Lead MLOps / AI Platform Engineer

SATCON Inc
Charlotte, United States of America
8 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charlotte, United States of America

Tech stack

API
Artificial Intelligence
Azure
Cloud Computing
Profiling
Python
Network Security
Load Testing
Language Modeling
Openshift
Performance Tuning
Reliability Engineering
TensorFlow
Graphics Processing Unit (GPU)
Google Cloud Platform
Large Language Models
Kubernetes Helm Charts
Parallel Computation
Generative AI
Containerization
AI Platforms
Kubernetes
Hashicorp
Machine Learning Operations
TensorRT
Terraform
Service Stack

Job description

We are seeking a highly skilled Lead MLOps / AI Platform Engineer to design, build, and optimize our next-generation Generative AI and Large Language Model (LLM) infrastructure. This role is pivotal in bridging the gap between cutting-edge AI research and robust production deployment. You will be responsible for orchestrating high-performance GPU environments (specifically leveraging Nvidia H200s), optimizing LLM inference, and maintaining enterprise-grade infrastructure across both Cloud (Google Cloud Platform/Azure) and On-Premise environments., * Deploy, scale, and manage large-scale language models using advanced inference frameworks such as vLLM, TensorRT-LLM, SGLang, and Triton Inference Server.

  • Implement and fine-tune performance optimization strategies, including Continuous Batching and advanced Parallelism techniques.
  • Conduct load testing, benchmarking, and profiling of LLM deployments using GuideLLM and Locust to ensure optimal latency and throughput.
  1. Cloud & Infrastructure Orchestration
  • Architect and maintain scalable, secure infrastructure on Google Cloud Platform and Azure using Infrastructure as Code (Terraform).
  • Design and execute Cloud Networking, Landing Zones, and Organization Policies/Governance.
  • Manage secrets and secure workloads utilizing HashiCorp Vault.
  • Develop and champion Internal Developer Portals to streamline workflows for data science and product teams.
  1. On-Premise & Kubernetes Engineering
  • Orchestrate ML workloads on Kubernetes, utilizing KServe, OpenShift AI / OpenShift Functions, and Helm charts/Operators.
  • Manage compute clusters with a heavy focus on advanced GPU Orchestration (Nvidia H200 ecosystems).
  • Demonstrate deep hands-on expertise in architecture and "know-how to unfold an LLM" into highly constrained or custom on-premise hardware setups.
  1. Observability & SRE
  • Implement end-to-end ML Observability and monitoring frameworks using Arize AI.
  • Establish Site Reliability Engineering (SRE) best practices, maintaining strict SLOs/SLIs for model deployment pipelines and production APIs.

Requirements

Core AI / MLOps Stack:

  • Inference Engines: vLLM, TensorRT-LLM, Triton Inference Server, SGLang
  • ML Frameworks/Ops: KServe, OpenShift AI, Arize AI, GenAI Platforms, RAG architecture
  • Performance & Testing: GuideLLM, Locust, Continuous Batching, Parallelism optimization
  • Infrastructure & Cloud Stack:
  • Cloud Providers: Google Cloud Platform (Google Cloud Platform), Microsoft Azure
  • Containerization & Orchestration: Kubernetes, OpenShift, Helm/Operators, GPU Orchestration
  • IaC & Automation: Terraform, Python
  • Security & Networking: HashiCorp Vault, Landing Zones, Org Policy & Governance
  • Hardware Sanity Check:
  • Mandatory Experience: Direct, hands-on experience working with Nvidia H200 GPUs and optimizing workloads specifically for this architecture.

Apply for this position