Lead MLOps / AI Platform Engineer

SATCON Inc

Charlotte, United States of America

8 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Charlotte, United States of America

Tech stack

API

Artificial Intelligence

Azure

Cloud Computing

Profiling

Python

Network Security

Load Testing

Language Modeling

Openshift

Performance Tuning

Reliability Engineering

TensorFlow

Graphics Processing Unit (GPU)

Google Cloud Platform

Large Language Models

Kubernetes Helm Charts

Parallel Computation

Generative AI

Containerization

AI Platforms

Kubernetes

Hashicorp

Machine Learning Operations

TensorRT

Terraform

Service Stack

Job description

We are seeking a highly skilled Lead MLOps / AI Platform Engineer to design, build, and optimize our next-generation Generative AI and Large Language Model (LLM) infrastructure. This role is pivotal in bridging the gap between cutting-edge AI research and robust production deployment. You will be responsible for orchestrating high-performance GPU environments (specifically leveraging Nvidia H200s), optimizing LLM inference, and maintaining enterprise-grade infrastructure across both Cloud (Google Cloud Platform/Azure) and On-Premise environments., * Deploy, scale, and manage large-scale language models using advanced inference frameworks such as vLLM, TensorRT-LLM, SGLang, and Triton Inference Server.

Implement and fine-tune performance optimization strategies, including Continuous Batching and advanced Parallelism techniques.
Conduct load testing, benchmarking, and profiling of LLM deployments using GuideLLM and Locust to ensure optimal latency and throughput.

Cloud & Infrastructure Orchestration

Architect and maintain scalable, secure infrastructure on Google Cloud Platform and Azure using Infrastructure as Code (Terraform).
Design and execute Cloud Networking, Landing Zones, and Organization Policies/Governance.
Manage secrets and secure workloads utilizing HashiCorp Vault.
Develop and champion Internal Developer Portals to streamline workflows for data science and product teams.

On-Premise & Kubernetes Engineering

Orchestrate ML workloads on Kubernetes, utilizing KServe, OpenShift AI / OpenShift Functions, and Helm charts/Operators.
Manage compute clusters with a heavy focus on advanced GPU Orchestration (Nvidia H200 ecosystems).
Demonstrate deep hands-on expertise in architecture and "know-how to unfold an LLM" into highly constrained or custom on-premise hardware setups.

Observability & SRE

Implement end-to-end ML Observability and monitoring frameworks using Arize AI.
Establish Site Reliability Engineering (SRE) best practices, maintaining strict SLOs/SLIs for model deployment pipelines and production APIs.

Requirements

Core AI / MLOps Stack:

Inference Engines: vLLM, TensorRT-LLM, Triton Inference Server, SGLang
ML Frameworks/Ops: KServe, OpenShift AI, Arize AI, GenAI Platforms, RAG architecture
Performance & Testing: GuideLLM, Locust, Continuous Batching, Parallelism optimization
Infrastructure & Cloud Stack:
Cloud Providers: Google Cloud Platform (Google Cloud Platform), Microsoft Azure
Containerization & Orchestration: Kubernetes, OpenShift, Helm/Operators, GPU Orchestration
IaC & Automation: Terraform, Python
Security & Networking: HashiCorp Vault, Landing Zones, Org Policy & Governance
Hardware Sanity Check:
Mandatory Experience: Direct, hands-on experience working with Nvidia H200 GPUs and optimizing workloads specifically for this architecture.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all