Solutions Architect, Inference Deployments
Role details
Job location
Tech stack
Job description
We're forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA's GPU technology and Kubernetes. As a Solutions Architect focused on inference, you'll collaborate closely with our engineering, DevOps, and customers to develop enterprise AI solutions. Together, we'll deliver generative AI to production!
What you'll be doing:
-
Build inference pipelines with tools like NVIDIA Dynamo, distributing tasks among GPU workers to improve efficiency.
-
Collaborate with DevOps teams to orchestrate disaggregated inference using Kubernetes for complex workloads.
-
Accelerate inference pipelines using TensorRT-LLM, vLLM, SGLang, and other backends to ensure seamless integration with disaggregated inference.
-
Provide mentorship and technical leadership to customers and internal teams, guiding them through the deployment of disaggregated inference systems and resolving complex issues.
Requirements
-
5+ Years in Solutions Architecture with a proven track record of deploying distributed systems and AI inference workloads on Kubernetes.
-
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving.
-
GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU (MIG) partitioning.
-
Solving sophisticated GPU allocation, memory hierarchies (HBM, DRAM, SSD), and low-latency networking (RDMA, UCX).
-
Demonstrated success in tuning large language models for low-latency inference in enterprise environments.
-
BS in CS/Engineering or equivalent experience.
Ways to stand out from the crowd:
-
Prior experience deploying NVIDIA inference technologies such as Dynamo, NIM, NIXL and Grove.
-
Deep understanding of transformer neural network, and inference acceleration technologies like quantization, speculative decoding, WideEP etc.
-
NVIDIA Certified AI Engineer or similar credentials.
-
Contributions to open-source projects including NVIDIA Dynamo, vLLM, KServe, or SGLang.
Benefits & conditions
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD.