Deep Learning Solutions Architect - Inference Optimization
Role details
Job location
Tech stack
Job description
NVIDIA's Worldwide Field Operations (WWFO) team is seeking a Solution Architect with a deep understanding of neural network inference. As customers adopt increasingly complex inference pipelines on state-of-the-art infrastructure, experts are needed to guide the integration of advanced inference techniques such as speculative decoding, request-scheduler optimizations or FP4 quantization. The ideal candidate will be proficient with tools such as TRT-LLM, vLLM, SGLang or similar, and have strong systems knowledge to enable customers to fully use NVIDIA's new GB300 NVL72 systems. What You Will Be Doing
- Work directly with key customers to understand their technology and provide the best AI solutions.
- Perform in-depth analysis and optimization to ensure the best performance on GPU architecture systems, including large-scale inference pipelines on Grace/ARM based systems.
- Partner with Engineering, Product and Sales teams to develop and plan best-suitable solutions for customers, enabling product feature growth through customer feedback and proof-of-concept evaluations.
Requirements
- Excellent verbal, written communication, and technical presentation skills in English.
- MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, or other engineering fields.
- 5+ years of work or research experience with Python, C++ or other software development.
- Work experience and knowledge of modern NLP, including a strong understanding of transformer, state-space, diffusion, MOE model architectures.
- Understanding of key libraries used for NLP/LLM training, such as Megatron-LM, NeMo, DeepSpeed, and deployment libraries like TensorRT-LLM, vLLM, or Triton Inference Server.
- Enthusiastic about collaborating across Engineering, Product, Sales, and Marketing teams, thriving in dynamic environments and staying focused amid constant change.
- Self-starter with a growth mindset, passion for continuous learning, and a willingness to share findings across the team.
Ways To Stand Out From The Crowd
- Demonstrated experience in running and debugging large-scale distributed deep learning training or inference processes.
- Experience working with larger transformer-based architectures for NLP, CV, ASR or other domains.
- Applied NLP technology in production environments.
- Proficiency with DevOps tools including Docker, Kubernetes, and Singularity.
- Understanding of HPC systems: data-center design, high-speed interconnect (InfiniBand), cluster storage, and scheduling design and/or management experience.
Benefits & conditions
Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.
Referrals increase your chances of interviewing at NVIDIA by 2×.
Benefits: https://www.nvidiabenefits.com