Deep Learning Solutions Architect - Inference Optimization

NVIDIA
Municipality of Madrid, Spain
24 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Municipality of Madrid, Spain

Tech stack

Artificial Neural Networks
C++
Computer Engineering
Data Centers
Software Debugging
DevOps
InfiniBand
Python
Software Engineering
Graphics Processing Unit (GPU)
Large Language Models
Deep Learning
Kubernetes
Information Technology
Docker

Job description

NVIDIA's Worldwide Field Operations (WWFO) team is seeking a Solution Architect with a deep understanding of neural network inference. As customers adopt increasingly complex inference pipelines on state-of-the-art infrastructure, experts are needed to guide the integration of advanced inference techniques such as speculative decoding, request-scheduler optimizations or FP4 quantization. The ideal candidate will be proficient with tools such as TRT-LLM, vLLM, SGLang or similar, and have strong systems knowledge to enable customers to fully use NVIDIA's new GB300 NVL72 systems. What You Will Be Doing

  • Work directly with key customers to understand their technology and provide the best AI solutions.
  • Perform in-depth analysis and optimization to ensure the best performance on GPU architecture systems, including large-scale inference pipelines on Grace/ARM based systems.
  • Partner with Engineering, Product and Sales teams to develop and plan best-suitable solutions for customers, enabling product feature growth through customer feedback and proof-of-concept evaluations.

Requirements

  • Excellent verbal, written communication, and technical presentation skills in English.
  • MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, or other engineering fields.
  • 5+ years of work or research experience with Python, C++ or other software development.
  • Work experience and knowledge of modern NLP, including a strong understanding of transformer, state-space, diffusion, MOE model architectures.
  • Understanding of key libraries used for NLP/LLM training, such as Megatron-LM, NeMo, DeepSpeed, and deployment libraries like TensorRT-LLM, vLLM, or Triton Inference Server.
  • Enthusiastic about collaborating across Engineering, Product, Sales, and Marketing teams, thriving in dynamic environments and staying focused amid constant change.
  • Self-starter with a growth mindset, passion for continuous learning, and a willingness to share findings across the team.

Ways To Stand Out From The Crowd

  • Demonstrated experience in running and debugging large-scale distributed deep learning training or inference processes.
  • Experience working with larger transformer-based architectures for NLP, CV, ASR or other domains.
  • Applied NLP technology in production environments.
  • Proficiency with DevOps tools including Docker, Kubernetes, and Singularity.
  • Understanding of HPC systems: data-center design, high-speed interconnect (InfiniBand), cluster storage, and scheduling design and/or management experience.

Benefits & conditions

Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.

Referrals increase your chances of interviewing at NVIDIA by 2×.

Benefits: https://www.nvidiabenefits.com

Apply for this position