Deep Learning Solutions Architect - Inference Optimization

NVIDIA

Municipality of Madrid, Spain

24 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Municipality of Madrid, Spain

Tech stack

Artificial Neural Networks

C++

Computer Engineering

Data Centers

Software Debugging

DevOps

InfiniBand

Python

Software Engineering

Graphics Processing Unit (GPU)

Large Language Models

Deep Learning

Kubernetes

Information Technology

Docker

Job description

NVIDIA's Worldwide Field Operations (WWFO) team is seeking a Solution Architect with a deep understanding of neural network inference. As customers adopt increasingly complex inference pipelines on state-of-the-art infrastructure, experts are needed to guide the integration of advanced inference techniques such as speculative decoding, request-scheduler optimizations or FP4 quantization. The ideal candidate will be proficient with tools such as TRT-LLM, vLLM, SGLang or similar, and have strong systems knowledge to enable customers to fully use NVIDIA's new GB300 NVL72 systems. What You Will Be Doing

Work directly with key customers to understand their technology and provide the best AI solutions.
Perform in-depth analysis and optimization to ensure the best performance on GPU architecture systems, including large-scale inference pipelines on Grace/ARM based systems.
Partner with Engineering, Product and Sales teams to develop and plan best-suitable solutions for customers, enabling product feature growth through customer feedback and proof-of-concept evaluations.

Requirements

Excellent verbal, written communication, and technical presentation skills in English.
MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, or other engineering fields.
5+ years of work or research experience with Python, C++ or other software development.
Work experience and knowledge of modern NLP, including a strong understanding of transformer, state-space, diffusion, MOE model architectures.
Understanding of key libraries used for NLP/LLM training, such as Megatron-LM, NeMo, DeepSpeed, and deployment libraries like TensorRT-LLM, vLLM, or Triton Inference Server.
Enthusiastic about collaborating across Engineering, Product, Sales, and Marketing teams, thriving in dynamic environments and staying focused amid constant change.
Self-starter with a growth mindset, passion for continuous learning, and a willingness to share findings across the team.

Ways To Stand Out From The Crowd

Demonstrated experience in running and debugging large-scale distributed deep learning training or inference processes.
Experience working with larger transformer-based architectures for NLP, CV, ASR or other domains.
Applied NLP technology in production environments.
Proficiency with DevOps tools including Docker, Kubernetes, and Singularity.
Understanding of HPC systems: data-center design, high-speed interconnect (InfiniBand), cluster storage, and scheduling design and/or management experience.

Benefits & conditions

Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.

Referrals increase your chances of interviewing at NVIDIA by 2×.

Benefits: https://www.nvidiabenefits.com