Senior Applied Deep Learning Scientist - Large Vision Language Models

NVIDIA Corporation

Zürich, Switzerland

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Zürich, Switzerland

Tech stack

Training Data

Artificial Intelligence

Computer Vision

Linux

Python

Language Modeling

Open Source Technology

PyTorch

Large Language Models

Deep Learning

Information Technology

Machine Learning Operations

Software Coding

Data Pipelines

Docker

Job description

We are looking for a highly motivated Senior Applied Deep Learning Scientist with a passion for multimodal language models. Join our world-class NVIDIA team, spanning Finland, Germany, the Netherlands, and the USA, behind pioneering work such as Megatron-Energon, Nemotron 3 Nano Omni and our latest post-training datasets!

As a core contributor to NVIDIA's Nemotron multimodal initiative, we are pushing the frontiers of state-of-the-art open-source multimodal models. We have a unique perspective in that we strive for open models, open weights, open data. Our mission is straightforward: create models that perform exceptionally in real-world applications right out of the box, while empowering and advancing the broader multimodal LLM ecosystem. As an applied research group, we prioritize delivering tangible impact and solving real-world problems. We're most excited when deep learning moves beyond theory into production at scale. If you share a pragmatic, delivery-focused perspective and care about turning research into reality, this team will feel like the right home!

What you will be doing:

Push the boundaries of the NVIDIA Nemotron Omni family of models to enable powerful downstream applications, including document intelligence, mathematical reasoning, multi-turn multimodal dialogue systems, and advanced software & agentic assistants. The role spans the full pipeline, from pre-training through post-training.
Help us prepare large-scale multimodal datasets to train cutting-edge foundation models across text, image, audio and video. This includes developing robust data processing pipelines to curate high-quality training data, augmenting it, synthetically generating labels and providing the infrastructure to load and serve data in real time.
Collaborate globally with other team members, researchers and developers from different departments at NVIDIA and AI startups we work with, to turn research and innovations into impactful products.

Requirements

M.Sc. or Ph.D. in Computer science (or a related field), or equivalent research experience in LLMs, systems, or connected areas.
10+ years of industry experience in computer vision, including designing data pipelines for diverse data modalities and deploying models from research into production.
Strong understanding of the theoretical foundations of LLMs/VLMs and familiarity with the latest academic developments in the field.
Solid hands-on coding skills with PyTorch and Python, experience with multi-GPU training on large-scale compute clusters, fluency with Docker, and Linux systems expertise.

Ways to stand out from the crowd:

Contributions to open-source LLM systems or large-scale AI infrastructure.
Previous AI-related projects or entrepreneurial experience in a closely connected domain.
An academic track record of publications in deep learning.

About the company

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all