Senior System Software Engineer - Embedded AI Inference

Nvidia

München, Germany

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

München, Germany

Tech stack

Artificial Intelligence

Systems Engineering

Unit Testing

C++

CMake

Static Program Analysis

Profiling

Software Quality

Code Review

Nvidia CUDA

Continuous Integration

Software Debugging

Linux

General-Purpose Computing on Graphics Processing Units

Python

Machine Learning

Open Source Technology

Software Engineering

System Software

Graphics Processing Unit (GPU)

Real Time Systems

PyTorch

Large Language Models

Deep Learning

Gpu Programming

GIT

Information Technology

Automotive

Decoding

Data Pipelines

Job description

We're hiring a Senior Software Engineer to develop production automotive software for AI inference and agent orchestration in C++. Join us on an exhilarating journey, where you'll build out the foundation for next-generation automotive software applications: in-car agentic AI and inference of cutting-edge AI models (LLM, VLM, VLA). You would have the opportunity to shape cutting-edge AI frameworks that enable unprecedented in-car AI experiences and provide a reliable backbone for a new generation of Autonomous Vehicles., * Design, implement, and maintain C++ agentic AI and AI inference solutions for embedded production platforms.

Integrate PyTorch Deep Learning models into C++ pipelines, and deploy them for real-time inference on NVIDIA GPUs.
Build and extend testable, modular libraries and components, including interfaces to models, sensor drivers, and vehicle control.
Profile, debug, and optimize C++ and CUDA code to meet strict latency and throughput targets.
Collaborate closely with ML researchers, systems engineers, and automotive partners to turn prototype algorithms into production-ready implementations.

Requirements

8+ years of professional software engineering experience, ideally in high-performance safety-critical software, automotive, robotics, or real-time systems.
Master's or PhD degree in Computer Science or Machine Learning.
Strong modern C++ (C++14/17 or later): templates, RAII, smart pointers, STL, and experience building large codebases.
Solid Python skills for tooling, training scripts, and glue code between data pipelines and C++ components.
Hands-on experience building agentic AI frameworks and with LLM / VLM inference. Experience with LLM and VLM inference and related optimization techniques like speculative decoding, LoRA, MoE.
Experience developing on Linux: build systems (CMake), debugging (gdb, sanitizers), profiling, and git-based workflows in a CI/CD environment.
Familiarity with GPU programming and optimization, ideally with TensorRT.

Ways to stand out from the crowd

Experience with agentic AI, specifically agents based on edge-friendly models (2-7B), including context management, reliable tool calling, and MCP, as well as experience with agentic coding.
Direct experience with the NVIDIA DRIVE AGX platform.
Knowledge of AI model optimization and deployment: quantization (INT8, FP8, 4-bit).
Familiarity with high-performance LLM inference frameworks like TensorRT-LLM or ONNX Runtime.
Understanding of software quality practices for safety-critical systems (code review, unit testing, static analysis; automotive standards knowledge is a plus) as well as open-source contributions or published work in AI, robotics, or GPU computing.

Work on challenging, real-world in-car AI inference problems where your ML and C++ skills directly impact the cabin experience and vehicle's self-driving capabilities. Collaborate with a talented, multidisciplinary team of researchers, engineers, and automotive experts. Solve hard technical problems at the intersection of deep learning, real-time systems, and production software engineering.