Senior AI-Native Systems Software Engineer, TensorRT

NVIDIA Ltd.

Santa Clara, United States of America

8 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 242K

Job location

Santa Clara, United States of America

Tech stack

Artificial Intelligence

C++

Profiling

Nvidia CUDA

Computer Engineering

Python

Performance Tuning

Software Architecture

Rapid Prototyping Process

Software Engineering

Large Language Models

Multi-Agent Systems

Deep Learning

Generative AI

Build Management

Information Technology

TensorRT

Stable Diffusion

Virtual Agents

C++14

Software Performance

Job description

Are you passionate about redefining how software is built in the age of Generative AI? Join NVIDIA's TensorRT team to help lead a first-of-its-kind, AI-native initiative designed to make TensorRT the default entry point for out-of-framework inference globally. We are moving beyond traditional development cycles with a new framework built from the ground up to leverage swarms of AI agents to produce high-performance, high-quality, modern C++ software at an unprecedented scale.

If you are a systems-thinking C++ engineer who wants to help scale out an agentic development framework, stay on top of state-of-the-art deep learning breakthroughs, and improve users' experience with lightning-fast model onboarding, we want to hear from you!

What you'll be doing:

Architecting an AI-native framework: Help design and build a codebase and architecture that scales beyond human capacity, supporting large numbers of AI agents working in parallel to generate, test, and validate production-grade software.
Scaling through agentic workflows: Improve the ratio of compute-to-software output by adopting and building AI-native tools, multi-agent orchestrators, and codebase harnesses that keep humans focused on the highest-value work..
Rapid prototyping with SOTA models: Act as a technical scout, identifying industry and academic breakthroughs (e.g., new attention mechanisms, KV cache strategies) and dispatching AI agent swarms to prototype and integrate these capabilities into our framework.
Delivering a great user experience: Ensure a seamless, high-performance path to production for the latest model families (LLMs, Diffusion, Audio, Vision and multi-modal models).
Extreme performance optimization: Work at the intersection of Python orchestration and C++ engine-level optimizations to achieve major latency and throughput gains for critical customer use cases.

Requirements

BS, MS, or PhD in Computer Science, Computer Engineering, AI, or equivalent experience.
4+ years of relevant software development experience.
Strong modern C++ skills: Proficiency with C++11/14/17 (or newer) and the STL, with an emphasis on clean, maintainable, performant code.
Deep learning familiarity: Experience with modern inference frameworks and an understanding of the architectural nuances of LLMs, Diffusion, and multi-modal models.
Systems thinking: Interest in how software architecture must evolve to support automated, agent-driven development and indefinitely scaling codebases.
End-to-end product sense: Ability to translate high-level customer needs into concrete technical requirements and user-centric solutions.
Pragmatic execution: Demonstrated ability to go from customer requests to production-quality software on tight timelines.
Collaborative mindset: Excellent communication skills and comfort working across internal organizations and with customers.

Ways to stand out from the crowd:

Agentic framework experience: Hands-on work with AI agent orchestrators or multi-agent coding frameworks, or experience building custom agentic coding harnesses for production software.
CUDA & kernel expertise: Experience with CUDA programming or exposure to kernel generation / autotuning efforts.
High-velocity prototyping: A track record of rapidly turning state-of-the-art papers into working prototypes in days, not weeks.
Performance profiling skills: Expertise in software performance analysis, profiling, and optimization (CPU and/or GPU), including using tooling to drive measurable wins.

Benefits & conditions

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, autonomous, and love a challenge, come join our team and help us build the future of high-performance AI inference technology!

#LI-Hybrid

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits .