Senior Staff Software Engineer - AI Agent Platform
NVIDIA Ltd.
Santa Clara, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 96KJob location
Santa Clara, United States of America
Tech stack
API
Artificial Intelligence
Encodings
Continuous Delivery
Software Design Documents
Programming Tools
Distributed Systems
Python
Key Management
PostgreSQL
MongoDB
OAuth
Open Source Technology
Role-Based Access Control
Redis
Session Management
Software Engineering
Data Ingestion
React
Flask
Delivery Pipeline
FastAPI
Vue.js
Gitlab-ci
Kubernetes
Information Technology
Kafka
Virtual Agents
Data Pipelines
Job description
We are looking for a Sr. Engineer to design, build, and scale the infrastructure powering NVIDIA's AI agent ecosystem. You will work at the intersection of distributed systems, developer platforms, and agentic AI - building the foundational services that enable teams across the company to develop, deploy, orchestrate, and operate autonomous AI agents at production scale.
What you will be doing:
- Build and develop platform services that own the full agent lifecycle from registration through deployment, execution, and teardown
- Architect Kubernetes-based execution environments with pod lifecycle management, namespace isolation, persistent storage, and identity propagation
- Develop and maintain automated CI/CD pipelines using GitLab CI and ArgoCD, including reusable pipeline templates and deployment blueprints that standardize how agents are built across teams
- Build framework-agnostic infrastructure supporting multiple agent SDKs (Claude Code, OpenAI Codex, LangGraph), with hands-on experience using harnesses, lifecycle hooks, skills configurability, observability (OTEL), and memory services
- Build and operate Kafka-based message pipelines and real-time event streaming using Redis PubSub and SSE
- Develop data ingestion pipelines, access interfaces, and storage layers that power AI agent knowledge and context
- Implement session management for state persistence, conversation history, and agent recovery across sessions
- Develop multi-layer auth using OAuth 2.0, JWT validation, token exchange, and gateway integration, and manage secrets lifecycle with Vault (provisioning, rotation, container injection)
- Partner with security teams on compliance, access controls, and approval workflows for agent operations
Requirements
- Bachelor's or Master's degree in Computer Science, Engineering, or related field (or equivalent experience), with 8+ years in software engineering - ideally in platform engineering, infrastructure, or developer tools
- Experience building and scaling AI agents in production using frameworks like Claude Code, Codex, or LangGraph
- Deep Kubernetes expertise including pod orchestration, persistent storage, RBAC, and multi-cluster management
- Strong Python skills with production API experience using FastAPI, Flask, or similar async frameworks
- Proven track record designing distributed systems with Kafka, Redis, and MongoDB or PostgreSQL
- Expertise building and managing robust CI/CD pipelines using GitLab CI and ArgoCD for continuous delivery to Kubernetes
- Experience designing AI data platform components (ingestion pipelines, vector stores, retrieval APIs, data preprocessing workflows) and building developer-facing platform APIs consumed by multiple engineering teams
- Solid grasp of auth and identity: OAuth 2.0, JWT, token exchange, and secrets management with Vault
- History of leading sophisticated technical projects such as migrations or greenfield platform builds, with strong interpersonal skills to drive alignment across teams and write clear design documents, * Experience building or operating AI agent platforms or agentic workflow systems, with hands-on expertise in agent protocols and frameworks like MCP, A2A, LangChain, or LangGraph
- Hands-on experience with RAG architectures, embedding pipelines, and vector databases (Milvus, Pinecone, or Weaviate)
- Full-stack skills with React or Vue for building developer portals and dashboards
- Contributions to open-source infrastructure or platform tooling
Benefits & conditions
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 270,250 USD for Level 4, and 200,000 USD - 322,000 USD for Level 5.