Technical Program Manager, AI Factory Infrastructure

NVIDIA Ltd.

Santa Clara, United States of America

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 322K

Job location

Santa Clara, United States of America

Tech stack

Artificial Intelligence

Data Centers

Software Design Documents

Network Topologies

Network Installation Services

AI Infrastructure

Service Stack

Job description

NVIDIA's AI Factory Infrastructure team develops global reference designs, de-risks technologies needed for the next generation of compute and network products and builds AI infrastructure at scale to validate solutions at scale. As a TPM on the team, you'll be responsible for leading teams that are highly multi-functional, including hardware, software and facility infrastructure teams, to deliver solutions and infrastructure. This role offers a truly unique blend of developing future solutions and products with building infrastructure at scale.

What You Will Be Doing:

Collaborate with product owners and technical leads to identify and collect requirements for next-generation AI Factories.
Build, supervise, and complete long-term programs including schedules, resourcing, and checkpoints.
Work with data center and hardware teams to find creative solutions to hard problems, and co-develop solutions and mitigation strategies.
Lead planning with key internal partners on capacity demands with engineering roadmaps and data center expansions.
Own end-to-end delivery of 100MW+ AI factory data center deployments, from construction readiness through commissioning, turn-up, and handoff to operations.
Coordinate general contractors, colocation providers, utilities, and OEMs to align electrical/mechanical scope, long-lead equipment, and site logistics for large-scale AI cluster deployments.
Drive integrated readiness reviews and acceptance criteria across power, liquid cooling, networking, and platform/hardware teams to ensure performance and reliability targets are met for AI factory applications.
Develop program plans for government grants and initiatives.
Translate program requirements into Basis Of Design documents
Bring together team members and foster a collaborative approach to delivery while holding team members accountable to action items and timelines.

Requirements

Outstanding long-term planning and execution skills to carry our data center lifecycle planning, including large-scale AI factory buildouts and expansions.
Experience managing end-to-end data center deployments for high-density AI infrastructure, including commissioning, readiness reviews, turn-up, and operational handoff.
Demonstrated ability to coordinate across colocation providers, general contractors, utilities, and OEMs to deliver complex electrical and mechanical scope (e.g., at 100MW+ campus scale).
Strong technical and program leadership across power delivery, liquid cooling, networking, and compute/platform teams to define acceptance criteria and ensure performance and reliability targets are met.
12+ years of experience providing program and project management leadership for data center projects covering construction of mechanical, electrical, and plumbing with large-scale server, storage, and network deployments.
BS or MS degree in Engineering (or equivalent experience)., * In-depth knowledge of infrastructure (hardware and software) data center facilities infrastructure (electrical and mechanical) technologies.
Familiarity with NVIDIA's AI compute technology stack and ability to translate platform requirements into data center infrastructure designs (power delivery, liquid cooling, space, and network topology) at scale.
Experience with colocation data center environments

Benefits & conditions

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 200,000 USD - 322,000 USD.