Technical Program Manager, AI Factory Infrastructure
Role details
Job location
Tech stack
Job description
NVIDIA's AI Factory Infrastructure team develops global reference designs, de-risks technologies needed for the next generation of compute and network products and builds AI infrastructure at scale to validate solutions at scale. As a TPM on the team, you'll be responsible for leading teams that are highly multi-functional, including hardware, software and facility infrastructure teams, to deliver solutions and infrastructure. This role offers a truly unique blend of developing future solutions and products with building infrastructure at scale.
What You Will Be Doing:
- Collaborate with product owners and technical leads to identify and collect requirements for next-generation AI Factories.
- Build, supervise, and complete long-term programs including schedules, resourcing, and checkpoints.
- Work with data center and hardware teams to find creative solutions to hard problems, and co-develop solutions and mitigation strategies.
- Lead planning with key internal partners on capacity demands with engineering roadmaps and data center expansions.
- Own end-to-end delivery of 100MW+ AI factory data center deployments, from construction readiness through commissioning, turn-up, and handoff to operations.
- Coordinate general contractors, colocation providers, utilities, and OEMs to align electrical/mechanical scope, long-lead equipment, and site logistics for large-scale AI cluster deployments.
- Drive integrated readiness reviews and acceptance criteria across power, liquid cooling, networking, and platform/hardware teams to ensure performance and reliability targets are met for AI factory applications.
- Develop program plans for government grants and initiatives.
- Translate program requirements into Basis Of Design documents
- Bring together team members and foster a collaborative approach to delivery while holding team members accountable to action items and timelines.
Requirements
- Outstanding long-term planning and execution skills to carry our data center lifecycle planning, including large-scale AI factory buildouts and expansions.
- Experience managing end-to-end data center deployments for high-density AI infrastructure, including commissioning, readiness reviews, turn-up, and operational handoff.
- Demonstrated ability to coordinate across colocation providers, general contractors, utilities, and OEMs to deliver complex electrical and mechanical scope (e.g., at 100MW+ campus scale).
- Strong technical and program leadership across power delivery, liquid cooling, networking, and compute/platform teams to define acceptance criteria and ensure performance and reliability targets are met.
- 12+ years of experience providing program and project management leadership for data center projects covering construction of mechanical, electrical, and plumbing with large-scale server, storage, and network deployments.
- BS or MS degree in Engineering (or equivalent experience)., * In-depth knowledge of infrastructure (hardware and software) data center facilities infrastructure (electrical and mechanical) technologies.
- Familiarity with NVIDIA's AI compute technology stack and ability to translate platform requirements into data center infrastructure designs (power delivery, liquid cooling, space, and network topology) at scale.
- Experience with colocation data center environments
Benefits & conditions
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 200,000 USD - 322,000 USD.