Sr. Technical Program Manager - Training at Scale

Advanced Micro Devices, Inc.
San Jose, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 206K

Job location

San Jose, United States of America

Tech stack

Artificial Intelligence
Nvidia CUDA
Computer Engineering
Distributed Computing Environment
Open Source Technology
Release Management
PyTorch
Information Technology
Machine Learning Operations

Job description

AMD is seeking a Senior Technical Program Manager (TPM) to lead Training at Scale programs for AMD Instinct products. You will driveend-to-endexecution of largescale AI training initiatives while owningmulti-quarterplanning, roadmap development, and the operating cadence that turns strategy into predictable delivery across the Training at Scale engineering portfolio.

In this role, you will be a core partner to engineering leadership,ensuringthatnear-termexecution strength is matched by clearlong-termplans, OKR rigor, and early risk/decision management across an evolving opensource AI ecosystem.

THE PERSON:

The ideal candidate is a highly structured program leader with technical depth in AI training frameworks and distributed training at scale, comfortable operating in ambiguity and turning strategy into executable roadmaps. You communicate crisply at all levels, build alignment acrosscross functionalteams, and proactively surface risks, tradeoffs, and decision points before they become delivery blockers

You thrive in afast-movingenvironment, bring a strong operating cadence (OKRs, reviews, dashboards), and can build durable planning mechanisms that reduce engineering overhead while improving delivery predictability., * Own Training at Scale portfolio planning:translatestrategy into amulti-quarterroadmap, quarterly plans, and measurable outcomes.

  • Establish and run an execution operating model (OKRs, program reviews, decision logs, dashboards) to drive rigor, transparency, and predictable delivery.

  • Driveend-to-enddelivery of largescale AI training capabilities acrosscross functionalengineering teams; manage scope, milestones, dependencies, and critical path.

  • Apply technical judgment toidentifyand managearchitecture leveltradeoffs, technical dependencies, and technicalrisk;proactively surface decision points and escalation paths.

  • Build alignment with engineering leadership and key stakeholders on priorities, sequencing, and resourcing., AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here.

Requirements

  • Technical fluency in AI/ML systems, including distributed training and scalability/performance considerations.

  • Handson familiarity with AI training frameworks/ecosystems (e.g.,PyTorch, JAX) and related tooling.

  • Understanding of GPU compute software stacks and performance considerations; familiarity with AMDROCmand/or NVIDIA CUDA

  • Experience working in opensource ecosystems (contributing, managing upstream dependencies, release planning, community/ecosystem coordination).

  • Track recordof proactive risk management andexecutive levelstakeholder communication in ambiguous environments., * Master's orBachelor's degree in Computer Engineering, Computer Science or Electrical Engineering is desired

About the company

At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

Apply for this position