Staff Machine Learning Engineer - ML Training Infrastructure

General Motors
Saint Paul, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 335K

Job location

Saint Paul, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Azure
Profiling
Computer Programming
ETL
Software Debugging
Distributed Computing Environment
Distributed Systems
General-Purpose Computing on Graphics Processing Units
Monitoring of Systems
Python
Machine Learning
TensorFlow
Azure
Software Engineering
AI Infrastructure
Google Cloud Platform
Cloud Platform System
PyTorch
Parallel Computation
Information Technology
Machine Learning Operations

Job description

We are seeking an experienced, technically strong, impact-driven expert in ML Training Infrastructure with a demonstrated ability to lead through hands-on technical work. In this role, you will be responsible for defining the technical direction and driving the design and development of scalable, reliable, and high-performance AI/ML platform infrastructure that enables advanced AI research and model development at scale.

As a Staff ML Engineer, you will operate as a technical leader across initiatives, partnering closely with machine learning engineers, research scientists, and platform teams to shape architecture, drive major technical decisions, and deliver state-of-the-art AI infrastructure that enables the future of intelligent driving technologies across General Motors vehicles.

What You'll Do:

  • Define and drive the architecture, design, and development of scalable, reliable, and high-performance ML frameworks and platform capabilities to support model training at scale.
  • Lead model training performance analysis and optimization efforts across distributed training workflows, improving scalability, efficiency, and cost across heterogeneous hardware environments.
  • Raise the bar on system observability, debuggability, operational excellence, and developer experience across the ML training stack.
  • Own large, ambiguous, cross-functional technical initiatives from strategy through execution, including technical roadmap definition, tradeoff analysis, and delivery.
  • Influence platform direction by identifying long-term infrastructure investments, setting engineering standards, and driving adoption of best practices across teams.
  • Collaborate across organizational boundaries to align requirements, resolve technical disagreements, and integrate new capabilities into the platform ecosystem.
  • Mentor engineers through design reviews, technical guidance, and hands-on partnership, while elevating engineering quality across the team., This role is categorized as remote. This means the selected candidate may be based anywhere in the country of work and is not expected to report to a GM worksite unless directed by their manager.

Requirements

  • Bachelor's degree or higher in Computer Science or a related field, or equivalent practical experience.
  • 7+ years of professional software engineering experience.
  • 5+ years of specialized experience in AI/ML infrastructure, such as enabling distributed training for large-scale ML models.
  • Strong programming skills in Python, with deep proficiency in frameworks such as PyTorch (preferred), TensorFlow, or similar ML systems.
  • Proven experience designing and operating distributed systems for ML training, including distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).
  • Demonstrated track record of leading technically ambiguous, cross-team infrastructure initiatives and driving them to measurable impact.
  • Strong architectural judgment and ability to make sound technical tradeoffs across performance, reliability, usability, and cost.
  • Willingness to travel to Sunnyvale, CA as needed.
  • Comfortable operating in highly ambiguous and dynamic environments.

What Will Give You a Competitive Edge (preferred qualifications):

  • 7+ years of professional software engineering experience.
  • Deep expertise in PyTorch 2.x+ and distributed training frameworks.
  • Experience designing and developing training platforms that support FSDP, pipeline parallelism, and other scalable solutions for training large foundational models.
  • Experience profiling, analyzing, debugging, and optimizing training and data loading performance at scale.
  • Strong record of technical leadership through architecture reviews, roadmap influence, and cross-team execution.
  • Excellent communication skills, with the ability to build consensus, navigate controversial decisions, communicate risks clearly, and provide constructive technical feedback.
  • Self-motivated, execution-oriented, and motivated by delivering broad organizational impact.

Benefits & conditions

Compensation: The compensation information is a good faith estimate only. It is based on what a successful applicant might be paid in accordance with applicable state laws. The compensation may not be representative for positions located outside of the California Bay Area.

  • The salary range for this role is $185,000 to $335,300. The actual base salary a successful candidate will be offered within this range will vary based on factors relevant to the position.
  • Bonus Potential: An incentive pay program offers payouts based on company performance, job level, and individual performance., * Benefits: GM offers a variety of health and wellbeing benefit programs. Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts and more.

About the company

Company Vehicle : Upon successful completion of a motor vehicle report review, you will be eligible to participate in a company vehicle evaluation program, through which you will be assigned a General Motors vehicle to drive and evaluate. Note: program participants are required to purchase/lease a qualifying GM vehicle every four years unless one of a limited number of exceptions applies., We believe we all must make a choice every day - individually and collectively - to drive meaningful change through our words, our deeds and our culture. Every day, we want every employee to feel they belong to one General Motors team., General Motors is committed to being a workplace that is not only free of unlawful discrimination, but one that genuinely fosters inclusion and belonging. We strongly believe that providing an inclusive workplace creates an environment in which our employees can thrive and develop better products for our customers.

Apply for this position