Senior ML Infrastructure Engineer - Embodied AI Scaling Foundations
Role details
Job location
Tech stack
Job description
As a Senior ML Infra Engineer, you will build critical infrastructure that powers every machine learning engineer working on our cutting-edge Autonomous Driving models. From foundational models to state-of-the-art optimization, our goal is simple: dramatically accelerate the machine learning development cycle. We are committed to delivering products that are performant, easy to use, and exceptionally reliable. Your success will be measured by the success of our partner teams who rely on our robust systems to build the world's most advanced driverless vehicles.
What you'll do:
-
Lead the design, implementation, and deployment of scalable platforms and tools that drive machine learning model training and evaluation workflows across GM.
-
Own complex technical projects end-to-end, making key architectural decisions and technical trade-offs. You will be a core contributor to team planning, design reviews, and code quality.
-
Take a holistic view of projects, considering their impact across multiple teams, and proactively drive technical prioritization. Collaborate closely with partner teams to ensure maximum benefit from the systems we build.
-
Help shape our team through technical interviewing with high, well-calibrated standards, and play an essential role in recruiting. Mentor and onboard junior engineers and interns, helping them grow their careers., Remote/Hybrid: This role is based remotely but if you live within a 50-mile radius of an office, you are expected to report to that location three times a week, at minimum., Compensation : The compensation information is a good faith estimate only. It is based on what a successful applicant might be paid in accordance with applicable state laws. The compensation may not be representative for positions located outside of the California Bay Area.
Requirements
-
3+ years of experience building large-scale distributed systems/applications or advanced ML Applications.
-
Proven track record of building robust frameworks with high-quality, long-lasting APIs.
-
Deep understanding and practical experience with machine learning algorithms.
-
Expertise in building reliable, highly performant, and cost-efficient systems leveraging modern cloud infrastructure.
-
Hands-on experience with the entire ML development lifecycle and MLOps practices.
-
Demonstrated ability to collaborate effectively across multiple teams and organizations.
-
Proficiency working with containerization and orchestration technologies (Docker, Kubernetes).
-
A strong passion for self-driving technology and its transformative potential.
-
Exceptional coding skills in Python or C++.
-
BS, MS, or PhD in Computer Science, Math, or equivalent practical experience.
Exceptional candidates may also have:
-
Experience with distributed training methodologies.
-
A background in optimizing model training performance.
-
Experience scaling model training across large clusters of GPUs/CPUs or other accelerators.
-
Familiarity with deep learning frameworks such as PyTorch, TensorFlow, etc.
-
A strong grasp of performance profiling and state-of-the-art training optimization algorithms, including their performance characteristics and effect on model convergence.
-
Experience with advanced build systems (Bazel, Buck, Blaze, or Cmake).
Benefits & conditions
-
The salary range for this role is $153,200.oo to $234,100.00. The actual base salary a successful candidate will be offered within this range will vary based on factors relevant to the position.
-
Bonus Potential: An incentivepayprogram offers payouts based on company performance, job level, and individual performance.
Benefits:
- Benefits: GM offers a variety of health and wellbeing benefit programs. Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuitionassistanceprograms, employeeassistanceprogram, GM vehicle discounts and more.