ML Infrastructure Engineer
Role details
Job location
Tech stack
Job description
- Lead the design, implementation, and deployment of scalable platforms and tools that drive machine learning model training and evaluation workflows across GM.
- Own complex technical projects end-to-end, making key architectural decisions and technical trade-offs. You will be a core contributor to team planning, design reviews, and code quality.
- Take a holistic view of projects, considering their impact across multiple teams, and proactively drive technical prioritization. Collaborate closely with partner teams to ensure maximum benefit from the systems we build.
- Help shape our team through technical interviewing with high, well-calibrated standards, and play an essential role in recruiting. Mentor and onboard junior engineers and interns, helping them grow their careers.
Requirements
- 3+ years of experience building large-scale distributed systems/applications or advanced ML Applications.
- Proven track record of building robust frameworks with high-quality, long-lasting APIs.
- Deep understanding and practical experience with machine learning algorithms.
- Expertise in building reliable, highly performant, and cost-efficient systems leveraging modern cloud infrastructure.
- Hands-on experience with the entire ML development lifecycle and MLOps practices.
- Demonstrated ability to collaborate effectively across multiple teams and organizations.
- Proficiency working with containerization and orchestration technologies (Docker, Kubernetes).
- A strong passion for self-driving technology and its transformative potential.
- Exceptional coding skills in Python or C++.
- BS, MS, or PhD in Computer Science, Math, or equivalent practical experience.
Exceptional candidates may also have:
- Experience with distributed training methodologies.
- A background in optimizing model training performance.
- Experience scaling model training across large clusters of GPCPUs or other accelerators.
- Familiarity with deep learning frameworks such as PyTorch, TensorFlow, etc.
- A strong grasp of performance profiling and state-of-the-art training optimization algorithms, including their performance characteristics and effect on model convergence.
- Experience with advanced build systems (Bazel, Buck, Blaze, or Cmake).
Remote/Hybrid: This role is based remotely but if you live within a 50-mile radius of an office, you are expected to report to that location three times a week, at minimum.
Benefits & conditions
Compensation: The compensation information is a good faith estimate only. It is based on what a successful applicant might be paid in accordance with applicable state laws. The compensation may not be representative for positions located outside of the California Bay Area.
-
The salary range for this role is $153,200.oo to $234,100.00. The actual base salary a successful candidate will be offered within this range will vary based on factors relevant to the position.
-
Bonus Potential: An incentive pay program offers payouts based on company performance, job level, and individual performance.
Benefits:
- Benefits: GM offers a variety of health and wellbeing benefit programs. Benefit options include medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts and more.