Engineering Manager (ML Platform)
Zoox
Foster City, United States of America
19 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 317KJob location
Remote
Foster City, United States of America
Tech stack
Cloud Computing
Machine Learning
Software Architecture
Large Language Models
Deep Learning
Machine Learning Operations
TensorRT
Hardware Infrastructure
Job description
- Our growing Software Infrastructure engineering leadership team is looking for a Senior Engineering Manager, ML Platform
- The centralized ML Platform team at Zoox plays a crucial role in enabling innovations across all our Autonomy and Data Science teams to develop and deploy models across our robotaxi and cloud infrastructure, and to work on cutting-edge training and inference optimization techniques
- We are working on many interesting challenges to enable rapid experimentation and scale our multi-modal Foundation models and RL infrastructure, and ensure these models run efficiently on our vehicles, meeting our latency targets
- You will get to work across all ML teams within Zoox - Perception, Prediction, Planner, Simulation, Collision Avoidance, and our Advanced Hardware Engineering group, and have the opportunity to significantly push the boundaries of how ML is practiced within Zoox
- We build and operate the base layer of ML tools, deep learning frameworks, and inference libraries used by our applied research teams for in- and off-vehicle ML use cases
- You will lead a team of strong software engineers and managers and act as a force multiplier for our internal customers
- This team has many growth opportunities as we expand our robotaxi deployments and venture into new ML domains
- Vision: Develop and execute a strategic vision for our ML training platform, ensuring scalability, reliability, and performance to support large-scale Foundation and RL models
- Technical acumen: Lead the design, implementation, and operation of a robust and efficient ML training platform to enable the training, experimentation, validation, and monitoring of ML models
- Hiring: Attract, hire, and inspire a diverse world-class engineering team, fostering a culture of innovation, collaboration, and excellence
- Partnership: Collaborate closely with cross-functional teams, including ML researchers, software engineers, data engineers, and hardware engineers to define requirements and align on architectural decisions
- Mentorship: Enable the engineers in the team to grow their careers by providing the right opportunities along with clear and timely feedback
Requirements
- Experience with training frameworks like PyTorch, JAX, etc., leveraging GPUs for distributed model training
- 10+ years of relevant experience, including 4+ years of management experience managing other managers and engineers
- Experience with GPU-accelerated inference using TensorRT, Ray Serve, or similar frameworks
- Experience building user-friendly ML Infrastructure that enabled large-scale model training and high-throughput, low-latency serving use cases
Benefits & conditions
- Paid parental leave
- Affinity groups and sports clubs
- Work from home opportunities
- Health insurance
- Our crew's health and happiness is our first priority. We offer comprehensive health and mental health support, a wellbeing program, and unlimited and flexible paid time away
- We invest in our crew-and their families-for the long term. That includes generous family planning support, caregiver support, and strong cash compensation with great equity upside
- We look after our crew when they're in the office too. Our famous food program is a great example, featuring a daily changing menu of local and sustainable dishes
- There's a busy calendar of social events at Zoox, with more sports teams than you can count. And, of course, playing with robots is an important part of the job description