Senior Software Engineer, Training Efficiency
Deep East Texas Council of Governments
San Francisco, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
San Francisco, United States of America
Tech stack
Profiling
Distributed Systems
Data Flow Control
Python
Machine Learning
TensorFlow
Data Processing
Information Technology
Machine Learning Operations
Data Pipelines
Job description
- The Waymo ML Infrastructure team works with Research and Production teams to develop models in Perception and Planning that are core to our autonomous driving software
- We help our partners by offering the best solutions for the entire model development lifecycle
- These solutions are developed in close collaboration with teams at Google
- They are geared towards both scaling models and solving problems unique to ML for autonomous driving
- You will improve the runtime efficiency of input data pipelines for large-scale training workloads
- This is a unique opportunity to work on ML systems and improve on our model training processes
- Design, and improve distributed input data pipelines for large-scale ML training workloads
- Collaborate with researchers and ML engineers to resolve bottlenecks in data pipeline performance
- Improve runtime goodput of ML training workload, including optimizing input data processing systems, ensuring scalability and reliability across distributed environments
- Implement and maintain advanced ML infrastructure tools, including ML Pathways, Grain, JAX, and TensorFlow
- Evaluate and integrate modern technologies to enhance the performance and scalability of ML systems
- Promote best practices for distributed systems architecture and contribute to technical leadership within the team
Requirements
- B.S. in Computer Science, Math, or 8+ years equivalent real-world experience
- Proficient in distributed systems design with an understanding of ML data pipeline optimization
- Experience with ML frameworks, including TensorFlow and JAX
- Hands-on experience libraries like Grain or tf.data service
- Solid programming skills in Python and C+
- Practical familiarity with profiling tools to uncover performance bottlenecks
- (Desirable) MS in Computer Science, Math
- (Desirable) Familiarity with distributed dataflow frameworks like ML Pathways