Senior Software Engineer, Training Efficiency

Deep East Texas Council of Governments
San Francisco, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

San Francisco, United States of America

Tech stack

Profiling
Distributed Systems
Data Flow Control
Python
Machine Learning
TensorFlow
Data Processing
Information Technology
Machine Learning Operations
Data Pipelines

Job description

  • The Waymo ML Infrastructure team works with Research and Production teams to develop models in Perception and Planning that are core to our autonomous driving software
  • We help our partners by offering the best solutions for the entire model development lifecycle
  • These solutions are developed in close collaboration with teams at Google
  • They are geared towards both scaling models and solving problems unique to ML for autonomous driving
  • You will improve the runtime efficiency of input data pipelines for large-scale training workloads
  • This is a unique opportunity to work on ML systems and improve on our model training processes
  • Design, and improve distributed input data pipelines for large-scale ML training workloads
  • Collaborate with researchers and ML engineers to resolve bottlenecks in data pipeline performance
  • Improve runtime goodput of ML training workload, including optimizing input data processing systems, ensuring scalability and reliability across distributed environments
  • Implement and maintain advanced ML infrastructure tools, including ML Pathways, Grain, JAX, and TensorFlow
  • Evaluate and integrate modern technologies to enhance the performance and scalability of ML systems
  • Promote best practices for distributed systems architecture and contribute to technical leadership within the team

Requirements

  • B.S. in Computer Science, Math, or 8+ years equivalent real-world experience
  • Proficient in distributed systems design with an understanding of ML data pipeline optimization
  • Experience with ML frameworks, including TensorFlow and JAX
  • Hands-on experience libraries like Grain or tf.data service
  • Solid programming skills in Python and C+
  • Practical familiarity with profiling tools to uncover performance bottlenecks
  • (Desirable) MS in Computer Science, Math
  • (Desirable) Familiarity with distributed dataflow frameworks like ML Pathways

Apply for this position