Technical Lead Manager (ML Platform Infrastructure)

Nuro Inc.

Mountain View, United States of America

10 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 235K

Job location

Remote

Mountain View, United States of America

Tech stack

Amazon Web Services (AWS)

Systems Engineering

Azure

Big Data

Cloud Computing

Cloud Engineering

Computer Clusters

ETL

Device Drivers

Distributed Computing Environment

Distributed Systems

Machine Learning

Redis

Azure

Ceph

Spark

Caching

Backend

Kubernetes

Information Technology

Slurm

Machine Learning Operations

Apache Beam

Nvme

Job description

Nuro is seeking an experienced Technical Lead Manager with deep expertise in large-scale infrastructure, workload orchestration, as well as batch and streaming data processing systems to join our ML Infrastructure team
In this role, you will lead the evolution of our core platform, ensuring our researchers and engineers have seamless access to the compute and data resources required to build the future of autonomous driving
You will drive the strategy for automated resource provisioning, high-performance workload scheduling, and efficient feature management
As a TLM, you will balance technical hands-on leadership with people management, mentoring a high-performing team while partnering closely with ML Research and Autonomy teams to eliminate infrastructure bottlenecks and accelerate the Nuro Driver development lifecycle
Setting Technical Strategy: Defining the roadmap for a unified ML platform that abstracts complex cloud infrastructure
Resource Provisioning & IaC: Scaling our automated infrastructure-as-code (IaC) pipelines to manage thousands of GPU/CPU nodes across diverse environments
Intelligent Scheduling: Designing and optimizing workload orchestration to maximize hardware utilization, minimize job wait times, and handle massive-scale distributed training
Data Dumping & ETL: Designing robust pipelines for the extraction and transformation of petabyte-scale sensor and telemetry data into ML-ready formats
Feature Caching & Feature Stores: Implementing robust feature caching and storage solutions to reduce redundant computations and ensure low-latency access to pre-computed features
Team Leadership: Mentoring and growing a team of software and systems engineers, fostering a culture of operational excellence and technical innovation

Requirements

Resource Provisioning: Deep familiarity with modern Infrastructure-as-Code and provisioning tools (e.g., Terraform, Pulumi, or Crossplane)
Feature Management: Experience implementing or maintaining feature stores and caching layers (e.g., Feast, Hopsworks, or Redis-based custom caching)
Experience: 6+ years of professional experience in ML Infrastructure, Backend Platform Engineering, or Distributed Systems with 3+ years of people/team management experience
Workload Scheduling: Hands-on experience building or managing large-scale orchestrators for compute-heavy workloads (e.g., Kubernetes/KubeRay, Ray, Slurm, or Volcano)
Data Dumping (ETL): Proven expertise in large-scale data extraction and transformation. You must be proficient in at least one distributed processing framework, such as Apache Spark or Apache Beam
Experience with high-performance storage systems (e.g., Lustre, Ceph, or specialized NVMe caching) for ML data loading
Knowledge of cost-optimization strategies for large-scale GPU clusters in public clouds (AWS/GCP/Azure)
Active contributor to open-source projects in the MLOps or Cloud-Native ecosystem (e.g., CNCF, Ray, or Kubeflow communities)
Advanced degree (Ph.D. or M.Sc.) in Computer Science, Systems Engineering, or a related technical field

Benefits & conditions

Free Caltrain pass and commuter benefits
Company stock options
Work from home opportunities
Health insurance

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all