ML Infrastructure Engineer - RL Systems

DIFFERENTIAL INC.
Daly City, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Daly City, United States of America

Tech stack

Artificial Intelligence
Nvidia CUDA
Distributed Computing Environment
Distributed Systems
Reinforcement Learning
PyTorch
Kubernetes
Machine Learning Operations

Job description

Move from maintaining a piece of the machine to defining the infrastructure that powers frontier AI research.

We're working with a frontier AI company building the infrastructure that powers large-scale reinforcement learning research.

This role sits at the intersection of ML systems, distributed infrastructure and researcher enablement. You'll work closely with research teams to build the tooling, platforms and infrastructure required to train, evaluate and scale advanced AI models.

Why This Role?

  • Direct ownership of critical infrastructure rather than a small component within a much larger organisation.
  • Work closely with researchers and influence how experiments are run, scaled and evaluated.
  • Help shape the next generation of RL systems, tooling and infrastructure.
  • Fast feedback loops and visible impact on research velocity and model performance.
  • Solve genuinely hard distributed systems and ML infrastructure challenges at the frontier of AI.

Requirements

  • ML Infrastructure / ML Systems Engineering
  • Distributed Systems
  • Training or Inference Infrastructure
  • RL or Post-Training Systems
  • Researcher Tooling
  • Large-scale GPU workloads

Technologies

PyTorch, DeepSpeed, FSDP, Ray, vLLM, CUDA, Triton, Kubernetes, distributed training and serving systems.

Apply for this position