ML Infrastructure Engineer - RL Systems

DIFFERENTIAL INC.

Daly City, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Daly City, United States of America

Tech stack

Artificial Intelligence

Nvidia CUDA

Distributed Computing Environment

Distributed Systems

Reinforcement Learning

PyTorch

Kubernetes

Machine Learning Operations

Job description

Move from maintaining a piece of the machine to defining the infrastructure that powers frontier AI research.

We're working with a frontier AI company building the infrastructure that powers large-scale reinforcement learning research.

This role sits at the intersection of ML systems, distributed infrastructure and researcher enablement. You'll work closely with research teams to build the tooling, platforms and infrastructure required to train, evaluate and scale advanced AI models.

Why This Role?

Direct ownership of critical infrastructure rather than a small component within a much larger organisation.
Work closely with researchers and influence how experiments are run, scaled and evaluated.
Help shape the next generation of RL systems, tooling and infrastructure.
Fast feedback loops and visible impact on research velocity and model performance.
Solve genuinely hard distributed systems and ML infrastructure challenges at the frontier of AI.

Requirements

ML Infrastructure / ML Systems Engineering
Distributed Systems
Training or Inference Infrastructure
RL or Post-Training Systems
Researcher Tooling
Large-scale GPU workloads

Technologies

PyTorch, DeepSpeed, FSDP, Ray, vLLM, CUDA, Triton, Kubernetes, distributed training and serving systems.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all