ML Infrastructure Engineer - RL Systems
DIFFERENTIAL INC.
Daly City, United States of America
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Daly City, United States of America
Tech stack
Artificial Intelligence
Nvidia CUDA
Distributed Computing Environment
Distributed Systems
Reinforcement Learning
PyTorch
Kubernetes
Machine Learning Operations
Job description
Move from maintaining a piece of the machine to defining the infrastructure that powers frontier AI research.
We're working with a frontier AI company building the infrastructure that powers large-scale reinforcement learning research.
This role sits at the intersection of ML systems, distributed infrastructure and researcher enablement. You'll work closely with research teams to build the tooling, platforms and infrastructure required to train, evaluate and scale advanced AI models.
Why This Role?
- Direct ownership of critical infrastructure rather than a small component within a much larger organisation.
- Work closely with researchers and influence how experiments are run, scaled and evaluated.
- Help shape the next generation of RL systems, tooling and infrastructure.
- Fast feedback loops and visible impact on research velocity and model performance.
- Solve genuinely hard distributed systems and ML infrastructure challenges at the frontier of AI.
Requirements
- ML Infrastructure / ML Systems Engineering
- Distributed Systems
- Training or Inference Infrastructure
- RL or Post-Training Systems
- Researcher Tooling
- Large-scale GPU workloads
Technologies
PyTorch, DeepSpeed, FSDP, Ray, vLLM, CUDA, Triton, Kubernetes, distributed training and serving systems.