Machine Learning Engineer (Post-Training)
Role details
Job location
Tech stack
Job description
We're looking for an ambitious Machine Learning Engineer specializing in Post-Training to work on environments, evaluations, data pipelines, and tooling for robust training systems.
Your work will directly impact how our agents learn to make decisions in complex supply chain environments. You'll help shape how we approach reward modeling, environment design, and agent training.
This role blends research and engineering. You'll implement novel approaches and contribute to our research direction while shipping production-grade systems. If you're energized by pushing the boundaries of what's possible, this is your chance.
Responsibilities
-
Design and implement post-training environments for supply chain decision-making
-
Create evaluation frameworks to measure agent performance and catch failure modes
-
Build data pipelines for training and human feedback collection
-
Optimize training infrastructure for throughput, efficiency, and fault tolerance
-
Debug complex issues in training pipelines and model behavior
-
Collaborate with the team to translate research ideas into reliable systems
-
Document what works (and what doesn't) so we can compound our learnings
-
Stay on top of industry trends and cutting edge use cases
Requirements
-
Have trained or fine-tuned LLMs for agents with SFT/DPO
-
Are proficient in Python, PyTorch and HF Transformers
-
Can balance research exploration with shipping working code
-
Are comfortable working with large datasets and building data pipelines at scale
-
Thrive in fast-moving environments where priorities shift
-
Are excited about AI-assisted tools and getting the most out of them
-
Can balance research exploration with shipping working code
-
Care about craft in your work
-
Have a deep sense of curiosity and make a habit of learning
-
Think globally about how your work impacts the entire organization
Bonus points if
-
Have hands-on experience with RL techniques (reward shaping and design, PPO, GRPO and other RLHF approaches)
-
Have experience with distributed training systems and techniques (DDP, FSDP, N-D parallelism)
-
You have experience with human-in-the-loop ML systems
-
You've built evaluation frameworks for open-ended tasks
-
You're familiar with supply chain, logistics, or operations domains
-
You have experience with Kubernetes and cloud infrastructure (AWS, GCP)
-
You've worked on reward hacking detection or robustness problems
-
You have a side project that shows you can't stop tinkering, If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success - and the success of our customers. Does your heart beat like ours? Find out here: Core Values