AI Engineer - Reinforcement Learning

BLUE SERVICE
Paris, France
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Paris, France

Tech stack

Artificial Intelligence
Learning Management Systems
Python
Reinforcement Learning
PyTorch
Large Language Models
Machine Learning Operations
Data Pipelines
Automation Anywhere

Job description

The AI Studio's mission is to find the fastest possible path to an autonomous supply chain. We're developing AI agents, learning systems, training models, and more to overcome the biggest challenges remaining in the global supply chain. In short, we are having a lot of fun. Your mission in this role We're looking for an ambitious AI Engineer specialising in Reinforcement Learning to work on environments, evaluations, data pipelines, and tooling for robust training systems. You'll help shape how we approach reward modeling, environment design, and agent training. If you're energised by pushing the boundaries of what's possible, this is your chance. Responsibilities:

  • Design and implement RL environments for supply chain decision-making
  • Develop reward functions that capture what "good" looks like for our agents
  • Create evaluation frameworks to measure agent performance and catch failure modes
  • Build data pipelines for training and human feedback collection
  • Document what works (and what doesn't) so we can compound our learnings
  • Stay on top of industry trends and cutting edge use cases

Requirements

  • You've trained or fine-tuned LLMs
  • Are excited about AI-assisted tools and getting the most out of them
  • Build & customize your own AI workflows
  • Have experience working with AI agents and RL environments in production
  • Are proficient in Python and PyTorch
  • Can balance research exploration with shipping working code
  • Hands on experience with RL techniques (reward shaping, policy optimization, RLHF)
  • Thrive in fast-moving environments where priorities shift
  • Care about craft in your work
  • Are curious about why things work, not just that they work

Bonus points if:

  • You have experience with human-in-the-loop ML systems
  • You've built evaluation frameworks for open-ended tasks
  • You're familiar with supply chain, logistics, or operations domains
  • You have a side project that shows you can't stop tinkering

Apply for this position