Senior AI Researcher

Aleph Alpha

Heidelberg, Germany

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Heidelberg, Germany

Tech stack

Artificial Intelligence

Python

Reinforcement Learning

Large Language Models

Job description

As a (senior) AI Researcher for reinforcement learning you will shape and improve the underlying RL methodology, maintain a high-quality training code-base, and conduct large-scale experiments to hill-climb our performance benchmarks. This role is for you if you both have a strong theoretical background on RL and the engineering drive to bring these methods into production and improve on the methods as part of the reinforcement learning team.

In your day-to-day you will conduct large-scale reinforcement learning experiments, derive hypotheses from the results, and iterate on both the implementation and methodology based on the observations. Together with a collaborative team, you will have direct impact on the models that we ship to our customers., * Hill-climb in large-scale training: Conduct large-scale LLM training runs, analyze evaluation scores in depth, propose hypotheses for improvement and directly implement them in order to maximize performance on our benchmarks.

Theoretical innovation: Stay at the bleeding edge of RL research. You will identify, implement, and iterate on novel approaches to multi-turn reinforcement learning.
Scale our training infrastructure: Identify bottlenecks in our training setup and optimize our RL training loops for large-scale training.
Cross-functional collaboration: Partner with our other post-training teams to turn raw feedback into actionable training signals, ensuring that our RL iterations lead to measurable improvements in downstream performance.

Requirements

Do you have a Master's degree?, * A deep understanding of Reinforcement Learning theory and how it relates to modern RL methods.

Experience with multi-node LLM training (ideally using RL). You understand how to scale multi-node RL trainings and can reason about and implement distributed algorithms.
Familiarity with statistical methods for evaluation and experiment design.
Ability to reason about what an evaluation/environment measures and whether it matters - not just run benchmarks, but understand them.
Strong Python skills and comfort with ML tooling (especially torch distributed)
Willingness to relocate to Heidelberg or travel regularly (potentially weekly)., * PhD in reinforcement learning or equivalent research experience.
A history of contributions to top-tier venues (NeurIPS, ICML, ICLR, etc.) specifically regarding RL.
Experience evaluating LLM models and crafting environments for training.

About the company

Aleph Alpha is one of the few companies in Europe with end-to-end in-house model development including pre- and post-training. We're building models that have general-purpose capabilities, but also specifically excel at addressing the needs of our customers.