ML Systems Engineer - Model Training and Infrastructure (SWE-focused LLMs)

Cosine
Charing Cross, United Kingdom
13 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Intermediate
Compensation
£ 110K

Job location

Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Apache HTTP Server
Unit Testing
Azure
Big Data
Cloud Computing
Software Quality
Information Engineering
Software Debugging
Distributed Systems
Python
Machine Learning
Open Source Technology
TensorFlow
Software Engineering
SQL Databases
PyTorch
Large Language Models
Containerization
Kubernetes
Information Technology
Production Code
Free and Open-Source Software
Machine Learning Operations
Docker
Data Generation

Job description

We're looking for an ML Systems Engineer to collaborate in training our Lumen models - our open-source-based software engineering LLMs., In this role you will:

  • Develop and manage synthetic data generation pipelines to curate datasets that will underpin future RL fine-tunes.
  • Design, build and deploy containerized services using Docker and platforms like Kubernetes to enable our RL infrastructure.
  • Build and iterate on large-scale RL loops where models write code, run tests or tools, and get rewarded (or penalized) accordingly.
  • Work hands-on across the stack: custom PyTorch dataloaders, RL objectives, and evaluation on real-world repos and tasks.

You'll collaborate closely with infra, product, and research to decide what to train next, how to train it, and how to measure whether it's actually better for engineers., + RL on top of those models to align them with software-engineering objectives.

  • Architect synthetic data generation pipelines for RL and deploy using containerization technologies.
  • Ideate on novel and opinionated reward functions for the training of SWE agents.
  • Improve evaluation for SWE models:
  • Help maintain/extend an evaluation suite for code models (unit tests, benchmark suites, repo-level tasks).
  • Analyze failure modes and feed them back into data and training plans., * Direct impact: Your work directly shapes the next generations of Lumen Enterprise SWE models that engineers use every day.
  • Real scale: You'll work with large, modern open-source models, long context lengths, and multi-node training runs.
  • Full-stack ML engineering: From custom PyTorch code and distributed systems to data curation, RL infrastructure design and MLOps.

If this sounds like a fit, this is a role where you can meaningfully push the frontier of open-source-based software engineering models.

Requirements

  • Strong software engineering or computer science background:
  • Typically 3-5 years of experience.
  • You can read, debug, and write non-trivial production code (you'll mainly be working across Python and Go).
  • Experience with tools like Docker and container management/orchestration platforms, like Kubernetes
  • Experience with at least one major cloud-computing platform like GCP, AWS or Azure
  • You care about code quality, correctness, and maintainability as much as model metrics.
  • Knowledge of PyTorch/Tensorflow/JAX:
  • Comfortable implementing custom training loops, losses, and dataloaders.
  • Data engineering instincts:
  • Comfortable working with large-scale datasets, object storage, dataset sharding, and filtering.
  • Know that data quality and sampling strategies matter as much as architecture.
  • Clear communication and ownership:
  • Can take a vague modelling goal ("make Lumen better at X") and turn it into a concrete plan of experiments.
  • Comfortable documenting decisions and walking others through tradeoffs.

Nice to have

You don't need all of these, but the more you have, the more you'll hit the ground running:

  • Experience with synthetic data generation pipelines
  • Experience with data tooling like SQL, Apache Iceberg and duckDB
  • Experience training LLMs in distributed environments
  • Safety, robustness, and reward shaping:
  • Experience with LLM-as-a-judge, reward hacking detection, or robustness evaluation.
  • Open-source contributions or research:
  • Contributions to open-source LLM tooling, RL libraries, etc.

Benefits & conditions

We're an in-office team, five days a week, by design. We believe the work we're doing benefits from being together, collaborating closely, and building shared context.

What you can expect:

  • Competitive salary , benchmarked to the market
  • Equity / share options , so you share in the upside you help create
  • 30 days' holiday + bank holidays
  • Genuine 9-5 working hours - we don't expect late nights or weekend work
  • Work hard in the office, collaborate closely, and switch off properly
  • Dog-friendly office - bring your dog to work
  • Daily lunch provided
  • Monthly team breakfasts
  • Monthly socials
  • Pension
  • High-quality equipment to do your best work

We care about focus, sustainability, and doing great work - not performative overwork. We value people who show up, contribute thoughtfully, collaborate well with their colleagues, and then go home.

This role won't suit everyone. But if you want structure, clarity, strong collaboration, and a team that takes both the work and work-life balance seriously, it's a great place to be.

About the company

At Cosine, we're building autonomous AI engineers that plan, write, and ship code inside real development workflows. Cosine is designed for on-premise and virtual private cloud (VPC) deployments, including fully air-gapped environments. We build our agent tooling entirely in-house and post-train open-source models to deliver reliable, enterprise-grade coding performance in security-critical settings. In 2024, Cosine achieved a 72% score on OpenAI's SWE-Lancer benchmark, placing us among the strongest real-world software-engineering AI systems evaluated. YC-backed and well-funded, Cosine was founded by experienced operators focused on building dependable, production-grade AI. This role is based in our Hoxton office, five days a week, because close collaboration, fast feedback, and shared context matter for the problems we're solving., We want to make sure that the models we train are the best SWEs in the world - this doesn't just mean training them to get the right answer, it means training them so that they write readable, maintainable code, that fits with the architectural patterns already present in the codebase. We believe we're now in the anti-slop era of coding agents, where data, RL environments and opinionated reward functions will shape the future standards of SWE models. If this sounds exciting, then this could be the role for you.

Apply for this position