ML Infrastructure Engineer

Cubiq Recruitment
Charing Cross, United Kingdom
18 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 111K

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Continuous Integration
Performance Tuning
Graphics Processing Unit (GPU)
Terraform

Job description

You'll be joining the team that powers the core of their research. This isn't a support role. This is the group that builds the compute backbone behind every major breakthrough. You'll shape how their scientists train models, test ideas, and push their work forward at scale. And because you're joining early, your impact will be felt across the whole organisation.

You'll work on problems that matter. You'll help build fast, reliable GPU systems that let researchers move from idea to result without friction. You'll have room to experiment, try new approaches, and design systems in a place that backs bold thinking., * Build, run, and improve high-performance GPU training and inference clusters with a focus on reliability and automation

  • Design and implement high-throughput data paths, including work on caching, I/O, and data locality across compute and storage
  • Benchmark, profile, and fix performance issues across compute, network, and orchestration layers
  • Set up clear observability, resilience, and security controls for sensitive research environments
  • Work with Research, Data, and Applied teams to plan GPU and storage capacity and support smoother ML experimentation

Requirements

  • Strong experience designing and operating large-scale ML compute clusters
  • Good understanding of GPU architecture, high-speed networking, and performance tuning for distributed training
  • Experience with modern containerised systems and migrations from traditional schedulers
  • Knowledge of high-throughput storage systems for ML or HPC workloads
  • Solid experience with IaC and CI/CD (Terraform, Argo CD, or similar)

Benefits & conditions

  • Salary packages competitive with FAANG businesses
  • An opportunity to work on projects that will make a difference in the world, all projects are multi-decade programs that are orientated to improve society and people's lives
  • A rare opportunity to take part in shaping the core ML infra team as it grows from the ground up
  • State-of-the-art resources, enabling you to push the boundaries of AI research and development quickly and ethically, * Enhanced holiday pay
  • Pension
  • Life Assurance
  • Income Protection
  • Private Medical Insurance
  • Hospital Cash Plan
  • Therapy Services
  • Perk Box
  • Electric Car Scheme

About the company

We're partnering with a highly funded AI research company, poised to build the largest and most advanced AI team in Europe in the coming years. There aren't many opportunities where you get to work on addressing the problems of tomorrow in a "don't be afraid to push boundaries and fail environment". Competing on a Deepmind-esque level, you'll be addressing some of humanity's most pressing and enduring challenges, including next-generation drug discovery, combating climate change, the future of sustainable agriculture, and various other humanity-positive missions! By joining their team, you'll have the opportunity to contribute to research that directly shapes a better, more sustainable future for humanity. You'll be joining at an early stage, which means there are truly very few opportunities that can compete with this on a personal impact level!

Apply for this position