Platform Engineer

Hlx Life Sciences
Charing Cross, United Kingdom
12 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
£ 91K

Job location

Charing Cross, United Kingdom

Tech stack

API
Artificial Intelligence
Backup Devices
Cloud Computing
Nvidia CUDA
Continuous Integration
Information Engineering
Linux
Disaster Recovery
Distributed Systems
Job Scheduling
Python
Reliability Engineering
Spark
GIT
Kubernetes
Data Management
Slurm
Machine Learning Operations
Terraform
Data Pipelines

Job description

AI Platform / ML Infrastructure Engineers

  • Kubernetes-based compute platforms
  • GPU scheduling, batch & distributed workloads
  • Supporting ML training, inference, and experimentation at scale

HPC / GPU Engineers

  • Job schedulers, MPI, multi-node workloads
  • Hybrid cloud and on-prem compute
  • Performance, reliability, and cost optimisation

Strong Data Engineers

  • Large-scale data pipelines and data platforms
  • Data reliability, orchestration, and observability
  • Close collaboration with ML and research teams

What You'll Work On

  • Designing and evolving Kubernetes-based compute platforms across hybrid and multi-cloud environments
  • Building and operating GPU-enabled infrastructure for ML and scientific workloads
  • Developing and maintaining core platform services, APIs, and internal tooling
  • Improving CI/CD pipelines and Infrastructure-as-Code workflows
  • Implementing monitoring, alerting, and reliability engineering practices
  • Ensuring security, data protection, backup, and disaster recovery best practices
  • Partnering closely with ML engineers, data scientists, and researchers to unblock compute and data challenges

Requirements

  • Strong experience in one or more of:
  • Platform / infrastructure engineering
  • ML infrastructure or MLOps
  • HPC or GPU compute
  • Data engineering at scale
  • Solid experience with Linux and cloud environments
  • Hands-on work with Kubernetes or distributed systems
  • Experience with Python (or similar) for automation or services
  • Familiarity with CI/CD, Git-based workflows, and automation
  • Strong problem-solving skills and a collaborative mindset

Bonus

  • Terraform or other IaC tools
  • Slurm, Kueue, Ray, Spark, or similar systems
  • GPU tooling (CUDA, Nvidia operators, schedulers)
  • Experience supporting ML training or data science teams

Apply for this position