Platform Engineer

Opportunitywe

Charing Cross, United Kingdom

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Charing Cross, United Kingdom

Tech stack

API

Artificial Intelligence

Backup Devices

Nvidia CUDA

Continuous Integration

Linux

Disaster Recovery

Python

Reliability Engineering

Spark

GIT

Kubernetes

Terraform

Data Pipelines

Requirements

Who This Role Is For (Choose Your Strength)We're open to different profiles and will shape the role around your strengths: AI Platform / ML Infrastructure EngineersKubernetes-based compute platformsGPU scheduling, batch & distributed workloadsSupporting ML training, inference, and experimentation at scale HPC / GPU EngineersJob schedulers, MPI, multi-node workloadsHybrid cloud and on-prem computePerformance, reliability, and cost optimisation Strong Data EngineersLarge-scale data pipelines and data platformsData reliability, orchestration, and observabilityClose collaboration with ML and research teams What You'll Work OnDesigning and evolving Kubernetes-based compute platforms across hybrid and multi-cloud environmentsBuilding and operating GPU-enabled infrastructure for ML and scientific workloadsDeveloping and maintaining core platform services, APIs, and internal toolingImproving CI/CD pipelines and Infrastructure-as-Code workflowsImplementing monitoring, alerting, and reliability engineering practicesEnsuring security, data protection, backup, and disaster recovery best practicesPartnering closely with ML engineers, data scientists, and researchers to unblock compute and data challengesWhat We're Looking ForStrong experience in one or more of:Platform / infrastructure engineeringML infrastructure or MLOpsHPC or GPU computeData engineering at scaleSolid experience with Linux and cloud environmentsHands-on work with Kubernetes or distributed systemsExperience with Python (or similar) for automation or servicesFamiliarity with CI/CD, Git-based workflows, and automationStrong problem-solving skills and a collaborative mindsetBonus Terraform or other IaC toolsSlurm, Kueue, Ray, Spark, or similar systemsGPU tooling (CUDA, Nvidia operators, schedulers)Experience supporting ML training or data science teams