Platform Engineer
Role details
Job location
Tech stack
Requirements
Who This Role Is For (Choose Your Strength)We're open to different profiles and will shape the role around your strengths: AI Platform / ML Infrastructure EngineersKubernetes-based compute platformsGPU scheduling, batch & distributed workloadsSupporting ML training, inference, and experimentation at scale HPC / GPU EngineersJob schedulers, MPI, multi-node workloadsHybrid cloud and on-prem computePerformance, reliability, and cost optimisation Strong Data EngineersLarge-scale data pipelines and data platformsData reliability, orchestration, and observabilityClose collaboration with ML and research teams What You'll Work OnDesigning and evolving Kubernetes-based compute platforms across hybrid and multi-cloud environmentsBuilding and operating GPU-enabled infrastructure for ML and scientific workloadsDeveloping and maintaining core platform services, APIs, and internal toolingImproving CI/CD pipelines and Infrastructure-as-Code workflowsImplementing monitoring, alerting, and reliability engineering practicesEnsuring security, data protection, backup, and disaster recovery best practicesPartnering closely with ML engineers, data scientists, and researchers to unblock compute and data challengesWhat We're Looking ForStrong experience in one or more of:Platform / infrastructure engineeringML infrastructure or MLOpsHPC or GPU computeData engineering at scaleSolid experience with Linux and cloud environmentsHands-on work with Kubernetes or distributed systemsExperience with Python (or similar) for automation or servicesFamiliarity with CI/CD, Git-based workflows, and automationStrong problem-solving skills and a collaborative mindsetBonus Terraform or other IaC toolsSlurm, Kueue, Ray, Spark, or similar systemsGPU tooling (CUDA, Nvidia operators, schedulers)Experience supporting ML training or data science teams