DevOps Engineer
Role details
Job location
Tech stack
Job description
- Design and manage scalable cloud infrastructure for high-performance RL training and distributed environments
- Build and optimise CI/CD pipelines for open-source and enterprise components
- Implement containerisation and orchestration using Docker and Kubernetes
- Develop Infrastructure as Code solutions (Terraform, CloudFormation, Pulumi)
- Implement monitoring, logging, and alerting for distributed ML systems
- Collaborate with ML teams on resource optimisation and cost efficiency
- Apply security best practices, manage access controls, and ensure compliance
- Automate operational tasks: backups, disaster recovery, maintenance
- Support GPU clusters and distributed compute resources for RL workloads
- Maintain availability and performance of production ML systems
Requirements
-
Degree in Computer Science/Engineering or 3+ years of DevOps/infrastructure experience
-
Strong background with AWS, GCP, or Azure, including ML/AI workloads
-
Proficiency with Docker, Kubernetes, and ML-focused orchestration
-
Experience with Terraform/CloudFormation/Pulumi and configuration management
-
Solid understanding of CI/CD tools (GitHub Actions, GitLab CI, Jenkins)
-
Knowledge of monitoring/observability tools (Prometheus, Grafana, OpenObserve)
-
Experience with GPU infrastructure and distributed ML compute frameworks
-
Familiarity with MLOps tools and model lifecycle management
-
Strong scripting skills (Python, Bash)
-
Understanding of cloud networking, security, and database fundamentals
-
Experience with HPC environments or schedulers is a plus
-
Strong problem-solving and communication skills Compensation & Benefits, * DevOps
-
IaC
-
CI/CD tools
-
ML/AI
-
observability toola
-
GPU infrastructure
Benefits & conditions
- Enhanced parental leave
- £500 annual learning and development budget
- Pension scheme
- Regular socials and quarterly gatherings
- Bike-to-Work scheme