DevOps Engineer

Matchtech
Charing Cross, United Kingdom
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
£ 85K

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Azure
Backup Devices
Bash
Cloud Computing
Configuration Management
Databases
Continuous Integration
DevOps
Disaster Recovery
Github
Python
Machine Learning
Open Source Technology
Prometheus
Data Logging
Pulumi
Scripting (Bash/Python/Go/Ruby)
Grafana
Cloudformation
Gitlab-ci
Kubernetes
Information Technology
Machine Learning Operations
Terraform
Docker
Jenkins

Job description

  • Design and manage scalable cloud infrastructure for high-performance RL training and distributed environments
  • Build and optimise CI/CD pipelines for open-source and enterprise components
  • Implement containerisation and orchestration using Docker and Kubernetes
  • Develop Infrastructure as Code solutions (Terraform, CloudFormation, Pulumi)
  • Implement monitoring, logging, and alerting for distributed ML systems
  • Collaborate with ML teams on resource optimisation and cost efficiency
  • Apply security best practices, manage access controls, and ensure compliance
  • Automate operational tasks: backups, disaster recovery, maintenance
  • Support GPU clusters and distributed compute resources for RL workloads
  • Maintain availability and performance of production ML systems

Requirements

  • Degree in Computer Science/Engineering or 3+ years of DevOps/infrastructure experience

  • Strong background with AWS, GCP, or Azure, including ML/AI workloads

  • Proficiency with Docker, Kubernetes, and ML-focused orchestration

  • Experience with Terraform/CloudFormation/Pulumi and configuration management

  • Solid understanding of CI/CD tools (GitHub Actions, GitLab CI, Jenkins)

  • Knowledge of monitoring/observability tools (Prometheus, Grafana, OpenObserve)

  • Experience with GPU infrastructure and distributed ML compute frameworks

  • Familiarity with MLOps tools and model lifecycle management

  • Strong scripting skills (Python, Bash)

  • Understanding of cloud networking, security, and database fundamentals

  • Experience with HPC environments or schedulers is a plus

  • Strong problem-solving and communication skills Compensation & Benefits, * DevOps

  • IaC

  • CI/CD tools

  • ML/AI

  • observability toola

  • GPU infrastructure

Benefits & conditions

  • Enhanced parental leave
  • £500 annual learning and development budget
  • Pension scheme
  • Regular socials and quarterly gatherings
  • Bike-to-Work scheme

Apply for this position