MLOps Engineer (Kubernetes, Cloud, ML Workflows)

FitNext Co
Charing Cross, United Kingdom
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Cloud Computing
Continuous Integration
DevOps
Github
Python
Machine Learning
Prometheus
Grafana
Kubernetes
Machine Learning Operations
Terraform
Docker

Job description

The MLOps Engineer will be responsible for implementing best practices, managing high-scale ML workflows, and evolving a robust MLOps platform that supports the full ML lifecycle. The position involves close collaboration with ML engineers and product teams to deliver scalable, reliable, and secure infrastructure for machine learning systems., * Managing GPU-enabled Kubernetes clusters for distributed ML workloads.

  • Building automation and tooling in Python or Go.
  • Developing and maintaining CI/CD pipelines tailored for ML workflows.
  • Implementing observability solutions to ensure performance and reliability.
  • Driving innovation in infrastructure for large-scale, production-grade ML systems.
  • This is an on-site role in London (Soho, near Tottenham Court Road). Applications are considered only from candidates available to work on-site.

Requirements

Code, Python, Bedrock, Kubernetes, Aws, Infrastructure, Docker, Incident Response, * 7+ years of experience as a DevOps Engineer in large-scale, cloud-based environments (AWS preferred).

  • 2+ years of hands-on experience in MLOps environments.
  • Strong expertise with Kubernetes (including GPU clusters) and Docker.
  • Proficiency in Python, Go, or similar languages, focused on automation/tooling.
  • Experience with CI/CD tools such as ArgoCD or GitHub Actions for ML workflows.
  • Knowledge of Infrastructure-as-Code, particularly Terraform.
  • Familiarity with observability tools such as Prometheus and Grafana.
  • Background in incident response, including on-call rotations.

BONUS SKILLS

  • Practical experience with AWS ML services such as SageMaker or Bedrock.
  • Knowledge of emerging MLOps frameworks and tools.
  • Experience enabling Data Scientists with scalable ML experimentation infrastructure.

About the company

ABOUT THE COMPANY One of the world's fastest-growing AI companies is seeking an MLOps Engineer to help scale and optimize its machine learning infrastructure. The company collaborates with leading AI labs to advance frontier model capabilities in reasoning, coding, multimodality, multilinguality, and STEM knowledge, while also delivering mission-critical AI systems for global enterprises. Headquartered in the United States, the leadership team includes experts from Meta, Google, Microsoft, Apple, Amazon, and top universities such as Stanford, Caltech, and MIT. Recognized as one of the most promising B2B companies shaping the future of AI, the organization operates at the forefront of innovation.

Apply for this position