AI Platform / MLOps Engineer
Role details
Job location
Tech stack
Job description
What you'll do Operate and scale AI/ML platforms end-to-end, including training, inference, pipelines, and production environments Build and maintain robust ML infrastructure using tools such as AWS SageMaker, MLflow, feature stores, and related ML platform components Design and implement CI/CD pipelines for ML models, AI workloads, and platform services Set up and optimise training and inference environments for reliability, scalability, and performance Implement observability, monitoring, alerting, and cost-control mechanisms for AI workloads Support production deployments of ML/AI systems with a strong focus on automation and operational excellence Work with DevOps and platform tooling such as AWS, Terraform, Kubernetes, Docker, GitHub Actions / CI/CD tools Collaborate with AI Engineers, Data Scientists, Data Engineers, and Tech Leads to ensure AI solutions are production-ready Contribute to best practices around MLOps, model versioning, experiment
Requirements
tracking, deployment, monitoring, and governance Work with LLM and agentic tooling ecosystems such as LangChain, LangFuse, LangSmith, or similar platforms Troubleshoot production issues related to infrastructure, pipelines, inference performance, latency, reliability, and cost Must Have Solid background in Platform Engineering, DevOps, Cloud Engineering, MLOps, or ML Platform Engineering Hands-on experience with AWS and cloud-native services Experience with Infrastructure as Code, especially Terraform Strong experience building and maintaining CI/CD pipelines Experience with ML platform tooling such as SageMaker, MLflow, feature stores, or similar tools Understanding of ML/AI workflows: training, inference, model deployment, pipelines, monitoring, and lifecycle management Experience setting up and managing production environments for AI/ML workloads Strong understanding of observability, monitoring, alerting, scalability, and cost optimisation Familiarity with containerisation and orchestration tools such as Docker and Kubernetes Experience with LLM / agentic tooling such as LangChain, LangFuse, LangSmith, or similar frameworks/platforms Strong automation mindset and ability to build reliable, repeatable, production-grade systems Strong problem-solving skills and ownership mindset Fluent English and Spanish Nice to Have Experience with data pipelines or data engineering workflows Experience with AWS Bedrock, vector databases, or LLM infrastructure Experience with model monitoring, drift detection, evaluation pipelines, or AI observability platforms Experience with workflow orchestration tools such as Airflow, Prefect, or similar Knowledge of security, governance, and compliance practices for AI/ML platforms Experience working in Agile / Scrum environments Previous experience in travel, aviation, digital platforms, or large-scale enterprise environments Hybrid model - 2 days onsite per week Why join this project? People first - diverse and inclusive culture in an international environment. Modern cloud platforms and large-scale, globa