Senior MLOps Platform Architect (AWS | Kubernetes | Terraform)
Role details
Job location
Tech stack
Job description
Your Strategic Partner for HR, Payroll & Headhunting Solutions ? We are hiring a senior MLOps / DevOps / SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you haven't designed production-grade MLOps infrastructure, haven't built CI / CD for ML, or haven't deployed ML workloads on Kubernetes at scale, this role is not a fit. You will design, build, and own the AWS-based infrastructure, Kubernetes platform, CI / CD pipelines, and observability stack that supports our AI models (Agentic AI, NLU, ASR, Voice Biometrics, TTS). You will be the technical owner of MLOps infrastructure decisions, patterns, and standards. Location : Remote - Europe (PL / ES / PT / CZ / CY) Key Responsibilities MLOps Platform Architecture (from scratch) * Design and build AWS-based AI / ML infrastructure using Terraform (required). * Define standards for security, automation, cost efficiency, and governance.
- Architect infrastructure for ML workloads, GPU / accelerators, scaling, and high availability. Kubernetes & Model Deployment * Architect, build, and operate production Kubernetes clusters. * Containerize and productize ML models (Docker, Helm). * Deploy latency-sensitive and high-throughput models (ASR / TTS / NLU / Agentic AI). * Ensure GPU and accelerator nodes are properly integrated and optimized. CI / CD for Machine Learning * Build automated training, validation, and deployment pipelines (GitLab / Jenkins). * Implement canary, blue-green, and automated rollback strategies. * Integrate MLOps lifecycle tools (MLflow, Kubeflow, SageMaker Model Registry, etc.). Observability & Reliability * Implement full observability (Prometheus + Grafana). * Own uptime, performance, and reliability for ML production services. * Establish monitoring for latency, drift, model health, and infrastructure health. Collaboration & Technical Leadership * Work closely
Requirements
with ML engineers, researchers, and data scientists. * Translate experimental models into production-ready deployments. * Define best practices for MLOps across the company. Qualifications and Skills We're looking for a senior engineer with a strong DevOps / SRE background who has worked extensively with ML systems in production. The ideal candidate brings a combination of infrastructure, automation, and hands-on MLOps experience. * 5+ years in a Senior DevOps, SRE, or MLOps Engineering role supporting production environments. * Strong experience designing, building, and maintaining Kubernetes clusters in production. * Hands-on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure. * Solid programming skills in Python or Go for building automation, tooling, and ML workflows. * Proven experience creating and maintaining CI / CD pipelines (GitLab or Jenkins). * Practical experience deploying and supporting ML models in production (e.g., ASR, TTS, NLU, LLM / Agentic AI). * Familiarity with ML workflow orchestration tools such as Kubeflow, Apache Airflow, or similar. * Experience with experiment tracking and model registry tools (e.g., MLflow, SageMaker Model Registry ). * Exposure to deploying models on GPU or specialized hardware (e.g., Inferentia, Trainium ). * Solid understanding of cloud infrastructure on AWS, including networking, scaling, storage, and security best practices. * Experience with deployment tooling (Docker, Helm) and observability stacks (Prometheus, Grafana). Ways to Know You'll Succeed * You enjoy building platforms from the ground up and owning technical decisions. * You're comfortable collaborating with ML engineers, researchers, and software teams to turn research into stable production systems. * You like solving performance, automation, and reliability challenges in distributed systems. * You bring a structured, pragmatic, and, scalable
Benefits & conditions
approach to infrastructure design, * Energetic and proactive individual, with a natural drive to take initiative and move things forward. * Enjoys working closely with people - researchers, ML engineers, cloud architects, product teams. * Comfortable sharing ideas openly, challenging assumptions, and contributing to technical discussions. * Collaborative mindset : you like to build together, not work in isolation. * Strong ownership mentality - you enjoy taking responsibility for systems end-to-end. * Curious, hands-on, and motivated by solving complex technical challenges. * Clear communicator who can translate technical work into practical recommendations. * Thrives in a fast-paced environment where you can experiment, improve, and shape how things are done. What we offer * Competitive fixed compensation based on experience and expertise. * Work on cutting-edge AI systems used globall. * Dynamic, multi-disciplinary teams engaged in digital transformation. * Remote-first work model * Long-term B2B contract * 20+ days paid time off * Apple gear * Training & development budget Our Core values at TheHRchapter * ️ Transparency : We believe in transparent and smooth recruitment processes. You will get feedback from us. * ️ Candidate experience : Perfect blend between automated and humanized recruitment processes. Don't hesitate to ask us for feedback, anytime. * ️ Talented pool : We bring highly-skilled motivated candidates to our clients. Our candidates match their company values and management style. * ️ Diversity and inclusion : There is no place for discrimination and intolerance. We care about diversity awareness and respect for any differences. #J-18808-Ljbffr