Senior MLOps / Machine Learning Engineer: LLMs & Agentic AI - Reply
Role details
Job location
Tech stack
Job description
As a Senior MLOps / ML Engineer at Data Reply, you will take ownership of architecting and deploying ML and GenAI solutions. You'll be hands-on at every stage - from proof-of-concept through production - and you'll help mentor junior AI engineers. A particular focus will be on deploying large-language models (LLMs) and AI agents at scale, integrating them with enterprise workflows, and ensuring repeatable, cost-efficient AWS architectures., * Leading solution workshops to design scalable ML systems on AWS using services like VPC, IAM, SageMaker Studio, Lambda, and EKS
- You'll build CI/CD pipelines using GitHub Actions, Jenkins, and AWS CodePipeline for deploying traditional ML, GenAI models, and AI agents
- Deploying LLMs (e.g., via Huggingface) and construct AI agent workflows using tools like LangChain, LangGraph, and custom orchestrators
- Your expertise will help reduce cloud costs with GPU acceleration, auto-scaling, and spot instances
- To implement model lifecycle tools (MLflow, SageMaker Registry), performance dashboards, alerts, and automated retraining pipelines
- Connecting ML models to client systems using APIs, Kafka, and build agent workflows with vector databases (Pinecone, Weaviate)
- You'll enforce secure, compliant, and ethical practices-VPC design, IAM policies, data encryption, and adherence to GDPR
- You'll be a trusted advisor and mentor, presenting technical solutions, managing expectations, and guiding junior team members
Requirements
-
University degree in Computer Science, Mathematics or in a directly related field (2.1 min grade)
-
3+ years in MLOps/ML Engineering experience, plus 5+ years in Python software development or data science
-
Skilled in SageMaker (training, endpoints, pipelines), Lambda, Step Functions, S3, and CloudWatch
-
Proficiency with Terraform or AWS CDK, Docker, and Kubernetes (EKS/Fargate)
-
Experienced with MLflow (or alternatives), GitHub Actions, Jenkins, AWS CodePipeline, and automated testing
-
You've got hands-on experience with deploying LLMs and building AI agents using LangChain or custom frameworks
-
Strong background in building data pipelines with Airflow/dbt and managing features via Feast or similar tools
-
You have experience building dashboards with CloudWatch/Prometheus/Grafana and implementing data validation with Great Expectations
-
It would be beneficial to have exposure to consulting/presales, MCP deployment, Databricks, and AWS ML Specialty certified