Data Engineer
Role details
Job location
Tech stack
Job description
We are looking for a MLOps / AIOps / LLMOps / AgentOps Engineer to join a multidisciplinary Data & AI team. The main mission of this role is to design, operate, and continuously evolve our AIOps platform, ensuring that our AI products run in a reliable, scalable, and cost-efficient way.
This position is strongly focused on platform, infrastructure, automation, observability, and operations rather than on building ML models or AI products themselves.
You will work with modern cloud technologies (mainly AWS, with some Azure exposure) and collaborate closely with Data Scientists, Data Engineers, and Product teams to bring AI solutions into production and keep them running smoothly.
We are open to candidates with strong expertise in at least one core area (e.g. cloud, DevOps, platform engineering, or ML operations) and solid foundational knowledge in the others, with motivation to grow across the full AI operations stack., * Design, maintain, and evolve the AIOps platform supporting:
- Traditional machine learning models in production
- LLM-based solutions such as RAG pipelines and AI Agents
- Speech Analytics use cases (ASR, conversation analysis, NLP)
- Build and operate ML and LLM pipelines with a strong focus on:
- Reliability, automation, and observability
- Model and LLM quality, performance, and drift monitoring
- Cloud cost control and optimization
- Implement LLMOps / AgentOps practices, including:
- LLM evaluation and observability
- Prompt management, traceability, and specialized logging
- Agent integration, orchestration, and lifecycle management
- Ensure continuous operation of AI products, including:
- Alerts, dashboards, SLOs / SLIs
- Scalability strategies and basic auto-remediation mechanisms
- Manage deployments in cloud environments (AWS / Azure) and container platforms (Docker / Kubernetes)
- Collaborate closely with Data Scientists and Data Engineers to productionize robust, scalable AI solutions
- Contribute to internal standards, automation, and best practices across the AI and data ecosystem
Requirements
- Hands-on experience in MLOps, AIOps, or operating ML systems in production
- Solid understanding of LLMOps and AgentOps concepts (RAGs, agents, evaluation, monitoring)
- Experience working with AWS and/or Azure in production environments
- Practical knowledge of containers and Kubernetes (Docker, basic Helm usage, etc.)
- Experience with CI/CD pipelines (GitHub Actions, GitLab CI, Azure DevOps, Jenkins, or similar)
- Familiarity with observability and monitoring concepts (CloudWatch, OpenTelemetry, Prometheus, etc.)
- Experience managing infrastructure as code (Terraform, Bicep, CDK, or similar)
- Python experience and familiarity with the ML ecosystem (e.g. scikit-learn, PyTorch), even if not a Data Scientist
- Good understanding of the ML / LLM lifecycle, from development to production and monitoring
- Fluent English to work in an international environment
Nice to Have (Not Required, but Valuable)
- Experience with ML/AI platforms such as SageMaker, Azure ML, MLflow, Kubeflow
- Exposure to Speech Analytics technologies (ASR, diarization, conversational NLP)
- Experience with cloud cost optimization / FinOps, especially for AI workloads
- Experience building or operating AI agents, copilots, or conversational systems
- Familiarity with LLM frameworks (LangChain, LlamaIndex, Semantic Kernel, etc.)
- Experience with workflow and orchestration tools (Airflow, Argo, Step Functions, Durable Functions)
Professional Skills & Mindset
- Strong focus on reliability, automation, and scalability
- Ability to collaborate effectively in multidisciplinary teams
- Clear communication and documentation-oriented mindset
- Platform mindset: building reusable, maintainable, and robust solutions
- Proactive, analytical, and continuous-improvement driven
- Strong sense of ownership and end-to-end responsibility
- Motivation to learn and grow across the AI operations stack