Senior AIOps Engineer

Groupon
Municipality of Madrid, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Municipality of Madrid, Spain

Tech stack

API
Artificial Intelligence
Computing Platforms
Cloud Engineering
Computer Programming
DevOps
Python
Prometheus
Webui
Search Technologies
AI Infrastructure
Istio
Large Language Models
Grafana
Containerization
Kubernetes
Machine Learning Operations
Terraform
Docker
ELK

Job description

As a Senior AIOps Engineer, you won't just be managing servers; you will be the architect of the "Golden Paths"-the reusable, automated infrastructure that enables our product teams to ship LLMs, Vector Search, and AI Agents faster than ever before., * Architect the AI Stack: Design and operate core infrastructure on Kubernetes, including Vector Databases, LLM Gateways (LiteLLM), and workflow automation tools (n8n).

  • Enable at Scale: Drive AI adoption by creating self-service "Golden Paths" using Terraform and Helm, allowing engineering teams to deploy RAG pipelines with one click.
  • Operational Excellence: Implement centralized observability, tracing (Langfuse), and governance to ensure our AI systems are reliable, auditable, and secure.
  • Fiscal Discipline: Own the "AI Bill"-monitoring token usage and latency to optimize spend while maintaining high performance., You will work with a cutting-edge stack including Kubernetes, LiteLLM, Open WebUI, n8n, Langfuse, and Vector Databases, all running within a modern cloud-native ecosystem.

Requirements

  • Platform Expertise: 5+ years in Platform Engineering, SRE, or DevOps within a cloud-native environment.
  • Kubernetes Mastery: Deep experience managing stateful and stateless workloads (Helm, Istio, Docker).
  • AI Infrastructure Fluency: Hands-on experience deploying and operating AI/ML tools or data-intensive systems in production.
  • Coding Proficiency: Strong skills in Python or Go to build custom API wrappers and automate operational tasks.
  • Monitoring & Tracing: Expertise in Prometheus, Grafana, and ELK stack to ensure end-to-end observability of complex AI requests.

About the company

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis. Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.

Apply for this position