Site Reliability Engineer

Edo
Barcelona, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote
Barcelona, Spain

Tech stack

Artificial Intelligence
Data analysis
JIRA
Azure
Cloud Computing
Continuous Integration
Software Debugging
Distributed Systems
Elasticsearch
Github
Identity and Access Management
Python
Virtual Desktops
Reliability Engineering
Zero Trust Network Access
YAML
Datadog
SSL Certificate Management
Google Cloud Platform
GitHub Copilot
Grafana
Mttr
Firewalls (Computer Science)
Gitlab-ci
Kubernetes
Data Analytics
Kafka
Fortinet
Kibana
Terraform
GPT
Docker
Jenkins
Go

Job description

We are looking for an experienced Site Reliability Engineer (SRE) to join our team in Barcelona. You will focus on ensuring the resilience and observability of our platform, using SRE principles such as SLIs/SLOs, error budgets, and toil reduction. You will work with experts to ensure our critical infrastructure is stable, scalable, and highly available through automation and data analysis., As an eDOer, you will have clear objectives, great challenges and a clear overview of how your work contributes to the global company project and its customers. We use Cloud platforms like Google Cloud Platform, focusing on the transition toward software-defined infrastructure. As an SRE, you will use these tools to maximize uptime and system stability:

  • Kubernetes
  • ArgoCD
  • Horizon WorkSpace (Virtual Desktops)
  • Certificate Management
  • GCVE
  • GKE
  • Dockers
  • Google Cloud, Azure, Amazon
  • F5 LoadBalancers
  • Fortigate Firewall
  • ZTNA/Sase
  • Identity Management
  • Datadog
  • Grafana
  • Kibana
  • Elasticsearch
  • Kafka
  • Corporate Services
  • Jira Service
  • Security Services
  • Github
  • Jenkins

You will be responsible for:

  • Leading incident response, triage, and troubleshooting in complex distributed systems.
  • Designing and implementing automated remediation strategies to reduce operational toil.
  • Managing comprehensive observability (monitoring, alerts, logs) to maintain ecosystem health.
  • Facilitating blameless post-mortems and driving improvements based on incident learning.
  • Act as an internal consultant and evangelist. You will train multidisciplinary product teams on how to instrument their code effectively (using OpenTelemetry/APM) and build their own custom dashboards.

We apply the GitOps paradigm and extreme automation to ensure stability, using:

  • Terraform for Infrastructure as Code
  • GitLab-CI and Jenkins for CI/CD
  • Docker as the Build-Ship-Run anywhere philosophy
  • Kubernetes as the preferred orchestrator
  • Helm for easily deployable applications

As an SRE, your mission will be:

  • Define and monitor SLIs/SLOs to align technical performance with business needs.
  • Optimize infrastructure through code to ensure scalability and availability.
  • Create and manage efficient and automated product deployment lifecycle.
  • Collaborate with other Company teams to implement best practices for the development lifecycle.
  • Implement continuous integration, delivery and deployment methodologies.

Your ultimate goal is to drive down Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR) by providing automated, correlated insights during high-severity incidents.

Requirements

Are you passionate about system stability and continuous improvement? Join us to lead the reliability of our global services., We are looking for professionals with a data-driven approach to reliability and a passion for technical diagnosis:

  • Solid experience in the incident management lifecycle and triage.
  • Advanced debugging skills in distributed systems.
  • Solid experience in Infrastructure as Code: Terraform, Terragrunt.
  • Experience in Scripting languages: Python, YAML, Go, ...
  • Experience in automation tools: terraform, argoCD, Crossplane...
  • Experience with Orchestrators: Kubernetes.
  • Proactive and data-driven approach to reliability improvement.
  • Experience with GCP.
  • Methodical (Definition of Done, Ways of working).
  • Always looking for innovation and improvement.
  • CAN DO attitude is a must.
  • Open to learn attitude is a must.
  • Desirable: Network Engineer experience.
  • Desirable: Docker experience.
  • Desirable: Experience creating MCPs and A2A., * Experience with Applied AI Tools: Demonstrated comfort using practical AI tools such as GitHub Copilot, ChatGPT, or other AI-powered coding assistants.
  • Experimentation Mindset: Curiosity and eagerness to explore, experiment with, and integrate emerging AI-driven solutions into different workflows.
  • AI-Enhanced Problem Solving: Ability to effectively leverage AI tools to enhance productivity.
  • Adaptability and Learning Agility: Enthusiastic about continuously learning and quickly adapting to new AI features and capabilities.
  • Collaboration with AI: Experience or interest in collaborating closely with AI tools to complement traditional practices.

Benefits & conditions

Prime Plus membership, competitive salary and benefits package, including flexible benefits, performance-based bonuses, birthday day off, discounts and partnerships, relocation support and premium equipment with role-based selection options and device ownership through our equipment lifecycle program when it reaches its refresh cycle.

Continuous learning to fuel your growth and explore new horizons! Learn and grow with free Coursera access, soft skills workshops, tech training, leadership development, and more. Plus, enjoy a great onboarding program.

Grow opportunities to empower your career, and unleash your potential! Personalised career paths and the eVOLVE Program will help you discover, grow, and thrive. Internal mobility opportunities let you pursue horizontal career changes and promotions.

Your Well-being is our priority. Embrace Freedom and Flexibility! At eDO, we value flexibility, employee care, and transparency. We offer a hybrid home-office model focused on outcome. You'll be able to find the right work-personal life balance that suits you best.

Work hard, party hard! We believe in having fun and connecting with colleagues! Join eDO for after-work events, padel tournaments, parties, and more. Create communities based on your passions, like sports and music. Come to work as you are, with no dress code, and enjoy free fruit, coffee, and tea at our offices.

Enjoy a dynamic and healthy environment! Be innovative, take risks, and share your ideas. Our diverse and open-minded teams support high performance, learning, and growth. You'll work in an Agile mindset environment with recognition at our core.

About the company

Why eDreams ODIGEO We're the world's leading travel subscription platform. * We pioneered Prime, the first and largest travel subscription programme, which has topped over 8 million Prime members since launching in 2017. * Millions of customers every year across 44 markets - 5 brands: eDreams, GO Voyages, Opodo, Travellink, and the metasearch engine Liligo. * More than 100 million searches per day on our websites - more than 6 billion AI daily predictions. * Over 1,700 employees - More than 60 different nationalities from all continents - 99% permanent contracts. Prime members are subscribed to global travel, gaining access to a comprehensive multi-product offering for all their travel needs-including hotels, rail, flights, dynamic packages and car rental, among others-compounded by industry-leading flexibility features and exclusive, member-only benefits. This entire Prime experience is powered by a proprietary, industry-leading AI platform that delivers a smarter, hyper-personalised service and comprehensive travel experience globally to its members.

Apply for this position