Site Reliability Engineer

Edo

Barcelona, Spain

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Barcelona, Spain

Tech stack

Artificial Intelligence

Data analysis

JIRA

Azure

Cloud Computing

Continuous Integration

Software Debugging

Distributed Systems

Elasticsearch

Github

Identity and Access Management

Python

Virtual Desktops

Reliability Engineering

Zero Trust Network Access

YAML

Datadog

SSL Certificate Management

Google Cloud Platform

GitHub Copilot

Grafana

Mttr

Firewalls (Computer Science)

Gitlab-ci

Kubernetes

Data Analytics

Kafka

Fortinet

Kibana

Terraform

GPT

Docker

Jenkins

Job description

We are looking for an experienced Site Reliability Engineer (SRE) to join our team in Barcelona. You will focus on ensuring the resilience and observability of our platform, using SRE principles such as SLIs/SLOs, error budgets, and toil reduction. You will work with experts to ensure our critical infrastructure is stable, scalable, and highly available through automation and data analysis., As an eDOer, you will have clear objectives, great challenges and a clear overview of how your work contributes to the global company project and its customers. We use Cloud platforms like Google Cloud Platform, focusing on the transition toward software-defined infrastructure. As an SRE, you will use these tools to maximize uptime and system stability:

Kubernetes
ArgoCD
Horizon WorkSpace (Virtual Desktops)
Certificate Management
GCVE
GKE
Dockers
Google Cloud, Azure, Amazon
F5 LoadBalancers
Fortigate Firewall
ZTNA/Sase
Identity Management
Datadog
Grafana
Kibana
Elasticsearch
Kafka
Corporate Services
Jira Service
Security Services
Github
Jenkins

You will be responsible for:

Leading incident response, triage, and troubleshooting in complex distributed systems.
Designing and implementing automated remediation strategies to reduce operational toil.
Managing comprehensive observability (monitoring, alerts, logs) to maintain ecosystem health.
Facilitating blameless post-mortems and driving improvements based on incident learning.
Act as an internal consultant and evangelist. You will train multidisciplinary product teams on how to instrument their code effectively (using OpenTelemetry/APM) and build their own custom dashboards.

We apply the GitOps paradigm and extreme automation to ensure stability, using:

Terraform for Infrastructure as Code
GitLab-CI and Jenkins for CI/CD
Docker as the Build-Ship-Run anywhere philosophy
Kubernetes as the preferred orchestrator
Helm for easily deployable applications

As an SRE, your mission will be:

Define and monitor SLIs/SLOs to align technical performance with business needs.
Optimize infrastructure through code to ensure scalability and availability.
Create and manage efficient and automated product deployment lifecycle.
Collaborate with other Company teams to implement best practices for the development lifecycle.
Implement continuous integration, delivery and deployment methodologies.

Your ultimate goal is to drive down Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR) by providing automated, correlated insights during high-severity incidents.

Requirements

Are you passionate about system stability and continuous improvement? Join us to lead the reliability of our global services., We are looking for professionals with a data-driven approach to reliability and a passion for technical diagnosis:

Solid experience in the incident management lifecycle and triage.
Advanced debugging skills in distributed systems.
Solid experience in Infrastructure as Code: Terraform, Terragrunt.
Experience in Scripting languages: Python, YAML, Go, ...
Experience in automation tools: terraform, argoCD, Crossplane...
Experience with Orchestrators: Kubernetes.
Proactive and data-driven approach to reliability improvement.
Experience with GCP.
Methodical (Definition of Done, Ways of working).
Always looking for innovation and improvement.
CAN DO attitude is a must.
Open to learn attitude is a must.
Desirable: Network Engineer experience.
Desirable: Docker experience.
Desirable: Experience creating MCPs and A2A., * Experience with Applied AI Tools: Demonstrated comfort using practical AI tools such as GitHub Copilot, ChatGPT, or other AI-powered coding assistants.
Experimentation Mindset: Curiosity and eagerness to explore, experiment with, and integrate emerging AI-driven solutions into different workflows.
AI-Enhanced Problem Solving: Ability to effectively leverage AI tools to enhance productivity.
Adaptability and Learning Agility: Enthusiastic about continuously learning and quickly adapting to new AI features and capabilities.
Collaboration with AI: Experience or interest in collaborating closely with AI tools to complement traditional practices.

Benefits & conditions

Prime Plus membership, competitive salary and benefits package, including flexible benefits, performance-based bonuses, birthday day off, discounts and partnerships, relocation support and premium equipment with role-based selection options and device ownership through our equipment lifecycle program when it reaches its refresh cycle.

Continuous learning to fuel your growth and explore new horizons! Learn and grow with free Coursera access, soft skills workshops, tech training, leadership development, and more. Plus, enjoy a great onboarding program.

Grow opportunities to empower your career, and unleash your potential! Personalised career paths and the eVOLVE Program will help you discover, grow, and thrive. Internal mobility opportunities let you pursue horizontal career changes and promotions.

Your Well-being is our priority. Embrace Freedom and Flexibility! At eDO, we value flexibility, employee care, and transparency. We offer a hybrid home-office model focused on outcome. You'll be able to find the right work-personal life balance that suits you best.

Work hard, party hard! We believe in having fun and connecting with colleagues! Join eDO for after-work events, padel tournaments, parties, and more. Create communities based on your passions, like sports and music. Come to work as you are, with no dress code, and enjoy free fruit, coffee, and tea at our offices.

Enjoy a dynamic and healthy environment! Be innovative, take risks, and share your ideas. Our diverse and open-minded teams support high performance, learning, and growth. You'll work in an Agile mindset environment with recognition at our core.

About the company

Why eDreams ODIGEO We're the world's leading travel subscription platform. * We pioneered Prime, the first and largest travel subscription programme, which has topped over 8 million Prime members since launching in 2017. * Millions of customers every year across 44 markets - 5 brands: eDreams, GO Voyages, Opodo, Travellink, and the metasearch engine Liligo. * More than 100 million searches per day on our websites - more than 6 billion AI daily predictions. * Over 1,700 employees - More than 60 different nationalities from all continents - 99% permanent contracts. Prime members are subscribed to global travel, gaining access to a comprehensive multi-product offering for all their travel needs-including hotels, rail, flights, dynamic packages and car rental, among others-compounded by industry-leading flexibility features and exclusive, member-only benefits. This entire Prime experience is powered by a proprietary, industry-leading AI platform that delivers a smarter, hyper-personalised service and comprehensive travel experience globally to its members.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all