Site Reliability Engineer

Abbott

Barcelona, Spain

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Barcelona, Spain

Java

Amazon Web Services (AWS)

Azure

Cloud Computing

DevOps

Disaster Recovery

Distributed Systems

EHealth

Python

Load Testing

RabbitMQ

Reliability Engineering

Cloud Services

Prometheus

Data Streaming

Grafana

Backend

Kotlin

Kubernetes

Kafka

Data Management

Terraform

Establish and improve SLOs, SLIs, and SLAs across services; partner with engineering teams to embed reliability targets into product designs.
Build and evolve monitoring, alerting, and tracing systems to ensure rapid detection and resolution of issues.
Develop incident response processes, oncall rotations, and postmortem practices that drive continuous improvement.
Implement automation for deployment pipelines, failover, scaling, and capacity planning to reduce manual operations and error risk.
Champion security and compliancedriven infrastructure, including secrets management, secure networking, and audit readiness.
Collaborate on disaster recovery strategies and resilience testing (chaos engineering, load testing, rolling updates, blue/green deployments).
Partner with developers to identify performance bottlenecks, optimize services, and reduce infrastructure costs.
Contribute to internal tooling and developer experience to accelerate safe delivery of features in production.

5+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles for distributed systems at scale.
Deep expertise with Kubernetes, container orchestration, and service meshes in production environments.
Strong skills in observability tooling (Prometheus, Grafana, OpenTelemetry, etc.) and incident management systems.
Experience designing HA/DR architectures, managing multiregion deployments, and optimizing for lowlatency traffic flows.
Proficiency with cloud platforms (AWS/GCP/Azure) and infrastructureascode (Terraform, Helm).
Security and compliance mindset, comfortable with regulated environments (HIPAA/GDPR) and auditing requirements.
Excellent crossfunctional communication and collaboration skills.

PREFERRED QUALIFICATIONS

Experience with streaming/messaging systems (Kafka, RabbitMQ) in production.
Background in digital health, IoT, or other missioncritical data platforms.
Familiarity with chaos engineering tools and costoptimization strategies for global cloud services.
Development experience in a modern backend language (Java, Kotlin, Go, Python) for tooling and automation.