Site Reliability Engineer

Abbott
Barcelona, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Barcelona, Spain

Tech stack

Java
Amazon Web Services (AWS)
Azure
Cloud Computing
DevOps
Disaster Recovery
Distributed Systems
EHealth
Python
Load Testing
RabbitMQ
Reliability Engineering
Cloud Services
Prometheus
Data Streaming
Grafana
Backend
Kotlin
Kubernetes
Kafka
Data Management
Terraform
Go

Job description

  • Establish and improve SLOs, SLIs, and SLAs across services; partner with engineering teams to embed reliability targets into product designs.

  • Build and evolve monitoring, alerting, and tracing systems to ensure rapid detection and resolution of issues.

  • Develop incident response processes, oncall rotations, and postmortem practices that drive continuous improvement.

  • Implement automation for deployment pipelines, failover, scaling, and capacity planning to reduce manual operations and error risk.

  • Champion security and compliancedriven infrastructure, including secrets management, secure networking, and audit readiness.

  • Collaborate on disaster recovery strategies and resilience testing (chaos engineering, load testing, rolling updates, blue/green deployments).

  • Partner with developers to identify performance bottlenecks, optimize services, and reduce infrastructure costs.

  • Contribute to internal tooling and developer experience to accelerate safe delivery of features in production.

Requirements

  • 5+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles for distributed systems at scale.

  • Deep expertise with Kubernetes, container orchestration, and service meshes in production environments.

  • Strong skills in observability tooling (Prometheus, Grafana, OpenTelemetry, etc.) and incident management systems.

  • Experience designing HA/DR architectures, managing multiregion deployments, and optimizing for lowlatency traffic flows.

  • Proficiency with cloud platforms (AWS/GCP/Azure) and infrastructureascode (Terraform, Helm).

  • Security and compliance mindset, comfortable with regulated environments (HIPAA/GDPR) and auditing requirements.

  • Excellent crossfunctional communication and collaboration skills.

PREFERRED QUALIFICATIONS

  • Experience with streaming/messaging systems (Kafka, RabbitMQ) in production.

  • Background in digital health, IoT, or other missioncritical data platforms.

  • Familiarity with chaos engineering tools and costoptimization strategies for global cloud services.

  • Development experience in a modern backend language (Java, Kotlin, Go, Python) for tooling and automation.

Apply for this position