Site Reliability Engineer

VanHack

24 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Tech stack

Amazon Web Services (AWS)

Data analysis

Azure

Reliability Engineering

Prometheus

Grafana

Reliability of Systems

Kubernetes

Terraform

Jenkins

Job description

We are seeking a Senior Site Reliability Engineer (f/m/d) to join our team in Barcelona, supporting the continued growth and stability of our cloud-based infrastructure.

In this role, you will be responsible for ensuring the reliability, scalability, and efficiency of our systems across multiple cloud platforms. You'll develop and enhance monitoring, observability, and automation frameworks, driving improvements in uptime and performance. Collaborating closely with development, security, and product teams, you'll help strengthen system resilience and optimize incident response and reliability processes using data-driven insights.

Requirements

Strong experience with cloud platforms such as AWS, GCP, Linode, or Azure
Proficiency with container orchestration technologies (Kubernetes, EKS, or GKE)
Hands-on experience with observability tools (Grafana-Prometheus stack)
Skilled in CI/CD pipelines and infrastructure-as-code (Jenkins, Argo, Terraform)
Analytical, proactive, and passionate about continuous improvement and system reliability