Site Reliability Engineer (SRE) - Kubernetes & Cloud Infrastructure

IONOS
Barcelona, Spain
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Barcelona, Spain

Tech stack

Bash
Cloud Computing
Computer Programming
Linux
DevOps
Distributed Systems
Github
Python
Log Analysis
Reliability Engineering
Prometheus
Data Logging
Scripting (Bash/Python/Go/Ruby)
Fluentd
Grafana
Gitlab-ci
Kubernetes
Terraform
ELK
Go

Requirements

resolve complex issues in distributed systems, contributing to the continuous improvement of the platform Develop and maintain monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK Stack) to proactively detect bottlenecks and sources of error Participate in on-call rotations, one week every 4 to 5 weeks Collaborate with product development teams to organize joint projects Manage incidents end-to-end: initial analysis, ticket creation, resolution, and follow-up through Problem Management Have access to up to one day per week for learning and continuous training Qualifications & Skills Several years of experience as an SRE or in similar roles (Linux System Administrator, DevOps Engineer, Platform Engineer, Full Stack Developer) Advanced expertise in Linux, container technologies, and especially Kubernetes Experience with Infrastructure as Code (preferably Terraform), CI/CD pipelines (GitLab CI/CD, GitHub Actions), and Helm charts Proficiency in at least one programming or scripting language (Go, Python, Bash) for automation and monitoring tasks Experience in operating and troubleshooting high-availability production environments Knowledge of monitoring, alerting, and log analysis for distributed applications (Prometheus, Grafana, FluentD, ELK, VictoriaMetrics, Icing) A proactive, solution-oriented, and independent working style, with the ability to systematically analyze and sustainably resolve technical problems Good command of English (spoken and written) Referrals increase your chances of interviewing at IONOS by 2x. Seniority level Mid-Senior level Employment type Full-time Job function & Industry Technology, Information and Media #J-18808-Ljbffr

About the company

Site Reliability Engineer - IONOS Applications Team IONOS is the largest European provider of cloud infrastructure, cloud services, and hosting solutions. We offer a long-term perspective in one of the most future-proof industries. Our culture is defined by open structures, flat hierarchies, first-name terms, and a strong team spirit. We firmly believe that work and fun are compatible and provide the right environment for it. Thanks to our continuous growth, we are looking for new colleagues to join us. Become part of IONOS and let's grow together! Responsibilities Contribute to the evolution of product infrastructure, integrating new services and applications into our cloud and Kubernetes environment Ensure the stable and secure operation of our platform Perform in-depth analysis and optimization of distributed and highly scalable environments Drive automation using tools such as Terraform, GitLab CI/CD, and Argo CD, managing infrastructure declaratively and reproducibly Analyze and

Apply for this position