Site Reliability Engineer

IONOS
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Tech stack

Bash
Cloud Computing
Continuous Integration
Linux
DevOps
Distributed Systems
Github
Python
Reliability Engineering
Cloud Services
Prometheus
Web Services
Data Logging
Fluentd
Grafana
Gitlab-ci
Kubernetes
Terraform
ELK
Go

Job description

IONOS is the largest European provider of cloud infrastructure, cloud services, and hosting solutions . We offer you a long-term perspective in one of the most future-proof industries.

Our culture is defined by open structures, flat hierarchies, first-name terms, and a strong team spirit . We firmly believe that work and fun are compatible and provide the right environment for it.

Thanks to our continuous growth , we are looking for new colleagues to join us. Become part of IONOS and let's grow together

Your Rol e as a Site Reliability Engineer (SRE ) in the IONOS Applications team, you will be part of the technical backbon e of critical platforms such as IONOS and STRATO Webmai l, as well as other web services operated on our Kubernetes platfor m

.You will work alongside experienced colleagues on the design of new resilient and high-performance services and product s, even under extreme loads

**.

Main Responsibiliti**

  • esContribute to th e evolution of product infrastructu re, integrating new services and applications into our cloud and Kubernetes environmen

  • t.Ensure th e stable and secure operati on of our platform

  • s.Perform in-depth analysis and optimization o f distributed and highly scalable environmen t

  • s.Driv e automati on using tools such a s Terraform, GitLab CI/CD, and Argo CD, managing infrastructure declaratively and reproducibl

  • y.Analyze and resolve complex issues in distributed systems, contributing to th e continuous improvement of the platfo r

  • m.Develop and maintai n monitoring, logging, and alerting solutio ns (e.g., Prometheus, Grafana, ELK Stack) to proactively detect bottlenecks and sources of erro

  • r.Participate i n on-call rotatio ns, one week every 4 to 5 week

  • s.Collaborate wit h product development tea ms to organize joint project

  • s.Manage incidents end-to-end: initial analysis, ticket creation, resolution, and follow-up throug h Problem Manageme n

  • t.Have access t o up to one day per week for learning and continuous traini n

**g.

Your Prof**

  • ileSeveral years of experience as an SRE or in similar roles (Linux System Administrator, DevOps Engineer, Platform Engineer, Full Stack Develope

  • r).Advanced expertise in Li nux, container technologies, and especial ly Kuberne t

  • es.Experience wi th Infrastructure as C ode (preferably Terraform ), CI/CD pipeli nes (GitLab CI/CD, GitHub Actions), a nd Helm cha r

  • ts.Proficiency in at least o ne programming or scripting langu age (Go, Python, Bash) for automation and monitoring tas

  • ks.Experience in operating and troubleshooting high-availability production environme n

  • ts.Knowledge of monitoring, alerting, and log analy sis for distributed applications (Prometheus, Grafana, FluentD, ELK, VictoriaMetrics, Icing

  • a) .A proactive, solution-oriented, and independ ent working style, with the ability to systematically analyze and sustainably resolve technical proble

  • ms.Good command of Engl ish (spoken and writte

n).

Requirements

Your Prof**

  • ileSeveral years of experience as an SRE or in similar roles (Linux System Administrator, DevOps Engineer, Platform Engineer, Full Stack Develope

  • r).Advanced expertise in Li nux, container technologies, and especial ly Kuberne t

  • es.Experience wi th Infrastructure as C ode (preferably Terraform ), CI/CD pipeli nes (GitLab CI/CD, GitHub Actions), a nd Helm cha r

  • ts.Proficiency in at least o ne programming or scripting langu age (Go, Python, Bash) for automation and monitoring tas

  • ks.Experience in operating and troubleshooting high-availability production environme n

  • ts.Knowledge of monitoring, alerting, and log analy sis for distributed applications (Prometheus, Grafana, FluentD, ELK, VictoriaMetrics, Icing

  • a) .A proactive, solution-oriented, and independ ent working style, with the ability to systematically analyze and sustainably resolve technical proble

  • ms.Good command of Engl ish (spoken and writte

About the company

IONOS is the largest European provider of cloud infrastructure, cloud services, and hosting solutions . We offer you a long-term perspective in one of the most future-proof industries. Our culture is defined by open structures, flat hierarchies, first-name terms, and a strong team spirit . We firmly believe that work and fun are compatible and provide the right environment for it. Thanks to our continuous growth , we are looking for new colleagues to join us. Become part of IONOS and let's grow together

Apply for this position