Site Reliability Engineer
Role details
Job location
Tech stack
Job description
IONOS is the largest European provider of cloud infrastructure, cloud services, and hosting solutions . We offer you a long-term perspective in one of the most future-proof industries.
Our culture is defined by open structures, flat hierarchies, first-name terms, and a strong team spirit . We firmly believe that work and fun are compatible and provide the right environment for it.
Thanks to our continuous growth , we are looking for new colleagues to join us. Become part of IONOS and let's grow together
Your Rol e as a Site Reliability Engineer (SRE ) in the IONOS Applications team, you will be part of the technical backbon e of critical platforms such as IONOS and STRATO Webmai l, as well as other web services operated on our Kubernetes platfor m
.You will work alongside experienced colleagues on the design of new resilient and high-performance services and product s, even under extreme loads
**.
Main Responsibiliti**
-
esContribute to th e evolution of product infrastructu re, integrating new services and applications into our cloud and Kubernetes environmen
-
t.Ensure th e stable and secure operati on of our platform
-
s.Perform in-depth analysis and optimization o f distributed and highly scalable environmen t
-
s.Driv e automati on using tools such a s Terraform, GitLab CI/CD, and Argo CD, managing infrastructure declaratively and reproducibl
-
y.Analyze and resolve complex issues in distributed systems, contributing to th e continuous improvement of the platfo r
-
m.Develop and maintai n monitoring, logging, and alerting solutio ns (e.g., Prometheus, Grafana, ELK Stack) to proactively detect bottlenecks and sources of erro
-
r.Participate i n on-call rotatio ns, one week every 4 to 5 week
-
s.Collaborate wit h product development tea ms to organize joint project
-
s.Manage incidents end-to-end: initial analysis, ticket creation, resolution, and follow-up throug h Problem Manageme n
-
t.Have access t o up to one day per week for learning and continuous traini n
**g.
Your Prof**
-
ileSeveral years of experience as an SRE or in similar roles (Linux System Administrator, DevOps Engineer, Platform Engineer, Full Stack Develope
-
r).Advanced expertise in Li nux, container technologies, and especial ly Kuberne t
-
es.Experience wi th Infrastructure as C ode (preferably Terraform ), CI/CD pipeli nes (GitLab CI/CD, GitHub Actions), a nd Helm cha r
-
ts.Proficiency in at least o ne programming or scripting langu age (Go, Python, Bash) for automation and monitoring tas
-
ks.Experience in operating and troubleshooting high-availability production environme n
-
ts.Knowledge of monitoring, alerting, and log analy sis for distributed applications (Prometheus, Grafana, FluentD, ELK, VictoriaMetrics, Icing
-
a) .A proactive, solution-oriented, and independ ent working style, with the ability to systematically analyze and sustainably resolve technical proble
-
ms.Good command of Engl ish (spoken and writte
n).
Requirements
Your Prof**
-
ileSeveral years of experience as an SRE or in similar roles (Linux System Administrator, DevOps Engineer, Platform Engineer, Full Stack Develope
-
r).Advanced expertise in Li nux, container technologies, and especial ly Kuberne t
-
es.Experience wi th Infrastructure as C ode (preferably Terraform ), CI/CD pipeli nes (GitLab CI/CD, GitHub Actions), a nd Helm cha r
-
ts.Proficiency in at least o ne programming or scripting langu age (Go, Python, Bash) for automation and monitoring tas
-
ks.Experience in operating and troubleshooting high-availability production environme n
-
ts.Knowledge of monitoring, alerting, and log analy sis for distributed applications (Prometheus, Grafana, FluentD, ELK, VictoriaMetrics, Icing
-
a) .A proactive, solution-oriented, and independ ent working style, with the ability to systematically analyze and sustainably resolve technical proble
-
ms.Good command of Engl ish (spoken and writte