Site Reliability Engineer (all genders)
Role details
Job location
Tech stack
Job description
As Site Reliability Engineer (all genders) you will join a deeply technical squad of around 6 engineers responsible for the reliable operation and infrastructure behind the Intelligent Grid Platform (IGP). The platform runs across 100+ customer instances on Kubernetes, spanning Azure, Open Telekom Cloud, and on-premise environments. You design, build, and maintain the platform foundation: cluster provisioning, deployment pipelines, observability, secrets management, and infrastructure-as-code. You partner closely with the new Operations team that will run IGP day to day for customers, providing them with the tooling, monitoring, and automation they need to operate confidently. You work in a squad together with other SREs, reporting to the Engineering Manager.
How You Make an Impact
- You maintain Kubernetes clusters across multiple clouds and on-premise environments, ensuring they are reliable, secure, and cost-effective
- You develop and maintain infrastructure-as-code (Terraform, SaltStack) to manage 100+ customer instances with layered configuration
- You design and maintain observability (monitoring, alerting, SLOs) so that production issues surface early and are resolved quickly
- You own and evolve secrets management, certificate automation, and security tooling across the platform
- You reduce operational toil through automation, better tooling, and solid runbooks
- You participate in incident response, root cause analysis, and drive follow-ups so the same issues do not reoccur
- You collaborate with development squads and the Operations team to improve the overall reliability of the IGP, * Agile working method with Kanban in cooperation of all squads
- Continuous integration / Continuous delivery
- Working in small batches with fast reviews
- Knowledge sharing sessions between developers
- "You Code it - You Own it" - Team responsibility for certain functional areas of the product
- Blameless post-mortems and culture of continuous improvement
Our Tech Stack
- Multi-cloud, hybrid on-prem setup with Kubernetes and Helm as the common denominator
- Application primarily written in Python and TypeScript
- Standard backing services like PostgreSQL, RabbitMQ, Redis
- Gitlab & Gitlab CI for managing the Software Delivery Lifecycle
- Terraform for Infrastructure as Code
Your Benefits
- Join us fully remote #LI-Remote or at our lovely office in Cologne in a hybrid working mode
- Option for remote work from abroad (up to three months per year from anywhere in the EU or the USA)
- State of the art technology and modern tech stack
- Excellent hardware equipment (16 inch MacBooks, 2 screens at your workplace)
- 30 holidays + 3 corporate holidays
- Support for your health through sports membership cooperations
- Flexible use of a monthly mobility budget (e.g. Jobrad, public transport)
- Time and resources for individual growth
- envelio pension plan
- Regular company and team events
Requirements
Do you have experience in TypeScript?, Perfection is a myth! We're more interested in the human behind the screen, so think of these criteria as helpful directions - we're excited to see how your unique skills might fit in.
- You have proven experience running production workloads on Kubernetes in a cloud or hybrid environment
- You are comfortable with Linux administration, networking, and distributed systems
- You have hands-on experience with infrastructure-as-code tools such as Terraform or CloudFormation
- You have worked with configuration management tools like SaltStack, Ansible, or Chef
- You have experience with container and orchestration technology (Docker, Kubernetes, Helm) in production
- You understand monitoring and observability and have worked with tools like Datadog, Prometheus, or Grafana
- You communicate effectively in asynchronous, remote-first environments
- You are curious, enjoy learning, and are open to using AI tools in your daily work
- You are business-fluent in English (Level C1)
- You have experience as a software developer, ideally with languages like Python or Go
- Nice to have: German language skills