Senior Site Reliability Engineer

SysEleven GmbH

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, German

Experience level

Senior

Job location

Tech stack

API

Bash

Software as a Service

Databases

Linux

Python

PostgreSQL

OpenStack

Reliability Engineering

Data Logging

Grafana

Gitlab-ci

Kubernetes

Terraform

Job description

As a Senior Site Reliability Engineer (m/f/x) at SysEleven, you design, build, and operate APIs that power the automation and reliability of our as-a-Service products, such as Database as a Service. You use Infrastructure as Code to standardize and scale our platforms, and you continuously improve CI/CD pipelines to ensure secure, resilient, and efficient delivery processes. With GitOps practices and Kubernetes orchestration, you reduce operational complexity and enable stable, predictable deployments that support our customers' critical workloads. You take ownership of reliability end to end, contribute to a culture of continuous improvement, and lead by example in solving complex technical challenges that shape the future of our services.

Your tasks

Ensure the reliability, availability, and performance of our Database- and Observability-as-a-Service products
Manage container-based applications in Kubernetes with a strong focus on security and resilience
Lead incident response, root cause analysis, and sustainable remediation efforts
Apply GitOps principles using Helm and Argo CD
Develop API services and tooling in Go to deliver stable SaaS products
Build and optimize CI/CD pipelines to improve deployment safety and system stability
Design and manage scalable infrastructure using IaC tools (e.g., Terraform) in cloud environments

Our Technologies and Tech Stack:

Go, Python, Bash
OpenStack, Kubernetes, Cilium, Envoy, Kyverno
Terraform, Crossplane, Argo CD, GitLab CI
PostgreSQL, Grafana, Loki, Mimir

Requirements

Do you have experience in Terraform?, * Several years of experience operating highly available systems in Linux and Kubernetes environments

Strong understanding of observability concepts (monitoring, logging, tracing)
Practical development experience in Go (knowledge of Python or Rust is a plus)
Experience with Infrastructure-as-Code tools such as Terraform or OpenTofu
Hands-on experience in incident management and structured root cause analysis
Familiarity with CI systems, especially GitLab CI
Strong problem-solving skills and good communication skills in German and English (minimum B2 level)

About the company

At SysEleven, you take ownership of the reliability of customer-facing services such as Database as a Service and Observability as a Service, which are deeply integrated into our cloud and Kubernetes platforms. You actively contribute to the daily operations and continuous improvement of these services, focusing on stability, performance, and automation maturity. We value a blameless culture, open communication, and knowledge sharing - whether in day-to-day collaboration, internal "Show & Tell" sessions, or at external conferences. You will have the autonomy to drive reliability initiatives strategically and shape robust, sustainable platform solutions together with the team.