Senior Site Reliability Engineer

SysEleven GmbH
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English, German
Experience level
Senior

Job location

Tech stack

API
Bash
Software as a Service
Databases
Linux
Python
PostgreSQL
OpenStack
Reliability Engineering
Data Logging
Grafana
Gitlab-ci
Kubernetes
Terraform
Go

Job description

As a Senior Site Reliability Engineer (m/f/x) at SysEleven, you design, build, and operate APIs that power the automation and reliability of our as-a-Service products, such as Database as a Service. You use Infrastructure as Code to standardize and scale our platforms, and you continuously improve CI/CD pipelines to ensure secure, resilient, and efficient delivery processes. With GitOps practices and Kubernetes orchestration, you reduce operational complexity and enable stable, predictable deployments that support our customers' critical workloads. You take ownership of reliability end to end, contribute to a culture of continuous improvement, and lead by example in solving complex technical challenges that shape the future of our services.

Your tasks

  • Ensure the reliability, availability, and performance of our Database- and Observability-as-a-Service products
  • Manage container-based applications in Kubernetes with a strong focus on security and resilience
  • Lead incident response, root cause analysis, and sustainable remediation efforts
  • Apply GitOps principles using Helm and Argo CD
  • Develop API services and tooling in Go to deliver stable SaaS products
  • Build and optimize CI/CD pipelines to improve deployment safety and system stability
  • Design and manage scalable infrastructure using IaC tools (e.g., Terraform) in cloud environments

Our Technologies and Tech Stack:

  • Go, Python, Bash
  • OpenStack, Kubernetes, Cilium, Envoy, Kyverno
  • Terraform, Crossplane, Argo CD, GitLab CI
  • PostgreSQL, Grafana, Loki, Mimir

Requirements

Do you have experience in Terraform?, * Several years of experience operating highly available systems in Linux and Kubernetes environments

  • Strong understanding of observability concepts (monitoring, logging, tracing)
  • Practical development experience in Go (knowledge of Python or Rust is a plus)
  • Experience with Infrastructure-as-Code tools such as Terraform or OpenTofu
  • Hands-on experience in incident management and structured root cause analysis
  • Familiarity with CI systems, especially GitLab CI
  • Strong problem-solving skills and good communication skills in German and English (minimum B2 level)

About the company

At SysEleven, you take ownership of the reliability of customer-facing services such as Database as a Service and Observability as a Service, which are deeply integrated into our cloud and Kubernetes platforms. You actively contribute to the daily operations and continuous improvement of these services, focusing on stability, performance, and automation maturity. We value a blameless culture, open communication, and knowledge sharing - whether in day-to-day collaboration, internal "Show & Tell" sessions, or at external conferences. You will have the autonomy to drive reliability initiatives strategically and shape robust, sustainable platform solutions together with the team.

Apply for this position