Senior Site Reliability Engineer

Diligent Corporation

München, Germany

2 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, German

Experience level

Senior

Job location

Remote

München, Germany

Tech stack

Kubernetes Security

Bash

Computer Networks

Continuous Integration

Couchbase

Data Infrastructure

Software Debugging

DevOps

Github

Python

Key Management

Network Segmentation

NoSQL

Octopus Deploy

Role-Based Access Control

Reliability Engineering

Prometheus

Data Logging

Istio

Grafana

Software Security

Indexer

Gitlab-ci

Git Flow

Kubernetes

Linkerd (Service Mesh)

ELK

Jenkins

Job description

In this role you'll join our operations team for our MeetingSuite product in Munich - a flat and diverse SRE team of four engineers. It's a team where influence comes from example rather than authority. Your day-to-day is keeping our Kubernetes platforms observable, resilient and boring-to-upgrade: GitOps with Flux, multi-AZ design, zero-downtime releases, and a centralised observability story every service owner can use without calling SRE. Alongside that, you'll partner closely with our Application Security Engineer on Kubernetes and container security - with room to grow into our security champion over time - to keep the bar high for the DAX 30 and other DACH customers we serve.

If multi-cluster Kubernetes, GitOps, logging, monitoring and NoSQL database management on Kubernetes are in your vocabulary, read on.

Here's a breakdown of what you'll do (not all of it, just the important stuff):

Operate and continuously improve our Kubernetes production platforms, contributing to zero-downtime upgrades and multi-AZ resilience as team-wide goals.
Grow into the team's expert on our ELK-based log platform - centralised cross-cluster monitoring and anomaly detection - so every service owner can see, alert on and debug their workload without SRE hand-holding. Maintain and evolve our Prometheus alerting rules and Grafana dashboards alongside the team.
Partner with our Application Security Engineer on Kubernetes and container security - admission control, workload identity, secrets management, network segmentation and runtime threat detection - with an interest in growing into our security champion over time.
Love automation. Chip away at operational toil - deployments, monitoring setup, internal reporting - building on the baseline the team already has, and ship reliably through our GitOps workflow (Flux, GitLab CI).
Participate in our Standby and Daily Business rotation, lead incident response, run blameless post-mortems and drive the resulting action items to completion.

Requirements

Do you have experience in REST?, You're a seasoned Site Reliability Engineer with years spent running production Kubernetes at scale, and you're the kind of engineer who takes the initiative when something can be better - observability, resilience, a tricky upgrade, or the way the team thinks about security. You're looking for a role where that initiative has room to turn into real improvements on a platform that customers trust with their most confidential data., * Several years hands-on SRE, DevOps or Platform Engineering, including meaningful time running production Kubernetes at scale.

Strong Kubernetes expertise with deep hands-on experience in at least one area - cluster lifecycle and upgrades, workload identity and RBAC, admission control, network policies, or custom resources and operators - and working familiarity with the rest.
Solid grasp of Kubernetes and container security - secrets management, network segmentation and runtime protection - and an interest in growing into our security champion alongside our Application Security Engineer.
Proven depth in the ELK stack (or a very similar log platform) - pipelines, indexing, dashboards, alerting - with an interest in growing into the team's observability expert. Working knowledge of Prometheus and Grafana.
Comfortable with GitOps and CI/CD as a daily way of working (we run Flux and GitLab CI; equivalents like Argo CD, GitHub Actions or Jenkins are fine), and hands-on experience with Helm and Kustomize for managing manifests. Solid coding in Go, Python or Bash, with a love for automating away repetitive work.
Comfortable being on-call and leading incidents calmly under pressure.
Professional fluency in German and excellent English; at home working in a diverse team.

It would be great if you had these to, but we'll support you if you don't:

Experience in regulated industries (financial services, legal, healthcare, defence) or under compliance frameworks such as ISO 27001 or C5.
Track record of designing or contributing to custom Kubernetes Operators.
Service-mesh experience (Istio, Linkerd, Cilium).
A demonstrated interest in working shoulder-to-shoulder with AppSec engineers to raise platform security posture.
Experience operating Couchbase (Couchbase Operator, server groups, XDCR) or another stateful data platform on Kubernetes.
Experience migrating ingress controllers or other cluster-wide components with zero customer downtime.
Experience with anomaly detection on platform telemetry. #LIHybrid

About the company

Diligent is the AI leader in governance, risk and compliance (GRC) SaaS solutions, helping more than 1 million users and 700,000 board members to clarify risk and elevate governance. The Diligent One Platform gives practitioners, the C-Suite and the board a consolidated view of their entire GRC practice so they can more effectively manage risk, build greater resilience and make better decisions, faster. At Diligent, we're building the future with people who think boldly and move fast. Whether you're designing systems that leverage large language models or part of a team reimaging workflows with AI, you'll help us unlock entirely new ways of working and thinking. Curiosity is in our DNA, we look for individuals willing to ask the big questions and experiment fearlessly - those who embrace change not as a challenge, but as an opportunity. The future belongs to those who keep learning, and we are building it together. At Diligent, you're not just building the future - you're an agent of positive change, joining a global community on a mission to make an impact. Learn more at diligent.com or follow us on LinkedIn and Facebook What Diligent Offers You * Creativity is ingrained in our culture. We are innovative collaborators by nature. We thrive in exploring how things can be differently both in our internal processes and to help our clients * We care about our people. Diligent offers a flexible work environment, global days of service, comprehensive health benefits, meeting free days, generous time off policy and wellness programs to name a few * We have teams all over the world. We may be headquartered in New York City, but we have office hubs in Washington D.C., Vancouver, London, Galway, Budapest, Munich, Bengaluru, Singapore, and Sydney. * Diversity is important to us. Growing, maintaining and promoting a diverse team is a top priority for us. We foster and encourage diversity through our Employee Resource Groups and provide access to resources and education to support the education of our team, facilitate dialogue, and foster understanding. Diligent created the modern governance movement. Our world-changing idea is to empower leaders with the technology, insights and connections they need to drive greater impact and accountability - to lead with purpose. Our employees are passionate, smart, and creative people who not only want to help build the software company of the future, but who want to make the world a more sustainable, equitable and better place. Headquartered in New York, Diligent has offices in Washington D.C., London, Galway, Budapest, Vancouver, Bengaluru, Munich, Singapore and Sydney. To foster strong collaboration and connection, this role will follow a hybrid work model. If you are within a commuting distance to one of our Diligent office locations, you will be expected to work onsite at least 50% of the time. We believe that in-person engagement helps drive innovation, teamwork, and a strong sense of community.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all