Site Reliability Engineer

Helsing GmbH

Berlin, Germany

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Berlin, Germany

Tech stack

Artificial Intelligence

Bash

Cloud Engineering

Computer Networks

Software Debugging

Distributed Systems

Python

Machine Learning

Reliability Engineering

Ansible

Prometheus

Zero Trust Network Access

Software Engineering

Policy as Code

Scripting (Bash/Python/Go/Ruby)

Istio

Grafana

Reliability of Systems

Containerization

Templating

Kubernetes

Infrastructure Automation Frameworks

Linkerd (Service Mesh)

Machine Learning Operations

Terraform

Dynatrace

Job description

Much of our work takes place in high-security on-premise environments, and we are looking for Site Reliability Engineer to support our high security environments.

Your role as a Site Reliability Engineer will be to design, implement, and manage our on-premise Kubernetes infrastructure.

We are looking for engineers with a strong work ethic and prioritisation skills. We value team players who communicate clearly, share knowledge generously, and collaborate effectively to move their team - and our mission-forward.

The day-to-day

As a SRE, you will design and build cloud-native infrastructure platforms on-premises, focusing on Kubernetes-based solutions that enable our development teams to operate services at scale.
You will create robust observability frameworks using Grafana, Prometheus, and distributed tracing to ensure system reliability and performance
You will architect and implement secure, multi-tenant Kubernetes clusters with strong access controls, policy-as-code governance, and zero-trust networking between red and black network domains. You will develop operators and controllers to automate infrastructure provisioning and compliance
You will build and maintain MLOps platforms enabling AI researchers to deploy, monitor, and scale machine learning models in production.
You will collaborate closely with our Security teams to implement supply chain security, container scanning, and runtime protection across our cloud-native stack

Requirements

Scripting: experience in either Python, Go, Rust or Bash/ Shell for automation and tooling
Experience with GitOps workflows and CI/CD automation
Kubernetes Expertise: deep experience operating production Kubernetes clusters, writing custom controllers/operators, and implementing service mesh architectures (Istio/Linkerd)
Cloud-Native Technologies: hands-on experience with CNCF ecosystem, e.g. including Helm, ArgoCD, Flux and container runtime security tools like Falco
Observability Stack: expert-level knowledge of Grafana, Prometheus, Loki, Tempo, and OpenTelemetry. Experience building custom dashboards, alerts, and SLI/SLO frameworks
Networking: Expert understanding of networking concepts, protocols and security
MLOps Platforms: experience with Kubeflow, MLflow, or similar platforms
Infrastructure as Code: proficiency with Terraform, Ansible, and Kubernetes manifest templating. Experience with policy-as-code tools like OPA/Gatekeeper
System Administration: deep understanding of Linux/Unix system administration and highly available, distributed systems
Comfortable building out data and telemetry pipelines for debugging and future-proofing solutions

You should apply if you

Have a high level of personal integrity, reliability, and attention to detail
Have a software engineering mindset with a passion for building platforms and tools that multiply developer productivity
Have experience running cloud-native workloads in on-premises or air-gapped environments
Are willing to relocate to Munich, London, or Berlin.

We are an ambitious and committed team of engineers, AI specialists and customer-facing programme managers. We are looking for mission-driven people to join our European teams - and apply their skills to solve the most complex and impactful problems. We embrace an open and transparent culture that welcomes healthy debates on the use of technology in defence, its benefits, and its ethical implications.

Benefits & conditions

Competitive compensation and VSOP options
Relocation support
Social and education allowances
Regular company events and all-hands to bring together employees as one team across Europe
A hands-on onboarding program (affectionately labelled "Infraduction"), in which you will be building tooling and applications to be used across the company. This is your opportunity to learn our tech stack, explore the company, and learn how we get things done - all whilst working with other engineering teams from day one

About the company

Helsing is a defence AI company. Our mission is to protect our democracies. We aim to achieve technological leadership, so that open societies can continue to make sovereign decisions and control their ethical standards. As democracies, we believe we have a special responsibility to be thoughtful about the development and deployment of powerful technologies like AI. We take this responsibility seriously. We are an ambitious and committed team of engineers, AI specialists and customer-facing programme managers. We are looking for mission-driven people to join our European teams - and apply their skills to solve the most complex and impactful problems. We embrace an open and transparent culture that welcomes healthy debates on the use of technology in defence, its benefits, and its ethical implications., Join Helsing and work with world-leading experts in their fields * Helsing's work is important. You'll be directly contributing to the protection of democratic countries while balancing both ethical and geopolitical concerns. * The work is unique. We operate in a domain that has highly unusual technical requirements and constraints, and where robustness, safety, and ethical considerations are vital. You will face unique Engineering and AI challenges that make a meaningful impact in the world. * Our work frequently takes us right up to the state of the art in technical innovation, be it reinforcement learning, distributed systems, generative AI, or deployment infrastructure. The defence industry is entering the most exciting phase of the technological development curve. Advances in our field of world are not incremental: Helsing is part of, and often leading, historic leaps forward. * In our domain, success is a matter of order-of-magnitude improvements and novel capabilities. This means we take bets, aim high, and focus on big opportunities. Despite being a relatively young company, Helsing has already been selected for multiple significant government contracts. * We actively encourage healthy, proactive, and diverse debate internally about what we do and how we choose to do it. Teams and individual engineers are trusted (and encouraged) to practise responsible autonomy and critical thinking, and to focus on outcomes, not conformity. At Helsing you will have a say in how we (and you!) work, the opportunity to engage on what does and doesn't work, and to take ownership of aspects of our culture that you care deeply about.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all