Principal Site Reliability Engineers (SRE)

Postaladdress Uk

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 70K

Charing Cross, United Kingdom

Bash

Cloud Computing

Cloud Engineering

Databases

Continuous Integration

DevOps

Python

PostgreSQL

Openshift

Red Hat Enterprise Linux - RHEL

Reliability Engineering

Site Reliability Engineering Practices

Prometheus

Working Model 2D

Cloud Platform System

Istio

Grafana

Multi-Cloud

Kubernetes

Design, build, and maintain highly available, scalable, and resilient platforms, prioritising standardisation, reuse, and automation
Champion GitOps-first approaches, minimising manual configuration
Lead and contribute to Site Reliability Engineering practices, including error budgets, SLOs, SLIs, and incident management
Work in agile delivery teams, aligning engineering outcomes to customer and service reliability goals
Operate within defined on-call rotas, supporting services underpinning critical national infrastructure
Provide technical leadership and mentorship, developing the capability of engineers across teams
Promote and embed best practices in reliability, security, observability, and automation
Contribute to the evolution of cloud-native and SRE standards, patterns, and platform strategies

Technologies:

ArgoCD
Bash
CI/CD
Cloud
GitOps
Grafana
Helm
Istio
Kubernetes
OpenShift
Prometheus
Python
Security
DevOps, We are seeking experienced Principal Site Reliability Engineers (SRE) to join a high-performing engineering team delivering resilient, cloud-native platforms for UK-based customers. These roles blend senior technical leadership with hands-on delivery, covering both project-based work and the ongoing reliability, scalability, and security of critical services. You will work closely with other senior engineers in small, collaborative teams, taking ownership of platform reliability, setting best practices, and mentoring others. The role supports critical national infrastructure, requires participation in an on-call rota, and operates within a hybrid working model across UK offices, client sites, and home.

Proven leadership experience in Site Reliability Engineering or senior platform engineering roles
Strong expertise in Kubernetes and OpenShift (CKA/CKS certifications beneficial)
Experience designing complex multi-cloud or hybrid architectures
Hands-on knowledge of service mesh technologies such as Istio
Experience with enterprise-grade databases, including EDB Postgres
Deep understanding of observability and monitoring stacks, such as Prometheus, Grafana, Loki, Tempo, and LogiStack
Strong Infrastructure as Code experience using tools such as Helm or Kustomize
Proficiency in scripting and automation, including Bash and Python
CI/CD and GitOps pipeline management using tools such as ArgoCD, FluxCD, or Tekton
Experience with Red Hat ACM/ACS and advanced container networking (e.g. Submariner)
A strong focus on reliability, automation, and operational excellence