Principal Site Reliability Engineers (SRE)
Postaladdress Uk
Charing Cross, United Kingdom
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
£ 70KJob location
Charing Cross, United Kingdom
Tech stack
Bash
Cloud Computing
Cloud Engineering
Databases
Continuous Integration
DevOps
Python
PostgreSQL
Openshift
Red Hat Enterprise Linux - RHEL
Reliability Engineering
Site Reliability Engineering Practices
Prometheus
Working Model 2D
Cloud Platform System
Istio
Grafana
Multi-Cloud
Kubernetes
Job description
- Design, build, and maintain highly available, scalable, and resilient platforms, prioritising standardisation, reuse, and automation
- Champion GitOps-first approaches, minimising manual configuration
- Lead and contribute to Site Reliability Engineering practices, including error budgets, SLOs, SLIs, and incident management
- Work in agile delivery teams, aligning engineering outcomes to customer and service reliability goals
- Operate within defined on-call rotas, supporting services underpinning critical national infrastructure
- Provide technical leadership and mentorship, developing the capability of engineers across teams
- Promote and embed best practices in reliability, security, observability, and automation
- Contribute to the evolution of cloud-native and SRE standards, patterns, and platform strategies
Technologies:
- ArgoCD
- Bash
- CI/CD
- Cloud
- GitOps
- Grafana
- Helm
- Istio
- Kubernetes
- OpenShift
- Prometheus
- Python
- Security
- DevOps, We are seeking experienced Principal Site Reliability Engineers (SRE) to join a high-performing engineering team delivering resilient, cloud-native platforms for UK-based customers. These roles blend senior technical leadership with hands-on delivery, covering both project-based work and the ongoing reliability, scalability, and security of critical services. You will work closely with other senior engineers in small, collaborative teams, taking ownership of platform reliability, setting best practices, and mentoring others. The role supports critical national infrastructure, requires participation in an on-call rota, and operates within a hybrid working model across UK offices, client sites, and home.
Requirements
- Proven leadership experience in Site Reliability Engineering or senior platform engineering roles
- Strong expertise in Kubernetes and OpenShift (CKA/CKS certifications beneficial)
- Experience designing complex multi-cloud or hybrid architectures
- Hands-on knowledge of service mesh technologies such as Istio
- Experience with enterprise-grade databases, including EDB Postgres
- Deep understanding of observability and monitoring stacks, such as Prometheus, Grafana, Loki, Tempo, and LogiStack
- Strong Infrastructure as Code experience using tools such as Helm or Kustomize
- Proficiency in scripting and automation, including Bash and Python
- CI/CD and GitOps pipeline management using tools such as ArgoCD, FluxCD, or Tekton
- Experience with Red Hat ACM/ACS and advanced container networking (e.g. Submariner)
- A strong focus on reliability, automation, and operational excellence