Site Reliability Engineer
Role details
Job location
Tech stack
Job description
As a Site Reliability Engineer, you'll ensure the reliability, performance and scalability of critical digital platforms. You'll monitor production systems, refine SLAs/SLOs and error budgets, design scalable solutions, improve architecture through telemetry insights, and build dashboards that provide clear visibility of system health. You'll also contribute to performance testing strategies and collaborate with engineering, operations and compliance teams to maintain high standards across the platform.
Requirements
-
Strong understanding of reliability engineering, scalable architectures and performance optimisation
-
Experience with observability, debugging and incident response
-
Proficiency in a programming language for automation and tooling (GO or .NET preferred)
-
Cloud experience, ideally AWS, and knowledge of container orchestration (Kubernetes) and Infrastructure as Code (Terraform)
-
Experience with monitoring and observability tools such as Grafana, Prometheus or OpenTelemetry
-
Strong understanding of networking fundamentals and distributed systems
-
Ability to collaborate effectively with engineering, operations and product teams, SRE, Site Reliability Engineer, AWS, Kubernetes, Terraform, Observability, Performance, SLAs/SLOs, Monitoring, Automation, GO, .NET, Distributed Systems, Cloud-Native Engineering Skills
-
monitoring
-
AWS
-
SRE
-
Kubernetes
-
observability
Benefits & conditions
- Exposure to modern cloud-native tooling and reliability practices
- High-impact role supporting major digital events
- Strong engineering culture with collaboration across product, operations and platform teams