Site Reliability Engineer

Searchability

Ashton-under-Lyne, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

£ 70K

Job location

Ashton-under-Lyne, United Kingdom

Tech stack

ASP.NET

.NET

Amazon Web Services (AWS)

Cloud Computing

Software Debugging

DevOps

Distributed Systems

Performance Tuning

Reliability Engineering

Prometheus

Web Platforms

Performance Testing

Grafana

Kubernetes

Terraform

Programming Languages

Job description

As a Site Reliability Engineer, I ensure the reliability, performance, and scalability of critical digital platforms. I monitor production systems, refine SLAs/SLOs and error budgets, design scalable solutions, and improve architecture through telemetry insights. I also build dashboards that provide clear visibility of system health. Additionally, I contribute to performance testing strategies and work closely with engineering, operations, and compliance teams to maintain high standards across the platform.

Technologies:

AWS
Cloud
Grafana
Kubernetes
Prometheus
Terraform
ASP.NET
DevOps

We offer a salary of up to £70,000 and provide a hybrid working environment, with three days a week onsite in Greater Manchester. You will be joining a modern SRE environment equipped with cloud-native tooling such as AWS, Kubernetes, and Terraform, focused on high-availability digital platforms and performance-critical workloads.

As part of our team, you will gain exposure to modern cloud-native tooling and reliability practices, playing a high-impact role that supports major digital events. We pride ourselves on fostering a strong engineering culture, encouraging collaboration across product, operations, and platform teams.

Requirements

Sure! Here's the reformatted text in the requested sections and rewritten in the first person perspective.
--
Strong understanding of reliability engineering, scalable architectures, and performance optimization.
Experience with observability, debugging, and incident response.
Proficiency in a programming language for automation and tooling (GO or .NET preferred).
Cloud experience, ideally with AWS, and knowledge of container orchestration (Kubernetes) and Infrastructure as Code (Terraform).
Experience with monitoring and observability tools such as Grafana, Prometheus, or OpenTelemetry.
Strong understanding of networking fundamentals and distributed systems.
Ability to collaborate effectively with engineering, operations, and product teams.