Senior Site Reliability Engineer

Anson McCade

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 60K

Charing Cross, United Kingdom

Artificial Intelligence

Bash

Cloud Computing

Computer Programming

Continuous Integration

Data Systems

Linux

DevOps

Distributed Systems

Python

Reliability Engineering

Ansible

Software Engineering

Data Logging

Scripting (Bash/Python/Go/Ruby)

Cloudformation

Terraform

Design, build and maintain high-quality infrastructure platforms and services
Apply software engineering principles to improve reliability, scalability, performance and operability
Contribute to technical strategy, standards and long-term platform evolution
Lead and participate in incident response, root cause analysis and blameless post-mortems
Use data and observability to reduce mean time to detect and resolve
Drive improvements through SLOs, error budgets and reliability metrics
Develop automation and tooling using scripting and programming to remove toil
Build CI/CD, infrastructure-as-code and self-service capabilities
Champion continuous improvement through experimentation and measurement
Ensure secure configuration and operation of platforms
Embed security controls and resilience patterns into infrastructure by design
Work closely with product, architecture and engineering teams to define platform requirements and solutions
Influence technical direction, mentor engineers, and help mature SRE practices across the organisation

Technologies:

AI
Ansible
Bash
CI/CD
Cloud
DevOps
Linux
Python
Security
Terraform, We are hiring a Senior Site Reliability Engineer to help build and operate highly reliable, scalable, and secure platforms supporting mission-critical applications and data systems in a large, complex enterprise environment. This hands-on SRE role is for an engineer who enjoys working across software, infrastructure, and operations, using automation and engineering best practices to drive availability, performance, and resilience at scale. You will work on large-scale, business-critical platforms with a real impact, and enjoy a competitive salary (£65k-£86k) plus bonus and strong benefits. We offer hybrid working with base locations in Glasgow or Greater Manchester.

Strong experience in Site Reliability Engineering, DevOps, or Platform Engineering
Solid programming and scripting skills (e.g. Python, Go, Bash)
Deep understanding of Linux, networking, distributed systems and cloud platforms
Experience with infrastructure-as-code and automation (e.g. Terraform, Ansible, CloudFormation)
Strong incident response, troubleshooting and fault-analysis skills using a scientific, data-driven approach
Experience with observability: metrics, logging, tracing, alerting and performance analysis
Ability to explain complex systems clearly and influence across technical and non-technical stakeholders
Nice to have: Experience driving SRE maturity (SLOs, error budgets, reliability reviews)
Nice to have: Exposure to large-scale, regulated or high-availability environments
Nice to have: Interest in using AI/ML to improve operations, monitoring or incident response
Nice to have: Passion for teaching, mentoring and building engineering culture