Senior Site Reliability Engineer
Anson McCade
Charing Cross, United Kingdom
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
£ 60KJob location
Charing Cross, United Kingdom
Tech stack
Artificial Intelligence
Bash
Cloud Computing
Computer Programming
Continuous Integration
Data Systems
Linux
DevOps
Distributed Systems
Python
Reliability Engineering
Ansible
Software Engineering
Data Logging
Scripting (Bash/Python/Go/Ruby)
Cloudformation
Terraform
Go
Job description
- Design, build and maintain high-quality infrastructure platforms and services
- Apply software engineering principles to improve reliability, scalability, performance and operability
- Contribute to technical strategy, standards and long-term platform evolution
- Lead and participate in incident response, root cause analysis and blameless post-mortems
- Use data and observability to reduce mean time to detect and resolve
- Drive improvements through SLOs, error budgets and reliability metrics
- Develop automation and tooling using scripting and programming to remove toil
- Build CI/CD, infrastructure-as-code and self-service capabilities
- Champion continuous improvement through experimentation and measurement
- Ensure secure configuration and operation of platforms
- Embed security controls and resilience patterns into infrastructure by design
- Work closely with product, architecture and engineering teams to define platform requirements and solutions
- Influence technical direction, mentor engineers, and help mature SRE practices across the organisation
Technologies:
- AI
- Ansible
- Bash
- CI/CD
- Cloud
- DevOps
- Linux
- Python
- Security
- Terraform, We are hiring a Senior Site Reliability Engineer to help build and operate highly reliable, scalable, and secure platforms supporting mission-critical applications and data systems in a large, complex enterprise environment. This hands-on SRE role is for an engineer who enjoys working across software, infrastructure, and operations, using automation and engineering best practices to drive availability, performance, and resilience at scale. You will work on large-scale, business-critical platforms with a real impact, and enjoy a competitive salary (£65k-£86k) plus bonus and strong benefits. We offer hybrid working with base locations in Glasgow or Greater Manchester.
Requirements
- Strong experience in Site Reliability Engineering, DevOps, or Platform Engineering
- Solid programming and scripting skills (e.g. Python, Go, Bash)
- Deep understanding of Linux, networking, distributed systems and cloud platforms
- Experience with infrastructure-as-code and automation (e.g. Terraform, Ansible, CloudFormation)
- Strong incident response, troubleshooting and fault-analysis skills using a scientific, data-driven approach
- Experience with observability: metrics, logging, tracing, alerting and performance analysis
- Ability to explain complex systems clearly and influence across technical and non-technical stakeholders
- Nice to have: Experience driving SRE maturity (SLOs, error budgets, reliability reviews)
- Nice to have: Exposure to large-scale, regulated or high-availability environments
- Nice to have: Interest in using AI/ML to improve operations, monitoring or incident response
- Nice to have: Passion for teaching, mentoring and building engineering culture