Lead Site Reliability Engineer
Role details
Job location
Tech stack
Job description
We're looking for an experienced Lead SRE / Platform Lead to take ownership of a mission-critical, cloud-native platform transforming the UK housing market. This is a hands-on, high-impact role where you'll lead the UK Platform Team, actively work on platform reliability, observability, and incident resolution, and help embed DevOps practices across the organisation. This role offers a rare opportunity to shape a platform from the ground up, setting operational standards, building resilience, and coaching your team to take full ownership. You'll have a real influence on both the technical and operational culture of the business while collaborating with global engineering, security, and service teams. If you thrive in hands-on platform operations, incident leadership, and team coaching, this is your chance to work on a modern cloud-native platform and deliver a high-visibility, mission-critical service., As Lead SRE, you'll be the operational and technical lead for the UK platform. Your remit includes:
-
Ensuring stable, secure, and high-performing platform operations
-
Leading incident management and service recovery
-
Driving observability, monitoring, and alerting improvements
-
Coaching and mentoring the Platform Team to take ownership and operate autonomously
-
Collaborating with engineering, security, and business teams to embed reliability and DevOps practices
-
Maintaining operational resilience, risk controls, and compliance Key Focus Areas
-
Own UK platform operations end-to-end, from day-to-day stability to patching, releases, and service transitions
-
Lead major incidents with technical insight, quick triage, and clear communication to stakeholders
-
Build observability and alerting strategies, dashboards, and automated health checks
-
Shape technical and operational standards, embedding DevOps principles across teams
-
Coach the UK Platform Team to become autonomous and accountable, improving delivery, prioritisation, and operational excellence
-
Ensure platform resilience and compliance through DR/BCP exercises, risk management, and audit readiness What's in it for You?
-
High-impact, hands-on ownership of a nationally critical platform
-
Opportunity to shape platform capability and culture from the ground up
-
Exposure to global teams and influence on strategic operational decisions
-
Work at the forefront of cloud-native technology (AWS, Terraform, observability stack)
-
1-2 days per month in Leeds or Oxfordshire depending on easiest commute
Requirements
- Proven hands-on experience in Platform Operations/SRE with responsibility for platform reliability and high availability
- Technical expertise with:
- AWS, Linux, Terraform, CI/CD pipelines
- Monitoring and observability (Grafana, Prometheus, Splunk, New Relic, PagerDuty)
- SQL/PostgreSQL diagnostics Experience leading P1/P2 incidents and driving rapid resolution Experience coaching, empowering, or developing a small team Comfortable working in regulated environments (FCA/PRA experience desirable) and with risk, audit, DR/BCP responsibilities
- Excellent communicator able to collaborate with engineering, security, and senior stakeholders
- Ability to influence wider organisational practices to "bleed" DevOps principles into the broader tech culture