Lead Site Reliability Engineer

Corecom Consulting

Leeds, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 100K

Job location

Remote

Leeds, United Kingdom

Tech stack

Amazon Web Services (AWS)

DevOps

Terraform

Job description

We're looking for an experienced Lead SRE / Platform Lead to take ownership of a mission-critical, cloud-native platform transforming the UK housing market. This is a hands-on, high-impact role where you'll lead the UK Platform Team, actively work on platform reliability, observability, and incident resolution, and help embed DevOps practices across the organisation. This role offers a rare opportunity to shape a platform from the ground up, setting operational standards, building resilience, and coaching your team to take full ownership. You'll have a real influence on both the technical and operational culture of the business while collaborating with global engineering, security, and service teams. If you thrive in hands-on platform operations, incident leadership, and team coaching, this is your chance to work on a modern cloud-native platform and deliver a high-visibility, mission-critical service., As Lead SRE, you'll be the operational and technical lead for the UK platform. Your remit includes:

Ensuring stable, secure, and high-performing platform operations
Leading incident management and service recovery
Driving observability, monitoring, and alerting improvements
Coaching and mentoring the Platform Team to take ownership and operate autonomously
Collaborating with engineering, security, and business teams to embed reliability and DevOps practices
Maintaining operational resilience, risk controls, and compliance Key Focus Areas
Own UK platform operations end-to-end, from day-to-day stability to patching, releases, and service transitions
Lead major incidents with technical insight, quick triage, and clear communication to stakeholders
Build observability and alerting strategies, dashboards, and automated health checks
Shape technical and operational standards, embedding DevOps principles across teams
Coach the UK Platform Team to become autonomous and accountable, improving delivery, prioritisation, and operational excellence
Ensure platform resilience and compliance through DR/BCP exercises, risk management, and audit readiness What's in it for You?
High-impact, hands-on ownership of a nationally critical platform
Opportunity to shape platform capability and culture from the ground up
Exposure to global teams and influence on strategic operational decisions
Work at the forefront of cloud-native technology (AWS, Terraform, observability stack)
1-2 days per month in Leeds or Oxfordshire depending on easiest commute

Requirements

Proven hands-on experience in Platform Operations/SRE with responsibility for platform reliability and high availability
Technical expertise with:
AWS, Linux, Terraform, CI/CD pipelines
Monitoring and observability (Grafana, Prometheus, Splunk, New Relic, PagerDuty)
SQL/PostgreSQL diagnostics Experience leading P1/P2 incidents and driving rapid resolution Experience coaching, empowering, or developing a small team Comfortable working in regulated environments (FCA/PRA experience desirable) and with risk, audit, DR/BCP responsibilities

Excellent communicator able to collaborate with engineering, security, and senior stakeholders
Ability to influence wider organisational practices to "bleed" DevOps principles into the broader tech culture

About the company

"At Corecom, we don't just accept differences, we celebrate them and thrive on them for the benefit of our employees, our clients and our candidates. Internally, we thrive from our differences and want our employees to be proud to be themselves and proud to be Corecom. Externally, we utilise those differences