Senior Site Reliability Engineer
Role details
Job location
Tech stack
Job description
- Support platforms serving millions of customers and billions of requests each month, ensuring availability, scalability and resiliency.
- Act as a key technical contributor within PEC, working with SRE guilds to improve cloud deployments, monitoring, CI/CD pipelines and cost efficiency.
- Explore and adopt new technologies and practices to advance SRE capabilities, including AI-driven tooling and automation
- Apply hands-on experience running high-throughput production systems to deliver customer value beyond POCs.
- Define and implement SLAs, SLOs and SLIs across software and data teams.
- Improve incident management through better tooling, alerting, runbooks and automated remediation.
- Act as a subject matter expert in site reliability engineering, contributing to technical discussions and fostering a culture of continuous learning across the lab.
Requirements
We're seeking an experienced Site Reliability Engineer to join the Cloud Enabling team within the Personalised Experiences and Communication Platform. This role is central to strengthening our SRE capability and improving the resiliency, availability and security of our platforms. The ideal candidate will have experience in SRE, software engineering, data engineering or AI/MLOps, with a proven track record of supporting high-throughput systems at scale. Strong experience with hybrid-cloud architectures, Kubernetes-based workloads, networking, and monitoring/logging solutions is essential, along with an engineering mindset suited to large, complex organisations., * Hands-on proven experience of software development, testing, monitoring, and operational stability at scale.
- Production experience with k8s and monitoring tools such as Datadog/Dynatrace/etc.
- Proven experience and knowledge of automation and CI/CD and best practices
- Proven experience of running postmortems, defining SLAs/SLIs/SLOs and participating in support rotas
- Coding/scripting experience developed in a commercial/industry setting (python/bash)
- Database knowledge, streaming and batch operations and designing APIs
- Proficient with Kubernetes (ideally microservice architectures using istio service mesh)
- Extensive experience of Cloud native solutions (ideally Google Cloud).
- Good understanding of cloud storage, networking, and resource provisioning.
Benefits & conditions
Our focus is to ensure we're inclusive every day, building an organisation that reflects modern society and celebrates diversity in all its forms.
We want our people to feel that they belong and can be their best, regardless of background, identity or culture.
We were one of the first major organisations to set goals on diversity in senior roles, create a menopause health package, and a dedicated Working with Cancer initiative.
And it's why we especially welcome applications from under-represented groups.
We're disability confident. So, if you'd like reasonable adjustments to be made to our recruitment processes, just let us know.
We also offer a wide-ranging benefits package, which includes:
-
A generous pension contribution of up to 15%.
-
An annual bonus award, subject to Group performance.
-
Share schemes including free shares.
-
Benefits you can adapt to your lifestyle, such as discounted shopping.
-
30 days' holiday, with bank holidays on top
-
A range of wellbeing initiatives and generous parental leave policies.