Senior Site Reliability Engineer
Role details
Job location
Tech stack
Job description
We're looking for a Senior Site Reliability Engineer to join our Data CoE team. You'll play a key role in ensuring the reliability, scalability, and security of our cloud-native data platforms. This is a hands-on engineering role with a strong focus on automation, observability, incident response, and cross-team collaboration, We're looking for an experienced and passionate Engineering Lead to take ownership of our Route to Live (RTL) team - a critical function within the IP&I Data Centre of Excellence (CoE). RTL is responsible for both Run and Build activities across our cloud-native estate, ensuring resilience, scalability, and innovation in how we deliver microservices and tooling to our internal customers.
As Security is central to this position, with duties including identifying and remediating vulnerabilities in cloud-native environments, integrating security scanning into CI/CD pipelines, and working closely with platform and security teams to enforce secure standards and policies.
Collaboration is vital, as you'll partner with data science, engineering, and product teams to troubleshoot deployment and pipeline challenges. You will also share your expertise through comprehensive documentation, walkthroughs, and reusable templates, acting as a mentor and trusted technical advisor across teams.
Additionally, you will be responsible for creating essential runbooks, upgrade plans, and incident playbooks using Confluence, tracking work and risks via JIRA or similar tools, and communicating clearly with stakeholders from both technical and non-technical backgrounds.
This is an exciting opportunity for someone keen to make a real impact on platform reliability, security, and team collaboration in a forward-thinking, cloud-driven environment.
Requirements
- Proven experience in SRE, DevOps, or Cloud Engineering roles.
- Strong knowledge of Cloud platforms: GCP (preferred), AWS or Azure.
- Proficiency in Terraform, Docker, Kubernetes, and CI/CD tools (e.g., Jenkins, Harness).
- Experience with observability tools and distributed tracing.
- Solid understanding of cloud security principles and vulnerability management.
- Excellent communication and documentation skills.
- A collaborative mindset and a bias for action.
- You'll help shape the future of reliability engineering at Lloyds, contributing to the evolution of our SRE Community of Practice and influencing standards across IP&I