Site Reliability Engineering Lead

Experian Information Solutions, Inc.

Nottingham, United Kingdom

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Nottingham, United Kingdom

Tech stack

Agile Methodologies

Amazon Web Services (AWS)

Cloud Computing

Disaster Recovery

Fault Tolerance

Monitoring of Systems

Reliability Engineering

Prometheus

Runbook

Software Engineering

Data Logging

System Availability

Grafana

Mttr

Information Technology

Cloudwatch

Devsecops

Job description

skills to drive operational excellence and foster a culture of reliability across engineering teams. Key Responsibilities: Leadership & Strategy Define and implement SRE best practices across the organization. Proven expertise in production support, resilience engineering, disaster recovery (DCR), automation, and cloud operations Mentor and guide a team of SREs, fostering growth and technical excellence. Collaborate with senior stakeholders to align reliability goals with business objectives. Reliability & Performance Establish SLIs, SLOs, and SLAs for critical services and ensure adherence. Drive initiatives to improve system resilience and reduce operational toil. Excellent in designing systems that detect and remediate issues without manual intervention - Self Healing systems, Runbook automation Exposure to tools like Gremlin, Chaos Monkey, AWS FIS to simulate outages and improve fault tolerance Incident Management Act as the primary point of escalation for critical, interpersonal and communication skills for technical and non-technical audiences. Qualifications Qualified with a degree in B.Sc. in Computer Science, MCA in Computer Science, Bachelor of Technology in Engineering, or higher Hands on technologist with minimum 12 years of experience working in software development with at least 5 years of experience leading an SRE team currently Deep expertise with various AWS services. Advanced knowledge of monitoring and observability tools. Strong leadership capabilities with a focus on setting clear direction, aligning team efforts with organizational goals, and maintaining high levels of motivation and engagement across the team. Skilled in working with geographically distributed teams, fostering inclusive collaboration across diverse cultures and backgrounds to enhance productivity and innovation. Excellent communication skills, with the ability to articulate complex ideas, solutions, and, feedback clearly to both technical and, As the SRE Lead, you will own the reliability strategy for mission-critical systems and lead a team of engineers to ensure high availability, scalability, and performance. You will define and implement SRE best practices and drive operational excellence across engineering teams.

Requirements

Site Reliability Engineering, Cloud Operations, Disaster Recovery, Automation, Monitoring, Observability, Incident Management, Agile DevSecOps, Leadership, Collaboration, AWS, Resilience Engineering, Technical Excellence, Problem Solving, Communication, Self Healing Systems, production issues and lead major incident response, root cause analysis, and postmortems. Perform detailed post-incident investigations to identify underlying causes. Document findings and share learnings to prevent recurrence. Implement preventive measures and continuous improvement processes. Observability Champion monitoring, logging, and alerting strategies using tools like Prometheus, Grafana, ELK, and AWS CloudWatch. Build real-time dashboards to visualize system health and reliability metrics. Configure intelligent alerting based on anomaly detection and thresholds. Combine metrics, logs, and traces to enable root cause analysis and reduce Mean Time to Resolution (MTTR). Knowledge of AIOps or ML-based anomaly detection for proactive reliability management. Collaboration Work closely with development teams to integrate reliability into application design and deployment Promote a culture of shared responsibility for uptime and performance across engineering teams. Strong, non-technical stakeholders, Adept at managing conflict constructively and facilitating consensus Proven track record of building secure, mission-critical, high-volume transaction web-based software systems, preferably in regulated environments (finance and insurance industries). Passionate in solving technical business problems, designing solutions and developing. Strong individual contributor and team player, capable of collaborating effectively within cross-functional teams. Additional Information Our uniqueness is that we celebrate yours. Experian's culture and people are important differentiators. We take our people agenda very seriously and focus on what matters; DEI, work/life balance, development, authenticity, collaboration, wellness, reward & recognition, volunteering... the list goes on. Experian's people first approach is award-winning; World's Best Workplaces 2024 (Fortune Top 25), Great Place To Work in 24 countries, and Glassdoor Best Places to Work 2024 to name a few. Check out Experian Life on social or our Careers Site to understand why. Experian is proud to be an Equal Opportunity and Affirmative Action employer. Innovation is an important part of Experian's DNA and practices, and our diverse workforce drives our success. Everyone can succeed at Experian and bring their whole self to work, irrespective of their gender, ethnicity, religion, colour, sexuality, physical ability or age. If you have a disability or special need that requires accommodation, please let us know at the earliest opportunity. Experian Careers - Creating a better tomorrow together Find out what its like to work for Experian by clicking here Employee Status: Regular Role Type: Hybrid Department: Technology Schedule: Full Time

About the company

Company Description Experian is a global data and technology company, powering opportunities for people and businesses around the world. We help to redefine lending practices, uncover and prevent fraud, simplify healthcare, create marketing solutions, and gain deeper insights into the automotive market, all using our unique combination of data, analytics and software. We also assist millions of people to realize their financial goals and help them save time and money. We operate across a range of markets, from financial services to healthcare, automotive, agribusiness, insurance, and many more industry segments. We invest in people and new advanced technologies to unlock the power of data. As a FTSE 100 Index company listed on the London Stock Exchange (EXPN), we have a team of 22,500 people across 32 countries. Our corporate headquarters are in Dublin, Ireland. Learn more at experianplc.com. Job Description We are looking for an enthusiastic SRE Lead to work in Project Spring at the forefront of our cloud modernisation, within our Credit & Verification Services. This is a hybrid role requiring travelling to Hyderabad office 40% times per month. Background: This is an incredibly exciting time for the Experian UKI Region, as we look to build our presence in the UK and Hyderabad and work on a technology transformation to meet our aspiration to significantly scale our business over the next five years. This an opportunity to join Credit & Verification Services on this journey and be part of a collaborative team that uses Agile DevSecOps principles to deliver business value. Credit and Verification Services currently comprises nearly 100 engineering teams who deliver over 200 products achieving significant revenue per annum for our UK Business. Our unique culture and agile ways of working offer a great opportunity to those seeking to join a talented set of diverse problem solvers to design, build and maintain our products. We pride ourselves in excellence, adopting best practices and holding ourselves to the highest standards. The Domain: As a member of the Project Spring team within Credit and Verification Services, you'll be part of a forward-thinking delivery group at the forefront of transforming how credit information is accessed in the UK. We're leading the charge in moving the Experian UK credit report ecosystem to the cloud-modernizing legacy systems and unlocking new possibilities for data-driven insights. Project Spring team thrives on collaboration, curiosity, and a shared passion for solving complex problems with elegant, scalable technology. If you're excited by the idea of shaping the future of financial data in the cloud, you'll feel right at home here. Role Context As the SRE Lead, you will own the reliability strategy for mission-critical systems and lead a team of engineers to ensure high availability, scalability, and performance. You will combine technical expertise with leadership