Public Cloud - Azure SRE Engineer
Role details
Job location
Tech stack
Job description
We are modernising with cloud, a platform that is quick, secure and resilient for customers: easy, modern and green for developers., We're looking for a Site Reliability Engineer with strong Azure experience. In this role, you'll spend roughly half your time managing incidents to keep operations running smoothly and the other half improving our services through automation, with participation in out-of-hours support required., You will work collaboratively with Engineering Leads and Product Owners, in building, contributing to and executing our platform roadmap. In addition to participating in the planning and delivery of our goals, driving prioritisation, automating processes using traditional and Gen AI tools, escalating impediments, and demonstrating successes. You will have the opportunity to participate in technical communities and work with internal customers across several domains within the organisation to advance shared capabilities and to be a role model and mentor early career engineers to advance their technical skills., * Be hands-on engineering, maintaining our Infrastructure as Code and CI/CD pipeline-based product and services by responding to change, implementing enhancements & improving reliability and customer experience
- Implementing further automation and reducing toil, by utilising existing Cloud tooling or implementing new technologies
- Improve operational excellence including monitoring, incident response, problem management, cost optimisation and reliability
- Be accountable for the day-to-day health of both production and non-production environments and respond to any incidents as required
- Observing, investigating & fixing service issues, with an engineering attitude - resolving via code changes and implementing improvements to prevent repeat issues
- Engage in Agile team ceremonies and contribute to continuous improvement efforts
- Provide technical expertise and input to establish the risk tolerance of products and services
- Communicate incident status updates clearly and frequently to other teams, customers and stakeholders
- Apply SRE practices and introduce chaos engineering where appropriate to strengthen resilience and validate reliability, On top of our team ethos, we're genuine about both equal opportunity and our colleagues representing the communities we serve - developing and advancing the best in our people.
You'll also get:
- A performance share bonus
- A flexible cash pot to spend on benefits
- A generous pension contribution and private health cover
- Up to 30 days leave depending on grade and service with ability to purchase more
- Various colleague share schemes
If you have the skills we're seeking and having a key role in a major data transformation appeals then get in touch, we'd love to hear from you!
Requirements
- Strong DevOps understanding, including experience of Infrastructure as Code and CI/CD pipelines, such as Terraform and Jenkins, Harness, or alternatives such as Azure DevOps
- Experience working with a broad set of Public Cloud technologies and services
- Ability to quickly understand, update and write new scripts in languages such as Python, Groovy, PowerShell, BASH
- A strong understanding of Cloud Security
- Experience in problem-solving, able to demonstrate logical thinking and excellent solving skills
- Experience with monitoring tools and techniques to ensure system reliability and performance, * Recent practical experience using SDKs and APIs to deliver automation
- Certifications in Azure or another cloud platform, such as GCP
- Candidates with less direct experience in cloud engineering, who can demonstrate strong skills in other areas such as sysadmin, software engineering or other technology/engineering fields who also are able to quickly adapt and pick up new skills will also be considered
- Technology agnostic and are willing to adapt their approach in pursuit of the best solution
- Continuous curiosity to learn and develop themselves with industry best practice methods and tooling in the cloud function