Senior Site Reliability Engineer
Role details
Job location
Tech stack
Job description
We are looking for a Senior DevOps Engineer that has strong C# code knowledge combined with strong knowledge of DevOps tools like Kubernetes (EKS or AKS) and Azure or AWS Cloud platforms. We are looking for a DevOps Engineer with a strong understanding of C# code combined with experience of monitoring tools like DataDog, Grafana and Prometheus to join a growing global Cloud Infrastructure team supporting SaaS products.
Our client are a Global Digital SaaS Software Company have a fantastic fully remote opportunity for an experienced Senior DevOps Engineer to join their UK Cloud Infrastructure team.
Site Reliability Engineers at this company are responsible for keeping the SaaS products running properly. Using concepts of software and systems engineering, they work to improve the reliability of all cloud systems while keeping levels of manual work low. DevOps are expected to be experienced in software engineering principals, operational discipline, and automation.
The Cloud and DevOps team work on a fully remote basis and work in conjunction with their US and Australian teams as well. This company are a market leader in Student community management software, this company's unique SaaS platform is an essential platform in the life of millions of University students across the globe.
In this role, you will apply your Software Engineering experience to enhance system performance and reliability, as well as building internal systems and capabilities that eliminate manual work through automation. You'll be joining our Platforms teams with globally-dispersed Site Reliability and Platform Engineers in a "follow the sun" model to operate our products on a multi-region cloud platform.
Role Responsibilities:
- Provide technical leadership and mentoring within the team through knowledge sharing sessions, pair programming, code reviews and solution design
- Identify and implement technical solutions to improve platform reliability, including the creation of mitigation strategies and operational playbooks.
- Implement and maintain monitoring/alerting/logging systems to identify and respond to incidents
- Ensure scalability and efficiency of cloud infrastructure and systems to handle traffic and data growth
- Conduct performance tests to identify and remediate bottlenecks
- Develop and maintain platform solutions, automate infrastructure provisioning, configuration, and management tasks using Infrastructure as Code.
- Monitor, review and tune databases to ensure high availability and performance
- Collaborate with product engineering teams to design/build fit-for-purpose and observable software
Requirements
- Proven experience in a SR DevOps / Site Reliability Engineering role and having strong code development experience in C# or similar OO development language.
- Experience of supporting .Net applications as a DevOps Engineer is a big bonus in this role
- Production experience operating containerization technologies - ideally with Kubernetes and/or Docker. Strong preference for AKS or EKS experience as well.
- Proficiency with one or more public cloud providers such as Azure, AWS or GCP
- Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation.
- Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar.
- Proven track record of maintaining highly-available and performant production environments.
- Ability to identify and implement effective mitigation strategies and operational playbooks.
Useful / Bonus Skills to have:
- Experience in CI/CD tooling: Azure DevOps/GitHub Actions, Octopus Deploy
- Relevant certifications in cloud platforms (e.g., Microsoft Certified: Azure Solutions Architect) and DevOps practices (e.g., Certified Kubernetes Administrator) are a plus
- Experience in database management/performance tuning, particularly MSSQL., This Senior Site Reliability Engineer role is working for a market leading global software company and this job is part of a large program of change and improvement in their Cloud SaaS products over the coming years. If you are looking for an interesting SRE role with a forward-thinking global organisation, then this would be a tremendous career opportunity to consider., * C#
- .Net
- azure
- DevOps
- Site Reliability Engineer
- Kubernetes
Benefits & conditions
- Opportunity to be a part of a 30+ year well-established, high-performance SaaS company.
- Excellent Company Pension scheme and Life Insurance,
- Excellent holiday allowance.
- A supportive team environment with emphasis on learning and development opportunities
- Working with a team of caring, high-performing, and passionate people who have fun supporting our vision, innovation, and continuous improvement.