Site Reliability Engineer

Anson McCade
Gloucester, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
£ 65K

Job location

Gloucester, United Kingdom

Tech stack

HTML
Java
JavaScript
Agile Methodologies
Amazon Web Services (AWS)
Data analysis
JIRA
Azure
Bash
Cloud Computing
Command Prompt
Databases
Linux
DevOps
MongoDB
Open Source Technology
OpenStack
Powershell
Scrum
Reliability Engineering
Selenium
Software Engineering
Systems Architecture
Scripting (Bash/Python/Go/Ruby)
Free and Open-Source Software
Web Technologies
Puppet
Docker
Microservices

Job description

As an SRE, you will bridge the gap between software engineering and systems operations. You will use your engineering expertise to replace manual tasks with automation, ensuring that traditional operational work (incidents, on-call, etc.) never exceeds 50% of your team's capacity. Core Accountabilities

  • Service Excellence: Support and maintain essential services for core mission applications, proactively enhancing availability, performance, and stability.
  • Automation First: Replace repetitive manual labor with innovative automated solutions.
  • Consultative Engineering: Work alongside product teams to advise on best practices for system design and resilience.
  • Observability: Instrument applications to improve monitoring and use data-driven insights to demonstrate daily system improvements.
  • Systems Architecture: Leverage your understanding of the relationship between software and infrastructure to build scalable, failure-resilient systems.
  • Community Engagement: Actively participate in the wider internal DevOps and SRE communities.

Requirements

We are looking for candidates with experience in the following areas:

  • Development: Software development in Java and web technologies (JavaScript, HTML).
  • Data & Infrastructure: Familiarity with database technologies (Elastic, Mongo) and cloud platforms (AWS, Azure, or OpenStack).
  • Scripting & OS: Proficiency in Linux and Windows command lines (Bash, PowerShell).
  • Configuration & Deployment: Hands-on experience with tools like Chef, Puppet, and Docker (container management/micro-services).
  • Monitoring: Expertise in monitoring large-scale systems using technologies such as ELK.
  • Problem Solving: Strong diagnostic skills across all levels of the tech stack and experience troubleshooting service outages.
  • Agile Methodology: Experience working within an Agile Scrum team and using supporting tools like Jira.
  • Testing & Open Source: Familiarity with automation frameworks (Selenium) and a track record of improving Open Source Software.

Apply for this position