Site Reliability Engineer

Anson McCade

Gloucester, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

£ 65K

Job location

Gloucester, United Kingdom

Tech stack

HTML

Java

JavaScript

Agile Methodologies

Amazon Web Services (AWS)

Data analysis

JIRA

Azure

Bash

Cloud Computing

Command Prompt

Databases

Linux

DevOps

MongoDB

Open Source Technology

OpenStack

Powershell

Scrum

Reliability Engineering

Selenium

Software Engineering

Systems Architecture

Scripting (Bash/Python/Go/Ruby)

Free and Open-Source Software

Web Technologies

Puppet

Docker

Microservices

Job description

As an SRE, you will bridge the gap between software engineering and systems operations. You will use your engineering expertise to replace manual tasks with automation, ensuring that traditional operational work (incidents, on-call, etc.) never exceeds 50% of your team's capacity. Core Accountabilities

Service Excellence: Support and maintain essential services for core mission applications, proactively enhancing availability, performance, and stability.
Automation First: Replace repetitive manual labor with innovative automated solutions.
Consultative Engineering: Work alongside product teams to advise on best practices for system design and resilience.
Observability: Instrument applications to improve monitoring and use data-driven insights to demonstrate daily system improvements.
Systems Architecture: Leverage your understanding of the relationship between software and infrastructure to build scalable, failure-resilient systems.
Community Engagement: Actively participate in the wider internal DevOps and SRE communities.

Requirements

We are looking for candidates with experience in the following areas:

Development: Software development in Java and web technologies (JavaScript, HTML).
Data & Infrastructure: Familiarity with database technologies (Elastic, Mongo) and cloud platforms (AWS, Azure, or OpenStack).
Scripting & OS: Proficiency in Linux and Windows command lines (Bash, PowerShell).
Configuration & Deployment: Hands-on experience with tools like Chef, Puppet, and Docker (container management/micro-services).
Monitoring: Expertise in monitoring large-scale systems using technologies such as ELK.
Problem Solving: Strong diagnostic skills across all levels of the tech stack and experience troubleshooting service outages.
Agile Methodology: Experience working within an Agile Scrum team and using supporting tools like Jira.
Testing & Open Source: Familiarity with automation frameworks (Selenium) and a track record of improving Open Source Software.