Site Reliability Engineer / SRE / Systems Engineer

AWD
Altrincham, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
£ 70K

Job location

Remote
Altrincham, United Kingdom

Tech stack

Systems Engineering
Bash
Cloud Computing
Cloud Engineering
Computer Security
Dynamic Host Configuration Protocol
DevOps
DNS
Github
Monitoring of Systems
IPv4
IPv6
Python
Linux System Administration
Reliability Engineering
Ansible
Prometheus
Data Logging
Google Cloud Platform
Cloud Platform System
System Availability
Grafana
IT Architecture
Backend
Containerization
Kubernetes
Terraform
Splunk
PPPoE
Software Version Control
Docker

Job description

A fantastic opportunity for a Site Reliability Engineer / Systems Engineer to support highly available, scalable production systems within a fast-growing technology environment, working across cloud platforms, DevOps, networking and operational resilience.

If you've also worked in the following roles, we'd also like to hear from you: DevOps Engineer, Operations Engineer, Cloud Engineer, Platform Engineer, Systems Engineer, Infrastructure Engineer, Production Engineer, As a Site Reliability Engineer/ Systems Engineer you will act as the vital link between operations, end users and backend development teams, ensuring system availability, performance optimisation and effective incident management across live environments.

This Site Reliability Engineer/ Systems Engineer role offers the chance to work with modern cloud technologies, containerisation, observability tools and automation practices, while influencing long-term reliability improvements across business-critical systems., Your duties as the Site Reliability Engineer / Systems Engineer include:

  • Incident Triage and Ownership: Acting as first-line technical escalation for live production issues through to resolution or handover

  • System Monitoring and Availability: Maintaining high availability, performance and scalability of production platforms and services

  • Observability Implementation: Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues

  • Reliability Improvements: Collaborating with development teams to translate operational insights into long-term platform resilience

  • Automation and Resilience: Supporting automation, incident response and continuous improvement practices

  • New Service Support: Ensuring new products and features are operable, reliable and scalable from day one

  • Cross-Team Collaboration: Working with network engineering, operations and support teams to diagnose service issues

  • Documentation and Reporting: Creating and maintaining runbooks, escalation guides and incident reports

  • Incident Prioritisation: Balancing customer impact with long-term system health and stability

  • Security and Compliance: Supporting compliance with security, availability and regulatory frameworks

Requirements

  • Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role

  • Experience supporting production services at scale within a DevOps or SRE environment

  • Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6

  • Experience with observability tools such as Prometheus, Grafana, ELK or Splunk

  • Hands-on experience with containerisation and orchestration using Docker and Kubernetes

  • Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices

  • Strong Linux administration skills with scripting capability in Bash, Python or similar

  • Familiarity with CI/CD pipelines and source control tools such as GitHub Actions

  • Understanding of security frameworks and operational resilience best practices

DESIRABLE

  • Experience within ISP, MSP or telecommunications environments

  • Familiarity with enterprise IT architectures including OSS and BSS systems

  • Knowledge of information security frameworks such as ISO27001, NIST or GDPR

  • Experience with infrastructure automation tools such as Terraform or Ansible

Apply for this position