Site Reliability Engineer

AWD
Manchester, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
£ 70K

Job location

Manchester, United Kingdom

Tech stack

Systems Engineering
Bash
Cloud Computing
Computer Security
Continuous Integration
Dynamic Host Configuration Protocol
Linux
DevOps
DNS
Github
IPv4
IPv6
Python
Linux System Administration
Reliability Engineering
Ansible
Prometheus
Data Logging
Google Cloud Platform
Cloud Platform System
System Availability
Grafana
IT Architecture
Backend
Containerization
Kubernetes
Terraform
Splunk
PPPoE
Software Version Control
Docker

Job description

  • Acting as first-line technical escalation for live production issues through to resolution or handover
  • Maintaining high availability, performance and scalability of production platforms and services
  • Managing logging, monitoring, alerting and metrics to proactively identify and resolve issues
  • Collaborating with development teams to translate operational insights into long-term platform resilience
  • Supporting automation, incident response and continuous improvement practices
  • Ensuring new products and features are operable, reliable and scalable from day one
  • Working with network engineering, operations and support teams to diagnose service issues
  • Creating and maintaining runbooks, escalation guides and incident reports
  • Balancing customer impact with long-term system health and stability
  • Supporting compliance with security, availability and regulatory frameworks

Technologies:

  • Bash
  • CI/CD
  • Cloud
  • DevOps
  • Docker
  • ELK
  • GitHub
  • Grafana
  • Support
  • Kubernetes
  • Linux
  • Network
  • OSS
  • Prometheus
  • Python
  • Security
  • Splunk
  • Terraform
  • Ansible
  • Backend

Requirements

  • Previous experience in a Site Reliability Engineer, DevOps Engineer, Systems Engineer or Operations Engineer role
  • Experience supporting production services at scale within a DevOps or SRE environment
  • Strong working knowledge of ISP-related networking concepts including DNS, DHCP, PPPoE, RADIUS and IPv4/IPv6
  • Experience with observability tools such as Prometheus, Grafana, ELK or Splunk
  • Hands-on experience with containerisation and orchestration using Docker and Kubernetes
  • Cloud platform experience, ideally Google Cloud Platform, including automation and scaling practices
  • Strong Linux administration skills with scripting capability in Bash, Python or similar
  • Familiarity with CI/CD pipelines and source control tools such as GitHub Actions
  • Understanding of security frameworks and operational resilience best practices
  • DESIRABLE
  • Experience within ISP, MSP or telecommunications environments
  • Familiarity with enterprise IT architectures including OSS and BSS systems
  • Knowledge of information security frameworks such as ISO27001, NIST or GDPR
  • Experience with infrastructure automation tools such as Terraform or Ansible

Apply for this position