Site Reliability Engineer

People First
Charing Cross, United Kingdom
2 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 133K

Job location

Charing Cross, United Kingdom

Tech stack

Microsoft Word
Amazon Web Services (AWS)
Azure
Multi-Cloud

Job description

  • Identify systemic reliability risks and drive lasting preventative improvements.
  • Define, implement, and refine SLIs, SLOs, and error budgets aligned with business and customer outcomes.
  • Lead sophisticated incident management, post-incident analysis, and extended remediation planning.

Architecture & Resilience

  • Review and influence system architecture to improve scalability, availability, and failure isolation.
  • Design strategies for high availability, graceful degradation, and disaster recovery across multi-region environments.
  • Quantify tradeoffs between performance, cost, and operational risk.

CI/CD & Deployment Safety

  • Upgrade deployment pipelines and implement automation that diminishes risk and quickens delivery.
  • Implement safe deployment patterns (canary, blue/green, progressive delivery).
  • Ensure robust rollback and recovery mechanisms.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • Recent experience in site reliability or production engineering required
  • Experience designing and operating CI/CD systems with deployment safety guardrails.
  • Experience with multi-cloud or multi-region resilience architecture.
  • Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog).
  • Experience working with Infrastructure as Code tools such as Terraform or CloudFormation.
  • Hands-on experience operating production workloads in AWS, GCP, or Azure environments, including multi-region deployments.

Apply for this position