Back

Site Reliability Engineer

People First

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 133K

Job location

Charing Cross, United Kingdom

Tech stack

Microsoft Word

Amazon Web Services (AWS)

Azure

Multi-Cloud

Job description

Identify systemic reliability risks and drive lasting preventative improvements.
Define, implement, and refine SLIs, SLOs, and error budgets aligned with business and customer outcomes.
Lead sophisticated incident management, post-incident analysis, and extended remediation planning.

Architecture & Resilience

Review and influence system architecture to improve scalability, availability, and failure isolation.
Design strategies for high availability, graceful degradation, and disaster recovery across multi-region environments.
Quantify tradeoffs between performance, cost, and operational risk.

CI/CD & Deployment Safety

Upgrade deployment pipelines and implement automation that diminishes risk and quickens delivery.
Implement safe deployment patterns (canary, blue/green, progressive delivery).
Ensure robust rollback and recovery mechanisms.

Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
Recent experience in site reliability or production engineering required
Experience designing and operating CI/CD systems with deployment safety guardrails.
Experience with multi-cloud or multi-region resilience architecture.
Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog).
Experience working with Infrastructure as Code tools such as Terraform or CloudFormation.
Hands-on experience operating production workloads in AWS, GCP, or Azure environments, including multi-region deployments.

Apply for this position