Site Reliability Engineer
People First
Charing Cross, United Kingdom
2 days ago
Role details
Contract type
Temporary contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
£ 133KJob location
Charing Cross, United Kingdom
Tech stack
Microsoft Word
Amazon Web Services (AWS)
Azure
Multi-Cloud
Job description
- Identify systemic reliability risks and drive lasting preventative improvements.
- Define, implement, and refine SLIs, SLOs, and error budgets aligned with business and customer outcomes.
- Lead sophisticated incident management, post-incident analysis, and extended remediation planning.
Architecture & Resilience
- Review and influence system architecture to improve scalability, availability, and failure isolation.
- Design strategies for high availability, graceful degradation, and disaster recovery across multi-region environments.
- Quantify tradeoffs between performance, cost, and operational risk.
CI/CD & Deployment Safety
- Upgrade deployment pipelines and implement automation that diminishes risk and quickens delivery.
- Implement safe deployment patterns (canary, blue/green, progressive delivery).
- Ensure robust rollback and recovery mechanisms.
Requirements
- Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
- Recent experience in site reliability or production engineering required
- Experience designing and operating CI/CD systems with deployment safety guardrails.
- Experience with multi-cloud or multi-region resilience architecture.
- Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog).
- Experience working with Infrastructure as Code tools such as Terraform or CloudFormation.
- Hands-on experience operating production workloads in AWS, GCP, or Azure environments, including multi-region deployments.