Site Reliability Engineer
Role details
Job location
Tech stack
Job description
As a Site Reliability Engineer, you won't just be supporting systems; you'll be ensuring the services that connect artists and fans around the globe are always on., System Reliability & Performance:
- Design, build, and maintain the availability, scalability, and performance of critical services.
- Develop and maintain robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution.
- Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement.
Automation & Efficiency:
- Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling.
- Create and maintain scripts and custom code to support and enhance our operational toolset.
- Support and optimize CI/CD pipelines to improve deployment speed and reliability.
Incident Management & Collaboration:
- Participate in an on-call rotation to troubleshoot and mitigate production incidents.
- Lead post-incident reviews and root cause analyses to implement lasting solutions.
- Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.
Requirements
Required Experience & Skills:
- A strong background in systems administration (Linux/Windows) in a large-scale environment.
- Proficiency in at least one programming language (e.g., Python, Go, Java).
- Hands-on experience with a major cloud platform (AWS, GCP, or Azure), with a high preference for AWS.
- Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible).
- Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace).
- Proven analytical and problem-solving abilities with experience in a high-pressure environment.
- Excellent communication skills and the ability to foster a collaborative team environment.
Preferred Experience & Skills:
- Bachelor's degree in an IT-related field.
- Experience managing large-scale, distributed systems for a global organization.
- Familiarity with IT governance standards like ITIL.
- Direct experience with ServiceNow for IT service management.
- Knowledge of chaos engineering, resilience testing, and advanced capacity planning., Job Description Service Engineer - Day Jobs in Orpington at Stannah - Join Our Team! We are looking for an experienced Service Engineer who has Stairlift experience to cover a route across North London As a Service Engineer at Stannah, you will play a vital role in keeping...
Benefits & conditions
Washroom Service Driver Here's what you get with phs. - A salary of £27,650+ OTE £28,650 - 40hr working week Monday- Friday - 23 days annual holiday + bank holidays - Flexible hours and development opportunities - Flexible start and finish times giving you a better work..., Washroom Service Driver Here's what you get with phs. A salary of £27,650+ OTE £28,650 40hr working week Monday- Friday 23 days annual holiday + bank holidays Flexible hours and development opportunities Flexible start and finish times giving you a better work life balance...., £40,000 - £50,000
Location: Hampshire, South East, UKProcess Reliability EngineerLocation: Hampshire (Hybrid - 2/3 days WFH, 2/3 days travelling to Hampshire sites or office-based)Employment Type: PermanentHours: 37 hours per weekSalary: £40-50k (dependent on experience)About the RoleOur..., Washroom Service Driver Here's what you get with phs. - A salary of £27,650+ OTE £28,650 - 40hr working week Monday- Friday - 23 days annual holiday + bank holidays - Flexible hours and development opportunities - Flexible start and finish times giving you a better work... © 2025, Jobsora.com