Senior Site Reliability Engineer

Selby Jennings
Charing Cross, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charing Cross, United Kingdom

Tech stack

Cloud Computing
Linux
Python
Reliability Engineering
Kubernetes
Docker

Requirements

Our client, a leading systematic hedge fund, is seeking a Senior Site Reliability Engineer to join their London-based platform team. In this role, you will focus on enhancing the reliability, resilience, and day-to-day operability of a rapidly scaling engineering platform. You will work closely with software engineers and platform owners to strengthen observability, improve incident response processes, and drive measurable reliability outcomes. To be successful, you will bring hands-on experience applying SRE principles in production environments, alongside strong expertise in Linux systems. You must be capable of building and operating containerized workloads using tools such as Docker or Podman, and hold strong experience in Go and/or Python. We are looking for a highly technical individual with strong Infrastructure-as-Code proficiency, and the ability to effectively query, interpret, and reason about metrics using PromQL. A key part of this role will involve owning and improving the overall effectiveness of the platform's observability. Responsibilities: Own the effectiveness of the observability platform, ensuring high-quality signals, alert fidelity, and ongoing suitability as the platform scales. Build and maintain actionable, low-noise dashboards and alerting across metrics and logs. Define and apply SLIs and SLOs where they support operational decision-making. Apply IaaC across observability and supporting systems. Improve the reliability, scalability, and operability of core services through hands-on engineering changes. Requirements: Strong practical experience applying SRE principles in production environments. Strong Linux systems knowledge. Strong development experience in Go and/or Python. Strong IaaC proficiency. OpenTelemetry experience (metrics, logs, traces). Kubernetes and cloud-native platform experience.

Apply for this position