Senior SRE GCP

Robert Walters

Charing Cross, United Kingdom

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 90K

Job location

Charing Cross, United Kingdom

Tech stack

API

Artificial Intelligence

Bash

Cloud Computing

Databases

Python

Reliability Engineering

Software Engineering

Datadog

Google Cloud Platform

Istio

Reliability of Systems

Kubernetes

Build Tools

Dynatrace

Job description

As a Senior Site Reliability Engineer, you will be responsible for supporting high-throughput systems that serve millions of customers and billions of requests each month. You'll work on complex hybrid-cloud architectures, with a focus on Kubernetes-based workloads, networking, and monitoring solutions., As a Senior Site Reliability Engineer, you will be responsible for supporting high-throughput systems that serve millions of customers and billions of requests each month. You'll work on complex hybrid-cloud architectures, with a focus on Kubernetes-based workloads, networking, and monitoring solutions.

You'll also have the opportunity to drive improvements across cloud deployments, CI/CD pipelines, and cost optimisation while exploring new technologies and automation opportunities. Acting as a subject matter expert in site reliability engineering, you'll help foster a culture of continuous learning within the team., * Ensure critical systems are highly available, scalable, and resilient.

Develop and implement SLAs/SLOs/SLIs to enhance system reliability.
Build tools to improve incident management processes, including alerting mechanisms, runbooks, and auto-resolving solutions.
Drive innovation by exploring AI tooling and automation to improve SRE capabilities.
Collaborate with teams to optimise cloud deployments and monitoring solutions.
Actively participate in postmortems and support rotas to ensure operational excellence.

Requirements

We're seeking candidates with:

Proven experience in software development, testing, monitoring, and operational stability at scale.
Expertise in Kubernetes (ideally microservice architectures using Istio service mesh).
Strong knowledge of cloud-native solutions (preferably Google Cloud), including storage, networking, and resource provisioning.
Hands-on experience with monitoring tools such as Datadog or Dynatrace.
Proficiency in coding/scripting languages such as Python or Bash.
A solid understanding of automation best practices and CI/CD pipelines.
Experience designing APIs and working with database operations (streaming/batch).