Senior SRE GCP
Role details
Job location
Tech stack
Job description
As a Senior Site Reliability Engineer, you will be responsible for supporting high-throughput systems that serve millions of customers and billions of requests each month. You'll work on complex hybrid-cloud architectures, with a focus on Kubernetes-based workloads, networking, and monitoring solutions., As a Senior Site Reliability Engineer, you will be responsible for supporting high-throughput systems that serve millions of customers and billions of requests each month. You'll work on complex hybrid-cloud architectures, with a focus on Kubernetes-based workloads, networking, and monitoring solutions.
You'll also have the opportunity to drive improvements across cloud deployments, CI/CD pipelines, and cost optimisation while exploring new technologies and automation opportunities. Acting as a subject matter expert in site reliability engineering, you'll help foster a culture of continuous learning within the team., * Ensure critical systems are highly available, scalable, and resilient.
- Develop and implement SLAs/SLOs/SLIs to enhance system reliability.
- Build tools to improve incident management processes, including alerting mechanisms, runbooks, and auto-resolving solutions.
- Drive innovation by exploring AI tooling and automation to improve SRE capabilities.
- Collaborate with teams to optimise cloud deployments and monitoring solutions.
- Actively participate in postmortems and support rotas to ensure operational excellence.
Requirements
We're seeking candidates with:
- Proven experience in software development, testing, monitoring, and operational stability at scale.
- Expertise in Kubernetes (ideally microservice architectures using Istio service mesh).
- Strong knowledge of cloud-native solutions (preferably Google Cloud), including storage, networking, and resource provisioning.
- Hands-on experience with monitoring tools such as Datadog or Dynatrace.
- Proficiency in coding/scripting languages such as Python or Bash.
- A solid understanding of automation best practices and CI/CD pipelines.
- Experience designing APIs and working with database operations (streaming/batch).