Senior SRE
Role details
Job location
Tech stack
Job description
-
Help build an d maintain AWS using services such as ECS, Lambda, Aurora PostgreSQL, Kinesis, Firehose and S3 to support large - scale, event driven systems using Terraform.
-
Partner with software engineers building backend services (primarily Ruby ) to embed SRE principles from design through production.
-
Improve observability across our systems using Terraform, Open Telemetry , Grafana, CloudWatch and distributed tracing.
-
Help build and maintain a reliable containerised platform on ECS and automate everything from CI/CD pipelines to infrastructure provisioning.
-
Lead and contribute to incident response, embedding measurement, learning and continuous improvement.
-
Strengthen the security, networking and resilience posture of our platform through robust design and automation.
-
Optionally p articipate in the paid on - call rota, helping ensure Sky Protect customers receive a consistently excellent ex perience.
Requirements
Do you have experience in Terraform?, * Curiosity, collaboration and a continuous improvement mindset - someone who thrives in cross functional teams.
-
Proficiency in at least one programming language (we commonly use Python; services run in Ruby).
-
Administrative and/or architectural experience with AWS or similar large - scale cloud platforms, plus strong Linux fundamentals.
-
Strong Terraform or similar IAC experience, and a passion for automation and repeatability.
-
Hands - on knowledge of Docker containers and container-based s ervice -oriented architectures (e.g. Kubernetes, ECS) .
-
Solid grounding in networking and security .
-
SRE skills across incident management, monitoring, measurement and reliability practices.
-
Bonus skills including Concourse CI, Open Telemetry , Grafana, GCP & BigQuery , Azure/Azure AD, data engineering pipelines, AWS IoT, and experience using generative AI tools like GitHub Copilot for development.