Site Reliability Engineer

Spait Infotech Private Limited
Charing Cross, United Kingdom
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior
Compensation
£ 115K

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Automation of Tests
Azure
Bash
Unix
Command-Line Interface
Cloud Computing
Cloud Engineering
Continuous Integration
DevOps
Distributed Systems
DNS
Github
Hypertext Transfer Protocols (HTTP)
Python
Reliability Engineering
Prometheus
Datadog
CircleCI
Pulumi
Scripting (Bash/Python/Go/Ruby)
Load Balancing
Grafana
Firewalls (Computer Science)
Cloudformation
Gitlab-ci
Kubernetes
Terraform
Jenkins
Microservices

Job description

  • Maintain and improve availability, performance, and reliability of production and staging systems.
  • Build and enhance deployment pipelines and automated provisioning using IaC (Infrastructure-as-Code).
  • Monitor system health through metrics, logs, and tracing; improve observability across the stack.
  • Support incident response, troubleshooting, and post-incident reviews.
  • Improve automation to reduce manual interventions and operational toil.
  • Contribute to operational documentation, runbooks, and best practices.
  • Assist senior engineers in maintaining cloud and platform infrastructure.
  • Learn and follow best practices in reliability, CI/CD, and cloud operations.
  • Improve CI/CD workflows and automated testing pipelines.

Requirements

Do you have experience in UNIX?, * Experience with Linux/Unix systems and command-line tools.

  • Familiarity with cloud platforms such as AWS, Azure, or GCP.
  • Understanding of CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, CircleCI, etc.).
  • Hands-on scripting experience with Python, Bash, or similar.
  • Basic understanding of networking, HTTP, DNS, load balancing, firewalls.
  • Enthusiasm for automation, reliability, and DevOps culture.
  • Experience with Kubernetes or container orchestration.
  • Infrastructure-as-Code using Terraform, Pulumi, or CloudFormation.
  • Observability tools (Prometheus, Grafana, Datadog, ELK/EFK, OpenTelemetry).
  • Experience managing distributed systems or microservices architectures.
  • Production incident management and on-call experience.
  • Strong understanding of SRE principles (SLIs/SLOs, error budgets, Chaos Engineering).
  • Cloud architecture expertise (AWS/GCP/Azure).

Apply for this position