Site Reliability Engineer

Spait Infotech Private Limited

Charing Cross, United Kingdom

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Junior

Compensation

£ 115K

Job location

Remote

Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)

Automation of Tests

Azure

Bash

Unix

Command-Line Interface

Cloud Computing

Cloud Engineering

Continuous Integration

DevOps

Distributed Systems

DNS

Github

Hypertext Transfer Protocols (HTTP)

Python

Reliability Engineering

Prometheus

Datadog

CircleCI

Pulumi

Scripting (Bash/Python/Go/Ruby)

Load Balancing

Grafana

Firewalls (Computer Science)

Cloudformation

Gitlab-ci

Kubernetes

Terraform

Jenkins

Microservices

Job description

Maintain and improve availability, performance, and reliability of production and staging systems.
Build and enhance deployment pipelines and automated provisioning using IaC (Infrastructure-as-Code).
Monitor system health through metrics, logs, and tracing; improve observability across the stack.
Support incident response, troubleshooting, and post-incident reviews.
Improve automation to reduce manual interventions and operational toil.
Contribute to operational documentation, runbooks, and best practices.
Assist senior engineers in maintaining cloud and platform infrastructure.
Learn and follow best practices in reliability, CI/CD, and cloud operations.
Improve CI/CD workflows and automated testing pipelines.

Requirements

Do you have experience in UNIX?, * Experience with Linux/Unix systems and command-line tools.

Familiarity with cloud platforms such as AWS, Azure, or GCP.
Understanding of CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, CircleCI, etc.).
Hands-on scripting experience with Python, Bash, or similar.
Basic understanding of networking, HTTP, DNS, load balancing, firewalls.
Enthusiasm for automation, reliability, and DevOps culture.
Experience with Kubernetes or container orchestration.
Infrastructure-as-Code using Terraform, Pulumi, or CloudFormation.
Observability tools (Prometheus, Grafana, Datadog, ELK/EFK, OpenTelemetry).
Experience managing distributed systems or microservices architectures.
Production incident management and on-call experience.
Strong understanding of SRE principles (SLIs/SLOs, error budgets, Chaos Engineering).
Cloud architecture expertise (AWS/GCP/Azure).