Site Reliability Engineer
Spait Infotech Private Limited
Charing Cross, United Kingdom
5 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Junior Compensation
£ 115KJob location
Remote
Charing Cross, United Kingdom
Tech stack
Amazon Web Services (AWS)
Automation of Tests
Azure
Bash
Unix
Command-Line Interface
Cloud Computing
Cloud Engineering
Continuous Integration
DevOps
Distributed Systems
DNS
Github
Hypertext Transfer Protocols (HTTP)
Python
Reliability Engineering
Prometheus
Datadog
CircleCI
Pulumi
Scripting (Bash/Python/Go/Ruby)
Load Balancing
Grafana
Firewalls (Computer Science)
Cloudformation
Gitlab-ci
Kubernetes
Terraform
Jenkins
Microservices
Job description
- Maintain and improve availability, performance, and reliability of production and staging systems.
- Build and enhance deployment pipelines and automated provisioning using IaC (Infrastructure-as-Code).
- Monitor system health through metrics, logs, and tracing; improve observability across the stack.
- Support incident response, troubleshooting, and post-incident reviews.
- Improve automation to reduce manual interventions and operational toil.
- Contribute to operational documentation, runbooks, and best practices.
- Assist senior engineers in maintaining cloud and platform infrastructure.
- Learn and follow best practices in reliability, CI/CD, and cloud operations.
- Improve CI/CD workflows and automated testing pipelines.
Requirements
Do you have experience in UNIX?, * Experience with Linux/Unix systems and command-line tools.
- Familiarity with cloud platforms such as AWS, Azure, or GCP.
- Understanding of CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, CircleCI, etc.).
- Hands-on scripting experience with Python, Bash, or similar.
- Basic understanding of networking, HTTP, DNS, load balancing, firewalls.
- Enthusiasm for automation, reliability, and DevOps culture.
- Experience with Kubernetes or container orchestration.
- Infrastructure-as-Code using Terraform, Pulumi, or CloudFormation.
- Observability tools (Prometheus, Grafana, Datadog, ELK/EFK, OpenTelemetry).
- Experience managing distributed systems or microservices architectures.
- Production incident management and on-call experience.
- Strong understanding of SRE principles (SLIs/SLOs, error budgets, Chaos Engineering).
- Cloud architecture expertise (AWS/GCP/Azure).