Senior Site Reliability Engineer

Stratospherec Ltd

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 85K

Job location

Remote

Charing Cross, United Kingdom

Tech stack

ASP.NET

Java

Amazon Web Services (AWS)

Azure

Bash

C Sharp (Programming Language)

Cloud Computing

Code Review

Continuous Integration

DevOps

Github

Python

Object-Oriented Software Development

Pair Programming

Powershell

Reliability Engineering

Ansible

Prometheus

Software Engineering

Datadog

Data Logging

Scripting (Bash/Python/Go/Ruby)

Grafana

Infrastructure as Code (IaC)

Cloudformation

Containerization

Kubernetes

Terraform

Docker

Job description

In your role, you will provide technical leadership and mentorship within the team through knowledge sharing sessions, pair programming, code reviews, and solution design. You will identify and implement technical solutions to improve platform reliability, create mitigation strategies, and develop operational playbooks. Your responsibilities will also include implementing and maintaining monitoring, alerting, and logging systems to respond to incidents. You will ensure scalability and efficiency of our cloud infrastructure, conduct performance tests to identify and address bottlenecks, and develop and maintain platform solutions while automating infrastructure provisioning and management tasks using Infrastructure as Code. Collaborating with product engineering teams to design and build fit-for-purpose and observable software will also be a key part of your role.

Technologies:

AWS
Ansible
Azure
Bash
C#
Cloud
Datadog
DevOps
Docker
GCP
Grafana
Java
Kubernetes
PowerShell
Prometheus
Python
Terraform
ASP.NET
Architect
CI/CD
GitHub

Requirements

We are seeking a Senior Site Reliability Engineer with a strong Software Development background in C#, Java, or a similar object-oriented programming language. You should have proven experience in a Site Reliability Engineering, DevOps, or Platform Engineering role. Familiarity with scripting languages such as Bash, Python, or PowerShell is essential. You must have hands-on experience with containerization technologies, preferably Kubernetes and/or Docker, and be proficient with one or more public cloud providers, such as Azure, AWS, or GCP. Additionally, experience using Infrastructure as Code (IaC) tools like Terraform, Ansible, or CloudFormation, and knowledge of monitoring and logging tools such as DataDog, Prometheus, or Grafana is required. A track record of maintaining highly-available and performant production environments, as well as the ability to identify and implement effective mitigation strategies, is crucial.