Senior Site Reliability Engineer

Stratospherec Ltd
Charing Cross, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 85K

Job location

Remote
Charing Cross, United Kingdom

Tech stack

ASP.NET
Java
Amazon Web Services (AWS)
Azure
Bash
C Sharp (Programming Language)
Cloud Computing
Code Review
Continuous Integration
DevOps
Github
Python
Object-Oriented Software Development
Pair Programming
Powershell
Reliability Engineering
Ansible
Prometheus
Software Engineering
Datadog
Data Logging
Scripting (Bash/Python/Go/Ruby)
Grafana
Infrastructure as Code (IaC)
Cloudformation
Containerization
Kubernetes
Terraform
Docker

Job description

  • In your role, you will provide technical leadership and mentorship within the team through knowledge sharing sessions, pair programming, code reviews, and solution design. You will identify and implement technical solutions to improve platform reliability, create mitigation strategies, and develop operational playbooks. Your responsibilities will also include implementing and maintaining monitoring, alerting, and logging systems to respond to incidents. You will ensure scalability and efficiency of our cloud infrastructure, conduct performance tests to identify and address bottlenecks, and develop and maintain platform solutions while automating infrastructure provisioning and management tasks using Infrastructure as Code. Collaborating with product engineering teams to design and build fit-for-purpose and observable software will also be a key part of your role.

Technologies:

  • AWS
  • Ansible
  • Azure
  • Bash
  • C#
  • Cloud
  • Datadog
  • DevOps
  • Docker
  • GCP
  • Grafana
  • Java
  • Kubernetes
  • PowerShell
  • Prometheus
  • Python
  • Terraform
  • ASP.NET
  • Architect
  • CI/CD
  • GitHub

Requirements

  • We are seeking a Senior Site Reliability Engineer with a strong Software Development background in C#, Java, or a similar object-oriented programming language. You should have proven experience in a Site Reliability Engineering, DevOps, or Platform Engineering role. Familiarity with scripting languages such as Bash, Python, or PowerShell is essential. You must have hands-on experience with containerization technologies, preferably Kubernetes and/or Docker, and be proficient with one or more public cloud providers, such as Azure, AWS, or GCP. Additionally, experience using Infrastructure as Code (IaC) tools like Terraform, Ansible, or CloudFormation, and knowledge of monitoring and logging tools such as DataDog, Prometheus, or Grafana is required. A track record of maintaining highly-available and performant production environments, as well as the ability to identify and implement effective mitigation strategies, is crucial.

Apply for this position