Senior Site Reliability Engineer
Stratospherec Ltd
Charing Cross, United Kingdom
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
£ 85KJob location
Remote
Charing Cross, United Kingdom
Tech stack
ASP.NET
Java
Amazon Web Services (AWS)
Azure
Bash
C Sharp (Programming Language)
Cloud Computing
Code Review
Continuous Integration
DevOps
Github
Python
Object-Oriented Software Development
Pair Programming
Powershell
Reliability Engineering
Ansible
Prometheus
Software Engineering
Datadog
Data Logging
Scripting (Bash/Python/Go/Ruby)
Grafana
Infrastructure as Code (IaC)
Cloudformation
Containerization
Kubernetes
Terraform
Docker
Job description
- In your role, you will provide technical leadership and mentorship within the team through knowledge sharing sessions, pair programming, code reviews, and solution design. You will identify and implement technical solutions to improve platform reliability, create mitigation strategies, and develop operational playbooks. Your responsibilities will also include implementing and maintaining monitoring, alerting, and logging systems to respond to incidents. You will ensure scalability and efficiency of our cloud infrastructure, conduct performance tests to identify and address bottlenecks, and develop and maintain platform solutions while automating infrastructure provisioning and management tasks using Infrastructure as Code. Collaborating with product engineering teams to design and build fit-for-purpose and observable software will also be a key part of your role.
Technologies:
- AWS
- Ansible
- Azure
- Bash
- C#
- Cloud
- Datadog
- DevOps
- Docker
- GCP
- Grafana
- Java
- Kubernetes
- PowerShell
- Prometheus
- Python
- Terraform
- ASP.NET
- Architect
- CI/CD
- GitHub
Requirements
- We are seeking a Senior Site Reliability Engineer with a strong Software Development background in C#, Java, or a similar object-oriented programming language. You should have proven experience in a Site Reliability Engineering, DevOps, or Platform Engineering role. Familiarity with scripting languages such as Bash, Python, or PowerShell is essential. You must have hands-on experience with containerization technologies, preferably Kubernetes and/or Docker, and be proficient with one or more public cloud providers, such as Azure, AWS, or GCP. Additionally, experience using Infrastructure as Code (IaC) tools like Terraform, Ansible, or CloudFormation, and knowledge of monitoring and logging tools such as DataDog, Prometheus, or Grafana is required. A track record of maintaining highly-available and performant production environments, as well as the ability to identify and implement effective mitigation strategies, is crucial.