Site Reliability Engineer
Role details
Job location
Tech stack
Job description
- Deliver, upgrade and maintain core service and project platforms, systems and automation.
- Develop and manage core monitoring and management platforms, ensuring high-quality observability.
- Collaborate with engineers, developers, operations and QA to design, harden and optimise platforms.
- Create architecture and solution designs for new services, delivering PoCs through to production.
- Conduct security risk and vulnerability assessments, ensuring secure deployment.
- Design, build and maintain automation tooling to improve performance and reliability.
- Diagnose and resolve performance issues across Linux, Docker, Terraform, Kubernetes and related technologies.
- Produce and maintain high-quality documentation and support mentoring within the team.
Requirements
Essential
-
Strong experience with Linux and Windows operating systems.
-
Excellent Linux administration skills (e.g. Ubuntu).
-
Proficiency in scripting languages (Bash, PHP, Python, PowerShell).
-
Hands-on experience with automation/DevOps tools such as Ansible, Terraform, CI/CD pipelines and Git.
-
Experience with Azure or similar cloud technologies.
-
Understanding of Desired State Configuration (DSC).
-
Strong problem-solving and analytical skills.
-
Willingness to learn, research new technologies and work in a fast-paced environment. Desirable
-
Experience with Infrastructure as Code (IaC) best practices.
-
Awareness of security considerations and best practice.
-
Knowledge of configuration as code tools (DSC, Ansible).
-
Experience with ARM Templates, Terraform and continuous inspection tooling.
-
Familiarity with Docker/Podman and container orchestration (Kubernetes, Swarm).
-
Experience with GitLab or similar configuration management tooling.
-
Knowledge of Jira or similar project/issue management tools.
-
Experience with Azure and Azure Stack Hub environments.
-
Monitoring and alerting tools experience (Nagios, Splunk).
-
Strong documentation and communication skills (Confluence).
-
Experience delivering and managing large-scale systems.
-
Understanding of Agile and DevOps principles. Security Clearance Due to the nature of the work, candidates must be UK sole nationals and eligible to obtain UK Security Clearance.