Site Reliability Engineer

Cabinet Office
Charing Cross, United Kingdom
9 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English, Norwegian
Experience level
Senior

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Amazon Web Services (AWS)
Bash
Continuous Integration
DevOps
Github
Python
Network Protocols
Reliability Engineering
Prometheus
Ruby
Web Application Security
Wi-Fi Technology
Datadog
Scripting (Bash/Python/Go/Ruby)
Delivery Pipeline
Infrastructure as Code (IaC)
Cloudformation
Concourse
Containerization
Cloudwatch
Zendesk
Terraform
Docker
Jenkins

Job description

GovWifi is a government-critical service that enables secure, consistent WiFi access across the UK public sector, supporting staff and visitors in thousands of locations. We're looking for a skilled DevOps Engineer to help keep this high-profile platform reliable, secure, and future-ready. You'll work with a multi-disciplinary team to maintain service availability, automate infrastructure, and deliver improvements. From deploying secure solutions in AWS to strengthening monitoring and incident response, you'll play a vital role in keeping GovWifi resilient at scale. If you enjoy solving complex problems, collaborating with diverse teams, and want your engineering skills to directly benefit the public sector, this is your opportunity to make real impact on a service used nationwide., As a DevOps Engineer on the GovWifi service, you will be part of a cross-disciplinary team responsible for ensuring the secure, reliable, and efficient operation of a government-critical platform. Your work will directly support thousands of users across the UK public sector, helping create a seamless and secure WiFi experience in government buildings nationwide., We'll assess you against these technical skills during the selection process:

  • AWS Cloud Platform Expertise
  • Infrastructure as Code (IaC)
  • Scripting and Automation
  • Containerisation and Orchestration
  • Networking and Security Fundamentals
  • CI/CD Pipeline Development and Maintenance
  • Incident Management and Troubleshooting:
  • Monitoring and Observability

We only ask for evidence of these technical skills on your application form:

  • AWS Cloud Platform Expertise
  • Infrastructure as Code (IaC)
  • Scripting and Automation
  • CI/CD Pipeline Development and Maintenance, * UK nationals
  • nationals of the Republic of Ireland
  • nationals of Commonwealth countries who have the right to work in the UK
  • nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities with settled or pre-settled status under the European Union Settlement Scheme (EUSS)
  • nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities who have made a valid application for settled or pre-settled status under the European Union Settlement Scheme (EUSS)
  • individuals with limited leave to remain or indefinite leave to remain who were eligible to apply for EUSS on or before 31 December 2020
  • Turkish nationals, and certain family members of Turkish nationals, who have accrued the right to work in the Civil Service, * Maintaining service reliability: Monitor, manage and improve the availability of GovWifi, ensuring the platform consistently meets service level objectives. Respond to and resolve incidents quickly, serving as a point of escalation when needed.
  • Automating infrastructure: Use Terraform (or other IaC tools) to automate deployments and infrastructure changes, reducing manual intervention and improving consistency.
  • Deploying securely: Carry out safe, reliable deployments of code and configuration into AWS environments (ECS, EC2, CloudWatch, ELB, CodeBuild, CodePipeline).
  • Improving system resilience: Design, build and implement monitoring, alerting, and recovery mechanisms to keep systems highly available and secure.
  • Mitigating risks: Identify, assess, and reduce security vulnerabilities across the platform, applying web security best practices and implementing protective measures.
  • Supporting migrations and transitions: Assist with tool changes, platform improvements, or policy-driven migrations that affect GovWifi operations.
  • Building for users: Develop new features or improvements through prototyping, proof-of-concepts, and continuous iteration in collaboration with product managers and developers.
  • Knowledge sharing: Document technical decisions clearly, add to the team's knowledge base, and explain complex issues to non-technical colleagues in a clear, supportive way.
  • Customer support: Engage with end-user requests and issues through support tools such as Zendesk, helping resolve technical challenges directly impacting users.
  • Driving continuous improvement: Pair with teammates, contribute to engineering improvement initiatives, and promote best practices across the service.

Requirements

Jenkins, Docker, Security, Remediation, Orchestration, Scripting, Ec2, Fundamentals, Continuous Integration, Code, Root, Infrastructure, Python, Troubleshooting, Ecs, Maintenance, Bash, Reliability, Ruby, Dashboards, Norway, Automation, Pipeline Development, Languages, Terraform Python or Bash Experience with AWS Cloud Platform, * AWS Cloud Platform Expertise: Knowledge and experience with AWS services including ECS, Loadbalancing, EC2, CloudWatch, CodeBuild, CodePipeline.

  • Infrastructure as Code (IaC): Proficiency with Terraform or similar tools (CloudFormation) for automating environment provisioning and management.
  • Scripting and Automation: Ability to write and maintain scripts in languages like Python, Bash, or Ruby for automation of operational tasks and deployments.
  • CI/CD Pipeline Development and Maintenance: Experience designing, implementing, and maintaining continuous integration and delivery pipelines using tools like Jenkins, GitHub Actions, Concourse or AWS CodePipeline. Should a large number of applications be received, an initial sift may be undertaken using the lead Behaviour (Working Together). Candidates who pass the initial sift may be progressed to a full sift, or progressed straight to assessment/interview.

TECHNICAL SKILLS

  • AWS Cloud Platform Expertise: Knowledge and experience with AWS services including ECS, Loadbalancing, EC2, CloudWatch, CodeBuild, CodePipeline.
  • Infrastructure as Code (IaC): Proficiency with Terraform or similar tools (CloudFormation) for automating environment provisioning and management.
  • Scripting and Automation: Ability to write and maintain scripts in languages like Python, Bash, or Ruby for automation of operational tasks and deployments.
  • CI/CD Pipeline Development and Maintenance: Experience designing, implementing, and maintaining continuous integration and delivery pipelines using tools like Jenkins, GitHub Actions, Concourse or AWS CodePipeline.
  • Containerisation and Orchestration: Familiarity with Docker and container management to package, deploy, and run applications efficiently in cloud environments.
  • Networking and Security Fundamentals: Understanding of network protocols (TCP/UDP), AWS VPCs, security groups, firewall rules, and security best practices to maintain a secure platform.
  • Incident Management and Troubleshooting: Skills in detecting, diagnosing, and resolving operational incidents, including root cause analysis and automation of remediation.
  • Monitoring and Observability: Knowledge of monitoring tools (CloudWatch, Prometheus, Datadog) and setting up alerts and dashboards to maintain platform health and reliability. Strengths will also be assessed at interview stage. Please note, you must successfully pass the first interview stage to progress to the second interview stage.

Apply for this position