SRE

UK Health Security Agency
Liverpool, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Part-time / full-time
Working hours
Regular working hours
Languages
English
Compensation
£ 71K

Job location

Remote
Liverpool, United Kingdom

Tech stack

Microsoft Windows
Artificial Intelligence
Amazon Web Services (AWS)
Systems Engineering
Azure
Bash
Cloud Computing
Continuous Delivery
Continuous Integration
Linux
DevOps
Distributed Systems
Python
Performance Tuning
Powershell
Reliability Engineering
Ansible
Prometheus
Scientific Computating
Software Engineering
Datadog
Google Cloud Platform
High Performance Computing
Grafana
Infrastructure as Code (IaC)
Kubernetes
Information Technology
Data Analytics
Terraform

Job description

The United Kingdom Health Security Agency (UKHSA) is a system leader for health security; taking action internationally to strengthen global health security, providing trusted advice to government and the public and reducing inequalities in the way different communities experience and are impacted by infectious disease, environmental hazards, and other threats to health. UKHSA's remit, as an agency with a global-to-local reach, is to protect the health of the nation from infectious diseases and other external threats to health. As the nation's expert national health security agency UKHSA will:

  • Prevent: anticipate threats to health and help build the nation's readiness, defences and health security

  • Detect: use cutting edge environmental and biological surveillance to proactively detect and monitor infectious diseases and threats to health

  • Analyse: use world-class science and data analytics to assess and continually monitor threats to health, identifying how best to control and mitigate the risks

  • Respond: take rapid, collaborative and effective actions nationally and locally to mitigate threats to health when they materialise

  • Lead: lead strong and sustainable global, national, regional and local partnerships designed to save lives, protect the nation from public health threats and reduce inequalities., The Digital and Directorate has primary responsibility for scientific computing and research computing services and support. Key functions of the Digital Development and Operations unit are to provide and support such platforms required by the staff of UKHSA and provide technical capabilities to enable public health services, both within the Organisation and between the Organisation and its customers and stakeholders. We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our High Performance Computing (HPC) & SRE team. The role will be critical in ensuring the stability, scalability and performance of our services, combining software engineering and systems engineering to build, improve and run reliable, scalable production systems. The role will be responsible to the Principal Specialist Engineer SRE and is part of the HPC/SRE/AI & research computing unit. The SRE will use engineering principles to remediate infrastructure and operational problems with primary focus on automation and Continuous Integration/Continuous Delivery (CI/CD) ensuring our services run reliably, are scalable and perform optimally in production environments. The SRE will monitor and manage these aspects while taking responsibility for multiple cloud infrastructure services. Observability of systems will be key to prioritising the operational service improvements and performance improvements to meet and exceed Service Level Objectives (SLOs)., + Architect, develop and manage multi-cloud HPC platforms and on-premise infrastructure

  • Ensure services are highly available, scalable and resilient

  • Manage performance, capability and capacity planning

  • Support UKHSA's AI requirements

  • Ensure services are stable, scalable, performant and automated

  • Respond to incidents, troubleshooting issues, and restore services promptly

  • Prioritise operational service improvements to meet/increase SLO, minimise downtime

  • Ensure effective monitoring/alerting is in place to proactively identify issues using tools and dashboards and reduce times to respond to issues

  • Leverage automation to streamline tasks, reduce overhead on repeatable operations, reduce manual intervention and improve efficiency

  • Write maintainable, clear and concise code

  • Optimise system performance using strong problem-solving skills to identify bottlenecks with an engineering mindset

  • Ensure system can handle current/future workloads through automation and capacity planning

  • Improve services through observability and identify ways to improve observability practices

  • Define SRE principles and influence/educate stakeholders to adopt implemented principles

  • Provide technical documentation for engineers and training

  • Work closely with engineering and technology teams to improve operational processes, reduce manual tasks ensuring seamless collaboration/knowledge sharing and reduce risks and adapt to new ways of working

Working for our organisation We pride ourselves as being an employer of choice, where Everyone Matters promoting equality of opportunity to actively encourage applications from everyone, including groups currently underrepresented in our workforce. UKHSA ethos is to be an inclusive organisation for all our staff and stakeholders. To create, nurture and sustain an inclusive culture, where differences drive innovative solutions to meet the needs of our workforce and wider communities. We do this through celebrating and protecting differences by removing barriers and promoting equity and equality of opportunity for all. Please visit our careers site for more information https://gov.uk/ukhsa/careers, + Ensure services are stable, scalable, and performant through engineering best practices and system design

  • Proactively identify and address system bottlenecks using advanced problem-solving and performance tuning techniques.

  • Conduct capacity planning and implement solutions to ensure systems can support current and future workloads Incident Response & Troubleshooting

  • Respond swiftly to production incidents, ensuring minimal downtime and quick restoration of services

  • Lead root cause analysis and postmortems, implementing lessons learned to prevent recurrence Monitoring, Alerting & Observability

  • Design and implement effective monitoring and alerting systems using tools and dashboards

  • Improve observability of services, ensuring issues are identified and addressed before impacting users

  • Continuously refine monitoring practices to reduce alert fatigue and improve response times Automation & Tooling

  • Develop automation to eliminate manual, repetitive tasks and improve operational efficiency

  • Write clear, maintainable, and well-tested code to support automation efforts and system tooling

  • Drive initiatives to reduce operational toil and improve reliability through Infrastructure as Code (IaC) Service Level Objectives & Operational Improvements

  • Define, track, and continuously improve SLOs, SLIs, and error budgets.

  • Identify and prioritize operational improvements that align with business goals and user experience SRE Best Practices & Advocacy

  • Define and evangelize SRE principles across the organisation

  • Collaborate with stakeholders to integrate reliability practices into the development lifecycle Collaboration & Knowledge Sharing

  • Work closely with software engineering, DevOps, and infrastructure teams to streamline deployment and operational workflows

  • Improve cross-functional collaboration and promote a culture of shared responsibility for service reliability Documentation & Training

  • Maintain accurate technical documentation, runbooks, and post-incident reports

  • Provide training and mentorship to engineering teams on best practices and tools The above is only an outline of the tasks, responsibilities and outcomes required of this role. You will carry out any other duties as may reasonably be required. The job description and person specification may be reviewed on an ongoing basis in accordance with the changing needs of the organisation., You will be required to complete an application form. You will be assessed on the listed 9 Essential Criteria, and this will be in the form of:

  • An application form ('Employer/ Activity history' section on the application)

  • A 1000 word Statement of Suitability & Technical statements This should outline how your skills, experience, and knowledge provide evidence of your suitability for the role. Healthjobs UK has a word limit of 1500, but your statement of suitability must be no more than 1000. The Application form and supporting statement will be marked together. The application form is the kind of information you would put into your CV - please be advised you will not be able to upload your CV. Please complete the application form in as much detail as possible. Longlisting If a large number of applications are received, we will longlist into 3 piles of:

  • Meets all essential criteria

  • Meets some essential criteria

  • Meets no essential criteria Only those that meet all essential criteria will progress to shortlisting. Shortlisting In the event of a large number of applications we will shortlist against the lead criteria as follows:

  • Proven work experience as a Site Reliability Engineer, DevOps Engineer, Operations Engineer or similar role to the aforementioned If you are successful at this stage, you will progress to interview. Please note: Feedback will not be provided at this stage. Interview You will be invited to a remote interview. This vacancy is being assessed using Success Profiles. Behaviours and Technical Skills will be tested at interview. Candidates will be required to take a technical test, presentation and pass the interview process successfully to enable us to set the rate of the MPS awarded. The Behaviours tested during the interview stage will be:

  • Changing and Improving - Lead Behaviou

  • Delivering at pace

  • Managing a Quality Service

  • Working Together You will also be expected to prepare and present a 5 minute presentation during the interview. This will be based on either:

  • Designing a highly available and scalable service OR

  • Automating a complex operational process This will be decided and confirmed ahead of the interview. There will also be a technical test during the interview, where you will be asked technical based questions to test your knowledge. This will be based on:

  • SRE principles

  • Troubleshooting/incident management

  • System design

  • Automation/coding

  • Knowledge in Linux & networking, Artificial Intelligence can be a useful tool to support your application, however, all examples and statements provided must be truthful, factually accurate and taken directly from your own experience. Where plagiarism has been identified (presenting the ideas and experiences of others, or generated by artificial intelligence, as your own) applications may be withdrawn and internal candidates may be subject to disciplinary action. Please see our candidate guidance for more information on appropriate and inappropriate use. Link below: Artificial intelligence and recruitment | Civil Service Careers This is a Non-Reserved post under the Civil Service Nationality Rules. To be eligible for employment in the UK Civil Service applicants must meet the Civil Service Nationality Rules (CSNRs) which operate independently of and additionally to the Immigration Rules. Applicants must also meet necessary security and vetting requirements, along with any other relevant pre-employment checks. This job is broadly open to the following groups:

  • UK nationals

  • nationals of the Republic of Ireland

  • nationals of Commonwealth countries who have the right to work in the UK

  • nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities with settled or pre-settled status under the European Union Settlement Scheme (EUSS) (opens in a new window)

  • nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities who have made a valid application for settled or pre-settled status under the European Union Settlement Scheme (EUSS)

  • individuals with limited leave to remain or indefinite leave to remain who were eligible to apply for EUSS on or before 31 December 2020

  • Turkish nationals, and certain family members of Turkish nationals, who have accrued the right to work in the Civil Service For more information on job nationality requirements and the right to work in the UK, see the Civil Service Nationality rules (opens in a new window) and the UK Visas and Immigration rules (opens in a new window) For posts on UKHSA Civil Service terms and conditions, new entrants to the Civil Service are expected to start on the minimum of the pay band. For existing Civil Servants and roles advertised across government, the rules of transfer apply, i.e., level transfers move on current salary or the pay range minimum, transfers on promotion move to new pay range minimum or receive a 10% increase. Either case is determined by whichever is the highest. The Civil Service pay structure and progression is different from NHS Agenda for Change (AfC), most local authority pay grades and other systems that have annual pay increments. For further details, please refer to the Information Sheet- Starting Salaries & Benefits attachment. For AfC or Medical/Dental posts, you must have the correct professional registration to be appointed. The pay will follow the AfC or Medical & Dental terms & conditions. You may be asked to provide evidence of previous service whilst we are conducting pre-employment checks to determine your starting salary. For Temporary Appointments, if you are not currently a civil servant, you will take up the post on a Fixed Term appointment. You may be able to take this role up as a Secondment. If you are an existing Civil Servant, based outside of the UKHSA, you will take up the post as a loan which you will need your department to agree. You cannot take the post up as a fixed term. If you are an existing UKHSA member of staff, you will take up the post as either a level transfer or a temporary promotion as per the UKHSA's Pay policy. Given the nature of the work of the UKHSA, as a Category 1 responder under the Civil Contingencies Act, you may be required in an emergency, if deemed a necessity, to redeploy to another role at short notice. You may also be required to work at any other location, within reasonable travelling distance of your permanent home address, in line with the provisions set out in your contract of employment. Late Applications will unfortunately not be considered.

Working for the Civil Service The Civil Service Code (opens in a new window) sets out the standards of behaviour expected of civil servants. We recruit by merit on the basis of fair and open competition, as outlined in the Civil Service Commission's recruitment principles (opens in a new window). The Civil Service embraces diversity and promotes equality of opportunity. The law requires that selection for appointment to the Civil Service is on merit on the basis of fair and open competition, as outlined in the Civil Service Commission's Recruitment Principles. If you feel your application has not been treated in accordance with the Recruitment Principles, and you wish to make a complaint, in the first instance, you should contact UKHSA Public Accountability Unit via email: [email protected] If you are not satisfied with the response you receive from the Department, you can contact the Civil Service Commission: Visit the Civil Service Commission website here. Reserve List - If more than the required number of suitable candidates pass the interview criteria, you may be kept on a reserve list for 12 months subject to your agreement. You may be contacted, in merit-order, if similar roles with closely matching essential criteria become available and the department choose to appoint from a reserve list. The panel will assess if candidates meet the requirement of the role first, using a specific benchmark system. If you are interviewed for the post and do not meet the required threshold for the specified grade, your application may be assessed against a similar, lower grade role and you may be offered the post should one be available. Interview expenses will not be reimbursed. UKHSA is required to check employment and/or education history covering three consecutive years. Please ensure you give details of at least two different referees, even if you were employed in one company for three years or more. If you are offered a job, information will also be transferred into the national NHS Electronic Staff Records system. Please note, all communication regarding your application will be made via email, please ensure you check your junk/spam folders as emails are sometimes filtered there. Any move to UKHSA from another employer will mean you can no longer access childcare vouchers. This includes moves between government departments. You may however be eligible for other government schemes, including Tax-Free Childcare. Determine your eligibility at https://www.childcarechoices.gov.uk/. Benefits of working at UKHSA include

Requirements

  • Proven work experience as a Site Reliability Engineer, DevOps Engineer, Operations Engineer or similar role to the aforementioned

  • Strong coding skills in languages such as Python, PowerShell or Bash

  • Deep understanding of Linux/Unix & Windows systems, networking, and distributed systems

  • Experience with CI/CD pipelines, cloud platforms (e.g. Amazon Web Services, Google Cloud Platform, Azure) and container orchestration (e.g., Kubernetes)

  • Hands-on experience with observability tools (e.g., Prometheus, Grafana, Datadog) and alerting systems.

  • Solid understanding of infrastructure automation (e.g., Terraform, Ansible, PowerShell, Helm)

  • Excellent communication and collaboration skills

  • Experience with security best practices

  • Possess problem solving skills and the ability to respond to sudden unexpected demands Desirable Criteria

  • Experience leading post-incident reviews

  • Previous involvement in defining and driving adoption of SRE practices across an organization

  • Experience delivering training or mentoring junior engineers

Benefits & conditions

We offer great flexible working opportunities at UKHSA and operate using a hybrid working model where business needs allow. This provides us with greater flexibility about how and where we work, to get the best from our workforce. As a hybrid worker, you will be expected to spend a minimum of 60% of your contractual working hours (approximately 3 days a week pro rata, (averaged over a month) working at one of UKHSA's core HQ's (Birmingham, Leeds, Liverpool, and London) or at one of our scientific campus sites (Colindale, Porton and Chilton). If based at one of our scientific campuses, you will be required to have a minimum of a Counter Terrorism Check security vetting check as a minimum. Our core HQ offices are modern and newly refurbished with excellent city centre transport link and benefit from benefit from co-location with other government departments such as the Department for Health and Social Care (DHSC). Salary Breakdown (Grade 7)

  • Grade 7 National: £56,185 - £66,581

  • Grade 7 Outer London: £58,340 - £68,574

  • Grade 7 Inner London: £60,494 - £70,566

  • This role attracts a Market Pay Supplement of £5,000 to £10,000 Please note: If you are successful at interview, and are moving from another government department, NHS, or Local Authority, the relevant starting salary principles for level transfers or promotions will apply. Otherwise, roles are offered at the pay scale minimum for the grade, but in exceptional circumstances there may be flexibility if you are able to demonstrate you are already in receipt of an existing, higher salary. Pay increases are through the relevant annual pay award for the role and terms. Security Clearance Level Requirement: All successful candidates must meet the basic security requirements before they can be appointed. The level of security needed is:

  • Basic Personnel Security Standard (BPSS) DBS Requirement:

  • Basic DBS For this role you will also need to meet:

  • Counter Terrorism Check (CTC) For meaningful National Security Vetting checks to be carried out individuals need to have lived in the UK for a sufficient period of time. You should normally have been resident in the United Kingdom for the last 3 years as the role requires Counter Terrorism Check (CTC) clearance. In exceptional circumstances UK residency less than the outlined periods may not necessarily bar you from gaining national security vetting and applicants should contact the Vacancy Holder/Recruiting Manager listed in the advert for further advice. Please note: If you are successful at interview, and are moving from another government department, NHS, or Local Authority, the relevant starting salary principles for level transfers or promotions will apply. Otherwise, roles are offered at the pay scale minimum for the grade, but in exceptional circumstances there may be flexibility if you are able to demonstrate you are already in receipt of an existing, higher salary. Pay increases are through the relevant annual pay award for the role and terms., Plus public holidays and one privilege day for the King's birthday

  • Access to a generous Defined Benefit pension scheme with employer contributions.

  • Access to a cycle-to-work salary sacrifice scheme, season ticket advances and payroll giving.

  • Access to a retail discounts and cashback site.

  • We also promote flexible working patterns (part-time, job-share, condensed hours). UKHSA views flexible working as essential in enabling us to recruit and retain talented people, ensuring that they are able to enjoy a long-lasting career with us. All employees have the right to apply for flexible working and there are a range of options available including working from home, compressed hours and job sharing.

  • We also offer a generous maternity/ paternity and adoption leave package. Hybrid Working UKHSA operates a hybrid working model where business needs allow. This provides us with greater flexibility about how and where we work, to get the best from our workforce. As a hybrid worker, you will usually spend a minimum of 60% of your contracted hours (averaged over a month) working at one of UKHSA's locations (approximately 3 days a week pro rata) and the rest of your time working from home. Disability Confident Scheme The Civil Service embraces diversity and promotes equal opportunities. As such, we run a Disability Confident Scheme for candidates with disabilities who meet the minimum selection criteria at sift to ensure these candidates are invited to interview. If you wish to be included in this scheme please tick the box on your application form. Reasonable Adjustments The Civil Service is committed to making sure that our selection methods are fair to everyone. To help you during the recruitment process, we will take into account any reasonable adjustments that could help you. An adjustment is a change to the recruitment process or an adjustment at work. This is separate to the Disability Confident Scheme. If you need an adjustment to be made at any point during the recruitment process you should: Contact the recruitment team in confidence as soon as possible to discuss your needs. You can find out more information about reasonable adjustments across the Civil Service here: https://www.civil-service-careers.gov.uk/reasonable-adjustments/ International Police Check If you have spent more than 6 months abroad over the last 3 years you may need an International Police Check. This would not necessarily have to be in a single block, and could be time accrued over that period. Internal Fraud check If successful for this role as one aspect of pre-employment screening, applicant's personal details - name, national insurance number and date of birth - will be checked against the Cabinet Office Internal Fraud Hub and anyone included on the database will be refused employment unless they can show exceptional circumstances. Currently this is only for External candidates to the Civil Service. Security Vetting Please check the Security Clearance needed for the role and follow the link for more information: https://www.gov.uk/government/publications/united-kingdom-security-vetting-clearance-levels/national-security-vetting-clearance-levels Future location UKHSA is investing in a new state-of-the-art National Biosecurity Centre in Harlow, Essex, which will eventually bring together teams currently based at Canary Wharf, Colindale and Porton Down. For more details, please see: Huge biosecurity centre investment to boost pandemic protection - GOV.UK. The new facilities will start becoming operational in the mid-2030s, with full completion by 2038. Staff will move in phases as facilities become available. If you're appointed to a role currently based at Canary Wharf, Colindale or Porton Down, please note that we'll continue investing in these sites for the next decade. As we get closer to the transition, we'll provide full information about relocation support available to staff.

Apply for this position