Site Reliability Engineer - AWS & Azure

Square One Resources Limited

Kilsby, United Kingdom

2 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

£ 157K

Job location

Kilsby, United Kingdom

Tech stack

Amazon Web Services (AWS)

Azure

Cloud Computing

Computer Engineering

DevOps

Monitoring of Systems

Information Technology Operations

Log Analysis

Reliability Engineering

Software Engineering

Data Processing

Containerization

Kubernetes

Information Technology

Terraform

Docker

Job description

We are seeking a highly skilled Site Reliability Engineer (SRE) with expertise in both Azure and AWS cloud platforms. This position is responsible for taking a lead role in migrating an existing on-prem HPC solution into Cloud, enhancing the reliability, scalability, and performance of that cloud infrastructure through automation, software engineering practices, and proactive system management. The ideal candidate will bridge the gap between development and operations, applying a software engineering mindset to IT operations and infrastructure., * Work with existing solutions already in place in the US to redefine, implement, and maintain scalable, reliable cloud infrastructure across Azure and AWS for the UK business as a similar but separate entity.

Develop automation scripts and tools to streamline operational tasks such as log analysis, environment testing, and incident response.
Collaborate with development and operations teams to ensure seamless deployment and performance of applications and services.
Monitor system performance and availability, proactively identifying and resolving issues.
Apply software engineering principles to infrastructure management, improving efficiency and reducing manual effort.
Deliver value by monitoring spending, optimizing resource usage, right-sizing and automation, and implement governance through tagging strategies and budget alerts.
Document the solution and deliver knowledge transfer and training to existing team members.

Requirements

Strong understanding of cloud-native architectures and services in Azure and AWS including AKS/EKS and it's automation.
Experience with infrastructure-as-code tools (eg, Terraform).
Familiarity with CI/CD pipelines, containerization (Docker, Kubernetes), and monitoring tools.
Knowledge of data processing and configuration design.
Experience with IT infrastructure and monitoring systems., * Bachelor's degree in Computer Science, Computer Engineering, Information Technology, or a related field.
Extensive experience in site reliability engineering, DevOps, or cloud infrastructure roles.

About the company

Square One is acting as both an employment agency and an employment business, and is an equal opportunities recruitment business. Square One embraces diversity and will treat everyone equally. Please see our website for our full diversity statement.