Site Reliability Engineer

Experis

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

£ 100K

Job location

Tech stack

Java

Amazon Web Services (AWS)

Azure

Cloud Computing

Disaster Recovery

Distributed Systems

Python

Release Management

Reliability Engineering

Software Engineering

System Availability

Reliability of Systems

Kubernetes

Programming Languages

Job description

Detect and mitigate system issues to ensure high availability. Automate operational tasks to improve efficiency and reduce manual intervention. Prepare disaster recovery plans and ensure business continuity. Monitor system health and optimize performance. Collaborate with development teams to enhance system reliability. Implement CI/CD pipelines for seamless deployment and release management. Ensure compliance with security standards, governance policies, and regulatory requirements.

Requirements

Expertise in software development and engineering for large-scale distributed systems. Strong proficiency in programming languages such as Golang, Java, or Python. Extensive experience with cloud infrastructure providers (AWS, Azure, or GCP). Deep knowledge of container orchestration platforms like Kubernetes. Exceptional problem-solving skills and a passion for building scalable, secure solutions. Excellent communication skills to collaborate with cross-functional teams.