SRE (Site Reliability Engineer)

Go Arrow
3 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
£ 122K

Job location

Tech stack

API
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Server Applications
Azure
Bash
Oracle WebLogic Server
Ubuntu (Operating System)
CentOS
Cloud Computing
Configuration Management
Computer Networks
Databases
Continuous Integration
Linux
DevOps
Disaster Recovery
Distributed Systems
DNS
Elasticsearch
Perl
Github
Groovy
Web Servers
IBM Websphere Application Server
WildFly (JBoss AS)
Python
Shell
Microsoft SQL Server
Team Foundation Server
Windows Server
MySQL
Nagios
Nginx
OpenStack
Oracle
Powershell
Systems Development Life Cycle
Ruby on Rails
Release Management
Reliability Engineering
Ansible
Ruby
Software Deployment
Software Engineering
Transmission Control Protocol (TCP)
T-SQL
Virtual Machines
Web Services
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Sql Optimization
Reliability of Systems
Firewalls (Computer Science)
Gitlab
Gitlab-ci
Kubernetes
Puppet
REST
Terraform
Splunk
New Relic (SaaS)
Software Version Control
Docker
Jenkins
VMware
Microservices

Job description

We are seeking a highly skilled Site Reliability Engineer to join our dynamic IT team. The successful candidate will be responsible for ensuring the stability, scalability, and performance of our cloud-based and on-premise systems. This role involves developing automation solutions, managing infrastructure, and supporting software deployment processes across diverse environments. The ideal applicant will possess a strong background in system administration, cloud computing, and software development, with a keen eye for troubleshooting and incident management. This is an excellent opportunity for professionals passionate about maintaining high-availability systems and driving continuous improvement in system reliability., * Design, implement, and maintain scalable and reliable infrastructure using tools such as Kubernetes, Terraform, Ansible, Puppet, Chef, and VMware.

  • Monitor system performance with tools like New Relic, Splunk, Elasticsearch, and Nagios to proactively identify issues before they impact users.
  • Automate deployment pipelines leveraging Jenkins, GitLab CI/CD, TFS, and other continuous integration tools to streamline software releases.
  • Manage cloud environments including AWS, Azure, Google Cloud Platform (GCP), and OpenStack to optimise resource utilisation and cost-efficiency.
  • Develop scripts using PowerShell, Bash (Unix shell), Python, Ruby, Perl, Groovy, or Go to automate routine tasks and improve operational efficiency.
  • Troubleshoot complex issues related to web services such as REST APIs, web servers like NGINX or WebSphere, application servers including Weblogic or JBoss.
  • Implement disaster recovery plans and perform incident response activities to minimise downtime during outages or security breaches.
  • Collaborate with development teams on requirements gathering for new features or system upgrades following SDLC best practices.
  • Maintain comprehensive documentation of system configurations and procedures aligned with ITIL standards for release management and change control.

Requirements

Do you have experience in Weblogic?, * Proven experience in a Site Reliability Engineering or DevOps role within a large-scale enterprise environment.

  • Extensive knowledge of containerisation technologies such as Docker and Kubernetes.
  • Hands-on experience with cloud platforms including AWS (Amazon S3, EC2), Azure (Virtual Machines), Google Cloud Platform (GCP), or OpenStack.
  • Strong proficiency in scripting languages such as Python, PowerShell, Bash (Unix shell), Ruby on Rails or Groovy for automation tasks.
  • Familiarity with configuration management tools like Ansible, Puppet, Chef; version control systems including GitHub or GitLab; and CI/CD pipelines using Jenkins or TFS.
  • Experience managing distributed systems architecture involving microservices and APIs over TCP/IP networks.
  • Knowledge of databases including MySQL, Microsoft SQL Server (T-SQL), Oracle DBMS; along with experience in SQL optimisation and disaster recovery planning.
  • Understanding of computer networking concepts such as DNS, TCP/IP protocols, firewalls, LAN/WAN configurations.
  • Ability to troubleshoot software issues across various platforms including Linux (CentOS/Ubuntu) and Windows Server environments. This role offers an engaging environment where technical expertise is valued and professional growth is encouraged through exposure to cutting-edge technology stacks and best practices in system reliability engineering.

Apply for this position