Site Reliability Engineer

C3 AI

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

£ 61K

Charing Cross, United Kingdom

Amazon Web Services (AWS)

Build Automation

Azure

Software as a Service

Configuration Management

Database Theory

Linux

DevOps

Fault Tolerance

Java Virtual Machine (JVM)

Python

NoSQL

Reliability Engineering

Ansible

Ruby

Scripting (Bash/Python/Go/Ruby)

Google Cloud Platform

Kubernetes

Information Technology

Cassandra

Puppet

Maximize system uptime and availability, ensuring functional and performance SLAs.
Establish end-to-end monitoring and alerting on all critical aspects.
Solve complex problems for critical services and build automation to prevent problem recurrence.
Influence and create new designs, architectures, standards, and methods for supporting the platform.
Initiate and lead scripting and automation to streamline system updates and upgrades.
Set up critical infrastructure, tools, and framework to streamline the deployment cycle.

Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.
Expertise in Linux Operating Systems, Networking, and Database concepts.
Experience with Cassandra (or another NoSQL alternative).
Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.
Experience with configuration management systems such as Ansible or Puppet.
Experience in Ruby or Python; to automate and monitor systems.
Excellent problem-solving, critical thinking, and communication skills.
Experience supporting as a DevOps or sys admin for commercial SaaS solutions.
BS or MS in Computer Science, related field, or equivalent professional experience.