Site Reliability Engineer

C3 AI
Charing Cross, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
£ 61K

Job location

Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Build Automation
Azure
Software as a Service
Configuration Management
Database Theory
Linux
DevOps
Fault Tolerance
Java Virtual Machine (JVM)
Python
NoSQL
Reliability Engineering
Ansible
Ruby
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Kubernetes
Information Technology
Cassandra
Puppet

Job description

  • Maximize system uptime and availability, ensuring functional and performance SLAs.

  • Establish end-to-end monitoring and alerting on all critical aspects.

  • Solve complex problems for critical services and build automation to prevent problem recurrence.

  • Influence and create new designs, architectures, standards, and methods for supporting the platform.

  • Initiate and lead scripting and automation to streamline system updates and upgrades.

  • Set up critical infrastructure, tools, and framework to streamline the deployment cycle.

Requirements

  • Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.

  • Expertise in Linux Operating Systems, Networking, and Database concepts.

  • Experience with Cassandra (or another NoSQL alternative).

  • Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.

  • Experience with configuration management systems such as Ansible or Puppet.

  • Experience in Ruby or Python; to automate and monitor systems.

  • Excellent problem-solving, critical thinking, and communication skills.

  • Experience supporting as a DevOps or sys admin for commercial SaaS solutions.

  • BS or MS in Computer Science, related field, or equivalent professional experience.

Apply for this position