Senior Infrastructure Engineer

Alfa AI
Charing Cross, United Kingdom
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 150K

Job location

Charing Cross, United Kingdom

Tech stack

Link Aggregation (Ethernet)
Bash
Border Gateway Protocol
Unix
Data Centers
Disaster Recovery
Distributed Data Store
Distributed Systems
Python
Open Source Technology
Performance Tuning
Ansible
Software Engineering
Ceph
Scripting (Bash/Python/Go/Ruby)
Saltstack
Containerization
Kubernetes
Puppet
Terraform
Docker

Job description

We are seeking a highly experienced Senior Infrastructure Engineer to join our platform engineering team. This role focuses on the design, implementation, and maintenance of our large-scale, distributed storage infrastructure, with a primary emphasis on Ceph. You will be a key player in ensuring the reliability, performance, and scalability of our storage solutions, which are critical to all our services. The ideal candidate has a deep understanding of Ceph architecture and a proven track record of managing petabyte-scale clusters in a production environment., * Design, deploy, and manage highly available, scalable, and performant Ceph storage clusters across multiple data centers.

  • Develop and maintain automation for cluster provisioning, monitoring, and lifecycle management using tools like Ansible, Puppet, or SaltStack.
  • Act as the subject matter expert for all storage-related issues, providing advanced troubleshooting, performance tuning, and root cause analysis.
  • Collaborate with software engineering and SRE teams to define storage requirements, establish best practices, and integrate storage solutions into our CI/CD pipelines.
  • Plan and execute capacity planning, disaster recovery strategies, and major version upgrades for the Ceph ecosystem.

Requirements

Do you have experience in UNIX?, * 6+ years of hands-on experience managing large-scale Ceph clusters (1PB+) in a 24/7 production environment.

  • Expert-level knowledge of Ceph architecture, including RADOS, RGW, and RBD.
  • Strong proficiency in Linux/Unix administration and scripting (e.g., Python, Bash).
  • Experience with infrastructure-as-code and automation tools (e.g., Ansible, Terraform, Puppet).

Nice-to-Have Qualifications

  • Experience with containerization and orchestration technologies (Docker, Kubernetes).
  • Familiarity with networking concepts (BGP, LACP) in the context of distributed systems.
  • Contributions to open-source projects, particularly Ceph.

Apply for this position