Senior Linux Systems Administrator
Role details
Job location
Tech stack
Job description
scripting (e.g. Python, Bash), source code management (e.g. Git) and infrastructure-as-code tools (e.g. Terraform, OpenTofu) A strong understanding of network design and switch/router/firewall configuration The desire to work collaboratively with users and other stakeholders to iteratively optimise systems The ability to travel to data centres and install rack-mounted equipment Experience with any or all of the following will also be highly valued: Modern GPU server deployment, tuning and management High-performance or high-availability storage servers/clusters (Lustre, Ceph, NFS) Advanced networking technologies (Infiniband, RDMA, RoCE) HPC workload managers (Slurm, LSF) Our infrastructure is used for research and development. Support will generally only be required during UK business hours however major maintenance may occasionally be scheduled for weekends. A collaborative and supportive work environment The opportunity to have a high impact in a growing organisation Competitive salary package and pension Professional development opportunities Networking opportunities with influential people from across the tech sector and academia A vibrant office environment located a few minutes' walk away from Cambridge train station CommonAI CIC is an equal opportunity employer and is committed to creating an inclusive and diverse workplace. Responsibilities Provision and maintain multi-rack GPU server clusters designed for both inference and training workloads. Collaborate with users and stakeholders to optimize systems iteratively.
Requirements
Linux, System Administration, Scripting, Python, Bash, Git, Infrastructure-as-Code, Terraform, Networking, GPU Servers, Storage Servers, HPC Workload Managers, High-Availability, Hypervisors, VMs, Collaboration