Systems Administrator
Role details
Job location
Tech stack
Job description
We have an excellent opportunity for a proactive and technically proficient Systems Administrator to join our dedicated on-site IT team. Reporting to our IT Manager, you'll collaborate closely to maintain, enhance and scale our High-Performance Computing (HPC) environment., This is an outstanding opportunity for an experienced Linux Systems Administrator to play a key role in ensuring the reliability and efficiency of our HPC systems. You'll be responsible for implementing optimisations, driving performance improvements, and supporting the scalability of our infrastructure. With a strong foundation in Linux systems, automation, and HPC environments, you'll bring a passion for infrastructure reliability and high performance to our growing team., * Be responsible for the administration and maintenance of HPC clusters, including compute nodes, storage systems, and networking
- Monitor system performance and ensure high availability of HPC resources
- Lead the deployment, configuration, and troubleshooting of HPC tooling (e.g. Slurm)
- Automate routine tasks using scripting languages (e.g. Bash, Python, Ansible)
- Collaborate with the IT Manager to plan and implement improvements to the HPC infrastructure
- Build and maintain documentation for systems, processes, and configurations
- Provide sound technical support to users of the HPC environment
- Ensure security and compliance across the HPC estate
- Maintain and develop the off-site backup strategy, ensuring data integrity, disaster recovery readiness, and compliance with data retention policies
- Maintain Synopsys EDA tooling, including license server management, environment configuration, and performance optimisation for electronic design automation workflows
Requirements
- Proven experience as a Systems Administrator (or similar role), managing Linux HPC clusters with Slurm, optimising performance, reliability, and resource utilisation for high-throughput computing workloads
- Solid experience in a Red Hat environment
- Previous hands-on experience managing HPC systems and tooling (e.g. job schedulers (Slurm), containerisation (such as Singularity or Docker), and parallel file systems)
- Strong scripting and automation skills, including Bash, Python and Ansible
- Experience with monitoring tools and performance tuning (such as Prometheus)
- Excellent understanding of networking concepts and protocols
- Experience in managing back-up solutions and disaster recovery planning
- Experience with Synopsys EDA tools, or similar electronic design automation environments
- Excellent problem-solving and communication skills, with the ability to respond flexibly in a changing environment
- Ability to work both independently and collaboratively in a fast-paced environment
- A strong commitment to delivering a stable, high-performance service that operates reliably to support users and workloads
Even better if you have…
- Red Hat Certified Systems Administrator
- Knowledge of CI/CD pipelines and DevOps practices and Github integration
- Exposure to scientific computing or research environments
- Experience with cloud-based HPC or hybrid environments
Benefits & conditions
What can you expect from us
- A comprehensive benefits package that includes an annual bonus plan, private medical insurance, life insurance, and a contributory pension scheme
- Equity, so that our team can share in the long-term success of Riverlane
- 28 days annual leave, plus bank holidays and enhanced family leave
- A diverse work environment that brings together experts in many fields (including software and hardware development, quantum information theory, physics and maths) and over 20 different nationalities
- A learning environment that encourages individual, team and company growth and development, including a regular programme of learning events and training and conference budgets