Systems Engineer
Role details
Job location
Tech stack
Job description
We are seeking a Senior Systems Engineer with a strong interest in High Performance Computing. The ideal candidate will have experience in deploying and configuring complex open-source packages as well as managing cluster schedulers. A solid background in Linux, scripting, and parallel file systems is essential. In this role, you will be responsible for the maintenance and development of our scientific computing cluster. The primary tasks on the cluster involve genome-wide proteomics, human genome next generation sequencing, and large-scale statistical modeling.
Your role Deploy and maintain the Scientific Computing computational and data science ecosystem, which includes approximately 1,000 cores with high bandwidth, low latency interconnects, GPUs, large-shared memory nodes, scientific workflows, and 2.5 petabytes of storage in a production environment. Troubleshoot, isolate, and resolve technical issues encompassing applications, systems, hardware, software, and network components. Actively monitor system performance and integrity. Design, develop, and implement system administration tasks, including hardware and software configuration, configuration management, networking, and metrics. Respond to and resolve user support tickets.
Environment Dell nodes: Ice Lake and Sapphire Rapids CPUs: totaling approximately 1,000 cores with 48 TB RAM EDR Infiniband interconnects Lustre: 2 Fileservers 2 Metadata Server 2.5 PB net capacity GPUs: three nodes with one NVIDIA L4: 24GB GPU memory, four nodes with two NVIDIA H100 PCIe: 94GB GPU memory each, and one node with NVIDIA HGX H100: four GPUs each with 80GB GPU memory
Requirements
Bachelor's degree in computer science, engineering, or another scientific field (master's or PhD preferred). Over 8 years of progressive experience in HPC system administration and operations, preferably within a Redhat/CentOS Linux administration and Batch HPC cluster environment. Proven expertise in troubleshooting. Demonstrated ability to work collaboratively as part of a team and maintain a customer-focused approach. Familiarity with job schedulers such as Slurm. Proficient in scripting and programming. Preferable experience with parallel file systems and storage solutions such as Lustre. Knowledge of Infiniband is an advantage. Experience with configuration management and orchestration software, such as BCM, is preferred. Competence in reproducible software installations is desired. Familiarity with GPU technologies is advantageous. Experience with containerization tools such as enroot, pyxis, or apptainer is preferred. Proficiency in version control systems, such as git, is preferable. Ability to multitask effectively in a dynamic environment. Excellent communication skills and strong analytical abilities. Strong written, verbal, and interpersonal communication skills.
Benefits & conditions
Minimum gross monthly salary of EUR 4,500 paid 14 times a year Normal working week is 40 hours. Core working time is the working time during which the employee must be present at the workplace. Monday to Thursday from 09:00 to 15:00 and Friday from 09:00 to 13:00. A maximum of 1 home office day per calendar week may be used. This role is based in Vienna, Austria. You will be required to relocate if you are based elsewhere, and you will receive a generous relocation allowance to support you.