Linux Systems Engineer - HPC/A (all genders)

AITHYRA GmbH

Vienna, Austria

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Compensation

€ 58K

Job location

Vienna, Austria

Tech stack

Artificial Intelligence

Bash

Ubuntu (Operating System)

Configuration Management

Nvidia CUDA

Computer Networks

Data Centers

Dynamic Host Configuration Protocol

Debian Linux

Linux

Document Management Systems

File Systems

DNS

General Parallel File Systems

Monitoring of Systems

InfiniBand

Networking Hardware

Job Scheduling

Python

Linux System Administration

Lua

Networking Basics

Network File Systems

Red Hat Enterprise Linux - RHEL

Ansible

TensorFlow

Prometheus

Scientific Computating

TCP/IP

Virtual Local Area Networks

AI Infrastructure

High Performance Computing

PyTorch

CheckMK

Grafana

Firewalls (Computer Science)

Kubernetes

Information Technology

Slurm

Hardware Infrastructure

Puppet

Docker

Job description

We are seeking a talented Linux Systems Engineer - HPC/AI (all genders) who will join a world-class team and help build and operate the foundational infrastructure needed to support groundbreaking research. This is a unique opportunity to be part of something from the early beginning.

As a Linux Systems Engineer with a focus on HPC/AI, you will help build and operate an HPC cluster specialized for AI workloads. This role is ideal for someone with solid Linux systems administration experience who is excited to grow into the world of High-Performance Computing and AI infrastructure. You will contribute to bringing advanced AI solutions to life, using your technical skills to support scalable, reliable, and high-performance systems for cutting-edge research.

Reporting to Stephan Stadlbauer, Head of Scientific Computing, your role combines Linux systems engineering, hardware and infrastructure support, and close collaboration with multidisciplinary teams. This position focuses on helping design, implement, and operate infrastructure for innovative AI research. If you are passionate about Linux systems, on-premises infrastructure, and want to develop further in HPC and AI, this role is an excellent opportunity., * Deploy, rack, cable, configure, and maintain server hardware, GPU nodes, networking equipment, and storage systems in our on-premises data centers.

Administer and harden a large-scale Linux environment (Debian/Ubuntu) that forms the backbone of the HPC/AI cluster.
Assist in designing, building, and scaling our HPC cluster specifically optimized for AI workloads - learning HPC best practices along the way.
Configure and manage the workload manager (SLURM) to efficiently schedule, monitor, and manage diverse jobs including AI training and inference.
Implement and optimize high-performance storage solutions (e.g., BeeGFS, Lustre) tailored for large-scale AI/HPC datasets and model training.
Install and configure key software components, including parallel file systems, networking fabrics, and AI-specific libraries and frameworks (e.g., TensorFlow, PyTorch).
Troubleshoot and resolve complex technical issues related to hardware, software, and networking components during the cluster build and initial operation phases.
Provide technical support and guidance to scientists for running their AI workloads on the cluster, including job submission, monitoring, and basic troubleshooting.
Monitor system performance, resource utilization, and job efficiency to optimize throughput and infrastructure.
Document system design, configurations, procedures, and best practices for building and operating the AI HPC cluster.

Requirements

Education in Computer Science, Information Technology, or a related field (or equivalent practical experience).
Solid, hands-on experience in Linux system administration (e.g., Ubuntu, Debian, RHEL) in professional or large-scale environments.
Proficiency in scripting and automation (e.g., Bash, Python, Lua) for system management, deployment, and monitoring tasks.
Practical experience with server hardware -- you are comfortable racking equipment, diagnosing hardware faults, and working in a data-center environment.
Familiarity with configuration management and automation tools (e.g., Ansible, Puppet, Salt) and a strong desire to apply automation best practices at scale.
Good understanding of networking fundamentals (TCP/IP, VLANs, firewalls, DNS/DHCP); experience with high-speed networking or InfiniBand is a plus. Interest in or initial exposure to HPC concepts (job schedulers, parallel file systems, cluster management) -- with a genuine eagerness to learn and develop deep expertise.
Interest in or initial exposure to GPU-accelerated computing and AI workloads -- with a willingness to grow into this area.
Excellent problem-solving skills and a proactive, hands-on attitude towards tackling complex technical challenges in a fast-paced environment.
Ability to communicate effectively in English and collaborate with technical and research teams.

Desired Skills:

Experience with HPC systems, cluster management tools, or job schedulers (SLURM, PBS).
Experience with containers and orchestration (e.g., Docker, Apptainer, Kubernetes).
Familiarity with parallel or network file systems (e.g., BeeGFS, Lustre, GPFS).
Exposure to GPU management, CUDA toolkits, or AI frameworks (TensorFlow, PyTorch).
Experience working with research scientists or in an academic environment.
Familiarity with monitoring and observability stacks (Prometheus, Grafana, CheckMK).

Benefits & conditions

A competitive salary (minimum gross annual salary of EUR 58000)
Support for your wellbeing, including access to a company doctor
Fresh fruits, sweet treats, and free coffee & tea are available every day
Flexible working arrangements, with the option for one home office day per week
Core hours: Monday-Thursday 09:00-15:00, Friday 09:00-13:00
Meal allowance to make your day a little easier
A welcoming community with diverse social and cultural activities
Relocation support to help you settle in comfortably if you're moving to join us

About the company

AITHYRA is a pioneering biomedical research institute in Vienna, Austria, where artificial intelligence meets life sciences to drive the next biological revolution. AITHYRA is an institute of the Austrian Academy of Sciences (ÖAW) and was established with generous funding from the not-for-profit Boehringer Ingelheim Foundation Mainz. We are building a world-class collaborative environment that brings together AI specialists, experimental scientists, and engineers to push the boundaries of biomedical innovation and improve human health. AITHYRA's mission is to transform the way life sciences are conducted using AI to drive the biological revolution in the next decade, with the ultimate goal of improving human health. Join the best of the academic, corporate, and start-up world, and support AITHYRA, the Research Institute for Biomedical Artificial Intelligence in Vienna., We are a curiosity-driven, globally minded organization committed to building an inclusive and flexible workplace. At AITHYRA, we believe diverse perspectives strengthen collaboration and spark innovation. We welcome applicants from all backgrounds, cultures, and experiences to help us create teams that reflect the communities our science serves. Your unique contribution matters here - come realize your full potential with us.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all