Linux Systems Engineer - HPC/A (all genders)

AITHYRA GmbH
Vienna, Austria
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Compensation
€ 58K

Job location

Vienna, Austria

Tech stack

Artificial Intelligence
Bash
Ubuntu (Operating System)
Configuration Management
Nvidia CUDA
Computer Networks
Data Centers
Dynamic Host Configuration Protocol
Debian Linux
Linux
Document Management Systems
File Systems
DNS
General Parallel File Systems
Monitoring of Systems
InfiniBand
Networking Hardware
Job Scheduling
Python
Linux System Administration
Lua
Networking Basics
Network File Systems
Red Hat Enterprise Linux - RHEL
Ansible
TensorFlow
Prometheus
Scientific Computating
TCP/IP
Virtual Local Area Networks
AI Infrastructure
High Performance Computing
PyTorch
CheckMK
Grafana
Firewalls (Computer Science)
Kubernetes
Information Technology
Slurm
Hardware Infrastructure
Puppet
Docker

Job description

We are seeking a talented Linux Systems Engineer - HPC/AI (all genders) who will join a world-class team and help build and operate the foundational infrastructure needed to support groundbreaking research. This is a unique opportunity to be part of something from the early beginning.

As a Linux Systems Engineer with a focus on HPC/AI, you will help build and operate an HPC cluster specialized for AI workloads. This role is ideal for someone with solid Linux systems administration experience who is excited to grow into the world of High-Performance Computing and AI infrastructure. You will contribute to bringing advanced AI solutions to life, using your technical skills to support scalable, reliable, and high-performance systems for cutting-edge research.

Reporting to Stephan Stadlbauer, Head of Scientific Computing, your role combines Linux systems engineering, hardware and infrastructure support, and close collaboration with multidisciplinary teams. This position focuses on helping design, implement, and operate infrastructure for innovative AI research. If you are passionate about Linux systems, on-premises infrastructure, and want to develop further in HPC and AI, this role is an excellent opportunity., * Deploy, rack, cable, configure, and maintain server hardware, GPU nodes, networking equipment, and storage systems in our on-premises data centers.

  • Administer and harden a large-scale Linux environment (Debian/Ubuntu) that forms the backbone of the HPC/AI cluster.
  • Assist in designing, building, and scaling our HPC cluster specifically optimized for AI workloads - learning HPC best practices along the way.
  • Configure and manage the workload manager (SLURM) to efficiently schedule, monitor, and manage diverse jobs including AI training and inference.
  • Implement and optimize high-performance storage solutions (e.g., BeeGFS, Lustre) tailored for large-scale AI/HPC datasets and model training.
  • Install and configure key software components, including parallel file systems, networking fabrics, and AI-specific libraries and frameworks (e.g., TensorFlow, PyTorch).
  • Troubleshoot and resolve complex technical issues related to hardware, software, and networking components during the cluster build and initial operation phases.
  • Provide technical support and guidance to scientists for running their AI workloads on the cluster, including job submission, monitoring, and basic troubleshooting.
  • Monitor system performance, resource utilization, and job efficiency to optimize throughput and infrastructure.
  • Document system design, configurations, procedures, and best practices for building and operating the AI HPC cluster.

Requirements

  • Education in Computer Science, Information Technology, or a related field (or equivalent practical experience).
  • Solid, hands-on experience in Linux system administration (e.g., Ubuntu, Debian, RHEL) in professional or large-scale environments.
  • Proficiency in scripting and automation (e.g., Bash, Python, Lua) for system management, deployment, and monitoring tasks.
  • Practical experience with server hardware -- you are comfortable racking equipment, diagnosing hardware faults, and working in a data-center environment.
  • Familiarity with configuration management and automation tools (e.g., Ansible, Puppet, Salt) and a strong desire to apply automation best practices at scale.
  • Good understanding of networking fundamentals (TCP/IP, VLANs, firewalls, DNS/DHCP); experience with high-speed networking or InfiniBand is a plus. Interest in or initial exposure to HPC concepts (job schedulers, parallel file systems, cluster management) -- with a genuine eagerness to learn and develop deep expertise.
  • Interest in or initial exposure to GPU-accelerated computing and AI workloads -- with a willingness to grow into this area.
  • Excellent problem-solving skills and a proactive, hands-on attitude towards tackling complex technical challenges in a fast-paced environment.
  • Ability to communicate effectively in English and collaborate with technical and research teams.

Desired Skills:

  • Experience with HPC systems, cluster management tools, or job schedulers (SLURM, PBS).
  • Experience with containers and orchestration (e.g., Docker, Apptainer, Kubernetes).
  • Familiarity with parallel or network file systems (e.g., BeeGFS, Lustre, GPFS).
  • Exposure to GPU management, CUDA toolkits, or AI frameworks (TensorFlow, PyTorch).
  • Experience working with research scientists or in an academic environment.
  • Familiarity with monitoring and observability stacks (Prometheus, Grafana, CheckMK).

Benefits & conditions

  • A competitive salary (minimum gross annual salary of EUR 58000)
  • Support for your wellbeing, including access to a company doctor
  • Fresh fruits, sweet treats, and free coffee & tea are available every day
  • Flexible working arrangements, with the option for one home office day per week
  • Core hours: Monday-Thursday 09:00-15:00, Friday 09:00-13:00
  • Meal allowance to make your day a little easier
  • A welcoming community with diverse social and cultural activities
  • Relocation support to help you settle in comfortably if you're moving to join us

About the company

AITHYRA is a pioneering biomedical research institute in Vienna, Austria, where artificial intelligence meets life sciences to drive the next biological revolution. AITHYRA is an institute of the Austrian Academy of Sciences (ÖAW) and was established with generous funding from the not-for-profit Boehringer Ingelheim Foundation Mainz. We are building a world-class collaborative environment that brings together AI specialists, experimental scientists, and engineers to push the boundaries of biomedical innovation and improve human health. AITHYRA's mission is to transform the way life sciences are conducted using AI to drive the biological revolution in the next decade, with the ultimate goal of improving human health. Join the best of the academic, corporate, and start-up world, and support AITHYRA, the Research Institute for Biomedical Artificial Intelligence in Vienna., We are a curiosity-driven, globally minded organization committed to building an inclusive and flexible workplace. At AITHYRA, we believe diverse perspectives strengthen collaboration and spark innovation. We welcome applicants from all backgrounds, cultures, and experiences to help us create teams that reflect the communities our science serves. Your unique contribution matters here - come realize your full potential with us.

Apply for this position