HPC Engineer (m/f) - Remote

1st solution consulting gmbh
2 days ago

Role details

Contract type
Contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote

Tech stack

C
Artificial Intelligence
Systems Engineering
Bash
C++
Nvidia CUDA
Computer Programming
System Configuration
Linux
InfiniBand
Job Scheduling
Python
OpenMP
Performance Tuning
Red Hat Enterprise Linux - RHEL
Ansible
Product Software Implementation Methods
Scripting (Bash/Python/Go/Ruby)
High Performance Computing
Parallel Computation

Job description

HPC Engineer (m/f)

Start: ASAP

Duration: 12 months +++

Location: remote

Tasks:

  • Execution and fulfillment of service requests for HPC resources.
  • Troubleshooting complex issues in HPC compute environments.
  • Documentation of HPC system configurations, procedures, and software implementations.
  • Creation and execution of test plans for HPC environments.
  • Proactive Development and improvement of current and future environments
  • Close collaboration with domain experts, data scientists, and hardware specialists.
  • Support day-to-day HPC operations and ensure systems remain stable and performant.
  • Address vague or incomplete user requests by applying diagnostic skills.
  • Develop HPC architectures with focus on performance, scalability, security.
  • Analyze and design complex HPC workloads.
  • Evaluate new technologies and prepare technical concepts.
  • Actively engage with the AI community to offer expert consultation for the ongoing improvement and strategic development
  • Observe application-based AI-Trends in our eco-system and guide about potential use-cases

Skills:

  • Minimum 3 years of professional experience in HPC-focused software or systems engineering.
  • Strong Linux (RHEL) administration knowledge.
  • Programming/Scripting skills: C, C++, Python, Bash, Ansible.
  • Knowledge of MPI, OpenMP, CUDA and other parallel computing frameworks.
  • Experience with cluster management, job scheduling and high-speed networking (InfiniBand).
  • Practical experience installing, configuring, troubleshooting, and performance tuning engineering simulation workloads on HPC clusters.
  • ITIL understanding for structured incident and service management.
  • Strong analytical capabilities, communication skills, and adaptability

Requirements

  • Minimum 3 years of professional experience in HPC-focused software or systems engineering.
  • Strong Linux (RHEL) administration knowledge.
  • Programming/Scripting skills: C, C++, Python, Bash, Ansible.
  • Knowledge of MPI, OpenMP, CUDA and other parallel computing frameworks.
  • Experience with cluster management, job scheduling and high-speed networking (InfiniBand).
  • Practical experience installing, configuring, troubleshooting, and performance tuning engineering simulation workloads on HPC clusters.
  • ITIL understanding for structured incident and service management.
  • Strong analytical capabilities, communication skills, and adaptability

Apply for this position