HPC Engineer (m/f) - Remote
1st solution consulting gmbh
2 days ago
Role details
Contract type
Contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
IntermediateJob location
Remote
Tech stack
C
Artificial Intelligence
Systems Engineering
Bash
C++
Nvidia CUDA
Computer Programming
System Configuration
Linux
InfiniBand
Job Scheduling
Python
OpenMP
Performance Tuning
Red Hat Enterprise Linux - RHEL
Ansible
Product Software Implementation Methods
Scripting (Bash/Python/Go/Ruby)
High Performance Computing
Parallel Computation
Job description
HPC Engineer (m/f)
Start: ASAP
Duration: 12 months +++
Location: remote
Tasks:
- Execution and fulfillment of service requests for HPC resources.
- Troubleshooting complex issues in HPC compute environments.
- Documentation of HPC system configurations, procedures, and software implementations.
- Creation and execution of test plans for HPC environments.
- Proactive Development and improvement of current and future environments
- Close collaboration with domain experts, data scientists, and hardware specialists.
- Support day-to-day HPC operations and ensure systems remain stable and performant.
- Address vague or incomplete user requests by applying diagnostic skills.
- Develop HPC architectures with focus on performance, scalability, security.
- Analyze and design complex HPC workloads.
- Evaluate new technologies and prepare technical concepts.
- Actively engage with the AI community to offer expert consultation for the ongoing improvement and strategic development
- Observe application-based AI-Trends in our eco-system and guide about potential use-cases
Skills:
- Minimum 3 years of professional experience in HPC-focused software or systems engineering.
- Strong Linux (RHEL) administration knowledge.
- Programming/Scripting skills: C, C++, Python, Bash, Ansible.
- Knowledge of MPI, OpenMP, CUDA and other parallel computing frameworks.
- Experience with cluster management, job scheduling and high-speed networking (InfiniBand).
- Practical experience installing, configuring, troubleshooting, and performance tuning engineering simulation workloads on HPC clusters.
- ITIL understanding for structured incident and service management.
- Strong analytical capabilities, communication skills, and adaptability
Requirements
- Minimum 3 years of professional experience in HPC-focused software or systems engineering.
- Strong Linux (RHEL) administration knowledge.
- Programming/Scripting skills: C, C++, Python, Bash, Ansible.
- Knowledge of MPI, OpenMP, CUDA and other parallel computing frameworks.
- Experience with cluster management, job scheduling and high-speed networking (InfiniBand).
- Practical experience installing, configuring, troubleshooting, and performance tuning engineering simulation workloads on HPC clusters.
- ITIL understanding for structured incident and service management.
- Strong analytical capabilities, communication skills, and adaptability