High Performance Computing (HPC) Systems Architect

Everforth Apex
Houston, United States of America
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior

Job location

Remote
Houston, United States of America

Tech stack

Bash
Data Centers
Linux
Job Scheduling
Python
Linux System Administration
Linux Servers
Performance Tuning
Shell Script
Software Configuration Management
High Performance Computing
Slurm
Isilon

Job description

  • Manage, monitor, and maintain the new HPC cluster, including compute, GPU, high-memory, hybrid, and management nodes.
  • Working with researchers or data scientists, in an academic, public health, or scientific research context
  • Oversee and optimize the Slurm job scheduler, including configuration, policies, queues, troubleshooting user jobs, and performance tuning.
  • Operate and support Tier 1 storage (PixStore) and its integration with Tier 2 storage (Dell EMC Isilon/PowerScale).
  • Act as a technical liaison between IT and the research community, supporting researchers and data scientists with onboarding, software configuration, and workload optimization.
  • Translate research needs into practical workflows on the cluster, providing guidance on best practices for running jobs and managing data.
  • Develop user-facing documentation, quick-start guides, and FAQs for researchers and data scientists.
  • Deliver trainings, workshops, and onboarding sessions to help users learn command-line basics, use scientific tools, and manage jobs via Slurm.
  • Collaborate with internal teams, faculty liaisons, and external vendors on support, enhancements, and long-term planning for HPC services., This position follows a hybrid work model with a mix of on-site and remote work, typically three days on-site and two days remote, with flexibility. The candidate must be available to come on-site as needed for data center access, physical hardware issues, and vendor visits. The schedule consists of standard daytime hours with some flexibility required for urgent issues affecting research workloads.

Requirements

An organization is deploying a new High Performance Computing (HPC) cluster and seeks an HPC-focused professional to administer, support, and enable research workloads on this environment. This role is focused on HPC environment management and research support, acting as the primary owner and advocate for the HPC environment and its users within a team that has strong Linux system administration expertise. Previous Data Scientist or mentorship of Data Scientist preferred. Experience working with researchers or data scientists, ideally in an academic, public health, or scientific research context, is necessary., Experience: Hands-on experience with High Performance Computing (HPC) environments is required, not just standalone Linux servers. Experience working with researchers or data scientists, ideally in an academic, public health, or scientific research context, is necessary. The candidate must be comfortable working on-site with physical hardware and data center environments.

Technical Skills: Strong Linux experience, particularly in a server or cluster environment, is required. Practical experience with job schedulers, specifically Slurm (configuration, job submission, troubleshooting, and optimization), is essential. The role requires the ability to understand and support software commonly used in research/HPC, such as Python-based workflows and scientific libraries, and to communicate technical concepts clearly to non-experts.

Preferred Qualifications

  • A heavy background as a data scientist or in a research computing support role.
  • Experience with PixStore or similar high-performance storage systems.
  • Experience with Dell EMC Isilon / PowerScale or other large-scale NAS platforms.
  • Familiarity with Bash, shell scripting, and scientific Python ecosystems.
  • Prior experience designing or managing HPC clusters and delivering user training.
  • Experience in higher education, healthcare, or public health research environments.

About the company

Everforth Apex is a world-class IT services company that serves thousands of clients across the globe. When you join Everforth Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Everforth Apex uses a virtual recruiter as part of the application process. Click for more details.

Apply for this position