HPC Systems Engineer - TS/SCI Required job in Charlottesville
Role details
Job location
Tech stack
Job description
Phoenix is seeking a High Performance Computing (HPC) Systems Engineer to support the build, configuration, and sustainment of advanced Linux-based HPC cluster environments. This role is critical to enabling distributed compute workloads, scientific simulations, and GPU-accelerated processing within a secure research environment.
You will work in a cluster-scale computing environment where performance optimization, scheduler configuration, and distributed workload execution are key to mission success.
What You'll Do
- Configure, deploy, and maintain multi-node Linux HPC clusters
- Administer and optimize workload schedulers (e.g., Slurm, PBS)
- Troubleshoot distributed compute workloads across cluster environments
- Perform performance analysis across compute, storage, and network layers
- Support GPU-enabled workloads and CUDA-based processing
- Develop and maintain automation scripts and operational tooling
- Assist in cluster provisioning and node deployment (e.g., xCAT, Warewulf)
- Support containerized workloads within HPC environments, Our technical competencies include Big Data analytics (batch and streaming), Cloud Computing infrastructure, multi-INT visualization, and enterprise architectures. We support operational missions (All-Source, Financial, CND) and serve as Product Owners for our open-source research initiatives.
Requirements
- Active TS/SCI clearance
- Ability to work onsite in Charlottesville, VA
- 6+ years of Linux systems administration experience
- Hands-on experience with HPC clusters or distributed compute environments
- Experience with workload schedulers such as:
- Slurm
- PBS / PBS Pro
- Torque or similar
- Strong command-line Linux administration skills (RHEL preferred)
- Experience with scripting or automation (Bash, Python, or similar)
- Ability to obtain DoD 8140 (8570) IAT Level II certification, * Experience administering multi-node HPC cluster environments
- Familiarity with parallel/distributed file systems (Lustre, BeeGFS, GPFS)
- Experience with MPI, OpenMP, or other parallel computing frameworks
- Experience supporting GPU compute environments (CUDA)
- Familiarity with container technologies:
- Docker, Podman, Singularity/Apptainer
- Experience with configuration management tools (Ansible, Puppet)
- Background supporting research labs, university HPC, or defense environments
Technical Environment
You'll work with cutting-edge technologies, including:
- Linux-based HPC clusters
- High-performance networking (RDMA, InfiniBand)
- Distributed compute frameworks (MPI, OpenMP)
- GPU-enabled processing (CUDA)
- Cluster provisioning tools (xCAT, Warewulf), * HPC cluster administration
- Research computing or university HPC centers
- National labs or scientific computing programs
- Defense or intelligence community computing environments, * Scheduler expertise (Slurm, PBS, etc.)
- Linux administration in multi-node environments
- Troubleshooting distributed workloads
- Automation and scripting experience
Benefits & conditions
Medical, Dental, Vision Insurance - 100% Company Paid Premiums
STD, LTD, and Life Insurance - 100% Company paid
401K - Automatic 10% company contribution no matching required
PTO - 4 weeks/year
Holidays - 11 paid/year
Birthdays off with pay
Referral Bonuses - Upfront AND Annually Recurring
Open Source Bonuses - Contribute to our Github projects
Professional Development - Paid training, Certifications, and Enrichment