HPC Systems Engineer - TS/SCI Required job in Charlottesville

THE PHOENIX
Charlottesville, United States of America
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charlottesville, United States of America

Tech stack

Systems Engineering
Bash
Big Data
Command-Line Interface
Cloud Computing
Configuration Management
Nvidia CUDA
Linux
Distributed File Systems
General Parallel File Systems
InfiniBand
Python
Network Layer
Linux System Administration
Node.js
OpenMP
Open Source Technology
Parallel Computing
Performance Tuning
Remote Direct Memory Access
Red Hat Enterprise Linux - RHEL
Ansible
Scientific Computating
Data Streaming
Scripting (Bash/Python/Go/Ruby)
High Performance Computing
Slurm
Puppet
Docker

Job description

Phoenix is seeking a High Performance Computing (HPC) Systems Engineer to support the build, configuration, and sustainment of advanced Linux-based HPC cluster environments. This role is critical to enabling distributed compute workloads, scientific simulations, and GPU-accelerated processing within a secure research environment.

You will work in a cluster-scale computing environment where performance optimization, scheduler configuration, and distributed workload execution are key to mission success.

What You'll Do

  • Configure, deploy, and maintain multi-node Linux HPC clusters
  • Administer and optimize workload schedulers (e.g., Slurm, PBS)
  • Troubleshoot distributed compute workloads across cluster environments
  • Perform performance analysis across compute, storage, and network layers
  • Support GPU-enabled workloads and CUDA-based processing
  • Develop and maintain automation scripts and operational tooling
  • Assist in cluster provisioning and node deployment (e.g., xCAT, Warewulf)
  • Support containerized workloads within HPC environments, Our technical competencies include Big Data analytics (batch and streaming), Cloud Computing infrastructure, multi-INT visualization, and enterprise architectures. We support operational missions (All-Source, Financial, CND) and serve as Product Owners for our open-source research initiatives.

Requirements

  • Active TS/SCI clearance
  • Ability to work onsite in Charlottesville, VA
  • 6+ years of Linux systems administration experience
  • Hands-on experience with HPC clusters or distributed compute environments
  • Experience with workload schedulers such as:
  • Slurm
  • PBS / PBS Pro
  • Torque or similar
  • Strong command-line Linux administration skills (RHEL preferred)
  • Experience with scripting or automation (Bash, Python, or similar)
  • Ability to obtain DoD 8140 (8570) IAT Level II certification, * Experience administering multi-node HPC cluster environments
  • Familiarity with parallel/distributed file systems (Lustre, BeeGFS, GPFS)
  • Experience with MPI, OpenMP, or other parallel computing frameworks
  • Experience supporting GPU compute environments (CUDA)
  • Familiarity with container technologies:
  • Docker, Podman, Singularity/Apptainer
  • Experience with configuration management tools (Ansible, Puppet)
  • Background supporting research labs, university HPC, or defense environments

Technical Environment

You'll work with cutting-edge technologies, including:

  • Linux-based HPC clusters
  • High-performance networking (RDMA, InfiniBand)
  • Distributed compute frameworks (MPI, OpenMP)
  • GPU-enabled processing (CUDA)
  • Cluster provisioning tools (xCAT, Warewulf), * HPC cluster administration
  • Research computing or university HPC centers
  • National labs or scientific computing programs
  • Defense or intelligence community computing environments, * Scheduler expertise (Slurm, PBS, etc.)
  • Linux administration in multi-node environments
  • Troubleshooting distributed workloads
  • Automation and scripting experience

Benefits & conditions

Medical, Dental, Vision Insurance - 100% Company Paid Premiums

STD, LTD, and Life Insurance - 100% Company paid

401K - Automatic 10% company contribution no matching required

PTO - 4 weeks/year

Holidays - 11 paid/year

Birthdays off with pay

Referral Bonuses - Upfront AND Annually Recurring

Open Source Bonuses - Contribute to our Github projects

Professional Development - Paid training, Certifications, and Enrichment

About the company

Phoenix Operations Group is a high-end engineering services company dedicated to protecting and advancing our national cyber resources. As a small company, we rely on innovation to continually advance our employees' skills and provide game-changing solutions to our customers., Phoenix Operations Group is an Equal Opportunity Employer. Phoenix Operations Group does not discriminate based on race, religion, color, sex, gender, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law. All employment is decided based on qualifications, merit, and business needs.

Apply for this position