TS/SCI HPC Systems Engineer

Insight Global
Rivanna, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 180K

Job location

Rivanna, United States of America

Tech stack

Systems Engineering
Bash
Command-Line Interface
Linux
Python
Linux System Administration
Scripting (Bash/Python/Go/Ruby)
High Performance Computing
GIT
Information Technology
Slurm
Software Version Control
Server Operating Systems & Platforms

Job description

A federal IT services client of Insight Global is hiring for a highly skilled HPC Systems Engineer to join their team full time in Charlottesville, VA. This role requires an active TS/SCI clearance and is 5 days/week on site. Relocation packages are available!

The HPC (High Performance Computing) Systems Engineer will work directly with engineers, analysts, and researchers to support job execution, troubleshoot workload failures, and improve the performance and efficiency of compute workloads running on HPC clusters. The Engineer will assist users with scheduler job scripts, application execution, and workload performance troubleshooting while promoting HPC best practices for efficient cluster utilization. This role serves as the primary interface between mission users and HPC platform infrastructure teams., Provide direct support to users running computational workloads on HPC clusters (classified & unclassified)

  • Assist with creating, submitting, and troubleshooting job scripts (Slurm, PBS), including CPU/GPU resource allocation
  • Diagnose and resolve failing, slow, or hanging jobs (including MPI, parallel, and GPU workloads)
  • Support application setup, compilation, and execution in Linux-based HPC environments
  • Advise users on best practices to improve job performance, efficiency, and resource utilization
  • Monitor workload usage and recommend optimizations to maximize cluster throughput
  • Develop and maintain automation scripts/tools (Bash/Python) and manage them in version control (Git)
  • Collaborate with infrastructure teams and maintain documentation to resolve system issues and support users

Requirements

Active TS/SCI clearance

  • ONE of the following certifications: Security+, CCNA Security, CySA+, GICSP, GSEC, CND, SSCP, CAP, CASP+, CISM, CISSP, GSLC, CCISO, HCISPP
  • 5+ years of experience working in Linux environments supporting distributed compute workloads or HPC cluster platforms
  • Experience executing or troubleshooting workloads using HPC workload schedulers such as Slurm, PBS, Torque, or similar systems
  • Experience administering command-line Linux systems including scripting (Bash, python, etc.) and troubleshooting applications in multi-user server environments.
  • Experience supporting systems within DoD/DoW or IC environments

Benefits & conditions

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.

Apply for this position