HPC Linux Systems Engineer, Classified Environment
Role details
Job location
Tech stack
Job description
- Install, integrate, and administer Linux-based HPC clusters, storage systems, and high-speed networks.
- Monitor and optimize system performance, reliability, and scalability for large-scale computational workloads.
- Diagnose complex hardware and software issues, coordinating with vendors and internal engineering teams to implement solutions.
- Participate in system design, deployment, acceptance testing, and upgrades for leadership-class and research computing systems.
- Develop and maintain automation, configuration management, and monitoring solutions using tools such as Ansible, Puppet, Bash, or Python.
- Collaborate with scientists, researchers, and technical staff to ensure HPC resources effectively support scientific and mission objectives.
- Support identity management, authentication, and access control frameworks to maintain secure and compliant environments.
- Document system architectures, processes, and best practices, and contribute to internal knowledge sharing.
- Participate in on-call rotations and off-hours maintenance windows as required to support 24x7 operations., * Collaborate with diverse teams of scientists, engineers, and technologists from across the DOE complex and academia.
- Grow your career in a mission-driven, innovation-focused environment with access to professional development and leadership opportunities.
- Enjoy life in East Tennessee, with a thriving research community, scenic outdoor recreation, and a high quality of life.
This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.
We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.
Requirements
The Field Intelligence Operations Division invites candidates to apply to join our National Security Computing team to contribute to the design, implementation, and management of HPC systems within a classified environment. We are looking for candidates with experience in HPC architecture, cluster management, and parallel computing, with a proven ability to work within highly secure and regulated environments. This role involves close collaboration with security teams, scientists, and IT leadership to ensure that the HPC infrastructure meets the stringent performance, security, and compliance requirements necessary for classified work., * BSdegree in computer science, engineering, or a related field.
- A minimum of 5 years of experience in Linux systems administration, or an equivalent combination of education and experience., * Experience administering HPC clusters or large-scale Linux computing environments.
- Familiarity with batch schedulers (e.g., SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale).
- Experience implementing and managing automation and configuration management frameworks (Ansible, Puppet, Salt).
- Proficiency in scripting or programming (Python, Bash, Go).
- Understanding of networking fundamentals and high-speed interconnects (InfiniBand, Ethernet).
- Experience deploying or supporting identity management and multi-factor authentication systems (PingFederate, RSA SecureID, Entra ID).
- Familiarity with virtualization or containerization technologies (VMware, KVM, Podman, Apptainer).
- Experience troubleshooting and tuning high-performance storage, networking, and computer systems.
- Excellent communication, collaboration, and problem-solving skills.
- Demonstrated ability to lead or contribute to complex technical projects with minimal supervision.
Special Requirement:
- SCI Clearance: This position requires the ability to obtain and maintain a Secret Compartmented Information (SCI) clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program. In addition, due to the SCI, you may also be subject to random polygraph testing.