HPC User Support Engineer
Role details
Job location
Tech stack
Job description
The Argonne Leadership Computing Facility (ALCF) is seeking a skilled HPC User Support Engineer who will play a critical role in aiding this mission by providing technical support for ALCF's high-performance computing (HPC) systems and services. This role focuses on ensuring a seamless user experience for researchers utilizing ALCF resources.
The successful candidate will work closely with leading scientific researchers from government, academia, and industry to resolve technical issues, create user documentation, and deliver training sessions. This position also involves stewarding users through ALCF allocation programs and improving workflows to enhance system usability and efficiency.
Key responsibilities include, but are not limited to:
- Managing and resolving technical issues; debugging, installing, compiling, and running large-scale user applications
- Supporting users in writing scripts for automated execution and optimizing workflows
- Assisting with job scheduling, debugging, and troubleshooting HPC-related challenges
- Collaborating with ALCF domain experts to provide resolutions to user requests and technical issues
- Developing and maintaining documentation, both internally and on user facing websites
- Conducting training sessions and onboarding new users to ensure effective utilization of ALCF resources
- Enabling secure access to HPC systems and ensuring compliance with ALCF policies
- Providing support in AI technologies, including machine learning frameworks, and assisting users in deploying and optimizing AI workflows on HPC systems
- This position is eligible for fully remote
Requirements
- Degree in Computer Science or related field
- Strong technical consulting and support skills, possessing innate abilities in diplomacy and tact
- Excellent oral and written communications skills
- Hands-on experience with programming and scripting languages like Python, C/C++, FORTRAN, shell scripting etc.
- Considerable experience working on UNIX systems
- Working knowledge of HPC systems and concepts
- Understanding of job scheduling and knowledge using popular schedulers such as PBSPro
- Effective analytical, problem-solving, and learning skills
- Ability to model Argonne's core values of impact, safety, respect, integrity, and teamwork.
- To perform the essential functions of this position successful applicants must provide proof of U.S. citizenship, which is required to comply with federal regulations and contract., * Experience working in an HPC center supporting user codes
- Experience working with parallel codes using MPI implementations and openMP
- Experience with common machine learning frameworks
- Experience with source code management systems like Git, and CI tools like Jenkins or GitLab
- Experience with DBMS
- Experience with containerization
- Master's degree in computer science or computational science or related field, plus work experience
- Experience developing technical training documentation for users
- Strong interest in emerging technologies and applications
- Ability to work on multiple concurrent projects efficiently and effectively
- Highly motivated and user focused
This position can be hired at one of two levels; the selected candidate will be placed at the appropriate level (PT2 or PT3) dependent upon the depth and breadth of relevant knowledge and skills. The minimum requirements for the two levels are as follows:
- PT2: Bachelors and 2+ years of experience, or equivalent. The expected hiring range for this position is $69,750 - $108,810 annually.
- PT3: Bachelors and 4+ years of experience, or a Masters and 2+ years of experience, or equivalent. The expected hiring range for this position is $86,299 - $134,626 annually.