High-Performance Computing System & Server Administrator, College of Science and Mathematics

James Madison University
Harrisonburg, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Harrisonburg, United States of America

Tech stack

Microsoft Windows
C++
Computer Clusters
Nvidia CUDA
Linux
Disaster Recovery
Fortran
R
Python
OpenMP
Software Systems
High Performance Computing
Slurm
Hardware Infrastructure
Network Server
VMware

Job description

  • Install and maintain server hardware including vendor coordination. Installation includes lifting and racking of servers.
  • Install and maintain HPC software and operating system infrastructure.
  • In consultation with vendor resources, maintain 2-5 year plan for the HPC cluster. Work with PIs and The CSM leadership to allocate resources in line with this plan as funding becomes available.
  • Coordinate HPC users' access to an external vendor for HPC consulting, and assist collaboration between users and vendor in these consultations.
  • Develop training, documentation, and informational resources for faculty and students on use of the HPC system. This would include documentation of support structures and hardware and software systems to improve research community (faculty, staff, and student) access and workload efficiencies.
  • Work with administration, JMU IT, faculty, and students to maintain and expand existing HPC resources, and facilitate research projects while balancing HPC system use.
  • Work with university IT and administration to ensure security, connectivity, and disaster recovery, and to build future capacity as needed for the university's research needs.

Requirements

Bachelor's Degree in a related field and a minimum of two-three years of work experience as an HPC or Linux administrator, or an equivalent combination of education and experience. Knowledge of scientific programming languages such as C/C++, Fortran, Python, R, etc. Working knowledge of software related to cluster computing and servers, such as Flex, Slurm, PBS/TORQUE, OpenMP, CUDA, OpenMPI/MPICH, VMWare, etc. Experience using and administering Linux and Windows systems. Ability to liaise effectively with faculty and external HPC vendor for trouble-shooting. Ability to utilize and develop tools within the technology field. Willingness to work with and train undergraduate students and faculty researchers; experience with training/instruction of HPC software and hardware preferred.

About the company

Mission We are a community committed to preparing students to be educated and enlightened citizens who lead productive and meaningful lives. Vision To be the national model for the engaged university: engaged with ideas and the world. Who We Are Situated in the heart of Virginia's beautiful Shenandoah Valley, the city of Harrisonburg is a vibrant community with a population of approximately 52,000. Harrisonburg is conveniently located approximately 120 miles from Washington, D.C. and Richmond, VA. JMU is a selective, public institution with a growing national reputation for offering experiences that lead to an outstanding education and supportive environment for students, faculty and staff. The student body includes approximately 21,000 undergraduate and 1,800 graduate students, with over 1,000 full-time instructional faculty. JMU offers thriving programs in the liberal arts, science and technology, and professional disciplines at the undergraduate, master's and doctoral levels. JMU has achieved national recognition for the high quality of its academic programs, focus on maintaining strong student/faculty interaction, and innovative faculty research., The College of Science and Mathematics (CSM) is seeking a full-time High-Performance Computing (HPC) System and Server Administrator to support primarily Linux-based research computing environments. The CSM has a wide range of computational needs, with faculty spread across five departments, including new, interdisciplinary data science programs, which add to the variety of the college's HPC requirements. This position represents an opportunity to manage existing on-premises server and computational infrastructure to support current and future efforts related to servers and HPC. CSM server and storage resource management and maintenance would be in collaboration with JMU's central IT; HPC duties would be in collaboration with JMU's central IT and an external vendor. The JMU College of Science and Mathematics High Performance Computing cluster is a relatively new Linux cluster built on RHEL, managed by Bright Cluster Manager and leverages the SLURM workload manager. It consists of approximately 20 nodes, including both CPU and NVIDIA GPU nodes. Additionally, The CSM maintains enterprise computing infrastructure consisting of 20+ servers/devices. This includes various storage devices (TrueNAS, Synology), web servers and Windows file/print/terminal servers.

Apply for this position