HPC Systems Engineer

The University of Chicago
Chicago, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 110K

Job location

Chicago, United States of America

Tech stack

Systems Engineering
Build Automation
Computer Clusters
Program Optimization
Computer Networks
Linux
Distributed File Systems
File Systems
Distributed Computing Environment
Perl
General Parallel File Systems
Python
Performance Tuning
Queue Management Systems
Ansible
Scientific Computating
Shell Script
Subsystems
Network Switches
Scripting (Bash/Python/Go/Ruby)
Network Storage
GIT
Information Technology
Performance Monitor
Free and Open-Source Software
Slurm
Puppet
Docker

Job description

Installs, configures, and maintains large computer clusters/servers and software. Day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components. Management of the system's network switch, parallel file system and HPC software stack and tools. Configuration of the scheduling and queuing system. Diagnosing and resolving system operational problems quickly and effectively. Coordinating with vendors to resolve hardware and software problems. Assist users with access and other help desk ticket requests or issues. Building and deploying open source software and software from vendors/partners. Providing reliable and efficient backups/restores for all managed systems. Maintaining and monitoring the security of the HPC systems and servers. Documenting system administration procedures for routine and complex tasks. Technical Environment: Linux build automation in a large, distributed computing environment with Puppet/Ansible/Git/Docker; scripting with Python/Shell/Perl. Experience must include: Implementing automation and monitoring using shell scripting; install, configure and maintain job management tools (SLURM/Moab/ TORQUE/PBS); operating systems deployment with XCAT/ROCKS; configure administer and support network storage subsystems (IBM/NetAppl/Data Direct Network/LSI); distributed file systems (GPFS/Lustre/Gluster); configure, install tuning and maintain scientific app software, performance monitoring and optimization tools

Requirements

Bachelor's degree in Computer Science, Electronics Engineering or closely related field plus 5 years of progressive experience in scientific computing required.

Required: 5 years: Linux build automation in a large, distributed computing environment with Puppet/Ansible/Git/ Docker; scripting with Python/Shell/Perl.

Experience must include: Implementing automation and monitoring using shell scripting; install, configure and maintain job management tools (SLURM/Moab/ TORQUE/PBS); operating systems deployment with XCAT/ROCKS; configure administer and support network storage subsystems (IBM/NetAppl/Data Direct Network/LSI); distributed file systems (GPFS/Lustre/Gluster); configure, install, tune and maintain: scientific app software, performance monitoring and optimization tools.

Benefits & conditions

The University of Chicago offers a wide range of benefits programs and resources for eligible employees, including health, retirement, and paid time off.

Pay Rate Type Salary

Pay Range $95,930.00 - $110,000.00

The included pay rate or range represents the University's good faith estimate of the possible compensation offer for this role at the time of posting.

Scheduled Weekly Hours 37.5

About the company

The University of Chicago Research Computing Center (RCC), a unit in the Office of Research, provides high-end research computing resources to researchers at the University of Chicago. It is dedicated to enabling research by providing access to centrally managed High-Performance Computing (HPC), storage, and visualization resources. These resources include hardware, software, high-level scientific and technical user support, and the education and training required to help researchers make full use of modern HPC technology and local and national supercomputing resources. The Office of Research oversees the conduct of sponsored research, research program development, and contract management functions.

Apply for this position