High Performance Computing Engineer
Role details
Job location
Tech stack
Job description
As a High-Performance Computing (HPC) Engineer, you will support the implementation, operation, and lifecycle management of secure and scalable HPC environments that enable computational research across HMS. Working as part of the Research Computing Infrastructure team, you will contribute to the provisioning and administration of compute clusters, workload scheduling systems such as Slurm, user-facing software environments, and secure platforms that meet institutional compliance needs. This role emphasizes hands-on technical execution, operational reliability, and collaboration with colleagues and researchers to support evolving HPC workflows and infrastructure.
Core Duties:
- Perform provisioning, configuration, and decommissioning of HPC compute clusters.
- Support the administration and tuning of workload schedulers (e.g., Slurm) to ensure efficient job management and cluster utilization.
- Help maintain secure, regulated compute environments (e.g., NIST 800-171).
- Contribute to the integration of user accounts and identity management with institutional systems.
- Maintain and optimize user-facing software environments, including module systems and containerized applications.
- Support development and maintenance of scripts, automation, and tools used in cluster operations.
- Monitor system health, respond to alerts, and assist with compliance reporting and documentation.
- Collaborate with team members and researchers to troubleshoot and improve the computing environment.
- Contribute to operational documentation and support knowledge-sharing across the team.
- Participate in off-hours on-call rotation.
- Perform other duties as assigned.
Requirements
- Minimum of two years' post-secondary education or relevant work experience., * Bachelor's degree preferred.
- Experience managing Linux-based systems in a research or academic environment.
- Familiarity with workload schedulers (Slurm preferred), cluster provisioning, or performance tuning.
- Experience with infrastructure monitoring, configuration management (e.g., Ansible), and containerization (e.g., Apptainer/Singularity, Docker).
- Understanding of security and compliance frameworks relevant to research computing.
- Strong troubleshooting, communication, and collaboration skills.
- Ability to work in a team-oriented environment and adapt to evolving priorities.
- Demonstrated service orientation and commitment to operational reliability.
- Willingness to learn and grow technical depth in HPC tools and methodologies.
- Effective time management and documentation habits.
Certificates and Licenses:
- Completion of Harvard IT Academy specified foundational courses (or external equivalent) preferred.
Benefits & conditions
This position is salary grade level 057. Please visit Harvard's Salary Ranges to view the corresponding salary range and related information., Harvard offers a comprehensive benefits package that is designed to support a healthy work-life balance and your physical, mental and financial wellbeing. Because here, you are what matters. Our benefits include, but are not limited to:
- Generous paid time off including parental leave
- Medical, dental, and vision health insurance coverage starting on day one
- Retirement plans with university contributions
- Wellbeing and mental health resources
- Support for families and caregivers
- Professional development opportunities including tuition assistance and reimbursement
- Commuter benefits, discounts and campus perks