HPC Storage Engineer

Susquehanna International Group, LLP
Philadelphia, United States of America
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Philadelphia, United States of America

Tech stack

Artificial Intelligence
Big Data
C++
Data Transmissions
ETL
File Systems
Distributed Systems
General Parallel File Systems
Linux System Administration
Performance Tuning
Ceph
Data Storage Management
Storage Technologies
Slurm

Job description

We are looking for an experienced HPC Storage Engineer to design, implement, and optimize the storage and data movement infrastructure that underpins our high-performance computing (HPC) environment. This role focuses on distributed and parallel filesystems, storage systems, and large-scale data movement, ensuring reliable, high-throughput access to data for compute-intensive workloads.

You will work closely with HPC platform engineers, compute and networking teams, and application users to deliver scalable, performant, and resilient storage solutions that tightly integrate the storage layer with compute nodes.

What you'll do

  • Design, deploy, and operate HPC storage systems and parallel/distributed filesystems (e.g., Lustre, GPFS/IBM Spectrum Scale, BeeGFS, Ceph)
  • Own data movement workflows across environments, including data ingest, replication, tiering, and archiving
  • Optimize filesystem and storage performance for large-scale parallel workloads
  • Design and tune load-balancing strategies across storage targets, metadata services, and data movement pipelines to ensure even utilization, high throughput, and predictable performance at scale
  • Troubleshoot storage, I/O, and data movement issues across HPC compute clusters
  • Develop and maintain automation for storage provisioning, monitoring, and lifecycle management
  • Partner with compute and networking teams to ensure end-to-end performance and reliability
  • Advise users and application teams on best practices for I/O patterns, data layout, and performance tuning
  • Evaluate and integrate new storage technologies and architectures as requirements evolve

Requirements

  • Hands-on experience with parallel or distributed filesystems in production environments
  • Strong understanding of Linux systems administration
  • Experience with high-performance I/O, data locality, and throughput optimization
  • Proficiency in large-scale distributed systems development, preferably in C++
  • Proven ability to troubleshoot complex performance and reliability issues across storage and compute stacks
  • Experience with data transfer and movement tools
  • Familiarity with object storage and hierarchical storage management (HSM)
  • Experience integrating storage with HPC schedulers (e.g., Slurm) and compute workflows
  • Background supporting scientific, ML/AI, or other data-intensive workloads

About the company

Susquehanna is a global quantitative trading firm powered by scientific rigor, curiosity, and innovation. Our culture is intellectually driven and highly collaborative, bringing together researchers, engineers, and traders to design and deploy impactful strategies in our systematic trading environment. To meet the unique challenges of global markets, Susquehanna applies machine learning and advanced quantitative research to vast datasets in order to uncover actionable insights and build effective strategies. By uniting deep market expertise with cutting-edge technology, we excel in solving complex problems and pushing boundaries together.

Apply for this position