Director, HPC Infrastructure Engineering

Guardant Health Inc.

Palo Alto, United States of America

25 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 317K

Job location

Palo Alto, United States of America

Tech stack

Amazon Web Services (AWS)

IBM System I

Azure

Big Data

Unix

Cloud Computing

File Systems

General Parallel File Systems

Python

Network Protocols

Shell Script

Software Construction

TCP/IP

Virtualization Technology

Graphics Processing Unit (GPU)

Google Cloud Platform

High Performance Computing

Containerization

Kubernetes

Information Technology

Slurm

Docker

Job description

Oversee and manage the HPC environment - compute, storage, network, physical infrastructure, and software - serving multiple Production and Development clusters
Integrate HPC systems with on-prem and cloud-based systems and data sources as required
Administer multiple HPC clusters and associated cluster file systems
Research, design, and implement next-generation HPC solutions
Diagnose and resolve production system stack issues, leveraging software utilities down to the source code level (e.g., shell scripts, Python, etc.)
Maintain and monitor infrastructure and facilities to ensure operational stability
Drive continuous improvement initiatives to enhance reliability and performance as workloads and data volumes scale
Ensure control, integrity, and accessibility across systems and applications serving multiple concurrent users
Provide operational oversight for systems at remote and international locations
Collaborate with offsite consultants to sustain and optimize infrastructure performance
Partner with vendors to procure, troubleshoot, upgrade, repair, and replace systems as needed
Foster a culture of continuous engineering improvement through design and architecture review, mentoring, feedback, and development and monitoring of key performance metrics
Hire, coach, and mentor individuals; build a strong cross-functional organization
Partner with a diverse customer base to understand requirements, priorities, and processes
Propose and implement new projects or recommend system improvements
Observe Quality standards appropriate for an FDA governed and CLIA/CAP compliant diagnostic laboratory
Manage budgets to balance refresh of obsolete equipment and software, scaling to support company growth, utilizing fixed headcount and contractor/consulting resources
Participate in a 24/7 on-call rotation

Requirements

B.S. in Computer Science or related technical field or equivalent experience
10 years' experience with high performance computing platforms, preferably organizations handling large volumes of sequenced genomic data, within a commercial enterprise
Experience with software-defined Infrastructure and cloud computing - Google Cloud Platform, Amazon Web Service (AWS) etc
GPUs and Petabyte scale Storage platforms management experience
Design, deployment, support and troubleshooting experience, in a complex computing environment
HPC Engineering team management experience (either directly or in a matrixed environment)
4+ years of networking experience with certification of CCNA or better
4+ years of Linux/Unix system administration, knowledge of Unix network protocols, TCP/IP, core infrastructure technologies and virtualization
2+ years of large-scale data storage and compute clusters (HPC) infrastructure
2+ years working in and with on-premise and cloud-based (AWS, Google, IBM and Azure) data-centers
2+ years of building software release and ops processes and automation toolset
2+ years providing documentation of system administration

Preferred:

Proficiency with Arista and compatible networking , up to and including 400 Gb/s links
Hands-on administration of IBM's General Parallel File System
Operational oversight of Slurm scheduler
Working knowledge of cloud bursting technologies
Familiarity with wide area file systems
Practical expertise in Docker and container technologies
Working experience with Kubernetes
Operation of infrastructure compliant with HIPAA and SOX standards

Success Profile:

Excels in agile, high-velocity technical environments.
Demonstrates self-leadership and a commitment to advancing both individual and team expertise.
Combines engineering rigor with pragmatic adaptability.Successfully manages operational SLAs while leading initiatives critical to business growth.

Hybrid Work Model: This section is applicable to onsite employees who are eligible for hybrid work location as specified by management and related policies. Guardant has defined days for in-person/onsite collaboration and work-from-home days for individual-focused time. All U.S. employees who live within 50 miles of a Guardant facility will be required to be onsite on Mondays, Tuesdays, and Thursdays. We have found aligning our scheduled in-office days allows our teams to do the best work and creates the focused thinking time our innovative work requires. At Guardant, our work model has created flexibility for better work-life balance while keeping teams connected to advance our science for our patients.

The annualized base salary ranges for the primary location and any additional locations are listed below. This range does not include benefits or, if applicable, bonus, commission, or equity. Each candidate's compensation offer will be based on multiple factors including, but not limited to, geography, experience, education, job-related skills, job duties, and business need. Primary Location: Palo Alto, CA Primary Location Base Pay Range: $230,200 - $316,600 Other US Location(s) Base Pay Range: $195,700 - $269,100 If the role is performed in Colorado, the pay range for this job is: $207,200 - $284,950

Employee may be required to lift routine office supplies and use office equipment. Majority of the work is performed in a desk/office environment; however, there may be exposure to high noise levels, fumes, and biohazard material in the laboratory environment. Ability to sit for extended periods of time., A background screening including criminal history is required for this role. GH will consider qualified applicants with criminal arrest or conviction histories in a manner consistent with applicable law including but not limited to the LA County Fair Chance Policies and the Fair Chance Act (Gov. Code Section 12952).

About the company

Guardant Health is a leading precision oncology company focused on guarding wellness and giving every person more time free from cancer. Founded in 2012, Guardant is transforming patient care and accelerating new cancer therapies by providing critical insights into what drives disease through its advanced blood and tissue tests, real-world data and AI analytics. Guardant tests help improve outcomes across all stages of care, including screening to find cancer early, monitoring for recurrence in early-stage cancer, and treatment selection for patients with advanced cancer. For more information, visit guardanthealth.com and follow the company on LinkedIn , X (Twitter) and Facebook . Guardant Health's High-Performance Computing team (HPC) builds and operates the computational technology infrastructure backbone of the company. This includes scalable data storage that holds petabytes of genomics data, high performance compute clusters running a custom bioinformatics pipeline in production and R&D environments, and the software infrastructure that hosts an ecosystem of services for internal data processing and external data integration. To facilitate Guardant Health's fast growth in the next few years, the HPC team is seeking a strong technical engineering leader who can help maintain and grow the HPC infrastructure during its expansion, while partnering with other engineering functions (Corporate IT, SQA and DevOps/SRE ) as well as the R&D user community and Lab Operations. This is a hands-on technical leadership position that will leverage your expertise in HPC environments, as well as your experience leading and managing a team.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all