Staff Engineer High Performance Computing

Pfizer Inc.
Groton, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 207K

Job location

Groton, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Computing Platforms
Big Data
Cloud Computing
Cloud Engineering
Continuous Integration
DevOps
Distributed Systems
Monitoring of Systems
Systems Analysis
Job Scheduling
Linux System Administration
Open Source Technology
Prometheus
Software Deployment
Data Logging
Google Cloud Platform
High Performance Computing
Grafana
Parallel Computation
Cloudformation
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Cloudwatch
Terraform
Software Version Control

Job description

This role will lead development and operationalize cloud-based HPC infrastructure required for research, modeling, and large-scale data processing across multiple cloud environments.

  • Serve as a primary technical expert; evaluate, advocate for, and drive consensus among senior managers and engineers for the go-forward technology platforms and toolkits used for HPC service delivery.
  • Collaborate with stakeholders, users, and leaders to develop a long-term technical roadmap for cloud-based HPC services.
  • Lead deep-dive discussions with technical partners at major cloud providers, defining HPC-related requirements and deliverables for Statements of Work.
  • Drive a culture of shared ownership, transparency, and engineering excellence through mentoring, coaching, and example setting.
  • Perform troubleshooting, system analysis, and benchmarking to manage escalated, difficult to resolve issues and maintain a high-performance environment.

HPC Platform Architecture and Engineering

  • Design and own robust and dependable high-throughput, parallel, low-latency infrastructure for HPC and ML/AI workloads in multiple cloud environments (AWS/GCP).
  • Establish technical standards, best practices, architectural frameworks, and implementation guidelines for reproducible HPC platform and application deployments.
  • Recommend cutting-edge HPC technologies including specialized accelerators, novel storage solutions, managed services, and open-source toolkits that will be integrated into the platform.
  • Own OS image development, job scheduler configuration, high performance storage systems
  • Ensure high performance, reliability, scalability, cost efficiency, and security.

Automation and DevOps

  • Drive adoption of infrastructure automation using IaC tools like Terraform and CloudFormation.
  • Establish, promote, and enforce internal standards (naming, tagging, documentation, version control, and change procedures) to ensure repeatable environment provisioning and scaling.
  • Establish infrastructure lifecycle management procedures, from provisioning to operations, support, updating, and teardown of production computing platforms.

Monitoring and Reliability

  • Determine KPIs to guide monitoring, logging, and alerting strategies for the infrastructure.
  • Collaborate with stakeholders, users, and senior managers to develop meaningful user-facing dashboards, drive resource management, cost efficiency, and workload optimization.
  • Design workflows, alerting systems and utilities to improve observability, user, or administrator experiences.

Requirements

  • B.S. in computer science, life science, data science or similar fields with 6+ years of experience in cloud infrastructure engineering.
  • A proven track record of developing and supporting robust HPC frameworks in a cloud environment.
  • Expert level experience with at least one of AWS and GCP, including knowledge of core compute and storage services relevant to HPC.
  • Deep understanding of modern CI/CD practices, observability and monitoring of cloud-based HPC infrastructure.
  • Strong knowledge of distributed systems and production system reliability.
  • Familiarity with monitoring and observability frameworks (CloudWatch, Prometheus, Grafana, etc.)
  • Solid understanding of cloud networking, identity, security controls, and core services., * M.S. in computer science, life science, data science or similar fields.
  • 10-15 years experience in HPC/Cloud engineering
  • Expertise with distributed computing environments, especially EKS/GKE/Kubernetes
  • Deep experience with HPC environments, job schedulers, and NVIDIA GPU compute.
  • Prior experience with HPC deployment utilities including AWS ParallelCluster and Parallel Computing Services, and Google Cloud Cluster Toolkit
  • Familiarity with other aspects of managing HPC services in a cloud environment: cloud financial models, cost optimization, user support services, application delivery, Linux administration, job scheduling, resource optimization.

Candidate demonstrates a breadth of diverse leadership experiences and capabilities including: the ability to influence and collaborate with peers, develop and coach others, oversee and guide the work of other colleagues to achieve meaningful outcomes and create business impact.

NON-STANDARD WORK SCHEDULE, TRAVEL OR ENVIRONMENT REQUIREMENTS

Benefits & conditions

The annual base salary for this position ranges from $124,400.00 to $207,400.00. In addition, this position is eligible for participation in Pfizer's Global Performance Plan with a bonus target of 17.5% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life's moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site - U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. This role is posted in multiple locations. If you are applying for the role in an secondary job posting location where pay transparency regulations apply, your Talent Advisor will share the local pay information with you during the first interview.

Relocation assistance may be available based on business needs and/or eligibility.

Candidates must be authorized to be employed in the U.S. by any employer.

U.S. work visa sponsorship (such as TN, O-1, H-1B, etc.) is not available for this role now or in the future.

Sunshine Act

Pfizer reports payments and other transfers of value to health care providers as required by federal and state transparency laws and implementing regulations. These laws and regulations require Pfizer to provide government agencies with information such as a health care provider's name, address and the type of payments or other value received, generally for public disclosure. Subject to further legal review and statutory or regulatory clarification, which Pfizer intends to pursue, reimbursement of recruiting expenses for licensed physicians may constitute a reportable transfer of value under the federal transparency law commonly known as the Sunshine Act. Therefore, if you are a licensed physician who incurs recruiting expenses as a result of interviewing with Pfizer that we pay or reimburse, your name, address and the amount of payments made currently will be reported to the government. If you have questions regarding this matter, please do not hesitate to contact your Talent Acquisition representative.

About the company

Pfizer is committed to the application of computational science in the areas of drug discovery and development and has recently initiated a large-scale migration of computational infrastructure to cloud. This role provides technical vision and will drive the execution of high-performance computing (HPC) solutions that support computational workloads across the organization.

Apply for this position