Staff Hardware Systems Engineer in San Francisco

Energy Jobline
San Francisco, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 253K

Job location

San Francisco, United States of America

Tech stack

Board Bringup
Artificial Intelligence
Data analysis
Systems Engineering
Computer Engineering
Software Debugging
Firmware
Hardware Design
InfiniBand
Python
PCI Express
Cloud Services
System Testing
Systems Integration
High Performance Computing
Hardware Testing
Data Analytics
Nvme

Job description

We are seeking a Hardware Production / Sustaining Engineer to strengthen Crusoe's Hardware Systems Engineering team and close critical skill gaps in debugging, validation, and production support of high-performance compute systems. In this role, you will take ownership of the full hardware lifecycle-from prototype bring-up to large-scale production-while driving automation, deep issue resolution, and reliability across Crusoe Cloud's GPU- and CPU-based infrastructure.

You will work closely with cross-functional teams to support, debug, and improve hardware platforms at scale, with a particular focus on PCIe, InfiniBand, and NVMe/storage, which have been identified as essential areas for deeper expertise. Your work will directly impact Crusoe's ability to deploy and operate sustainable, AI-first compute systems with world-class performance and reliability.

What You'll Be Working On:

  • Drive the full hardware development and sustaining lifecycle, including feasibility, bring-up, validation, deployment, and ongoing production support.
  • Develop and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability improvements.
  • Lead deep troubleshooting and debugging across:
  • PCIe (link training, topology, performance issues)
  • InfiniBand (fabric debugging, throughput, connectivity issues)
  • NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)
  • Conduct rigorous system validation and characterization for GPU, CPU, and high-performance compute platforms.
  • Support E2E integration and solution testing to ensure Crusoe Cloud products meet performance, reliability, and scalability expectations.
  • Collaborate with mechanical, thermal, firmware, software, and manufacturing teams to resolve system-level issues and enable stable production operation.
  • Drive prototyping, qualification, and readiness for high-volume manufacturing with both internal teams and external vendors.
  • Identify opportunities for new hardware technologies, testing methods, and sustainability improvements aligned with Crusoe's long-term objectives.
  • Provide data-driven insights to influence Crusoe's hardware roadmap and reliability strategy.

Requirements

  • 8-10+ years of experience in hardware development, validation, sustaining engineering, or production engineering.
  • Strong hands-on expertise in PCIe, InfiniBand, and NVMe/storage debugging and development.
  • Deep proficiency in hardware bring-up, board-level debugging, and system-level validation.
  • Ability to design and implement automation frameworks for hardware testing (Python, Shell, or similar).
  • Technical background in digital and analog design, server architecture, and high-performance compute hardware.
  • Experience working across thermal, mechanical, firmware, and software functions in multidisciplinary environments.
  • Strong analytical and problem-solving skills with a data-driven approach.
  • Excellent communication and collaboration skills for working with internal teams and external partners.
  • Bachelor's or Master's degree in Electrical Engineering, Computer Engineering, or equivalent experience.

Bonus Points:

  • Experience designing or optimizing GPU-to-GPU communication architectures for AI/ML workloads.
  • Direct experience integrating NVLink or other next- GPU interconnect technologies.
  • Familiarity with cutting-edge GPU architectures and how to leverage them in AI/HPC environments.
  • Expertise supporting or designing systems across both ARM and x86 server architectures.
  • Background in sustainable or energy-efficient hardware design practices.
  • Advanced certifications or coursework in AI/HPC hardware systems.

Benefits & conditions

  • Competitive compensation
  • Restricted Stock Units
  • Paid time off & paid holidays
  • Comprehensive health, dental & vision insurance
  • Employer contributions to HSA account
  • Paid parental leave
  • Paid life insurance, short-term and long-term
  • Professional development & tuition reimbursement
  • Mental health & wellness support
  • Commuter benefits (parking & transit)
  • Cell phone stipend
  • 401(k) Retirement plan with company match up to 4% of salary
  • Volunteer time off, Compensation will be paid in the range of $208,000 - $253,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

About the company

Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide. We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers. Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

Apply for this position