Staff Network Deployment Engineer, Lab

Crusoe's Inc

San Francisco, United States of America

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 234K

Job location

San Francisco, United States of America

Tech stack

Border Gateway Protocol

Big Data

BIOS

Cloud Computing

Computer Networks

Data Centers

Firmware

High-Level Architecture

Networking Hardware

Junos

Python

Laboratory Information Management Systems

Network Architecture

Routing

Packet Analyzer

Ansible

AI Infrastructure

Graphics Processing Unit (GPU)

Computer Networking Systems

Juniper

Deployment Automation

Job description

Crusoe Cloud is seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical implementation of our lab research and integration lab. This role is focused on the deployment, management, advanced diagnosis, maintenance, and repair of high-performance networks for GPU compute clusters. As we rapidly expand our footprint of high-performance compute (HPC) and GPU-based AI infrastructure, you will be the primary driver behind performing new product integration for network tools.

This is a technical, hands-on role that sits at the intersection of Network Engineering, Data Center Operations, and Project Management. The ideal candidate will be hands-on with network level troubleshooting and will work closely with data center operations, engineering, and vendors to support cutting-edge network infrastructure to support the latest NVIDIA and AMD GPUs. This position plays a critical role in supporting Crusoe's new product integration, research and experimentation efforts.

What You'll Be Working On:

Execute new product build-outs: Lead the end-to-end deployment of network infrastructure for our Crusoe lab.
Bridge Design and Reality: Take high-level designs from the Network Development team and translate them into implementation plans, cable maps, and configuration templates.
Validate new equipment: Perform rigorous "Burn-in" testing acceptance testing (SAT) for new network clusters.
Support optimizing deployment automation: Use Python, Ansible, and ZTP (Zero Touch Provisioning) to automate the staging and configuration of network devices.
Lab management: Work with remote hands, cabling vendors, and to ensure physical layer standards (fiber paths, power requirements, and cooling) meet Crusoe's stringent HPC requirements.
Manage a lab environment consisting of the latest GPU systems and networking devices.
Working closely with OEM for Networking and data center infrastructure components.
Perform diagnosis and troubleshooting of hardware faults for network systems.
Support networking for GPU platforms including NVIDIA A100, H200, GB200, B200, B300 and AMD 350X / 355X.
Execute component-level diagnosis and remediation for failed or degraded network hardware.
Partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for networking systems.
Perform firmware and BIOS upgrades.
Maintain detailed documentation of maintenance activities, failures, and resolutions in ticketing and asset management systems.
For lab operations, develop and update standard operating procedures (SOPs) for troubleshooting, repair, and validation workflows.
Collaborate with engineering, software, and data center operations teams to identify root causes of systemic failures and document remediations.
Participate in a rotating infrastructure on-call schedule (about one week every 4-6 weeks) with daytime coverage and handoff to the Europe team.

Requirements

8+ years of experience in network engineering with a heavy focus on large-scale data center deployments and infrastructure projects.
Mastery of Physical Layer Standards: Expert knowledge of structured cabling (SMF/MMF, MPO/MTP), optical transceivers (400G/800G), and data center power/cooling requirements.
Strong Routing and Switching Knowledge: Hands-on experience configuring Arista (EOS), Juniper (Junos), and NVIDIA/Mellanox platforms in a leaf-spine architecture.
Protocol Proficiency: Solid understanding of BGP, EVPN-VXLAN as they relate to large-scale fabric provisioning.
Automation-First Mindset: Proficiency in Python and Ansible for automating repetitive deployment tasks and validating configuration state.
Logistical Excellence: Proven ability to manage multiple complex projects simultaneously across different time zones and physical locations.
Troubleshooting Expertise: Ability to diagnose complex physical layer and link-layer issues using OTDRs, light meters, and packet captures.
Education: Bachelor's degree in a technical field or equivalent practical experience in hyperscale or ISP environments.

Benefits & conditions

Competitive compensation
Restricted Stock Units
Paid time off & paid holidays
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off, Compensation will be paid in the range of $193,000 - $234,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

About the company

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all