Staff Network Deployment Engineer, Lab
Role details
Job location
Tech stack
Job description
Crusoe Cloud is seeking a high-energy, detail-oriented Staff Network Deployment Engineer to lead the physical and logical implementation of our lab research and integration lab. This role is focused on the deployment, management, advanced diagnosis, maintenance, and repair of high-performance networks for GPU compute clusters. As we rapidly expand our footprint of high-performance compute (HPC) and GPU-based AI infrastructure, you will be the primary driver behind performing new product integration for network tools.
This is a technical, hands-on role that sits at the intersection of Network Engineering, Data Center Operations, and Project Management. The ideal candidate will be hands-on with network level troubleshooting and will work closely with data center operations, engineering, and vendors to support cutting-edge network infrastructure to support the latest NVIDIA and AMD GPUs. This position plays a critical role in supporting Crusoe's new product integration, research and experimentation efforts.
What You'll Be Working On:
- Execute new product build-outs: Lead the end-to-end deployment of network infrastructure for our Crusoe lab.
- Bridge Design and Reality: Take high-level designs from the Network Development team and translate them into implementation plans, cable maps, and configuration templates.
- Validate new equipment: Perform rigorous "Burn-in" testing acceptance testing (SAT) for new network clusters.
- Support optimizing deployment automation: Use Python, Ansible, and ZTP (Zero Touch Provisioning) to automate the staging and configuration of network devices.
- Lab management: Work with remote hands, cabling vendors, and to ensure physical layer standards (fiber paths, power requirements, and cooling) meet Crusoe's stringent HPC requirements.
- Manage a lab environment consisting of the latest GPU systems and networking devices.
- Working closely with OEM for Networking and data center infrastructure components.
- Perform diagnosis and troubleshooting of hardware faults for network systems.
- Support networking for GPU platforms including NVIDIA A100, H200, GB200, B200, B300 and AMD 350X / 355X.
- Execute component-level diagnosis and remediation for failed or degraded network hardware.
- Partner with data center operations to manage and perform field-replaceable unit (FRU) repairs for networking systems.
- Perform firmware and BIOS upgrades.
- Maintain detailed documentation of maintenance activities, failures, and resolutions in ticketing and asset management systems.
- For lab operations, develop and update standard operating procedures (SOPs) for troubleshooting, repair, and validation workflows.
- Collaborate with engineering, software, and data center operations teams to identify root causes of systemic failures and document remediations.
- Participate in a rotating infrastructure on-call schedule (about one week every 4-6 weeks) with daytime coverage and handoff to the Europe team.
Requirements
- 8+ years of experience in network engineering with a heavy focus on large-scale data center deployments and infrastructure projects.
- Mastery of Physical Layer Standards: Expert knowledge of structured cabling (SMF/MMF, MPO/MTP), optical transceivers (400G/800G), and data center power/cooling requirements.
- Strong Routing and Switching Knowledge: Hands-on experience configuring Arista (EOS), Juniper (Junos), and NVIDIA/Mellanox platforms in a leaf-spine architecture.
- Protocol Proficiency: Solid understanding of BGP, EVPN-VXLAN as they relate to large-scale fabric provisioning.
- Automation-First Mindset: Proficiency in Python and Ansible for automating repetitive deployment tasks and validating configuration state.
- Logistical Excellence: Proven ability to manage multiple complex projects simultaneously across different time zones and physical locations.
- Troubleshooting Expertise: Ability to diagnose complex physical layer and link-layer issues using OTDRs, light meters, and packet captures.
- Education: Bachelor's degree in a technical field or equivalent practical experience in hyperscale or ISP environments.
Benefits & conditions
- Competitive compensation
- Restricted Stock Units
- Paid time off & paid holidays
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off, Compensation will be paid in the range of $193,000 - $234,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.