Principal Systems Software Engineer
Role details
Job location
Tech stack
Job description
As the Principal Systems Architect, you will serve as the visionary lead for Crusoe's next-generation AI infrastructure. This is a role for an industry-recognized expert who has already "seen the movie" at hyperscale and is ready to redefine the I/O path for the age of generative AI. You aren't just building a cloud; you are designing the fluid fabric that unifies Bare-Metal-as-a-Service (BMaaS), Intelligent IaaS, and Elastic CaaS into a single, high-performance pool of intelligence.
In this position, you will bridge the gap between silicon and software, advising executive leadership on critical hardware/software co-design pivots while remaining hands-on enough to lead elite R&D teams in shipping production-grade kernel and orchestration code. We are looking for a master of the I/O path who can push massive-scale training workloads to the theoretical limits of hardware. This is a full-time position.
What You'll Be Working On:
- Unifying Infrastructure Pillars:
- Bare-Metal-as-a-Service (BMaaS): Architect systems that deliver raw GPU throughput via zero-latency InfiniBand/RDMA fabrics for massive-scale training.
- Intelligent IaaS: Design highly optimized, thin virtualization layers using KVM or custom micro-VMs to provide enterprise-grade isolation without the "virtualization tax."
- Elastic CaaS: Build a high-performance container substrate (utilizing Kubernetes or Slurm) that allows AI workloads to burst and scale across heterogeneous GPU nodes.
- Mastering the I/O Path: Lead the architectural design of our internal cloud fabric, drawing on experience from top-tier hyperscalers to drive the technical roadmap for SR-IOV, RDMA, and virtualized GPU scheduling.
- Advanced R&D Leadership: Lead elite workstreams to prototype and productionize novel methods for managing memory, networking, and compute that don't yet exist in standard cloud distributions.
- Technical Strategy & Documentation: Draft white papers and RFCs that define the next two years of Crusoe's compute and networking stack.
- High-Level Debugging: Work alongside Staff and Senior engineers to resolve complex race conditions in the I/O path and optimize kernel-level memory pinning for GPU clusters.
- Industry Influence: Represent Crusoe in open-source communities and industry forums to influence the global direction of cloud-native AI infrastructure., * Patent Holder: Possession of patents related to network virtualization, GPU scheduling, or distributed file systems.
- Open Source Leadership: Maintainer status or significant contributions to the Linux Kernel, Kubernetes, or specialized HPC projects.
- AI/ML Workload Expertise: Direct experience optimizing infrastructure for Large Language Model (LLM) training and inference at scale.
Requirements
- Hyperscale Provenance: 12+ years of experience designing and shipping core infrastructure at a major hyperscaler (e.g., OCI, AWS, Azure, Google Cloud Platform) or a specialized HPC cloud.
- Deep Systems Authority: Authoritative knowledge of the Linux kernel, virtualization internals (KVM, QEMU, Firecracker), and high-performance networking (RoCE v2, InfiniBand).
- Hardware-Software Co-Design: Proven ability to design software that maximizes the performance of NVIDIA/AMD GPUs and high-speed NICs.
- R&D Leadership: Experience leading cross-functional teams through high-ambiguity projects and delivering production-ready, mission-critical systems.
- Industry Contributions: A portfolio of significant contributions to the field, which may include patents, major open-source contributions, or published research in distributed systems.
- Communication Mastery: The rare ability to explain the nuances of memory-mapped I/O to an engineer and the business value of a new fabric architecture to the Board.
- Mandatory Education: A Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related analytical field (or equivalent professional experience).
Benefits & conditions
- Competitive compensation
- Restricted Stock Units
- Paid time off & paid holidays
- Comprehensive health, dental & vision insurance
- Employer contributions to HSA account
- Paid parental leave
- Paid life insurance, short-term and long-term disability
- Professional development & tuition reimbursement
- Mental health & wellness support
- Commuter benefits (parking & transit)
- Cell phone stipend
- 401(k) Retirement plan with company match up to 4% of salary
- Volunteer time off
Compensation Range
$260,000 - $340,000 + Significant Equity & Bonus. Compensation is determined by the applicant's depth of expertise, previous impact at scale, and alignment with our architectural goals.