Director of Network Engineering
Role details
Job location
Tech stack
Job description
The Network Engineering Team is responsible for the design, development, and ongoing operation of all networking services that underpin both the internal management platform and the customer-facing cloud infrastructure; this includes internal transit, WAN connectivity and DC networking. They act as a 3/4th line escalation point for the support organisation.
Nscale is looking for a deeply experienced Director of Network Engineering to strategically lead this team, as our AI DC footprint develops.
What You'll be Doing
- Lead the delivery of network infrastructure programmes across Ethernet, InfiniBand, and WAN domains-owning scope, timelines, dependencies, and outcomes from design through delivery-and operations.
- Set technical direction and standards for high-performance Ethernet fabrics (VLANs, LACP, VxLAN, BGP, EVPN, EVPN, VxLAN) and ensure consistent implementation through DevOps practices, including network automation and CI/CD pipelines.
- Provide technical leadership for InfiniBand deployments: guiding architecture choices, configuration, performance tuning, and complex troubleshooting (Subnet Managers, QoS, routing, congestion control, and firmware management).
- Own prioritisation and execution planning for the network roadmap-balancing delivery commitments, operational risk, technical debt, and stakeholder requirements across multiple concurrent projects.
- Partner with deployment, DC operations, procurement, and vendors to validate BOM correctness, ensure designs are fit-for-purpose and delivered without surprises.
- Oversee data centre network integration, providing input into DC layout and rack elevations to ensure reference architectures are implemented correctly and consistently across sites.
- Drive operational excellence and incident reduction, leading root-case analysis for performance or stability issues across Ethernet/InfiniBand/WAN, and implementing preventative controls, runbooks, and measurable SLOs.
- Coordinate cross-team dependencies (DC operations, leveraging Python/Ansible-driven tooling for provisioning, configuration validation, compliance, knowledge sharing, and change management across multi-vendor environments).
- Manage team execution and technical development: aligning engineers to outcomes, unblocking delivery, raising capability through coaching/reviews, and ensuring clear ownership of technical areas and projects.
- Maintain strong practical knowledge of optics and hardware: guiding platform/transceiver/cabling decisions and supporting effective BOM and troubleshooting practices when needed.
Requirements
- Proven people-management experience: building, leading, and leading a technically strong network engineering function through coaching, mentoring, performance management, and clear ownership of outcomes. Able to evaluate strategy into execution, delivering results with credibility and strong technical depth.
- Strong technical depth in high-performance Ethernet fabrics, with hands-on experience in VLANs, VxLAN overlays, EVPN, production operations, and troubleshooting (BGP, LACP, QoS, multicast). Comfortable using Wireshark, CLI tooling, and supporting or guiding implementation across the team.
- Proven capability leading InfiniBand programmes for accelerated compute clusters (NVIDIA Quantum/QM/Mellanox gear). Comfortable with complex performance optimization (Subnet Managers, QoS, RDMA/RoCE, and fabric health).
- Able to articulate trade-offs and technical practices.
- Depth in at least one large-scale data centre network design (Clos/Spine-Leaf topologies, and SD-WAN), with responsibility for defining resilient, secure multi-site connectivity and phasing efficient vendor/protocol management.
- Nice to have: experience building and operating SONiC-based switches (eBPF, device validation, and day-2 operational tooling across multi-vendor environments, and articulating approaches to a delivery way of working).
- Nice to have: production InfiniBand operational experience (SR-IOV, multi-tenancy, isolation, compatibility validation, and practical troubleshooting across transceivers, cabling, and hardware constraints).
- Leadership in multi-team or project delivery with cross-functional dependencies (DC operations teams, leading incident response at senior escalation levels, driving SLA/problem management, and delivering measurable improvements in reliability, performance, and change success).
Benefits & conditions
- Highly competitive package (base + equity) with reviews every 12 months.
- Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI.
- Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support.
- Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Join our thriving remote-first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work.