Sr. Staff Software Engineer - HPC Network...

LinkedIn Corporation
Mountain View, United States of America
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 297K

Job location

Mountain View, United States of America

Tech stack

Artificial Intelligence
Systems Engineering
C++
Common Lisp Object Systems
Computer Clusters
Profiling
Computer Networks
Network Congestion
Linux
Distributed Systems
Ethernet
Network Interface Controllers
Python
Machine Learning
Network Architecture
Network Protocols
Performance Tuning
Remote Direct Memory Access
Systems Architecture
Data Processing
High Performance Computing
Large Language Models
Backend
Containerization
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Low Latency
Apache Flink
Kafka
Spark Streaming
Stream Processing
Data Pipelines
Go
Programming Languages
Microservices

Job description

We are seeking an HPC Network Engineer to design, deploy, and operate high-performance, low-latency Ethernet fabrics for large-scale GPU clusters. The role focuses on RoCE v2-based GPU interconnect networks supporting AI/ML training, inference, and HPC workloads. You will work closely with systems, GPU, platform, and software teams to build scalable, lossless Ethernet networks optimized for RDMA traffic.

As a Senior Staff Software Engineer, you will define long-term technical direction, lead cross-org initiatives, mentor senior engineers, and drive solutions for complex distributed systems challenges at massive scale. This role requires deep expertise in backend systems, data processing, and large-scale system design, with strong understanding of networking concepts.

Responsibilities:

  • Network architecture and design for large-scale LLM training and inference workloads.

  • Design RoCE v2-based GPU interconnection fabrics for multi-rack and multi-pod GPU clusters

  • Define lossless Ethernet architectures (Clos / fat-tree / leaf-spine) optimized for RDMA

  • Select and validate 400G / 800G Ethernet switching platforms and NICs (ConnectX, BlueField, etc.)

  • Deep expertise in host-level and Kubernetes pod networking architectures, including enablement of high-performance features such as RDMA and GPU Direct.

  • Experience in host network performance tuning for large-scale collective communications, balancing latency, throughput, and congestion control.

  • Analyze system performance and diagnose complex cross-layer issues., A request for an accommodation will be responded to within three business days. However, non-disability related requests, such as following up on an application, will not receive a response.

LinkedIn will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by LinkedIn, or (c) consistent with LinkedIn's legal duty to furnish information.

San Francisco Fair Chance Ordinance

Pursuant to the San Francisco Fair Chance Ordinance, LinkedIn will consider for employment qualified applicants with arrest and conviction records.

Pay Transparency Policy Statement

As a federal contractor, LinkedIn follows the Pay Transparency and non-discrimination provisions described at this link: https://lnkd.in/paytransparency.

Requirements

  • BA/BS Degree in Computer Science or related technical discipline, or equivalent practical experience

  • 10+ years of experience building and operating large-scale distributed systems or data-intensive backend platforms.

  • Experience in one or more programming languages such as Go, Python, C++, or similar.

  • Experience in Linux system engineering and host networking.

  • Demonstrated knowledge of network protocols, fabric design, and performance optimization.

  • Proven ability to lead complex technical initiatives end-to-end in a multi-team environment.

  • Experience with system design skills with focus on scalability, reliability, and performance.

  • Experience with container platforms (Kubernetes) and microservices.

Preferred Qualifications:

  • Experience supporting large-scale AI or HPC workloads.

  • Familiarity with LLM training frameworks and communication libraries (e.g., NCCL, MPI).

  • Experience with streaming systems (Kafka, Flink, Spark Streaming, or similar) and high-throughput data pipeline architectures.

  • Experience with performance benchmarking and profiling tools.

  • Experience with infrastructure automation or configuration management tools.

  • Demonstrated influence across organizations (tech lead, architect, principal/IC leadership roles).

Suggested Skills:

  • Distributed Systems

  • HPC Networking

  • Performance Optimization

  • Technical Leadership

Benefits & conditions

We strongly believe in the well-being of our employees and their families. That is why we offer generous health and wellness programs and time away for employees of all levels. LinkedIn is committed to fair and equitable compensation practices.

The pay range for this role is $181,000 to $297,000. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to skill set, depth of experience, certifications, and specific work location. This may be different in other locations due to differences in the cost of labor.

The total compensation package for this position may also include annual performance bonus, stock, benefits and/or other applicable incentive compensation plans. For more information, visit https://careers.linkedin.com/benefits.

Equal Opportunity Statement

We seek candidates with a wide range of perspectives and backgrounds and we are proud to be an equal opportunity employer. LinkedIn considers qualified applicants without regard to race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.

LinkedIn is committed to offering an inclusive and accessible experience for all job seekers, including individuals with disabilities. Our goal is to foster an inclusive and accessible workplace where everyone has the opportunity to be successful.

About the company

LinkedIn is the world's largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We're also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that's built on trust, care, inclusion, and fun - where everyone can succeed. Join us to transform the way the world works.

Apply for this position