Kubernetes Platform Engineer

Hewlett-Packard Enterprise

Bloomington, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

$ 212K

Job location

Bloomington, United States of America

Tech stack

API

Artificial Intelligence

Automation of Tests

DevOps

Distributed Systems

Remote Direct Memory Access

Cloud Platform System

Large Language Models

Kubernetes

TensorRT

Nim (Programming Language)

Webhooks

Microservices

Job description

This role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from an HPE office., We are seeking a Kubernetes Platform Engineer (High-Performance Networking) to lead Kubernetes-native, RDMA-class networking for distributed AI inference platforms on HPC clusters. You will own the end-to-end technical design that allows Kubernetes-orchestrated inference workloads (NVIDIA NIMs, vLLM, TensorRT-LLM) to transparently consume high-speed fabrics (e.g., HPE Slingshot/CXI) using Operators, DRA, CDI, Multus/secondary CNI, and Kubernetes networking abstractions-without container rebuilds, privileged pods, or manual tuning. This role is central to transforming a traditionally HPC-centric fabric into a first-class Kubernetes resource, aligned with modern AI Factory and inference-as-a-service deployment models.

Make HPC fabric capabilities consumable from standard containers Design the mechanisms to expose RDMA-capable NIC resources and required runtime components without baking the fabric into images, including mounting/injecting host user-space libraries (e.g., libcxi + libfabric) in a controlled, supportable way.

Define the reference design and implement for Kubernetes-native RDMA enablement across:

Dynamic Resource Allocation (DRA)
Container Device Interface (CDI)
Multus + secondary CNIs
Operator-driven lifecycle management

Own API and CRD design (ResourceClaims, DeviceClasses, custom CRDs) with long-term compatibility guarantees.

Make and defend architectural tradeoffs between:
Device plugins vs DRA
CDI vs runtime hooks vs admission webhooks
Shared vs exclusive NIC models
Performance vs operability vs isolation

Kubernetes Operator Ownership

Define how distributed inference patterns (KV-cache movement, prefill/decode separation) map onto Kubernetes primitives.

Ensure out-of-the-box compatibility with:

NVIDIA NIMs and the NIM Operator
KServe ServingRuntime / InferenceService
GPU Operator (CDI mode)
Publish deployment patterns and validated manifests for inference workloads using RDMA fast paths.

Requirements

Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Security-First Mindset, Solutions Design, Testing & Automation, User Experience (UX)

Benefits & conditions

"The expected salary/wage range for this position is provided below. Actual offer may vary from this range based upon geographic location, work experience, education/training, and/or skill level.

United States of America: Annual Salary USD 111,500 - 211,500 in Colorado // 106,000 - 243,000 in Minnesota & Texas The listed salary range reflects base salary. Variable incentives may also be offered."

About the company

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today's complex world. Our culture thrives on finding new and better ways to accelerate what's next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE., HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all