GPU Cluster Architect - Data Center

Hamilton Barnes
Amsterdam, Netherlands
8 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
Amsterdam, Netherlands

Tech stack

Artificial Intelligence
Data Centers
Ethernet
InfiniBand
Python
Systems Architecture
Scripting (Bash/Python/Go/Ruby)
Graphics Processing Unit (GPU)
Large Language Models
Reliability of Systems
Kubernetes
Low Latency
Slurm

Job description

We're looking for a GPU Cluster Architect to lead the design and development of their next-generation AI infrastructure powering large-scale, GPU-accelerated workloads. In this hands-on role, you'll own architectural decisions across compute, networking, and storage, building platforms capable of supporting the scale, performance, and reliability demands of modern AI and ML systems.

You'll define how tens of thousands of GPUs are interconnected, powered, cooled, and optimized across multiple data center sites. Working alongside world-class engineering teams, you'll shape the backbone of one of the most advanced AI clouds in the world.

If you're passionate about designing ultra-scale systems, optimizing performance for LLM training and inference, and building the core infrastructure that powers AI innovation, this is your opportunity. Responsibilities

  • Architect scalable GPU cluster topologies spanning compute nodes, interconnects (InfiniBand, Ethernet), storage, and control planes
  • Model and analyze AI/ML workloads (LLM training, inference) to drive tradeoffs in latency, bandwidth, GPU density, and performance
  • Collaborate with network architects to design and validate low-latency, high-throughput interconnects (InfiniBand HDR/NDR, RoCEv2) at POD and data center scale
  • Integrate and optimize storage solutions to support training datasets, checkpointing, and high-performance I/O operations
  • Design for reliability, incorporating telemetry, automation, and monitoring to detect and resolve issues early
  • Partner with cross-functional teams including SRE, networking, storage, and data center engineering to operationalize your designs

Requirements

  • 5+ years of experience designing GPU or HPC clusters at scale
  • Deep understanding of modern GPU architectures (NVIDIA, AMD)
  • Expertise with HPC interconnects (InfiniBand, RoCE) and low-latency networking
  • Strong background in systems architecture, compute, and hardware reliability
  • Proficiency in scripting and automation (Python, Go)

Bonus

  • Experience with AI/ML workload optimization and performance modeling
  • Familiarity with large-scale data center design and cooling/power strategies
  • Exposure to orchestration systems (Kubernetes, Slurm) or telemetry frameworks

Benefits & conditions

  • Bonus scheme
  • Company shares
  • Flexible remote working

Salary

  • Up to €200,000 gross per year

#J-18808-Ljbffr Salarisomschrijving

€200000 - €200000 monthly

About the company

We are partnered with a fast-growing global technology organisation specialising in full-stack cloud infrastructure designed for the artificial intelligence era. Headquartered in Amsterdam and listed on Nasdaq, it builds and operates cutting-edge AI cloud platforms and large-scale GPU-powered data centres that enable developers, researchers and enterprises to train, deploy and scale AI workloads with unmatched performance and reliability. With a presence across Europe, North America and Israel, the business combines deep technical expertise with a mission to democratise access to advanced AI infrastructure, supporting innovation across sectors from life sciences to media and beyond.

Apply for this position