Anshul Jindal & Martin Piercy

Aug 20, 2025 • World Congress 2025

Your Next AI Needs 10,000 GPUs. Now What?

Training large language models is a networking problem, not a compute problem. Learn how to keep thousands of GPUs from sitting idle.

#1about 2 minutes

Introduction to large-scale AI infrastructure challenges

An overview of the topics to be covered, from the progress of generative AI to the compute requirements for training and inference.

#2about 4 minutes

Understanding the fundamental shift to generative AI

Generative AI creates novel content, moving beyond prediction to unlock new use cases in coding, content creation, and customer experience.

#3about 6 minutes

Using NVIDIA NIMs and blueprints to deploy models

NVIDIA Inference Microservices (NIMs) and blueprints provide pre-packaged, optimized containers to quickly deploy models for tasks like retrieval-augmented generation (RAG).

#4about 4 minutes

An overview of the AI model development lifecycle

Building a production-ready model involves a multi-stage process including data curation, distributed training, alignment, optimized inference, and implementing guardrails.

#5about 6 minutes

Understanding parallelism techniques for distributed AI training

Training massive models requires splitting them across thousands of GPUs using tensor, pipeline, and data parallelism to manage compute and communication.

#6about 2 minutes

The scale of GPU compute for training and inference

Training large models like Llama requires millions of GPU hours, while inference for a single large model can demand a full multi-GPU server.

#7about 3 minutes

Key hardware and network design for AI infrastructure

Effective multi-node training depends on high-speed interconnects like NVLink and network architectures designed to minimize communication latency between GPUs.

#8about 3 minutes

Accessing global GPU capacity with DGX Cloud Lepton

NVIDIA's DGX Cloud Lepton is a marketplace connecting developers to a global network of cloud partners for scalable, on-demand GPU compute.

24 days ago

AI Software Engineer (m/f/d)

Sunhat
Köln, Germany

Remote

Senior

1 month ago

Senior Machine Learning Engineer (f/m/d)

MARKT-PILOT GmbH
Stuttgart, Germany

Remote

Senior

30 days ago

Senior Researcher for Generative AI

Dynatrace
Linz, Austria

Senior

The rise of general-purpose GPU computing

00:53 MIN

The rise of general-purpose GPU computing

Accelerating Python on GPUs

Accessing software, models, and training resources

20:32 MIN

Accessing software, models, and training resources

Accelerating Python on GPUs

The evolution of GPUs from graphics to AI computing

00:48 MIN

The evolution of GPUs from graphics to AI computing

Accelerating Python on GPUs

NVIDIA's platform for the end-to-end AI workflow

18:20 MIN

NVIDIA's platform for the end-to-end AI workflow

Trends, Challenges and Best Practices for AI at the Edge

Building a cloud architecture for large-scale ML

28:04 MIN

Building a cloud architecture for large-scale ML

Geometric deep learning for drug discovery

Highlighting impactful contributions and the rise of open models

07:36 MIN

Highlighting impactful contributions and the rise of open models

Open Source: The Engine of Innovation in the Digital Age

How GPUs evolved from graphics to AI powerhouses

01:11 MIN

How GPUs evolved from graphics to AI powerhouses

Accelerating Python on GPUs

The future of computing requires scaling out to data centers

24:21 MIN

The future of computing requires scaling out to data centers

Coffee with Developers - Stephen Jones - NVIDIA

Featured Partners

WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA

WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA

Ankit Patel

about a year ago • World Congress 2024

A Deep Dive on How To Leverage the NVIDIA GB200 for Ultra-Fast Training and Inference on Kubernetes

A Deep Dive on How To Leverage the NVIDIA GB200 for Ultra-Fast Training and Inference on Kubernetes

Kevin Klues

about 2 months ago • World Congress 2025

Efficient deployment and inference of GPU-accelerated LLMs

Efficient deployment and inference of GPU-accelerated LLMs

Adolf Hohl

about a year ago • World Congress 2024

Unveiling the Magic: Scaling Large Language Models to Serve Millions

Unveiling the Magic: Scaling Large Language Models to Serve Millions

Patrick Koss

about 2 months ago • World Congress 2025

How AI Models Get Smarter

How AI Models Get Smarter

Ankit Patel

about 3 months ago • World Congress 2025

AI Factories at Scale

AI Factories at Scale

Thomas Schmidt

about a year ago • World Congress 2024

Exploring LLMs across clouds

Exploring LLMs across clouds

Tomislav Tipurić

about 2 months ago • World Congress 2025

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Christian Liebel

about a year ago • World Congress 2024

From learning to earning

Jobs that call for the skills explored in this talk.

Senior AI Software Developer & Mentor

30 days ago

Senior AI Software Developer & Mentor

Dynatrace
Linz, Austria

Senior

Java

TypeScript

AI Frameworks

Agile Methodologies

Senior AI Software Engineer, GenAI Framework

today

Senior AI Software Engineer, GenAI Framework

Nvidia

Remote

Senior

API

Python

PyTorch

Senior Software Architect - Deep Learning and HPC Communications

today

Senior Software Architect - Deep Learning and HPC Communications

NVIDIA Corporation

Remote

Senior

C++

Linux

Node.js

PyTorch

+1

AI Developer / Generative AI Engineer

today

AI Developer / Generative AI Engineer

Cognizant

API

ETL

REST

Azure

Neo4j

+4

DevOps Engineer GPU Nvidia CUDA

today

DevOps Engineer GPU Nvidia CUDA

YER Talents GmbH

C++

GIT

CMake

Linux

DevOps

+3

AI Engineer / Developer - Generative AI

today

AI Engineer / Developer - Generative AI

Médiane Benelux

.NET

REST

Azure

DevOps

Python

+4

Cloud & Reliability Engineer - Fokus Machine Learning & Generative AI

today

Cloud & Reliability Engineer - Fokus Machine Learning & Generative AI

Barmer

Bash

Docker

Grafana

Terraform

Prometheus

+4

AI Engineer (Agentic Systems & Infrastructure)

today

AI Engineer (Agentic Systems & Infrastructure)

PDR.cloud GmbH

Remote

€50K

API

Azure

Python

+6

Machine Learning Engineer - Generative AI , ISE

today

Machine Learning Engineer - Generative AI , ISE

Apple

C

iOS

C++

Unix

PyTorch

+6