Cloud & AI Infrastructure

Accelerating AI Inference at Scale: A Deep Dive Into NVIDIA Dynamo on Kubernetes

with Anshul Jindal & Mohak Chadha

Thursday 9 July 15:00 – 17:00 Room M2 (40 Seats)

About This Session

As foundation models move toward deeper test-time computation, inference becomes the dominant scaling constraint. Latency, throughput, and cost are governed by a small set of forces: autoregressive decoding, KV-cache growth, memory bandwidth, and scheduling under contention. This workshop frames large-scale inference through these emerging laws of inference, starting from first principles and building toward real systems. Learners deploy NVIDIA Dynamo on Kubernetes to operate aggregated and disaggregated inference architectures using built-in KV-aware routing and scheduling. The outcome is a principled understanding of where inference time and money go — and how architectural choices bend those curves in production. Participants will deploy both aggregated and disaggregated inference on a 4xA100 node and compare the performance of the two.

Topics

AI Models
Agentic AI
Distributed Systems
Docker
Infrastructure
NVIDIA

← Back to Schedule

Speakers

Anshul Jindal

Senior Solution Architect · NVIDIA

Senior Solution Architect at NVIDIA

Read bio Hide bio

Anshul is a Sr. Solution Architect at NVIDIA's DGX Cloud team, he specializes in assisting customers with deploying their workloads at scale. Anshul has a strong background in SRE and has extensive experience in managing production-grade Kubernetes clusters across various Cloud Service Providers (CSPs). He has received Ph.D. in computer science from TU Munich, graduating summa cum laude.
Mohak Chadha

Solution Architect · NVIDIA

Solution Architect at NVIDIA

Read bio Hide bio

Mohak Chadha is a Solutions Architect at NVIDIA, based in Munich, where he focuses on designing solutions for enterprise AI applications. Before this, he was at Firebolt, a unicorn startup, where he managed their cloud infrastructure and worked on improving security. With a background in distributed systems, Mohak specializes in cloud computing, parallel computing, and high-performance computing. He shares his expertise in distributed systems by presenting at major developer conferences, including KubeCon + CloudNativeCon and the Open Source Summit.