Skip to content

Cloud & AI Infrastructure

Accelerating AI Inference at Scale: A Deep Dive Into NVIDIA Dynamo on Kubernetes

with Anshul Jindal & Mohak Chadha

Thursday 9 July 15:00 – 17:00 Room M2 (40 Seats)

About This Session

As foundation models move toward deeper test-time computation, inference becomes the dominant scaling constraint. Latency, throughput, and cost are governed by a small set of forces: autoregressive decoding, KV-cache growth, memory bandwidth, and scheduling under contention. This workshop frames large-scale inference through these emerging laws of inference, starting from first principles and building toward real systems. Learners deploy NVIDIA Dynamo on Kubernetes to operate aggregated and disaggregated inference architectures using built-in KV-aware routing and scheduling. The outcome is a principled understanding of where inference time and money go — and how architectural choices bend those curves in production. Participants will deploy both aggregated and disaggregated inference on a 4xA100 node and compare the performance of the two.

Topics

  • AI Models
  • Agentic AI
  • Distributed Systems
  • Docker
  • Infrastructure
  • NVIDIA