Skip to content

Workshop

Accelerating AI Inference at Scale: A Deep Dive Into NVIDIA Dynamo on Kubernetes

with Anshul Jindal & Mohak Chadha

  • AI Models
  • Agentic AI
  • Distributed Systems
  • Docker
  • Infrastructure
  • NVIDIA

Free for All Attendees · Seats Limited

Workshops are included with your event ticket at no extra cost. Seats fill up fast — registration opens through the official event app approximately one week before the event. Follow app notifications to know the moment sign-ups go live.

Starts

Thu 9 Jul, 13:00

Ends

Thu 9 Jul, 15:00

About This Workshop

As foundation models move toward deeper test-time computation, inference becomes the dominant scaling constraint. Latency, throughput, and cost are governed by a small set of forces: autoregressive decoding, KV-cache growth, memory bandwidth, and scheduling under contention. This workshop frames large-scale inference through these emerging laws of inference, starting from first principles and building toward real systems. Learners deploy NVIDIA Dynamo on Kubernetes to operate aggregated and disaggregated inference architectures using built-in KV-aware routing and scheduling. The outcome is a principled understanding of where inference time and money go — and how architectural choices bend those curves in production. Participants will deploy both aggregated and disaggregated inference on a 4xA100 node and compare the performance of the two.

More to Explore

More Workshops

More hands-on sessions waiting — find the one that fits your stack.