Workshop

Accelerating AI Inference at Scale: A Deep Dive Into NVIDIA Dynamo on Kubernetes

with Anshul Jindal & Mohak Chadha

AI Models
Agentic AI
Distributed Systems
Docker
Infrastructure
NVIDIA

Free for All Attendees · Seats Limited

Workshops are included with your event ticket at no extra cost. Seats fill up fast — registration opens through the official event app approximately one week before the event. Follow app notifications to know the moment sign-ups go live.

Get Your Event Ticket

Starts

Thu 9 Jul, 13:00

Ends

Thu 9 Jul, 15:00

About This Workshop

As foundation models move toward deeper test-time computation, inference becomes the dominant scaling constraint. Latency, throughput, and cost are governed by a small set of forces: autoregressive decoding, KV-cache growth, memory bandwidth, and scheduling under contention. This workshop frames large-scale inference through these emerging laws of inference, starting from first principles and building toward real systems. Learners deploy NVIDIA Dynamo on Kubernetes to operate aggregated and disaggregated inference architectures using built-in KV-aware routing and scheduling. The outcome is a principled understanding of where inference time and money go — and how architectural choices bend those curves in production. Participants will deploy both aggregated and disaggregated inference on a 4xA100 node and compare the performance of the two.

Your Speakers

Anshul Jindal

Senior Solution Architect · NVIDIA

Senior Solution Architect at NVIDIA

Read bio Hide bio

Anshul is a Sr. Solution Architect at NVIDIA's DGX Cloud team, he specializes in assisting customers with deploying their workloads at scale. Anshul has a strong background in SRE and has extensive experience in managing production-grade Kubernetes clusters across various Cloud Service Providers (CSPs). He has received Ph.D. in computer science from TU Munich, graduating summa cum laude.
Mohak Chadha

Solution Architect · NVIDIA

Solution Architect at NVIDIA

Read bio Hide bio

Mohak Chadha is a Solutions Architect at NVIDIA, based in Munich, where he focuses on designing solutions for enterprise AI applications. Before this, he was at Firebolt, a unicorn startup, where he managed their cloud infrastructure and worked on improving security. With a background in distributed systems, Mohak specializes in cloud computing, parallel computing, and high-performance computing. He shares his expertise in distributed systems by presenting at major developer conferences, including KubeCon + CloudNativeCon and the Open Source Summit.

More to Explore

More Workshops

More hands-on sessions waiting — find the one that fits your stack.

Accelerating AI Inference at Scale: A Deep Dive Into NVIDIA Dynamo on Kubernetes

About This Workshop

More Workshops

Prompt Engineering Hands-on

Teaching Kubernetes Security in your Cluster

The Decisions Developers Make Without Noticing – And How to Make Them Better

Rust on Robots: Hands-on Embedded Rust on STM32

From COBOL to Java: How Developers Transition 60 Years of Legacy into Modern Java Services

Agents of Football: Build AI Agents That Compete in Live Football Matches

What do we need to deliver high quality products?

Deep Dive: Mastering Agents, MCP and other hypes

From messy addresses to production-ready data: Build a location enrichment pipeline on AWS with HERE

Beyond the Thor’s Hammer: Pragmatic Agentic AI with Caching, Reuse, and Cost Guardrails

Engineering Customer Journey Analytics at Scale: Lessons from Germany's Largest Banking Platform

Trust code you didn't write: From code review to confidence

Getting started with Hexagonal Architecture

Build an agentic full-stack tabletop game master application

MCP is all you need to make an AI-agent consume your RESTful API

Build a Production-Ready AI Agent in 90 Minutes

The Art and Science behind evaluating AI Agents at scale

How to Pitch Innovation to Your CEO

How does a Java agent work? Building a Java agent from scratch.

Adopting GitOps for microservices delivery via Argo CD

Build a data-intensive dashboard (that actually works)

Ideate & Strategize: Defining Your Football for Good Prototype

Zero to Binary: Building a Production-Ready AI Agent in Go

From Vector Search to Better Understanding: How Hybrid RAG Improves Answers, Not Just Matches

Build a Multi-Agent Marathon Planner with ADK and A2A

Teaching AI to Code in Every Language with NVIDIA NeMo

Defending the Modern Supply Chain: Hands-On Vulnerability Remediation

Hands-on AMQP with LavinMQ: Decoupling Services with Message Queues

You shall (not) pass!? An Introduction to Testing authenftication

Building Modern Distributed Systems using Less AI Tokens

Give your agent a wallet: Build an AI agent with stablecoin payment capabilities

Building a Better Tomorrow: Tips and Tricks for Docker Builds

HR Workshop - Vibe:athon: Where HR Goes to Build 1/4

GitHub Copilot: From Zero to Hero

Building AI apps with the Google ecosystem

The Bright Data Build-Off

Building & scaling custom serverless AI

The best SDLC is the one you build yourself: Why orchestration changes everything

HR Workshop - Vibe:athon: Where HR Goes to Build 2/4

Level Up Your Automation with GitHub Agentic Workflows

Developing Crash-Proof Java Applications

Faster Together: Train and Deploy a Speculative Decoding Model for Low-Latency LLM Inference

Managing Sovereign AI Infrastructure: MLOps and LLMOps in Highly Regulated Banking IT

Spec-Driven Development with Agentic Skills

Ducks, Sensors & Agents: Hands-On Edge AI with Arduino UNO Q

Shall we play a Game? LLM Security in Practice

HR Workshop - The Human Advantage: Storytelling in the Age of AI

AI That Acts: Orchestrating Agents in Modern Developer Workflows (securely)

Agents at Scale: Multi-Agent Architecture with A2A Protocol on Agent Runtime and ADK Integration

Always-On Autonomous AI Agents: Exploring the OpenClaw Abstraction

Vibe³ Cross-Platform Apps with Lynx: A TikTok Hackathon

Exploring Server Side Rendering

From Hallucination to Justification: Hands-On Explainability for LLMs

Bridging LLMs and Systems: Practical Automation with MCP Tools and Function Calls

HR Workshop - Vibe:athon: Where HR Goes to Build 3/4

Teaching GitHub Copilot COBOL: A Practical Guide to Agentic AI Legacy Modernization

Vibe Coding with Postgres: From Zero to Prod in Your IDE

Compress, Cut, and Distill: The Latest Gen AI Model Compression Techniques in Practice

How to mess up JWT's - a practitioner's guide

Create Your Own Role-Playing Game with Agentic AI using Strands Agents

Build Agents That Can Pay with x402: From Your Laptop to a Live Network

Let agents buy your API: Build a payment-gated service with x402

GenAI in Testing: Using GitHub Copilot to Accelerate Quality Without Losing Trust

HR Workshop - The Human Advantage: Storytelling in the Age of AI

Building Multi-Agent AI Systems with MAF: From Copilot to Orchestrated Agents

Agents That Own Their Inference: Building Production AI Agents on Dedicated GPUs

Generate Synthetic Data for Physical AI with NVIDIA Cosmos World Foundation Models

Hack Me, Bro: An Antifragile AI Battle Arena

Build a Multi-Channel AI Agent

Never say refactoring is impossible

HR Workshop - Vibe:athon: Where HR Goes to Build 4/4

Context > Models: How to make your agents truly intelligent