Skip to content

Workshop

Faster Together: Train and Deploy a Speculative Decoding Model for Low-Latency LLM Inference

with Amit Kushwaha

  • AI Models
  • Tokenomics

Free for All Attendees · Seats Limited

Workshops are included with your event ticket at no extra cost. Seats fill up fast — registration opens through the official event app approximately one week before the event. Follow app notifications to know the moment sign-ups go live.

Starts

Thu 9 Jul, 15:30

Ends

Thu 9 Jul, 17:30

About This Workshop

Dive deep into theory and practice of low-latency inference by deploying NVIDIA TensorRT-LLM with advanced speculative decoding techniques. You'll train an Eagle-3 draft head to propose candidate tokens efficiently, serve it, and benchmark it using AIPerf to quantify how these strategies minimize latency.

More to Explore

More Workshops

More hands-on sessions waiting — find the one that fits your stack.