Agents That Own Their Inference: Building Production AI Agents on Dedicated GPUs

About This Session

Every production agent today is renting its intelligence. You're paying per token, sending your customer's data to someone else's servers, and hoping the provider doesn't rate-limit you during your launch. For most teams, that's fine. But for a growing number of teams in regulated industries, with high-volume products, latency-sensitive workloads, or rising token bills, it's starting to look like a liability. In this 120-minute hands-on workshop you'll get a dedicated GPU and build an agent that runs on infrastructure you control. You'll stand up vLLM, point your agent at it, and drive concurrent load through the stack until you can see batching, KV cache pressure, and throughput limits in the metrics. Then you'll optimize the deployment to improve throughput while keeping per-request latency in line. The focus isn't agent frameworks. It's the inference layer underneath them. You'll leave with working code and a real understanding of continuous batching under real concurrency, KV cache tradeoffs, vLLM's metrics, and the bottlenecks that only show up when you operate the inference server yourself.

Speaker

Duan Lightfoot

Senior Developer Advocate · Akamai

Senior Developer Advocate at Akamai

Read bio

Du’An Lightfoot is a Sr. AI Engineer at Akamai, where he specializes in advancing agentic AI systems and helping organizations leverage AI to solve infrastructure challenges and increase developer productivity. Drawing from his diverse background as a USAF veteran and technical experience at Cisco and Cerner, Du’An brings a unique perspective on implementing AI solutions in enterprise environments. He has delivered presentations at major industry events including AWS Re:Invent, We Are Developers, Code With Claude, and the AI Engineer World Fair, focusing on practical AI applications that transform how technical teams work. Through his hands-on experience with Agentic AI technologies and dedication to mentoring professionals through AI transformation, Du’An provides actionable insights and concrete strategies for developers to embrace AI-powered workflows and become productivity champions within their organizations.