Workshop

Agents That Own Their Inference: Building Production AI Agents on Dedicated GPUs

with Duan Lightfoot

AI Models
Agents
Generative AI (GenAI)
Infrastructure
Large Language Models (LLMs)
Llama
LLMOps
Ollama
Small Language Models (SLMs)

Free for All Attendees · Seats Limited

Workshops are included with your event ticket at no extra cost. Seats fill up fast — registration opens through the official event app approximately one week before the event. Follow app notifications to know the moment sign-ups go live.

Get Your Event Ticket

Starts

Fri 10 Jul, 14:45

Ends

Fri 10 Jul, 16:45

About This Workshop

Every production agent today is renting its intelligence. You're paying per token, sending your customer's data to someone else's servers, and hoping the provider doesn't rate-limit you during your launch. For most teams, that's fine. But for a growing number of teams in regulated industries, with high-volume products, latency-sensitive workloads, or rising token bills, it's starting to look like a liability. In this 120-minute hands-on workshop you'll get a dedicated GPU and build an agent that runs on infrastructure you control. You'll stand up vLLM, point your agent at it, and drive concurrent load through the stack until you can see batching, KV cache pressure, and throughput limits in the metrics. Then you'll optimize the deployment to improve throughput while keeping per-request latency in line. The focus isn't agent frameworks. It's the inference layer underneath them. You'll leave with working code and a real understanding of continuous batching under real concurrency, KV cache tradeoffs, vLLM's metrics, and the bottlenecks that only show up when you operate the inference server yourself.

Your Speaker

Duan Lightfoot

Senior Developer Advocate · Akamai

Senior Developer Advocate at Akamai

Read bio Hide bio

Du’An Lightfoot is a Sr. AI Engineer at Akamai, where he specializes in advancing agentic AI systems and helping organizations leverage AI to solve infrastructure challenges and increase developer productivity. Drawing from his diverse background as a USAF veteran and technical experience at Cisco and Cerner, Du’An brings a unique perspective on implementing AI solutions in enterprise environments. He has delivered presentations at major industry events including AWS Re:Invent, We Are Developers, Code With Claude, and the AI Engineer World Fair, focusing on practical AI applications that transform how technical teams work. Through his hands-on experience with Agentic AI technologies and dedication to mentoring professionals through AI transformation, Du’An provides actionable insights and concrete strategies for developers to embrace AI-powered workflows and become productivity champions within their organizations.

More to Explore

More Workshops

More hands-on sessions waiting — find the one that fits your stack.

Agents That Own Their Inference: Building Production AI Agents on Dedicated GPUs

About This Workshop

More Workshops

Prompt Engineering Hands-on

Teaching Kubernetes Security in your Cluster

The Decisions Developers Make Without Noticing – And How to Make Them Better

Rust on Robots: Hands-on Embedded Rust on STM32

From COBOL to Java: How Developers Transition 60 Years of Legacy into Modern Java Services

Agents of Football: Build AI Agents That Compete in Live Football Matches

What do we need to deliver high quality products?

Deep Dive: Mastering Agents, MCP and other hypes

From messy addresses to production-ready data: Build a location enrichment pipeline on AWS with HERE

Beyond the Thor’s Hammer: Pragmatic Agentic AI with Caching, Reuse, and Cost Guardrails

Engineering Customer Journey Analytics at Scale: Lessons from Germany's Largest Banking Platform

Trust code you didn't write: From code review to confidence

Getting started with Hexagonal Architecture

Build an agentic full-stack tabletop game master application

MCP is all you need to make an AI-agent consume your RESTful API

Build a Production-Ready AI Agent in 90 Minutes

The Art and Science behind evaluating AI Agents at scale

How to Pitch Innovation to Your CEO

How does a Java agent work? Building a Java agent from scratch.

Adopting GitOps for microservices delivery via Argo CD

Build a data-intensive dashboard (that actually works)

Ideate & Strategize: Defining Your Football for Good Prototype

Zero to Binary: Building a Production-Ready AI Agent in Go

From Vector Search to Better Understanding: How Hybrid RAG Improves Answers, Not Just Matches

Build a Multi-Agent Marathon Planner with ADK and A2A

Teaching AI to Code in Every Language with NVIDIA NeMo

Defending the Modern Supply Chain: Hands-On Vulnerability Remediation

Hands-on AMQP with LavinMQ: Decoupling Services with Message Queues

You shall (not) pass!? An Introduction to Testing authenftication

Building Modern Distributed Systems using Less AI Tokens

Building the Agentic Economy with x402 and Stablecoins

Building a Better Tomorrow: Tips and Tricks for Docker Builds

HR Workshop - Vibe:athon: Where HR Goes to Build 1/4

GitHub Copilot: From Zero to Hero

Building AI apps with the Google ecosystem

Accelerating AI Inference at Scale: A Deep Dive Into NVIDIA Dynamo on Kubernetes

The Bright Data Build-Off | AI writes the UI. Bright Data brings the data. Win on what you build

Building & scaling custom serverless AI

The best SDLC is the one you build yourself: Why orchestration changes everything

HR Workshop - Vibe:athon: Where HR Goes to Build 2/4

Level Up Your Automation with GitHub Agentic Workflows

Developing Crash-Proof Java Applications

Faster Together: Train and Deploy a Speculative Decoding Model for Low-Latency LLM Inference

Managing Sovereign AI Infrastructure: MLOps and LLMOps in Highly Regulated Banking IT

Spec-Driven Development with Agentic Skills

Ducks, Sensors & Agents: Hands-On Edge AI with Arduino UNO Q

Shall we play a Game? LLM Security in Practice

HR Workshop - The Human Advantage: Storytelling in the Age of AI

AI That Acts: Orchestrating Agents in Modern Developer Workflows (securely)

Agents at Scale: Multi-Agent Architecture with A2A Protocol on Agent Runtime and ADK Integration

Always-On Autonomous AI Agents: Exploring the OpenClaw Abstraction

Vibe³ Cross-Platform Apps with Lynx: A TikTok Hackathon

Exploring Server Side Rendering

From Hallucination to Justification: Hands-On Explainability for LLMs

Bridging LLMs and Systems: Practical Automation with MCP Tools and Function Calls

HR Workshop - Vibe:athon: Where HR Goes to Build 3/4

Teaching GitHub Copilot COBOL: A Practical Guide to Agentic AI Legacy Modernization

Vibe Coding with Postgres: From Zero to Prod in Your IDE

Compress, Cut, and Distill: The Latest Gen AI Model Compression Techniques in Practice

How to mess up JWT's - a practitioner's guide

Create Your Own Role-Playing Game with Agentic AI using Strands Agents

Build Agents That Can Pay with x402: From Your Laptop to a Live Network

Building the Agentic Economy with x402 and Stablecoins

GenAI in Testing: Using GitHub Copilot to Accelerate Quality Without Losing Trust

HR Workshop - The Human Advantage: Storytelling in the Age of AI

Building Multi-Agent AI Systems with MAF: From Copilot to Orchestrated Agents

Generate Synthetic Data for Physical AI with NVIDIA Cosmos World Foundation Models

Hack Me, Bro: An Antifragile AI Battle Arena

Build a Multi-Channel AI Agent

Never say refactoring is impossible

HR Workshop - Vibe:athon: Where HR Goes to Build 4/4

Context > Models: How to make your agents truly intelligent