Context is Money: Slashing RAG Production Costs with Foundry IQ

About This Session

Building RAG prototypes is easy, but pushing data retrieval to production at scale gets expensive fast. Between messy document layouts flooding prompts with structural noise and unoptimized queries running redundant search loops, token consumption quickly spirals out of control. This session explores how to build a lean, centralized knowledge plane using Azure AI Foundry IQ. We will look at how to combine layout-aware ingestion (turning messy data into dense, high-fidelity Markdown) with the open-standard Model Context Protocol (MCP) to serve your applications from a single, optimized data endpoint. Through live architectural breakdowns and practical demos, you will see how to leverage server-side semantic caching, enforce identity-based security boundaries, and monitor real-time token telemetry. Walk away with a scalable blueprint to slash your system’s token overhead by up to 50% while maximizing retrieval performance.

Speaker

Kiran Panchal

AI Solution Engineer · Microsoft

AI Solution Engineer at Microsoft

Read bio

Kiran is an AI & APPS Solution Engineer at Microsoft, working with enterprise customers, especially in the automotive industry EMEA to design and scale AI-driven applications. She focuses on translating complex business challenges into practical, production-ready solutions using Microsoft Foundry, AI Services , frameworks and modern agent architectures. With deep hands-on experience across architecture design, workshops, and real-world deployments, she operates at the intersection of business and engineering bridging the gap between strategy and execution without losing pace on either side. When she isn’t debugging runtime telemetry or configuring Foundry Agents, Kiran channels her focus into martial arts, music, and canvas art. She tackles rogue bugs and heavy enterprise scale with the same grit, curiosity, and zero-sugarcoating mindset she used to navigate moving across the globe from India to Germany.