The Limits of Prompting: ArchitectingTrustworthy Coding Agents
Prompt engineering has its limits. Learn how a multi-agent architecture, enriched with deep context, boosted our AI agent's suggestion acceptance rate from 12% to over 60%.
#1about 2 minutes
Prototyping a basic AI code review agent
A simple prototype using a GitHub webhook and a single LLM call reveals the potential for understanding code semantics beyond static analysis.
#2about 2 minutes
Iteratively improving prompts to handle edge cases
Simple prompts fail to consider developer comments or model knowledge cutoffs, requiring more detailed instructions to improve accuracy.
#3about 5 minutes
Establishing a robust benchmarking process for agents
A reliable benchmarking pipeline uses a large dataset, concurrent execution, and an LLM-as-a-judge (LLJ) to measure and track performance improvements.
#4about 2 minutes
Decomposing large tasks into specialized agents
To combat inconsistency and hallucinations, a single large task like code review is broken down into multiple smaller, specialized agents.
#5about 6 minutes
Leveraging codebase context for deeper insights
Moving beyond prompts, providing codebase context via vector similarity (RAG) and module dependency graphs (AST) unlocks high-quality, human-like feedback.
#6about 3 minutes
Introducing Awesome Reviewers for community standards
Awesome Reviewers is a collection of prompts derived from open-source projects that can be used to enforce team-specific coding standards.
#7about 1 minute
Key takeaways for building reliable LLM agents
The path to a reliable agent involves starting with a proof-of-concept, benchmarking rigorously, using prompt engineering for quick fixes, and investing in deep context.
Related jobs
Jobs that call for the skills explored in this talk.
The Web We Broke (And Why AI Agents Are Paying the Price) - AgentCon BerlinThis is the accompanying post to the talk Chris Heilmann gave at AgentCon in Berlin on 19/05/2026, you can also see the slides and listen to it in this screencast:
Thirty years of developer shortcuts, bloated JavaScript, and inaccessible HTML have l...
Panel Discussion: Responsible AI in Practice - Real-World Examples and ChallengesIntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...
Daniel Cranney
Dev Digest 210: AI Agents Are Go! Is MCP Dead? LLMs Crack AnonymityInside last week’s Dev Digest 210 .
🪦 Is MCP already dead?
🐍 Secure snake on the CLI
🏗️ The architecture behind open source LLMs
⚖️ AI companies and governments at odds
🦫 Is Go the best language for AI agents?
🕵️ “Security research” bot hacks Micros...
From learning to earning
Jobs that call for the skills explored in this talk.