Truth, Lies, and Probabilities: Testing AI Hallucinations

About This Session

Last year, while preparing a talk, I was looking for a research paper I had read in the past but couldn’t find anymore so I decided to ask AI for help. Instead of the right paper, it generated an impressive list of academic citations. They looked convincing, but when I checked, most of them didn’t exist. Although I knew already about hallucinations, the mathematician in me immediately wanted to understand why this happens so systematically, not just occasionally, leading me into an intensive investigation and research to deeply understand how these models operate. AI models like LLMs don’t “think” like humans, they generate outputs based on probabilities, producing the most statistically likely sequence of words. This means they can sound confident while being completely wrong. These moments, known as hallucinations, are inherent to how generative AI works and if left undetected, they can result in false information being delivered with absolute confidence. Unlike traditional software, where a defect is a deviation from expected results, hallucinations are an expected outcome of the model’s design. And that makes me think: How do we as testers detect, measure, and manage risks that are basically built into the system itself? In this session, we’ll explore hallucinations from an intuitive, mathematical perspective, without difficult or heavy formulas, so anyone can understand why they occur. Then, we’ll explore practical methods for evaluating AI outputs, since conventional testing approaches don´t apply here. You’ll understand how to test AI hallucinations, calculate the confidence and risk of AI outputs, and explain findings effectively. We’ll explore practical takeaways like testing on ground-truth data, using adversarial prompts, and verifying outputs through cross-validation with external sources. Although mathematically it's not possible to avoid hallucinations completely, these methods allow you to estimate the rate of occurrence and reduce their impact.

Speaker

Anastasia Simou

Quality Engineering Specialist · Accenture

Quality Engineering Specialist at Accenture

Read bio

Anastasia holds a diploma in Mathematics and works as a Quality Engineering Specialist at Accenture, where she leads the Test Data Management practice. With a background in mathematics and a passion for problem solving, she specializes in designing strategies that combine effective testing with data safety, security, and compliance. With a strong focus on data protection and a particular interest in AI security, her work focuses on protecting sensitive data in test environments and integrating security into quality engineering practices. She is passionate about helping teams build software that is not only functional, but secure by design.