Skip to content

AI Engineering

Garbage In, Garbage Out: Engineering Reliable AI Document Extraction Pipelines

with Nazeer Saeed

Friday 10 July 16:20 – 16:50 Stage 8 - powered by Red Hat

About This Session

Your OCR problem isn't really an OCR problem. It’s everything that happens before and after it. Most AI pipelines don’t fail because the model is bad. They fail because we send it garbage. We throw blurry mobile scans, crooked receipts, and massive blocks of unstructured text at an LLM and wonder why it’s expensive, inconsistent, and hallucinatory. In this talk, you’ll see how to design an end-to-end document extraction pipeline that: Validates and improves image quality at capture time Extracts structured, high-context data instead of dumping raw text Sends lean, intentional payloads to LLMs so they’re cheaper and more predictable

Topics

  • Data Pipelines
  • Generative AI (GenAI)
  • Large Language Models (LLMs)
  • System Design
  • Workflow Automation