Garbage In, Garbage Out: Engineering Reliable AI Document Extraction Pipelines

About This Session

Your OCR problem isn't really an OCR problem. It’s everything that happens before and after it. Most AI pipelines don’t fail because the model is bad. They fail because we send it garbage. We throw blurry mobile scans, crooked receipts, and massive blocks of unstructured text at an LLM and wonder why it’s expensive, inconsistent, and hallucinatory. In this talk, you’ll see how to design an end-to-end document extraction pipeline that: Validates and improves image quality at capture time Extracts structured, high-context data instead of dumping raw text Sends lean, intentional payloads to LLMs so they’re cheaper and more predictable

Speaker

Nazeer Saeed

Staff Solutions Engineer · Apryse

Staff Solutions Engineer at Apryse

Read bio

With over a decade of experience in solution engineering, consulting, and digital transformation, I help organizations design scalable technology solutions that solve complex business challenges. At Apryse, I work with enterprises and development teams on document processing, OCR, SDK integrations, workflow automation, and AI-driven extraction pipelines. My focus is on building reliable, secure, and efficient systems that turn unstructured documents into actionable data. My background spans R&D in AI, knowledge management, and Industry 4.0 across Europe. As a speaker, I share practical engineering patterns that help developers build smarter, more predictable document AI solutions.