Nathaniel Okenwa
Performant Architecture for a Fast Gen AI User Experience
#1about 2 minutes
Building a real-time translator inspired by sci-fi
The Babel fish from "Hitchhiker's Guide to the Galaxy" serves as the inspiration for a real-time audio translation project.
#2about 4 minutes
Analyzing the latency of a basic AI architecture
A demonstration of the initial 2019 architecture using GCloud reveals a significant latency of over ten seconds for a simple translation.
#3about 2 minutes
Reducing latency by upgrading the AI service stack
Switching to modern, specialized APIs like Deepgram and 11 Labs significantly cuts the total processing time from twelve to five seconds.
#4about 2 minutes
Implementing streaming to reduce response wait times
Adopting a streaming approach provides a major performance boost, but a naive implementation results in chaotic and low-quality audio output.
#5about 2 minutes
Using chunking to balance streaming speed and quality
Chunking data based on sentence punctuation controls the streaming waterfall, improving the quality of generated audio without sacrificing speed.
#6about 6 minutes
Eliminating network latency with local and edge models
Running a smaller, local AI model like Whisper on the edge eliminates cross-continental network latency and provides near-instantaneous results.
#7about 3 minutes
Using caching to serve pre-generated AI responses
Implementing caching, from simple request matching to semantic search with vector databases, avoids redundant generation and speeds up common queries.
#8about 2 minutes
Optimizing prompts and user experience for speed
Fine-tuning performance involves optimizing prompts to generate fewer tokens and improving perceived speed with clear loading states for the user.
#9about 2 minutes
Summary of key performance optimization techniques
A final recap covers the essential strategies for building fast Gen AI experiences, including streaming, edge computing, caching, and prompt optimization.
Related jobs
Jobs that call for the skills explored in this talk.
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
Sunhat
Köln, Germany
Remote
€85-115K
Senior
Team Leadership
Software Architecture
+1
Matching moments
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
03:28 MIN
Why corporate AI adoption lags behind the hype
What 2025 Taught Us: A Year-End Special with Hung Lee
03:15 MIN
The future of recruiting beyond talent acquisition
What 2025 Taught Us: A Year-End Special with Hung Lee
04:27 MIN
Moving beyond headcount to solve business problems
What 2025 Taught Us: A Year-End Special with Hung Lee
03:48 MIN
Automating formal processes risks losing informal human value
What 2025 Taught Us: A Year-End Special with Hung Lee
02:44 MIN
Rapid-fire thoughts on the future of work
What 2025 Taught Us: A Year-End Special with Hung Lee
03:38 MIN
Balancing the trade-off between efficiency and resilience
What 2025 Taught Us: A Year-End Special with Hung Lee
03:39 MIN
Breaking down silos between HR, tech, and business
What 2025 Taught Us: A Year-End Special with Hung Lee
Featured Partners
Related Videos
Generate AI in the Browser with Chrome AI - Raymond Camden
Raymond Camden
Prompt API & WebNN: The AI Revolution Right in Your Browser
Christian Liebel
How AI Models Get Smarter
Ankit Patel
Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based
Maxim Salnikov
Chatbots are going to destroy infrastructures and your cloud bills
Stanislas Girard
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
Christian Liebel
Make it simple, using generative AI to accelerate learning
Duan Lightfoot
Supercharge your cloud-native applications with Generative AI
Cedric Clyburn
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning


Accenture
Charing Cross, United Kingdom
REST
React
GraphQL
React Native
Continuous Integration

Generative Ai Engineer83zero Limited
Glasgow, United Kingdom
£80-88K
GIT
Azure
NoSQL
React
+16

University of the Arts, London
Sleaford, United Kingdom
£34-41K
Python
PyTorch
TensorFlow


Descripción De La Vacante
€40-70K
Azure
Python
PyTorch
TensorFlow
+1

Speechify
Municipality of Madrid, Spain
Python
Kubernetes

autonomous-teaming
München, Germany
Remote
C++
GIT
Linux
Python
+1