Lee Boonstra
Raise your voice!
#1about 1 minute
Building a custom voice AI with WebRTC and Google APIs
An overview of the architecture for streaming voice from a browser to a backend for processing with conversational AI.
#2about 4 minutes
Comparing custom voice AI to public assistants
A custom voice AI provides more control over technical requirements and terms of service compared to public platforms like Google Assistant or Alexa.
#3about 1 minute
Handling short versus long user utterances
Public assistants are optimized for short commands, whereas custom AI for use cases like contact centers must be designed to handle long, complex user stories.
#4about 3 minutes
Demo of a voice-enabled self-service kiosk
A demonstration of a web-based airport kiosk that answers user questions spoken in different languages using a custom voice AI.
#5about 1 minute
The core challenge of integrating voice technologies
The main difficulty in building a voice AI is not using individual APIs, but integrating the entire pipeline from frontend audio stream to backend processing.
#6about 3 minutes
Capturing cross-browser microphone audio with RecordRTC
The RecordRTC library is used to abstract away browser inconsistencies and reliably capture microphone audio streams for processing.
#7about 2 minutes
Streaming audio to the backend with Socket.IO
Socket.IO and the socket.io-stream module enable real-time, bidirectional streaming of binary audio data from the browser to a Node.js backend.
#8about 3 minutes
Transcribing audio with the Speech-to-Text API
Google's Speech-to-Text API converts the incoming audio stream into text using a streaming recognition call that handles data as it arrives.
#9about 4 minutes
Understanding user intent with Dialogflow
Dialogflow uses natural language understanding to match transcribed user text to predefined intents, entities, and knowledge bases to determine the user's goal.
#10about 4 minutes
Adding multi-language support with the Translate API
The Translate API enables multi-language support by translating foreign language input to English for Dialogflow processing and then translating the response back.
#11about 3 minutes
Generating audio responses with Text-to-Speech
The Text-to-Speech API synthesizes a natural-sounding voice from the text response, which is then sent back to the browser as an audio buffer to be played.
#12about 1 minute
Deployment considerations and open source code
Deploying a voice application requires HTTPS for microphone access, which can be easily configured using services like App Engine Flex, and the full project code is available on GitHub.
Related jobs
Jobs that call for the skills explored in this talk.
Wilken GmbH
Ulm, Germany
Senior
Kubernetes
AI Frameworks
+3
Matching moments
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
01:32 MIN
Organizing a developer conference for 15,000 attendees
Cat Herding with Lions and Tigers - Christian Heilmann
03:15 MIN
The future of recruiting beyond talent acquisition
What 2025 Taught Us: A Year-End Special with Hung Lee
03:28 MIN
Why corporate AI adoption lags behind the hype
What 2025 Taught Us: A Year-End Special with Hung Lee
03:48 MIN
Automating formal processes risks losing informal human value
What 2025 Taught Us: A Year-End Special with Hung Lee
04:57 MIN
Shifting from formal corporate speak to an authentic voice
Leveraging Leaders’ Voices: The Business Power of Personal Branding
02:44 MIN
Rapid-fire thoughts on the future of work
What 2025 Taught Us: A Year-End Special with Hung Lee
03:13 MIN
How AI can create more human moments in HR
The Future of HR Lies in AND – Not in OR
Featured Partners
Related Videos
Creating bots with Dialogflow CX
Xavier Portilla Edo
Minimal infrastructure for Real‑Time Phone Agents: transcripts in, responses out
Chris Heilmann, Daniel Cranney, Marius Obert & Staff Developer Evangelist at Twilio
WeAreDevelopers LIVE – AI vs the Web & AI in Browsers
Chris Heilmann, Daniel Cranney & Raymond Camden
WeAreDevelopers LIVE – Real-Time Phone Agents, Unsafe VPNs & More
Chris Heilmann, Daniel Cranney & Marius Obert
OpenAI for FinTech: Building a Stock Market Advisor Chatbot
Akmal Chaudhri
From Syntax to Singularity: AI’s Impact on Developer Roles
Anna Fritsch-Weninger
Integrate your Cognitive Assistant with 3rd-party DBs and software
Felix Augenstein
From ML to LLM: On-device AI in the Browser
Nico Martin
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.


Capitole
Barcelona, Spain
Remote
C++
Python
PyTorch
TensorFlow
+3


Capitole
Municipality of Valencia, Spain
Remote
C++
Python
PyTorch
TensorFlow
+3


Capitole
Santa Cruz de Tenerife, Spain
Remote
C++
Python
PyTorch
TensorFlow
+3

MANGO
Palau-solità i Plegamans, Spain
API
Azure
Redis
Node.js
Salesforce
+6


Capitole
Municipality of Vigo, Spain
Remote
C++
Python
PyTorch
TensorFlow
+3