Tobias Münch

Is the web ready for voice user interfaces?

The Web Speech API is not ready for production. Its accuracy is like a coin flip and it has critical privacy flaws.

Is the web ready for voice user interfaces?
#1about 3 minutes

Why voice user interfaces are important for accessibility

Voice interfaces can significantly improve web accessibility for users with disabilities and provide hands-free convenience for mobile professionals.

#2about 1 minute

Understanding the Web Speech API's core functions

The Web Speech API is a W3C standard divided into speech recognition for converting voice to text and speech synthesis for converting text to voice.

#3about 2 minutes

Reviewing VUI research and its current limitations

Research projects like the Conversational Web and a wheelchair VUI demonstrate potential but suffer from inconsistent accuracy, online-only functionality, and lack of wake words.

#4about 3 minutes

How to implement the Web Speech API in JavaScript

Learn the step-by-step process of implementing speech recognition, including loading the class, configuring grammar with JSGF, starting the listener, and processing the results.

#5about 2 minutes

Navigating the Web Speech API's result data structure

The API returns a nested data structure containing a list of results, each with alternatives that include the text transcript and a confidence score.

#6about 3 minutes

Key challenges limiting Web Speech API adoption

The API's adoption is hindered by significant issues including poor developer experience, privacy risks from cloud processing, no offline support, and inconsistent browser implementations.

#7about 3 minutes

A look inside the browser's implementation of speech recognition

An analysis of the Chromium source code reveals how the Web Speech API is implemented through layers that manage and dispatch recognition tasks to either remote cloud services or local OS-dependent engines.

#8about 5 minutes

The future of VUIs with Stanford's React Genie

Stanford's React Genie project offers a new paradigm by loosely coupling a voice agent with React state, allowing for complex voice commands that can manipulate off-screen content and application logic.

#9about 1 minute

Final verdict on the web's readiness for voice UIs

While the current Web Speech API is suitable for experimentation, it is not reliable enough for production use, but promising research indicates a more capable future for web-based voice interfaces.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
CH
Chris Heilmann
WebMCP: Empowering Agents as First-Class Citizens of the Web
WebMCP is an exciting W3C proposal that just landed in Chrome Canary to try out . The idea is that you can use some HTML attributes on a form or register JavaScript tool methods to give agents direct access to content. This gives us as content prov...
WebMCP: Empowering Agents as First-Class Citizens of the Web

From learning to earning

Jobs that call for the skills explored in this talk.