Nico Martin
From ML to LLM: On-device AI in the Browser
#1about 2 minutes
Using machine learning to detect verbal filler words
A personal project to detect and count filler words in Swiss German speech highlights the limitations of standard speech-to-text APIs.
#2about 2 minutes
Comparing TensorFlow.js backends for performance
TensorFlow.js performance depends on the chosen backend, with WebGPU offering significant speed improvements over CPU, WebAssembly, and WebGL.
#3about 2 minutes
Real-time face landmark detection with WebGPU
A live demo showcases how the WebGPU backend in TensorFlow.js achieves 30 frames per second for face detection, far outpacing CPU and WebGL.
#4about 1 minute
Building a browser extension for gesture control
A Chrome extension uses a hand landmark detection model to enable website navigation and interaction through pinch gestures.
#5about 2 minutes
Training a custom speech model with Teachable Machine
Teachable Machine provides a no-code interface to train a custom speech command model directly in the browser for recognizing specific words.
#6about 2 minutes
The technical challenges of running LLMs in browsers
To run LLMs on-device, we must understand their internal workings, from tokenizers that convert text to numbers to the massive model weights.
#7about 2 minutes
Reducing LLM size for browser use with quantization
Quantization is a key technique for reducing the file size of LLM weights by using lower-precision numbers, making them feasible for browser deployment.
#8about 2 minutes
Running on-device models with the WebLLM library
The WebLLM library, powered by Apache TVM, simplifies the process of loading and running quantized LLMs directly within a web application.
#9about 2 minutes
A live demo of on-device text generation
A markdown editor demonstrates fast, local text generation using the Gemma 2B model, with all processing happening in the browser without cloud requests.
#10about 1 minute
Mitigating LLM hallucinations with RAG
Retrieval-Augmented Generation (RAG) improves LLM accuracy by providing relevant source documents alongside the user's prompt to ground the response in facts.
#11about 3 minutes
Building an on-device RAG solution for PDFs
A demo application shows how to implement a fully client-side RAG system that processes a PDF and uses vector embeddings to answer questions.
#12about 1 minute
Forcing an LLM to admit when it doesn't know
By instructing the model to only use the provided context, a RAG system can reliably respond that it doesn't know the answer if it's not in the source document.
#13about 2 minutes
The future of on-device AI hardware and APIs
The performance of on-device AI is heavily hardware-dependent, but future improvements in chips (NPUs) and browser APIs like WebNN will broaden access.
#14about 2 minutes
Key benefits of running AI in the browser
Browser-based AI offers significant advantages including privacy by default, zero installation, high interactivity, and infinite scalability since users provide the compute.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
31:13 MIN
Running on-device AI in the browser with Gemini Nano
Exploring Google Gemini and Generative AI
33:35 MIN
Performing inference in the browser with ONNX Runtime Web
Making neural networks portable with ONNX
00:17 MIN
Building a custom voice AI with WebRTC and Google APIs
Raise your voice!
52:56 MIN
Innovative local AI tools for privacy and transcription
Honeypots and Tarpits, Benefits of Building your own Tools and more with Salma Alam-Naylor
25:22 MIN
Exploring web development quirks and creative AI failures
WeAreDevelopers LIVE – Building on Algorand: Real Projects and Developer Tools
33:57 MIN
Implementing on-device AI with the Chrome AI API
WeAreDevelopers LIVE – AI vs the Web & AI in Browsers
13:51 MIN
The technology behind in-browser AI execution
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
02:42 MIN
Two primary approaches for browser-based AI
Prompt API & WebNN: The AI Revolution Right in Your Browser
Featured Partners
Related Videos
Machine learning in the browser with TensorFlowjs
Håkan Silfvernagel
Multimodal Generative AI Demystified
Ekaterina Sirazitdinova
What do language models really learn
Tanmay Bakshi
Getting Started with Machine Learning
Alexandra Waldherr
Build UIs that learn - Discover the powerful combination of UI and AI
Eliran Natan
Prompt API & WebNN: The AI Revolution Right in Your Browser
Christian Liebel
Generative AI power on the web: making web apps smarter with WebGPU and WebNN
Christian Liebel
Exploring the Future of Web AI with Google
Thomas Steiner
From learning to earning
Jobs that call for the skills explored in this talk.
Machine Learning Engineer
Speechmatics
Charing Cross, United Kingdom
Remote
€39K
Machine Learning
Speech Recognition
ML/DevOps Engineer at dynamic AI/ Computer Vision company
Nomitri
Berlin, Germany
C++
Bash
Azure
DevOps
Python
+12
AIML - Machine Learning Research (Speech Translation), DMLI
Apple Inc.
Cambridge, United Kingdom
€44K
C++
Java
Bash
Perl
+5
Web Developer (short-term, 2 months) In Open-Source Machine Learning
Eindhoven University of Technology
Eindhoven, Netherlands
Remote
React
Plotly
Next.js
Machine Learning
AIML - Machine Learning Research (Speech), DMLI
Apple Firmenprofil
Aachen, Germany
Confluence
Machine Learning
Machine Learning Scientist (AI for Code)
SonarSource
Bochum, Germany
Java
Python
PyTorch
TensorFlow
Machine Learning
+1
ML Application Engineer (German-speaking)
Neural Concept
Großmehring, Germany
Fluid
Python
Machine Learning


