His journey started with detecting filler words in his native dialect. It ended with running a private LLM in the browser to answer questions about any PDF.
#1about 2 minutes
Using machine learning to detect verbal filler words
A personal project to detect and count filler words in Swiss German speech highlights the limitations of standard speech-to-text APIs.
#2about 2 minutes
Comparing TensorFlow.js backends for performance
TensorFlow.js performance depends on the chosen backend, with WebGPU offering significant speed improvements over CPU, WebAssembly, and WebGL.
#3about 2 minutes
Real-time face landmark detection with WebGPU
A live demo showcases how the WebGPU backend in TensorFlow.js achieves 30 frames per second for face detection, far outpacing CPU and WebGL.
#4about 1 minute
Building a browser extension for gesture control
A Chrome extension uses a hand landmark detection model to enable website navigation and interaction through pinch gestures.
#5about 2 minutes
Training a custom speech model with Teachable Machine
Teachable Machine provides a no-code interface to train a custom speech command model directly in the browser for recognizing specific words.
#6about 2 minutes
The technical challenges of running LLMs in browsers
To run LLMs on-device, we must understand their internal workings, from tokenizers that convert text to numbers to the massive model weights.
#7about 2 minutes
Reducing LLM size for browser use with quantization
Quantization is a key technique for reducing the file size of LLM weights by using lower-precision numbers, making them feasible for browser deployment.
#8about 2 minutes
Running on-device models with the WebLLM library
The WebLLM library, powered by Apache TVM, simplifies the process of loading and running quantized LLMs directly within a web application.
#9about 2 minutes
A live demo of on-device text generation
A markdown editor demonstrates fast, local text generation using the Gemma 2B model, with all processing happening in the browser without cloud requests.
#10about 1 minute
Mitigating LLM hallucinations with RAG
Retrieval-Augmented Generation (RAG) improves LLM accuracy by providing relevant source documents alongside the user's prompt to ground the response in facts.
#11about 3 minutes
Building an on-device RAG solution for PDFs
A demo application shows how to implement a fully client-side RAG system that processes a PDF and uses vector embeddings to answer questions.
#12about 1 minute
Forcing an LLM to admit when it doesn't know
By instructing the model to only use the provided context, a RAG system can reliably respond that it doesn't know the answer if it's not in the source document.
#13about 2 minutes
The future of on-device AI hardware and APIs
The performance of on-device AI is heavily hardware-dependent, but future improvements in chips (NPUs) and browser APIs like WebNN will broaden access.
#14about 2 minutes
Key benefits of running AI in the browser
Browser-based AI offers significant advantages including privacy by default, zero installation, high interactivity, and infinite scalability since users provide the compute.
Related jobs
Jobs that call for the skills explored in this talk.
Dev Digest 215: Agent Memory, JS2026, Googlebot Analysis & Canvas❤️HTMLInside last week’s Dev Digest 215 .
🗿 Make AI talk like a caveman
🧠 A guide to context engineering for LLMs
🤖 Simon Willison on agentic engineering
🔐 Axios supply chain attack post mortem
🛡️ Designing AI agents to resist prompt injection
🎨 HTML in c...
Chris Heilmann
With AIs wide open - WeAreDevelopers at All Things Open 2025Last week our VP of Developer Relations, Chris Heilmann, flew to Raleigh, North Carolina to present at All Things Open . An excellent event he had spoken at a few times in the past and this being the “Lucky 13” edition, he didn’t hesitate to come and...
Daniel Cranney
Dev Digest 198: 30 years of JS, In-Browser AI, How Attackers Abuse GenAI Inside last week’s Dev Digest 198 .
🎂 30 years of JavaScript
⏰ How long is a JavaScript second
💻 Clean code in Angular
🤦♂️ AI makes different mistakes than humans
👨💻 In-browser and offline AI
🟠 Undocumented Hacker News features
🐋 DeepSeek censored...
From learning to earning
Jobs that call for the skills explored in this talk.