Fabien Vauchelles
Cracking the Code: Decoding Anti-Bot Systems!
#1about 5 minutes
The fundamental challenge of web scraping as a turing test
Web scraping is fundamentally a Turing test where automated scripts must mimic natural human behavior to avoid detection by anti-bot systems.
#2about 10 minutes
How anti-bot systems analyze the browser stack for signals
Anti-bot systems analyze signals from the entire browser stack, including IP address, TCP/TLS/HTTP2 fingerprints, JavaScript execution, and user navigation patterns.
#3about 2 minutes
Exploiting the business need to minimize false positives
The necessity for websites to avoid blocking real customers (false positives) forces anti-bot systems to focus on a limited set of the most effective signals.
#4about 5 minutes
Tools and techniques to identify anti-bot systems
Use tools like Wappalyzer, browser dev tools, and proxy interceptors to identify the specific anti-bot protection and analyze its architecture and encrypted payloads.
#5about 7 minutes
A step-by-step methodology for building robust scrapers
Follow an incremental approach to bypass protections, starting with basic scraper tuning and progressively adding proxies, headless browsers, and unblocker APIs.
#6about 4 minutes
Designing a scalable architecture for data collection
Build a scalable scraping infrastructure using a central data store, an orchestrator, a proxy management layer, and a farm of diverse browsers.
#7about 7 minutes
Decoding common javascript obfuscation techniques
Anti-bot systems use JavaScript obfuscation techniques like string concealing, code flow confusion, and control flow flattening to make their code unreadable.
#8about 3 minutes
Identifying the five key signal types after deobfuscation
After deobfuscating the code, identify the five main types of signals collected: configuration details, automation flags, rendering fingerprints, reverse engineering checks, and integrity controls.
#9about 1 minute
The next frontier in anti-bot is javascript virtual machines
The next evolution in anti-bot technology involves JavaScript virtual machines that execute proprietary, undocumented bytecode, making reverse engineering significantly more difficult.
#10about 14 minutes
Answering questions on scraping legality, VPNs, and rate limits
The Q&A session addresses common questions about the legality of web scraping, the effectiveness of VPNs, managing rate limits, and the cat-and-mouse game with anti-bot providers.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
04:01 MIN
Navigating the complexities of modern web scraping
How to scrape modern websites to feed AI agents
49:19 MIN
Defending systems with honeypots and tarpits
Honeypots and Tarpits, Benefits of Building your own Tools and more with Salma Alam-Naylor
17:41 MIN
Presenting live web scraping demos at a developer conference
Tech with Tim at WeAreDevelopers World Congress 2024
19:38 MIN
Solving scaling challenges in web data collection
Tech with Tim at WeAreDevelopers World Congress 2024
11:08 MIN
Overcoming blocking techniques and messy HTML
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
21:28 MIN
How developers can prevent browser AI interference
Exploring the Future of Web AI with Google
27:19 MIN
Key takeaways on IDE and developer tool security
You click, you lose: a practical look at VSCode's security
36:58 MIN
Addressing security, privacy, and cross-browser concerns
Project Fugu: Progressive Web Apps, Superpowered
Featured Partners
Related Videos
The attacker's footprint
Antonio de Mello & Amine Abed
Getting under the skin: The Social Engineering techniques
Mauro Verderosa
WeAreDevelopers LIVE: Scammer Payback with Python, Grok Goes Unhinged, The Future of Chromium and mo
Dan Cranney, Chris Heilmann & Brian Rountree
Plants vs. Thieves: Automated Tests in the World of Web Security
Ramona Schwering
WeAreDevelopers LIVE - Chrome for Sale? Comet - the upcoming perplexity browser Stealing and leaking
Chris Heilmann & Daniel Cranney & Ramona Schwering
Skynet wants your Passwords! The Role of AI in Automating Social Engineering
Wolfgang Ettlinger & Alexander Hurbean
Typed Security: Preventing Vulnerabilities By Design
Michael Koppmann
WeAreDevelopers Live: Browser Extensions, Honey Scam, Jailbreaking LLMs and more
Chris Heilmann & Daniel Cranney
From learning to earning
Jobs that call for the skills explored in this talk.

Senior PHP Developer (NL based only)
Online Payment Platform
Delft, Netherlands
€75-95K
Senior
PHP
MySQL
Laravel

![Senior Software Engineer [TypeScript] (Prisma Postgres)](https://wearedevelopers.imgix.net/company/283ba9dbbab3649de02b9b49e6284fd9/cover/oKWz2s90Z218LE8pFthP.png?w=400&ar=3.55&fit=crop&crop=entropy&auto=compress,format)
Senior Software Engineer [TypeScript] (Prisma Postgres)
Prisma
Remote
Senior
Node.js
TypeScript
PostgreSQL

Lead Fullstack Engineer AI
Hubert Burda Media
München, Germany
€80-95K
Intermediate
React
Python
Vue.js
Langchain
+1




