Fabien Vauchelles
Cracking the Code: Decoding Anti-Bot Systems!
#1about 5 minutes
The fundamental challenge of web scraping as a turing test
Web scraping is fundamentally a Turing test where automated scripts must mimic natural human behavior to avoid detection by anti-bot systems.
#2about 10 minutes
How anti-bot systems analyze the browser stack for signals
Anti-bot systems analyze signals from the entire browser stack, including IP address, TCP/TLS/HTTP2 fingerprints, JavaScript execution, and user navigation patterns.
#3about 2 minutes
Exploiting the business need to minimize false positives
The necessity for websites to avoid blocking real customers (false positives) forces anti-bot systems to focus on a limited set of the most effective signals.
#4about 5 minutes
Tools and techniques to identify anti-bot systems
Use tools like Wappalyzer, browser dev tools, and proxy interceptors to identify the specific anti-bot protection and analyze its architecture and encrypted payloads.
#5about 7 minutes
A step-by-step methodology for building robust scrapers
Follow an incremental approach to bypass protections, starting with basic scraper tuning and progressively adding proxies, headless browsers, and unblocker APIs.
#6about 4 minutes
Designing a scalable architecture for data collection
Build a scalable scraping infrastructure using a central data store, an orchestrator, a proxy management layer, and a farm of diverse browsers.
#7about 7 minutes
Decoding common javascript obfuscation techniques
Anti-bot systems use JavaScript obfuscation techniques like string concealing, code flow confusion, and control flow flattening to make their code unreadable.
#8about 3 minutes
Identifying the five key signal types after deobfuscation
After deobfuscating the code, identify the five main types of signals collected: configuration details, automation flags, rendering fingerprints, reverse engineering checks, and integrity controls.
#9about 1 minute
The next frontier in anti-bot is javascript virtual machines
The next evolution in anti-bot technology involves JavaScript virtual machines that execute proprietary, undocumented bytecode, making reverse engineering significantly more difficult.
#10about 14 minutes
Answering questions on scraping legality, VPNs, and rate limits
The Q&A session addresses common questions about the legality of web scraping, the effectiveness of VPNs, managing rate limits, and the cat-and-mouse game with anti-bot providers.
Related jobs
Jobs that call for the skills explored in this talk.
Technoly GmbH
Berlin, Germany
€50-60K
Intermediate
Network Security
Security Architecture
+2
VECTOR Informatik
Stuttgart, Germany
Senior
Java
IT Security
IGEL Technology GmbH
Bremen, Germany
Senior
Java
IT Security
Matching moments
03:28 MIN
Navigating the complexities of modern web scraping
How to scrape modern websites to feed AI agents
06:57 MIN
Overcoming blocking techniques and messy HTML
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
03:37 MIN
Defending systems with honeypots and tarpits
Honeypots and Tarpits, Benefits of Building your own Tools and more with Salma Alam-Naylor
01:57 MIN
Presenting live web scraping demos at a developer conference
Tech with Tim at WeAreDevelopers World Congress 2024
09:56 MIN
The evolution of web scraping for modern applications
WeAreDevelopers LIVE – Web Scraping, Agents, Actors and more
06:37 MIN
How to defend against AI-powered attacks
Skynet wants your Passwords! The Role of AI in Automating Social Engineering
02:43 MIN
The evolving cybersecurity landscape with AI
Fireside Chat with Cloudflare's Chief Strategy Officer, Stephanie Cohen (with Mike Butcher MBE)
03:50 MIN
Solving scaling challenges in web data collection
Tech with Tim at WeAreDevelopers World Congress 2024
Featured Partners
Related Videos
The attacker's footprint
Antonio de Mello & Amine Abed
Getting under the skin: The Social Engineering techniques
Mauro Verderosa
Skynet wants your Passwords! The Role of AI in Automating Social Engineering
Wolfgang Ettlinger & Alexander Hurbean
Cyber Security: Small, and Large!
Martin Schmiedecker
Can Machines Dream of Secure Code? Emerging AI Security Risks in LLM-driven Developer Tools
Liran Tal
Let’s write an exploit using AI
Julian Totzek-Hallhuber
Securing Frontend Applications with Trusted Types
Philippe De Ryck
Cyber Sleuth: Finding Hidden Connections in Cyber Data
Jennifer Reif
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.




amazon
Canton de Courbevoie-1, France
Java
Linux
NoSQL
Python
TypeScript
+2



Thales
Canton de Palaiseau, France

Octopus Group
Paris, France
API
Python
Splunk
Scripting (Bash/Python/Go/Ruby)

Abnormal AI
Intermediate
API
Spark
Kafka
Python