Lars Kölker
Data is Key: Scraping Metadata from Websites
#1about 2 minutes
How social media sites generate link previews
Social media platforms scrape hidden metadata like titles and descriptions from URLs to transform a simple link into a rich preview.
#2about 1 minute
Defining web scraping and its primary use cases
Web scraping is the practice of gathering data from websites without an API, often used when APIs are missing, rate-limited, or too expensive.
#3about 2 minutes
Why CSS selector-based scraping is brittle
Relying on specific CSS selectors for scraping creates a fragile solution that is tied to a single site and breaks whenever the source code changes.
#4about 4 minutes
Generic scraping with schema.org and JSON-LD
Schema.org provides a standardized vocabulary for structured data, enabling the creation of generic scrapers using formats like JSON-LD.
#5about 5 minutes
Using meta tags for structured data extraction
Protocols like Open Graph (OGP) and Twitter Cards extend standard HTML meta tags to provide rich, structured metadata for social sharing and scraping.
#6about 4 minutes
The oEmbed protocol for embedded content
The oEmbed protocol offers a standardized endpoint for retrieving embeddable representations of a URL, which is essential for sites like Instagram.
#7about 1 minute
Showcasing a powerful multi-protocol scraper
A demonstration shows how combining different scraping techniques can extract rich information, including product prices and author images, from various websites.
#8about 3 minutes
Q&A on legality, rate limits, and frameworks
The speaker addresses audience questions regarding the legality of scraping, managing rate limits, and recommended frameworks like Beautiful Soup.
Related jobs
Jobs that call for the skills explored in this talk.
Senior Machine Learning Engineer (f/m/d)
MARKT-PILOT GmbH
Stuttgart, Germany
Remote
€75-90K
Senior
Python
Docker
+1
Matching moments
17:41 MIN
Presenting live web scraping demos at a developer conference
Tech with Tim at WeAreDevelopers World Congress 2024
16:53 MIN
Navigating a fragmenting web and AI content scraping
Fireside Chat: Can Regulation Improve Accessibility? - Léonie Watson
29:02 MIN
Designing a scalable architecture for data collection
Cracking the Code: Decoding Anti-Bot Systems!
38:54 MIN
The evolution of web scraping for modern applications
WeAreDevelopers LIVE – Web Scraping, Agents, Actors and more
04:01 MIN
Navigating the complexities of modern web scraping
How to scrape modern websites to feed AI agents
25:34 MIN
The symbiotic relationship between AI and web scraping
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
22:52 MIN
Demonstration of an AI copilot for automated scraping
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
04:22 MIN
Training AI models with custom scraped data
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
Featured Partners
Related Videos
From clicks to cribs - How to find your dream home with web scraping
Alexander Lichter
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
Vidas Bacevičius
How to scrape modern websites to feed AI agents
Jan Curn
Data Science on Software Data
Markus Harrer
The Great API Debate: REST, GraphQL, or gRPC?
Alexis Yushin
WeAreDevelopers LIVE – Web Scraping, Agents, Actors and more
Chris Heilmann, Daniel Cranney, Ondra Urban & COO & GTM at Apify
Bringing Clarity to Event Streams: Enabling Analytics and AI Through Rich Metadata
Clemens Vasters
Awful APIs: A History Lesson in Industry Mistakes and Mishaps
James Seconde
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

AI Systems and MLOps Engineer for Earth Observation
Forschungszentrum Jülich GmbH
Jülich, Germany
Intermediate
Senior
Linux
Docker
AI Frameworks
Machine Learning




{"@context":"https://schema.org","@graph":[{"@context":"https://schema.org/","@type":"JobPosting","@id":"#jobPosting","title":"Data Scientist
White Light Digital Marketing
Remote
£30-36K
Junior
API
Python
Data analysis
+2

Data Scientist
August-Wilhelm Scheer Institut für digitale Produkte und Prozesse gGmbH
Saarbrücken, Germany
Java
Python
PyTorch
TensorFlow
Data analysis
+1


Graph Data Scientist and Web Visualization Engineer
ESQlabs GmbH
Saterland, Germany
API
HTML
WebGL
React
Vue.js
+8
