Philipp Krenn
Make Your Data FABulous
#1about 7 minutes
Understanding the CAP theorem for distributed systems
The CAP theorem states that a distributed data store can only provide two of three guarantees: consistency, availability, and partition tolerance.
#2about 3 minutes
Introducing the FAB theory for datastore tradeoffs
The FAB theory proposes another set of tradeoffs for data stores, where you can only pick two of three attributes: fast, accurate, or big.
#3about 7 minutes
How terms aggregation trades accuracy for speed
Elasticsearch's terms aggregation may return inaccurate counts by default because each shard only considers its top local results to improve performance.
#4about 8 minutes
Inconsistent relevance scores in distributed full-text search
Full-text search relevance scores using TF-IDF can be inconsistent because inverse document frequency is calculated per-shard, not globally.
#5about 2 minutes
Using a single shard to ensure data accuracy
Forcing an index to use a single shard guarantees accurate aggregations and relevance scores by eliminating distributed calculations, but sacrifices horizontal scaling.
#6about 1 minute
Why you must consciously choose your data tradeoffs
It is crucial to understand and explicitly choose the tradeoffs in your data systems, like those in the CAP and FAB theorems, to avoid unexpected behavior.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
02:56 MIN
Navigating the challenges of distributed aggregations
Distributed search under the hood
03:31 MIN
Q&A on indexing, aggregations, and OpenSearch vs Elasticsearch
Search and aggregations made easy with OpenSearch and NodeJS
05:32 MIN
Optimizing compute, storage, and data transmission
A Hitchhiker's Guide to Resource Efficient Software
04:58 MIN
Optimizing performance with advanced data distribution methods
Fault Tolerance and Consistency at Scale: Harnessing the Power of Distributed SQL Databases
04:29 MIN
Introducing the core principles of Elasticsearch
Distributed search under the hood
01:17 MIN
Recapping Kafka's capabilities for real-time data feeds
Let's Get Started With Apache Kafka® for Python Developers
02:40 MIN
Distributing data using shards and replicas
Distributed search under the hood
05:47 MIN
Achieving massive throughput with sharded architectures
The Rise of Reactive Microservices
Featured Partners
Related Videos
Distributed search under the hood
Alexander Reelsen
Empowering Retail Through Applied Machine Learning
Christoph Fassbach & Daniel Rohr
Database Magic behind 40 Million operations/s
Jürgen Pilz
Things I learned while writing high-performance JavaScript applications
Michele Riva
Leveraging Real time data in FSIs
Tim Faulkes
Modern Data Architectures need Software Engineering
Matthias Niehoff
How building an industry DBMS differs from building a research one
Markus Dreseler
Writing a full-text search engine in TypeScript
Michele Riva
Related Articles
View all articles


.gif?w=240&auto=compress,format)
From learning to earning
Jobs that call for the skills explored in this talk.

Confideck GmbH
Vienna, Austria
Remote
Intermediate
Senior
Node.js
MongoDB
TypeScript

epunkt GmbH
Graz, Austria
€63K
Azure
QlikView
Powershell
Scripting (Bash/Python/Go/Ruby)


Accenture
Barcelona, Spain

La Fosse
Cambridge, United Kingdom
£143-150K
Senior
ETL
GIT
Python



Smart Future Campus GmbH
Dresden, Germany
ETL
JSON
Azure
NoSQL
Scrum
+1
