How do you apply machine learning when your dataset is too big for a single machine? Discover PySpark's powerful, distributed ML pipelines.
#1about 3 minutes
Combining big data and machine learning for business insights
The exponential growth of data necessitates combining big data processing with machine learning to personalize user experiences and drive revenue.
#2about 3 minutes
An introduction to the Apache Spark analytics engine
Apache Spark is a unified analytics engine for large-scale data processing that provides high-level APIs and specialized libraries like Spark SQL and MLlib.
#3about 4 minutes
Understanding Spark's core data APIs and abstractions
Spark's data abstractions evolved from the low-level Resilient Distributed Dataset (RDD) to the more optimized and user-friendly DataFrame and Dataset APIs.
#4about 11 minutes
How the Spark cluster architecture enables parallel processing
Spark's architecture uses a driver program to coordinate tasks across a cluster manager and multiple worker nodes, which run executors to process data in parallel.
#5about 5 minutes
Using Python with Spark through the PySpark library
PySpark provides a Python API for Spark, using the Py4J library to communicate between the Python process and Spark's core JVM environment.
#6about 5 minutes
Exploring the key features of the Spark MLlib library
Spark's MLlib offers a comprehensive toolkit for machine learning, including pre-built algorithms, featurization tools, pipelines for workflow management, and model persistence.
#7about 4 minutes
The standard workflow for machine learning in PySpark
A typical machine learning workflow in Spark involves using DataFrames, applying Transformers for feature engineering, training a model with an Estimator, and orchestrating these steps with a Pipeline.
#8about 3 minutes
Pre-built algorithms and utilities available in Spark MLlib
MLlib includes a variety of common, pre-built algorithms for classification, regression, and clustering, such as logistic regression, SVM, and K-means clustering.
Related jobs
Jobs that call for the skills explored in this talk.
The State of WebDev AI 2025 Results: What Can We Learn?Introduction
The 2025 edition of The State of WebDev AI offers a detailed snapshot of how developers are using AI today, which tools have gained the most traction over the past year, and what these trends suggest about the future of the industry.
In...
Christina Schaireiter
Why Attend a Developer Event?Modern software engineering moves too fast for documentation alone. Attending a world-class event is about shifting from tactical execution to strategic leadership.
Skill Diversification: Break out of your specific tech stack to see how the industry...