Alan Mazankiewicz
Fully Orchestrating Databricks from Airflow
#1about 5 minutes
Exploring the core features of the Databricks workspace
A walkthrough of the Databricks UI shows how to create Spark clusters, run code in notebooks, and define scheduled jobs with multi-task dependencies.
#2about 6 minutes
Understanding the fundamentals of Apache Airflow orchestration
Airflow provides powerful workflow orchestration with features like dynamic task generation, complex trigger rules, and a detailed UI for monitoring DAGs.
#3about 5 minutes
Integrating Databricks and Airflow with built-in operators
The DatabricksRunNowOperator and DatabricksSubmitRunOperator allow Airflow to trigger predefined or dynamically defined jobs in Databricks via its REST API.
#4about 3 minutes
Creating a custom operator for full Databricks API control
To overcome the limitations of built-in operators, you can create a generic custom operator by subclassing BaseOperator and using the DatabricksHook to make arbitrary API calls.
#5about 3 minutes
Implementing a custom operator to interact with DBFS
A practical example demonstrates how to use the custom generic operator to make a 'put' request to the DBFS API, including the use of Jinja templates for dynamic paths.
#6about 2 minutes
Developing advanced operators for complex cluster management
For complex scenarios, custom operators can be built to create an all-purpose cluster, wait for it to be ready, submit multiple jobs, and then terminate it.
#7about 5 minutes
Answering questions on deployment, performance, and tooling
The discussion covers running Airflow in production environments like Kubernetes, optimizing Spark performance on Databricks, and comparing Airflow to Azure Data Factory.
#8about 10 minutes
Discussing preferred data stacks and career advice
The speaker shares insights on their preferred data stack for different use cases, offers advice for beginners learning Python, and describes a typical workday as a data engineer.
Related jobs
Jobs that call for the skills explored in this talk.
Picnic Technologies B.V.
Amsterdam, Netherlands
Intermediate
Senior
Python
Structured Query Language (SQL)
+1
Matching moments
01:32 MIN
Organizing a developer conference for 15,000 attendees
Cat Herding with Lions and Tigers - Christian Heilmann
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
03:17 MIN
Selecting strategic partners and essential event tools
Cat Herding with Lions and Tigers - Christian Heilmann
02:39 MIN
Establishing a single source of truth for all data
Cat Herding with Lions and Tigers - Christian Heilmann
02:54 MIN
Automating video post-production with local scripts
Cat Herding with Lions and Tigers - Christian Heilmann
03:28 MIN
Why corporate AI adoption lags behind the hype
What 2025 Taught Us: A Year-End Special with Hung Lee
03:48 MIN
Automating formal processes risks losing informal human value
What 2025 Taught Us: A Year-End Special with Hung Lee
04:27 MIN
Moving beyond headcount to solve business problems
What 2025 Taught Us: A Year-End Special with Hung Lee
Featured Partners
Related Videos
Python-Based Data Streaming Pipelines Within Minutes
Bobur Umurzokov
PySpark - Combining Machine Learning & Big Data
Ayon Roy
Enjoying SQL data pipelines with dbt
Matthias Niehoff
Convert batch code into streaming with Python
Bobur Umurzokov
Data Fabric in Action - How to enhance a Stock Trading App with ML and Data Virtualization
Andreas Christian
From Syntax to Singularity: AI’s Impact on Developer Roles
Anna Fritsch-Weninger
Alibaba Big Data and Machine Learning Technology
Dr. Qiyang Duan
Databases on Kubernetes
Denis Souza Rosa
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Databricks
Amsterdam, Netherlands
Azure
Scala
Spark
Python
Routing
+4

Spait Infotech Private Limited
Sheffield, United Kingdom
Remote
£45-90K
Intermediate
ETL
Azure
Spark
+4

Ai. Databricks
Charing Cross, United Kingdom
Azure
Amazon Web Services (AWS)



Brightbox Grp Ltd
Charing Cross, United Kingdom
Remote
£104-119K
ETL
Azure
Spark
+3

Deloitte Consulting B.V.
Amsterdam, Netherlands
Remote
Azure
Data Lake
Terraform
Continuous Integration


RED Global
Sheffield, United Kingdom
Intermediate
ETL
Azure
Python
Tableau
PySpark
+7