Registration required!
December 7, 2021
1:00 pm
1:45 pm

Fully Orchestrating Databricks from Airflow

Powered by

About the session

In this talk we will introduce how to use the popular cloud service Databricks for
hosting Apache Spark applications for distributed data processing in combination with
Apache Airflow, an orchestration framework for ETL batch workflows. After a brief
exploration of the Databricks Workspace and the fundamentals of Airflow we will take a
deeper look into the functionality Databricks provides in Airflow for orchestrating its
workspace. Afterwards, we will find out how to extend and customize that functionality to
manage virtually every aspect of the Databricks Workspace from Airflow.
The talk does not require any prior knowledge of Databricks, Spark or Airflow but it does
assume familiarity with the fundamentals of the Python programming language especially
object oriented programming and REST api requests. The actual distributed data processing
with Apache Spark itself is not the focus of this talk.

About the speaker

Alan Mazankiewicz
Alan Mazankiewicz
Machine Learning Engineer at inovex GmbH

Watch recording

Registration required!

Save your spot

7 Dec
1:00 pm
1:45 pm
Save my spotSave my spotSave my spotSave my spot
Code of Conduct
WeAreDevelopers welcomes everyone and is dedicated to defending anybody from harassment, regardless of gender, gender identity, and expression, sexual orientation, disability, physical appearance, body size, race, age or religion.
Read more
Diversity & Inclusion
At the WeAreDevelopers Events we empower underrepresented groups by giving them the stage to share their knowledge and experiences. It is crucial for our international events to bring together the perspectives of people with different backgrounds.
Read more