Sign up or log in to watch the video
Fully Orchestrating Databricks from Airflow
Alan Mazankiewicz - 2 years ago
In this talk we will introduce how to use the popular cloud service Databricks for hosting Apache Spark applications for distributed data processing in combination with Apache Airflow, an orchestration framework for ETL batch workflows. After a brief exploration of the Databricks Workspace and the fundamentals of Airflow we will take a deeper look into the functionality Databricks provides in Airflow for orchestrating its workspace. Afterwards, we will find out how to extend and customize that functionality to manage virtually every aspect of the Databricks Workspace from Airflow. The talk does not require any prior knowledge of Databricks, Spark or Airflow but it does assume familiarity with the fundamentals of the Python programming language especially object oriented programming and REST api requests. The actual distributed data processing with Apache Spark itself is not the focus of this talk.
Jobs with related skills
(Senior) Tableau Developer (m/f/x)
ALDI SÜD
·
14 days ago
Mülheim, Germany
Hybrid
Senior Software Architekt (m/f/d)
CGI
·
29 days ago
Berlin, Germany
+2
Hybrid
Softwareentwickler Full Stack (m/w/d)
Science Media Center Germany gGmbH
·
28 days ago
Cologne, Germany
Hybrid
Field Application Engineer (FAE) (m/w/d)
Lauterbach GmbH
·
21 days ago
Höhenkirchen-Siegertsbrunn, Germany
Related videos