In the previous post I have described about setting up required resources to build our ETL pipeline i.e., creating azure databricks workspace, ADLS Gen2 for data source and destination and mounting ADLS Gen2 to the DBFS.
In this post let’s write some pyspark/Spark SQL code to extract &…
With the growing need to structure large volumes of data(bigdata) and make insights out of it, ETL as a process has a major role in this process and also a tedious task. Traditional bigdata technologies like Hadoop MapReduce and Apache Hive have laid a sophisticated path to achieve this goal…
Note: This article describes the step by step process and detailed explanation for mounting ADLS Gen2 to DBFS using service principal & OAuth 2.0. If you need abstract explanation refer to databricks documentation here.
What is Databricks File System?
Databricks File System (DBFS) is a distributed file system mounted into…
In this post Iwill show you how to create a generic pipeline that copies data from Azure SQL database to Azure data lake storage (ADLS Gen2).
Setup resources for data copy:
Before proceeding further let’s gather all the resources that are required to create the generic pipeline: