arrow_back
Module 1 : Deep Dive into Delta Lake
Syllabus of the course
Architecture and Introduction of Delta-Lake
How Delta table differs from Normal tables
Create a Delta table
Generate Column in Delta table
Read a Delta table
Write to a Delta table
ReplaceWhere while writing to Delta table
Delete a Delta table
Update a Delta table
Upsert or Merge Statment in Delta tables
Understand transactional logs in _delta_log folder
How to go to time travel in Delta table using History
Restore Delta table to previous version
How to add constraint in Delta table
How to add user meta data information in a Delta table
Schema evolution and enforcement in Delta table
Shallow and Deep Clone of Delta table
Addition on Deep and Shallow Clone of Delta table
How to enable Change Data Feed in Delta table
Reduce small file issue using optimize
Download Module 1 Notebooks
Module 2: End to End Project
Architecture video of the Project
Understand the requirement
Create Blob Storage and Upload Excel file
Create ADLS GEN2 for source and upload data files
Create Azure SQL Server,Database and create table
Setup API and understand in breif
Complete source dataset files
Create Sink Datalake for Delta lake -Bronze, Silver, Gold layers
Setup Keyvault and SPN(App Registration)
Create MountPoint of ADLS Gen2 and Blob storage (Source Location)
Create MountPoint of ADLS GEN2 (Sink Location)
Read Excel file and write in Delta Lake(Bronze Layer)
Improve version - Read Excel file and write in Delta Lake(Bronze Layer)
Read CSV,Parquet,Text files and write in Delta Lake(Bronze Layer)
Read SQL tables from Azure SQL
Read data from API
Read CSV file using Spark File Streaming
Add FileName and Insert LoadTime
Framework to validate schema across all sources
Improvements on Ingestion - Raw Layer
Cleansing Raw tables - Part 1
Cleansing Raw tables - Part 2
Cleansing Raw tables - Part 3
Cleansing Raw tables - Part 4
Improve validation framework and Create Raw tables
Improve ReplaceWhere while writing data into delta
Create DDL for Raw tables
Improve Merge logic and Automate the flow
Fix small file issues in Raw and Cleansed Layer
Improvement in Streaming source
Create Fact table -1
Create Fact table-2
Create Dimension - 1
Create Dimension -2
Setup workflow and Jobs in Databricks
Why we should vacuum delta tables
Manual Run Notebook Job
Setup Git Repo in Databricks
Create Views and Import in PowerBI to visualize the data
Update Workflows
Update Workflows - Part2
Distribute data files to End users using Python
DeltaLake - Databricks - Code
Preview - Build Real-Time DeltaLake Project using PySpark and Spark-SQL with Databricks
Discuss (
0
)
navigate_before
Previous
Next
navigate_next