Spark- Architecture
Spark Architecture in short
What is Databricks ?
Databricks Setup and Introduction
what is RDD?
RDD - Practical
What is DataFrame and How to create
Read different types of files in Spark
Type of files format supported in Apache Spark
Read CSV file
Handle bad records in CSV file while reading
Read Json file
Read XML file
Read Parquet file
Read Excel File
Read Data from REST API
Read data from SQL Databases
Read ORC and Avro files
Manipulate data using PySpark
Select columns in DataFrame
Add New Column in a DataFrame
Rename Column in a DataFrame
Case when in a DataFrame
Filter a DataFrame
Sort a DataFrame
Drop columns and Drop duplicates rows
Handle Nulls values in Dataframe
Group by the data in Dataframe
Handle dates in a Dataframe
Collect list vs Collect set
Explode in Dataframe
Row_Number vs Rank vs Dense Rank
Join Dataframes
Different types of views
Write data into a table
Managed vs External table
Manipulation -Notebooks
Delta Lake Features
Delta table introduction
Create data in delta table
Update data in delta table
Delete data from delta table
Write data in delta table -- OverwriteSchema and MergeSchema
Replace Where while writing data in delta table
History and Restore the data in Delta table
Vacuum command in Delta table
Optimize command -- fix small file issues
DataLake VS DeltaLake
Batch vs Streaming data
Batch vs Streaming Data
How to write batch vs streaming data code
Autoloader in detail
Output Mode
Type of trigggers
Databricks Utilities
Databricks Utility File System Commands
Databricks Widgets Commands - Parameterize Notebooks
Databricks Notebooks Command
Spark Optimizations Tricks
Databricks UI Simulator & Performance Tuning
Databricks UI Simulator
Performance Issues
Runtime Architecture of Spark In Databricks
Runtime Architecture of Spark In Databricks
Skew in Databricks
SKEW UI Simulator
SKEW JOIN Optimization
Spill in Databricks
SPILL UI simulator
How to solve shuffling problem?
SHUFFLE UI simulator
Storage UI Siumulator
What is Serialisation?
Vectorized UDFs
Serialization UI Simulator
What is Z - Order? How to optimise tables ?
Process PetaBytes of Data in seconds
When and Why we should use Vacuum command in Delta Table?
Deletion vector
Deletion Vector
Liquid Clustering in Detail
What is Predictive Optimization ?
Predictive Optimization
How to do Cluster Tuning in Databricks ?
Performance tuning in databricks
Unity Catalog
What is Unity Catalog
Create Databricks, Azure Data Lake Gen2 and Connector
Setup Access Connector and create MetaStore
Create external location and storage credential
Create catalog, schema and table - External and Managed
What is Lineage
Unity Catalog Delta Sharing (Non Databricks Customer)
Different Compute Type
How to create Groups and Manage
SQL commands for Unity Catalog
Masking columns - Sensitive Data
Row Level Access Control in Unity Catalog table
Create SPN and Grant access
Utilities and Framework
What are init scripts
How to do logging in databricks notebooks?
Introduction of Pytest in PySpark Databricks
Build your first package and Import in Databricks
Write PyTest in Python for PySpark
Integrate Pytest (Test cases) in CI-CD Pipeline
Unit Testing in PySpark
Testing of Databricks Notebook | Unit Testing |
Delta Live Tables
Delta Live Tables - Introduction
Delta live tables views,tables,streaming
How to implement data quality checks in DLT
End to End Delta Live Tables
Databricks Workflows and CI-CD
Git Integration in Databricks
Databricks Notebooks Deployment
Create Jobs and Workflows
Deploy workflows using Github actions
Workflows with SPN
Hugging face LLM model application using Databricks
Architecture and Problem Statement
Cleaning the data
Creating Embeddings in Silver Layer
Save data into VectorDB
Download imdb data file
Preview - Databricks Certified Data Engineer - Zero to Hero
Discuss (