Projects - Rana Nasri Ghazzi

Databricks:

DashBoards:

Uber Analytics

IT Survey

Featured Projects:

A selection of data engineering and analytics work

A portfolio of ETL and Analytics projects built on the Databricks ecosystem using Medallion Architecture. Each project ingests data from a different source platform using a purpose-fit ingestion method, applies transformations through the Silver layer, and delivers a Gold layer ready for AI and analytics workloads. Every pipeline showcases a distinct Databricks capability — reflecting real-world design choices tailored to different data sources, volumes, and business needs.

Several projects include embedded analytics and visualizations to surface data quality issues and deliver actionable insights for business users. All pipelines are production-ready, implemented with DAB (Databricks Asset Bundles) for deployment automation and managed under version control via GitHub.

Note: These project has been intentionally designed to demonstrate a variety of ingestion modes and methods. Certain pipelines have been restructured — for example, some sources use streaming rather than batch ingestion — in order to showcase different data engineering concepts across the pipeline inventory.

Data Ingestion Pipeline Inventory

Pipeline Name	Source	Source Type	Ingestion Type	Load Mode	Target Architecture	Table Type	Schedule / Trigger	Monitoring	Data Quality (QA)
AirFlights	Aviation API	JSON	Full refresh	Overwrite	Medallion (Bronze-Silver-Gold)	Delta	Daily batch	Custom logging	Filter rules + row count checks
GA4	Google Analytics 4	Lakehouse Connect -Event Stream	Incremental (DLT live tables)	Append	Medallion (Bronze-Silver-Gold)	Streaming	Daily batchs	DLT built-in event log	DLT built-in expectations
Wikimedia	Wikimedia API	JSON	Incremental (DLT live tables)	Append + Merge (SCD)	Medallion (Bronze-Silver-Gold)	Streaming	Daily batch	DLT built-in event log	DLT built-in expectations
Stocks	Alpha Vantage API	JSON	Incremental(DLT live tables)	Append	Medallion (Bronze-Silver-Gold)	Streaming	Daily batch	DLT built-in event log	DLT built-in expectations
IBM	Alpha Vantage API	JSON	Incremental (Structured Streaming)	Append	Medallion (Bronze-Silver-Gold)	Streaming	Daily batch	DLT built-in event log	DLT built-in expectations
Survey	Survey Files	CSV / Excel	Full refresh	Overwrite	Medallion (Bronze-Silver-Gold)	Delta	Weekly batch	Custom logging	Filter rules
Uber	Uber Trip Files	CSV	Full refresh	Overwrite	Medallion (Bronze-Silver-Gold)	Delta	Weekly batch	Custom logging	Filter rules
Cat Breeds	Cat API	JSON	Full refresh	Overwrite	Medallion (Bronze-Silver-Gold)	Delta	Weekly batch	Custom logging	Filter rules

Airflights

Moving Airplane

An end-to-end ETL pipeline built with Python, Serpapi, and Delta Lake that ingests raw Google Flights data, transforms it through a medallion architecture, and produces aggregated flight intelligence for trip planning.

Bronze Layer	Silver Layer	Gold Layer
Extracting and storing raw flight data to a Delta table in Databricks.	Processing raw flight data from Bronze tables into cleaned, curated silver tables.	Processing cleaned flight data to create business-ready analytics for round-trip flight combinations

Stocks ETL Pipeline

A production-ready ETL pipeline for processing stock market data using Databricks LakeHouse architecture. This project implements a medallion architecture (Bronze → Silver → Gold) with automated data quality checks and orchestrated execution.

Bronze Layer	Silver Layer	Gold – Layer
Raw, unprocessed stock data ingestion with full historical refresh capability	Cleaned and validated data with quality checks applied	Business-ready aggregations and analytics-optimized datasets

IBM Stocks

This project is an automated, scheduled ETL pipeline built in Databricks that ingests IBM daily stock data from an external API and processes it through a two-layer Delta Lake architecture (Bronze → Silver).

Bronze Layer	Silver Layer	Dashboard – Tableau
Incremental raw data ingestion that implements a Change Data Capture (CDC) using watermark approach.	From Raw Ingestion to Curated Truth — Silver Layer Delta Upsert Pipeline	Interactive Tableau dashboards surfacing IBM stock performance metrics and market trends for business consumption.

GA4 Pipeline

This project builds a production-grade data pipeline that ingests raw Google Analytics 4 (GA4) event data, cleans and flattens it, and makes it ready for analysis. It is built on Databricks using Delta Live Tables (DLT) and PySpark, following the Medallion Architecture pattern.

INGESTION	SILVER LAYER	GOLD LAYER
COMING SOON

IT professional Survey

A multi-layer ETL pipeline for analyzing IT professional survey data
using Medallion Architecture with Python and PostgreSQL

ELT	Analysis – EDA
Processing large IT professional survey dataset through a structured three-layer Medallion Architecture.	In-depth Exploratory Data Analysis (EDA)

Medallion ETL – Dashboard – EDA

Crypto – Medallion ETL	Uber Drive	Cat Breed
Processed cryptocurrency market data through a Medallion ETL pipeline.	Analyzed Uber ride data to improve operational efficiency metrics. DASHBOARD

Comming Soon

Wikimedia Live Edit Stream — Custom PySpark Structured Streaming Pipeline

A real-time streaming data pipeline built on Apache Spark (Databricks) that continuously polls the Wikipedia/Wikimedia API for recent edits and processes them as a structured stream.

Databricks Pipeline Monitoring Framework

Built a lightweight, native monitoring framework for Databricks pipelines that tracks both operational health and data quality in a single queryable system. The framework captures per-run metrics — duration, row counts, SLO compliance — alongside granular quality check results, all linked by a shared run_id key stored in two Delta tables in Unity Catalog.