Air Flights Silver Layer – Data Cleaning & Transformation
Tools: Databricks | Pandas | PyArrow | PySpark | Delta Lake
Description:
This notebook implements a Silver layer ETL pipeline for flight booking data using the Medallion Architecture (Bronze → Silver). It processes raw flight data from Bronze tables into cleaned, curated Silver tables.
What it does:
- Reads Bronze data – Loads airflights_outbound and airflights_return tables from workspace.bronze
- Cleans and transforms – Combines flights, adds search route formatting, converts duration strings to minutes, removes duplicates
- Splits by flight type – Separates direct flights (0 stops) from connecting flights (>0 stops)
- Writes to Silver tables – Saves cleaned data to workspace.silver.airflights_direct and workspace.silver.airflights_connecting
- Verifies output – Queries the silver tables to confirm successful load
Schema includes: flight number, airline, origin/destination, times, duration, stops, price, currency, trip type, and fetch timestamp.
