Skip to content

Rana Nasri Ghazzi

Data Analyst & Visualization Expert

Turning Data into Decisions

Home
Portfolio
Projects
Contact

Share on Facebook
Share on LinkedIn
Share on WhatsApp
Print this Page
Email this Page

IBM Stocks – ETL Pipeline

IBM Stocks ETL – Silver layer

Tools: Databricks | Pyspark | Pandas| Numpy | delta.tables

Description :

The Silver layer takes the Bronze data and produces a clean, curated, always-current version of each record. It applies type casting, null validation, and deduplication, then uses a Delta MERGE (upsert) operation keyed on the date field — updating existing records if they’ve changed and inserting new ones if they haven’t been seen before.

Task 3: Silver_Merge

Notebook: IBM _silver
Dependencies: Waits for Auto_Loader_bronze to complete

What it does:

Loads configuration from config_Parms
Creates silver table schema if it doesn’t exist
Reads from workspace.bronze.ibm
Data Quality Transformations:
- Casts string columns to proper types (DATE, DOUBLE, BIGINT, TIMESTAMP)
- Drops null values on critical columns (Date, Open, High, Low, Close, Volume)
- Deduplicates by Date (keeps latest record per date)
- Filters invalid data (Close > 0)
- Adds ingested_at timestamp
MERGE operation into workspace.silver.ibm:
- If Date exists: UPDATE all columns
- If Date is new: INSERT new row
Result: Clean, typed, deduplicated data ready for analytics

Output: The end result is a reliable, incrementally-updated data foundation ready to power business intelligence tools and real-time dashboards with accurate, deduplicated IBM stock history.

Home
Portfolio
Projects
Contact