IBM Stocks ETL – Silver layer
Tools: Databricks | Pyspark | Pandas| Numpy | delta.tables
Description :
The Silver layer takes the Bronze data and produces a clean, curated, always-current version of each record. It applies type casting, null validation, and deduplication, then uses a Delta MERGE (upsert) operation keyed on the date field — updating existing records if they’ve changed and inserting new ones if they haven’t been seen before.
Task 3: Silver_Merge
Notebook: IBM _silver
Dependencies: Waits for Auto_Loader_bronze to complete
What it does:
- Loads configuration from config_Parms
- Creates silver table schema if it doesn’t exist
- Reads from workspace.bronze.ibm
- Data Quality Transformations:
- Casts string columns to proper types (DATE, DOUBLE, BIGINT, TIMESTAMP)
- Drops null values on critical columns (Date, Open, High, Low, Close, Volume)
- Deduplicates by Date (keeps latest record per date)
- Filters invalid data (Close > 0)
- Adds ingested_at timestamp
- MERGE operation into workspace.silver.ibm:
- If Date exists: UPDATE all columns
- If Date is new: INSERT new row
- Result: Clean, typed, deduplicated data ready for analytics
Output: The end result is a reliable, incrementally-updated data foundation ready to power business intelligence tools and real-time dashboards with accurate, deduplicated IBM stock history.
