{"id":5165,"date":"2026-05-30T23:24:31","date_gmt":"2026-05-30T23:24:31","guid":{"rendered":"https:\/\/ranaghazzi.com\/?p=5165"},"modified":"2026-05-30T23:25:30","modified_gmt":"2026-05-30T23:25:30","slug":"auto-loader-the-smartest-way-to-ingest-streaming-data-in-databricks","status":"publish","type":"post","link":"https:\/\/ranaghazzi.com\/?p=5165","title":{"rendered":"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks"},"content":{"rendered":"<p><style>\n    .light-font-container, .light-font-container p, .light-font-container h2, .light-font-container li {<br \/>\n        font-weight: #FFFFFF !important;<br \/>\n    }<br \/>\n<\/style>\n<\/p>\n<div class=\"light-font-container\" style=\"background-color: #FFFFFF; padding: 40px; border-radius: 15px;\">\n\n\n<h2 class=\"wp-block-heading\">Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks<\/h2>\n\n\n<div class=\"wp-block-post-date\"><time datetime=\"2026-05-30T23:25:25.619Z\">May 30, 2026<\/time><\/div>\n\n\n<p class=\"wp-block-paragraph\">If you&#8217;ve ever built a data pipeline that ingests files from cloud storage, you know the pain: polling for new files, tracking what&#8217;s already been processed, handling duplicates, and scaling when data volumes spike. Databricks Auto Loader was built to solve exactly these problems \u2014 elegantly and at scale.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">What Is Auto Loader?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Auto Loader is a Databricks-native structured streaming source that incrementally and efficiently ingests new data files as they arrive in cloud storage (S3, ADLS, GCS). It&#8217;s built on top of Apache Spark&#8217;s Structured Streaming engine and handles all the complexity of file discovery, state tracking, and schema management for you.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At its core, Auto Loader answers one question: &#8220;Which files have arrived since I last ran?&#8221; \u2014 and answers it reliably, at any scale.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">How It Works<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Auto Loader offers two file discovery modes:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">1. Directory Listing Mode (Default)<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Periodically lists the contents of the source directory and compares it against already-processed files stored in a checkpoint location. Simple to set up, works everywhere, but can become slow for directories with millions of files.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2. File Notification Mode (Recommended for Scale)<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Uses cloud-native event services (AWS SNS + SQS, Azure Event Grid, GCS Pub\/Sub) to receive real-time notifications when new files arrive. This is far more efficient at scale since it avoids full directory scans entirely.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df = (spark.readStream\n        .format(\"cloudFiles\")\n        .option(\"cloudFiles.format\", \"json\")\n        .option(\"cloudFiles.schemaLocation\", \"\/mnt\/schema\/orders\")\n        .load(\"\/mnt\/raw\/orders\/\"))<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The key format name is cloudFiles \u2014 that&#8217;s Auto Loader&#8217;s identifier in Spark.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Key Features<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Exactly-Once Processing<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Auto Loader uses Spark checkpointing to track which files have been processed. If your pipeline crashes mid-run, it resumes from exactly where it left off \u2014 no duplicates, no missed files.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Automatic Schema Inference &amp; Evolution<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">One of Auto Loader&#8217;s standout features. It infers the schema from your data on the first run and stores it at the schemaLocation. When new columns appear in incoming files, it can:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Fail<\/strong> and alert you (default)<\/li>\n\n\n\n<li><strong>Rescue<\/strong> unexpected data into a _rescued_data column<\/li>\n\n\n\n<li><strong>Add<\/strong> new columns automatically<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>.option(\"cloudFiles.schemaEvolutionMode\", \"addNewColumns\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Scalable File Discovery<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Auto Loader can handle billions of files efficiently. In notification mode, it processes file arrival events rather than scanning directories \u2014 a critical advantage for high-volume pipelines.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Built-in Metadata Column<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Every ingested row gets a _metadata column with useful context:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df.select(\n    \"_metadata.file_path\",\n    \"_metadata.file_name\",\n    \"_metadata.file_modification_time\",\n    \"_metadata.file_size\"\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This makes auditing, debugging, and lineage tracking trivial.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">A Complete Auto Loader Pipeline<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from pyspark.sql.functions import current_timestamp\n\n# 1. Ingest with Auto Loader\nraw_df = (spark.readStream\n    .format(\"cloudFiles\")\n    .option(\"cloudFiles.format\", \"parquet\")\n    .option(\"cloudFiles.schemaLocation\", \"\/mnt\/checkpoints\/orders\/schema\")\n    .option(\"cloudFiles.schemaEvolutionMode\", \"addNewColumns\")\n    .option(\"cloudFiles.inferColumnTypes\", \"true\")\n    .load(\"\/mnt\/raw\/orders\/\"))\n\n# 2. Add ingestion metadata\nenriched_df = raw_df.withColumn(\"ingested_at\", current_timestamp()) \\\n                    .withColumn(\"source_file\", raw_df&#91;\"_metadata.file_name\"])\n\n# 3. Write to Delta Lake\n(enriched_df.writeStream\n    .format(\"delta\")\n    .option(\"checkpointLocation\", \"\/mnt\/checkpoints\/orders\/stream\")\n    .option(\"mergeSchema\", \"true\")\n    .trigger(availableNow=True)          # batch-style trigger\n    .toTable(\"silver.orders\"))<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Trigger Modes<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Auto Loader supports multiple trigger strategies depending on your latency and cost requirements:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Trigger<\/th><th>Behavior<\/th><th>Best For<\/th><\/tr><\/thead><tbody><tr><td>availableNow=True<\/td><td>Processes all backlog, then stops<\/td><td>Scheduled batch jobs<\/td><\/tr><tr><td>processingTime=&#8221;5 minutes&#8221;<\/td><td>Runs every N minutes continuously<\/td><td>Near real-time pipelines<\/td><\/tr><tr><td>once=True <em>(deprecated)<\/em><\/td><td>Like availableNow but older API<\/td><td>Legacy pipelines<\/td><\/tr><tr><td>Default (no trigger)<\/td><td>Runs as fast as possible<\/td><td>True streaming use cases<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">availableNow is the recommended pattern for most production pipelines \u2014 it gives you incremental batch behavior with the simplicity of streaming.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Auto Loader vs. COPY INTO<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Databricks offers two incremental ingestion approaches. Here&#8217;s when to use each:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><\/th><th>Auto Loader<\/th><th>COPY INTO<\/th><\/tr><\/thead><tbody><tr><td>Volume<\/td><td>Millions+ of files<\/td><td>Thousands of files<\/td><\/tr><tr><td>Mode<\/td><td>Streaming<\/td><td>Batch SQL<\/td><\/tr><tr><td>Schema evolution<\/td><td>Built-in<\/td><td>Manual<\/td><\/tr><tr><td>File tracking<\/td><td>Checkpoint-based<\/td><td>Internal state<\/td><\/tr><tr><td>Best for<\/td><td>Continuous pipelines<\/td><td>Ad-hoc \/ scheduled loads<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">For large-scale, production-grade pipelines, Auto Loader is the clear winner.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Best Practices<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Always set a schemaLocation \u2014 even if you define the schema manually. It protects against schema drift causing silent failures downstream.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use _rescued_data in production to capture unexpected columns rather than failing the entire stream:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>.option(\"cloudFiles.schemaEvolutionMode\", \"rescue\")<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Partition your source paths when possible. Loading from \/raw\/orders\/date=2026-05-30\/ instead of \/raw\/orders\/ limits the file scan scope dramatically.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Separate checkpoint and schema locations to keep things organized:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/checkpoints\/{pipeline}\/schema\/   \u2190 schemaLocation\n\/checkpoints\/{pipeline}\/stream\/   \u2190 checkpointLocation<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Use availableNow with a job scheduler (like Databricks Workflows) rather than running a perpetual streaming cluster \u2014 it&#8217;s cheaper and operationally simpler for most batch-oriented pipelines.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">When Not to Use Auto Loader<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Auto Loader is a file-based ingestion tool. It&#8217;s not the right choice when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Your source is a message queue (Kafka, Kinesis) \u2014 use the native Spark connectors instead.<\/li>\n\n\n\n<li>You need full table replication from a database \u2014 use tools like Debezium or Fivetran.<\/li>\n\n\n\n<li>You&#8217;re reading from Delta tables themselves \u2014 use Delta&#8217;s native streaming (readStream.format(&#8220;delta&#8221;)).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Wrapping Up<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Auto Loader removes the operational burden of building and maintaining incremental file ingestion pipelines. Schema evolution, exactly-once semantics, scalable file discovery, and deep Delta Lake integration make it the backbone of most modern Databricks medallion architectures.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you&#8217;re landing files in cloud storage and loading them into Delta Lake, Auto Loader is almost certainly the right tool \u2014 and cloudFiles is the single option that unlocks all of it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks If you&#8217;ve ever built a data pipeline that ingests files from cloud storage, you know the pain: polling for new files, tracking what&#8217;s already been processed, handling duplicates, and scaling when data volumes spike. Databricks Auto Loader was built to solve exactly these problems [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-5165","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks - Rana Nasri Ghazzi<\/title>\n<meta name=\"description\" content=\"Explore Rana Ghazzi&#039;s data analytics portfolio \u2014 dashboards, visualizations, and insights built with Tableau, Power BI &amp; Python.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ranaghazzi.com\/?p=5165\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks - Rana Nasri Ghazzi\" \/>\n<meta property=\"og:description\" content=\"Explore Rana Ghazzi&#039;s data analytics portfolio \u2014 dashboards, visualizations, and insights built with Tableau, Power BI &amp; Python.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ranaghazzi.com\/?p=5165\" \/>\n<meta property=\"og:site_name\" content=\"Rana Nasri Ghazzi\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-30T23:24:31+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-30T23:25:30+00:00\" \/>\n<meta name=\"author\" content=\"Rana Ghazzi\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rana Ghazzi\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165\"},\"author\":{\"name\":\"Rana Ghazzi\",\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/#\\\/schema\\\/person\\\/d8ee34f53cb0df9faaf816fb5363a4cc\"},\"headline\":\"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks\",\"datePublished\":\"2026-05-30T23:24:31+00:00\",\"dateModified\":\"2026-05-30T23:25:30+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165\"},\"wordCount\":772,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/#\\\/schema\\\/person\\\/d8ee34f53cb0df9faaf816fb5363a4cc\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165\",\"url\":\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165\",\"name\":\"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks - Rana Nasri Ghazzi\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/#website\"},\"datePublished\":\"2026-05-30T23:24:31+00:00\",\"dateModified\":\"2026-05-30T23:25:30+00:00\",\"description\":\"Explore Rana Ghazzi's data analytics portfolio \u2014 dashboards, visualizations, and insights built with Tableau, Power BI & Python.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/?p=5165#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/ranaghazzi.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/#website\",\"url\":\"https:\\\/\\\/ranaghazzi.com\\\/\",\"name\":\"Rana Nasri Ghazzi\",\"description\":\"Turning Data into Decisions\",\"publisher\":{\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/#\\\/schema\\\/person\\\/d8ee34f53cb0df9faaf816fb5363a4cc\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ranaghazzi.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/#\\\/schema\\\/person\\\/d8ee34f53cb0df9faaf816fb5363a4cc\",\"name\":\"Rana Ghazzi\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/logo.png\",\"url\":\"https:\\\/\\\/ranaghazzi.com\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/logo.png\",\"contentUrl\":\"https:\\\/\\\/ranaghazzi.com\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/logo.png\",\"width\":1024,\"height\":1024,\"caption\":\"Rana Ghazzi\"},\"logo\":{\"@id\":\"https:\\\/\\\/ranaghazzi.com\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/logo.png\"},\"url\":\"https:\\\/\\\/ranaghazzi.com\\\/?author=2\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks - Rana Nasri Ghazzi","description":"Explore Rana Ghazzi's data analytics portfolio \u2014 dashboards, visualizations, and insights built with Tableau, Power BI & Python.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ranaghazzi.com\/?p=5165","og_locale":"en_US","og_type":"article","og_title":"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks - Rana Nasri Ghazzi","og_description":"Explore Rana Ghazzi's data analytics portfolio \u2014 dashboards, visualizations, and insights built with Tableau, Power BI & Python.","og_url":"https:\/\/ranaghazzi.com\/?p=5165","og_site_name":"Rana Nasri Ghazzi","article_published_time":"2026-05-30T23:24:31+00:00","article_modified_time":"2026-05-30T23:25:30+00:00","author":"Rana Ghazzi","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Rana Ghazzi","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ranaghazzi.com\/?p=5165#article","isPartOf":{"@id":"https:\/\/ranaghazzi.com\/?p=5165"},"author":{"name":"Rana Ghazzi","@id":"https:\/\/ranaghazzi.com\/#\/schema\/person\/d8ee34f53cb0df9faaf816fb5363a4cc"},"headline":"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks","datePublished":"2026-05-30T23:24:31+00:00","dateModified":"2026-05-30T23:25:30+00:00","mainEntityOfPage":{"@id":"https:\/\/ranaghazzi.com\/?p=5165"},"wordCount":772,"commentCount":0,"publisher":{"@id":"https:\/\/ranaghazzi.com\/#\/schema\/person\/d8ee34f53cb0df9faaf816fb5363a4cc"},"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ranaghazzi.com\/?p=5165#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ranaghazzi.com\/?p=5165","url":"https:\/\/ranaghazzi.com\/?p=5165","name":"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks - Rana Nasri Ghazzi","isPartOf":{"@id":"https:\/\/ranaghazzi.com\/#website"},"datePublished":"2026-05-30T23:24:31+00:00","dateModified":"2026-05-30T23:25:30+00:00","description":"Explore Rana Ghazzi's data analytics portfolio \u2014 dashboards, visualizations, and insights built with Tableau, Power BI & Python.","breadcrumb":{"@id":"https:\/\/ranaghazzi.com\/?p=5165#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ranaghazzi.com\/?p=5165"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ranaghazzi.com\/?p=5165#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ranaghazzi.com\/"},{"@type":"ListItem","position":2,"name":"Auto Loader: The Smartest Way to Ingest Streaming Data in Databricks"}]},{"@type":"WebSite","@id":"https:\/\/ranaghazzi.com\/#website","url":"https:\/\/ranaghazzi.com\/","name":"Rana Nasri Ghazzi","description":"Turning Data into Decisions","publisher":{"@id":"https:\/\/ranaghazzi.com\/#\/schema\/person\/d8ee34f53cb0df9faaf816fb5363a4cc"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ranaghazzi.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/ranaghazzi.com\/#\/schema\/person\/d8ee34f53cb0df9faaf816fb5363a4cc","name":"Rana Ghazzi","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ranaghazzi.com\/wp-content\/uploads\/2025\/11\/logo.png","url":"https:\/\/ranaghazzi.com\/wp-content\/uploads\/2025\/11\/logo.png","contentUrl":"https:\/\/ranaghazzi.com\/wp-content\/uploads\/2025\/11\/logo.png","width":1024,"height":1024,"caption":"Rana Ghazzi"},"logo":{"@id":"https:\/\/ranaghazzi.com\/wp-content\/uploads\/2025\/11\/logo.png"},"url":"https:\/\/ranaghazzi.com\/?author=2"}]}},"_links":{"self":[{"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=\/wp\/v2\/posts\/5165","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5165"}],"version-history":[{"count":7,"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=\/wp\/v2\/posts\/5165\/revisions"}],"predecessor-version":[{"id":5175,"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=\/wp\/v2\/posts\/5165\/revisions\/5175"}],"wp:attachment":[{"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5165"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5165"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ranaghazzi.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5165"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}