CI/CD on Databricks
Databricks Asset Bundles (DABs)
CI/CD on Databricks centers on automating the testing and deployment of data pipelines, notebooks, and jobs across environments using Databricks Asset Bundles (DABs) as the primary infrastructure-as-code tool, with everything version-controlled in Git.
The CI side covers linting, unit testing with pytest or Nutter, and bundle validation on every code push, while the CD side automates promotion from dev to staging to prod using platform tools like GitHub Actions or Azure DevOps.
Credentials are managed through service principals and Databricks Secrets — never hardcoded — and Unity Catalog provides clean environment separation at the data layer.
Key best practices include:
idempotent jobs,
separate workspaces per environment,
all deploys running through the pipeline (never from a local machine),
And post-deploy observability via job run history, Delta table history, and external monitoring tools.
The recommended modern stack combines GitHub Actions, DABs, Unity Catalog, and MLflow for end-to-end automated delivery of Databricks workloads.
The recommended modern approach (replaces legacy databricks-cli deploys).
- Version-controlled as YAML in your repo
- Define everything in databricks.yml
- Supports multi-environment targets (dev, staging, prod)
- Deploy with: databricks bundle deploy –target prod
- Run with: databricks bundle run .
- Supports: Jobs, DLT pipelines, notebooks, Python wheels, Model Serving endpoints
