Back to Knowledge Hub
Engineering

Delta Live Tables vs Classic ETL: Which Fits Your Pipeline?

8 min read·Tags: delta live tables, DLT, databricks, etl, declarative etl, streaming etl, data quality, pyspark

You've built classic ETL pipelines. PySpark jobs, Airflow DAGs, explicit MERGE statements. It works. Then someone from your team mentions Delta Live Tables and you wonder whether it's genuinely better or just new syntax over the same complexity. The answer: DLT solves specific problems very well and introduces different problems of its own. Here's how to evaluate the delta live tables vs classic etl tradeoff without the hype.

What Each Approach Actually Is

Classic ETL is explicit pipeline code: you write PySpark (or SQL) transformations, wire them together with an orchestrator (Airflow, Prefect, Databricks Workflows), manage dependencies manually, and implement your own error handling and quality checks.

Delta Live Tables (DLT) is Databricks' declarative ETL framework. You define what tables should contain, not how to build them. DLT handles dependency resolution, pipeline execution ordering, quality enforcement, and retry logic. It's opinionated by design.

The fundamental difference: classic ETL is imperative (you control execution), DLT is declarative (you declare expectations and DLT handles execution).

Feature Comparison

DimensionDelta Live TablesClassic ETL (PySpark + Airflow)
ParadigmDeclarative — define what, not howImperative — define how, step by step
Dependency resolutionAutomatic (DLT builds the DAG)Manual (you wire jobs/tasks)
Data quality checksBuilt-in expectations (warn/drop/fail)DIY (assert statements, custom checks)
Streaming supportNative (batch and streaming in one pipeline)Structured Streaming (separate setup)
Schema evolutionAutomatic schema evolutionManual handling required
Error handlingBuilt-in retry, quarantine tablesCustom error handling
ObservabilityPipeline UI with lineage graphDepends on orchestrator + logging setup
DebuggingHarder — less control over execution orderEasier — run individual jobs in isolation
TestingLimited (ndf unit test framework early stage)Standard pytest / databricks-connect
FlexibilityConstrained — DLT API is the boundaryFull — write any valid Spark/Python code
Multi-platformDatabricks onlyPlatform-agnostic (can run on any Spark)
Learning curveLow for simple pipelinesHigh (Spark + Airflow + Delta mastery)

DLT Pricing

Last verified: March 2026. DLT adds a surcharge on top of standard Databricks DBU costs. Verify current figures at databricks.com/product/pricing.

Pipeline TypeDLT SurchargeWhen to Use
Core (formerly Classic)0.2 DBU/hour additionalDevelopment, simple batch pipelines
Pro0.25 DBU/hour additionalChange Data Capture (CDC), advanced streaming
Advanced0.36 DBU/hour additionalEnhanced autoscaling, SLA guarantees

DLT costs are additive to your underlying cluster compute. A medium-size cluster running DLT Pro pipelines can run meaningfully more expensive than equivalent classic jobs — model this before committing, especially for high-frequency streaming pipelines.

[PRICING-CHECK] DLT pricing tiers have been restructured since the Lakeflow rebrand — verify current tier names and surcharges at databricks.com/product/pricing.

DLT in Practice: The Expectations Syntax

The most compelling feature of DLT is data quality expectations. Instead of writing custom assertion code, you declare quality rules that DLT enforces at runtime.

# Python — Delta Live Tables with expectations
# Databricks Runtime with Delta Live Tables

import dlt
from pyspark.sql import functions as F

# Bronze layer: raw ingestion (DLT handles scheduling and incrementalism)
@dlt.table(
    name="bronze_orders",
    comment="Raw order events from the source API — no transformations applied",
    table_properties={"quality": "bronze"}
)
def bronze_orders():
    return (
        spark.readStream
        .format("cloudFiles")          # Auto Loader — handles new file detection
        .option("cloudFiles.format", "json")
        .option("cloudFiles.schemaLocation", "/mnt/datalake/schemas/orders")
        .load("/mnt/datalake/landing/orders/")
    )


# Silver layer: cleaned orders with quality expectations
@dlt.table(
    name="silver_orders",
    comment="Cleaned and validated orders — enforced quality contract",
    table_properties={"quality": "silver"}
)
# expect: record the violation, but keep the row (for monitoring)
@dlt.expect("positive_amount", "amount > 0")

# expect_or_fail: halt the pipeline if ANY row violates this rule
@dlt.expect_or_fail("non_null_order_id", "order_id IS NOT NULL")

# expect_or_drop: silently remove rows that violate this rule
@dlt.expect_or_drop("valid_status", "status IN ('pending', 'shipped', 'delivered', 'cancelled')")

def silver_orders():
    return (
        dlt.read_stream("bronze_orders")
        .select(
            F.col("order_id").cast("string"),
            F.col("customer_id").cast("string"),
            F.col("amount").cast("double"),
            F.col("status").cast("string"),
            F.to_timestamp(F.col("order_date"), "yyyy-MM-dd'T'HH:mm:ss").alias("order_date")
        )
    )


# Gold layer: daily revenue aggregation (batch, reads from Silver)
@dlt.table(
    name="gold_daily_revenue",
    comment="Daily revenue aggregated from silver_orders — rebuilt daily"
)
def gold_daily_revenue():
    return (
        dlt.read("silver_orders")
        .groupBy(F.date_trunc("day", F.col("order_date")).alias("date"))
        .agg(
            F.sum("amount").alias("total_revenue"),
            F.count("*").alias("order_count")
        )
        .orderBy("date")
    )

The three expectation modes are the core DLT differentiator:

  • @dlt.expect — log the violation in the pipeline event log, keep the row
  • @dlt.expect_or_fail — stop the pipeline on any violation (good for critical keys)
  • @dlt.expect_or_drop — silently quarantine invalid rows (good for optional fields)

In classic ETL, you'd implement all three modes as custom code — typically 30-50 lines of assertion logic, custom exception handling, and logging setup. DLT handles this in one decorator.

Honest Trade-offs

Where DLT Genuinely Wins

Data quality enforcement is legitimately better. The expectations system covers the 80% case with zero boilerplate. The pipeline event log captures every expectation violation with row-level detail — this is something most classic ETL pipelines only have if someone invested significant engineering effort.

Streaming + batch in one framework. DLT abstracts whether a table is streaming or batch — you can switch a table from batch to streaming by changing dlt.read to dlt.read_stream without restructuring the pipeline. Classic ETL keeps these as fundamentally different code paths.

Observability out of the box. The DLT pipeline graph UI shows data flow, lineage, and quality metrics without any setup. Classic pipelines require assembling this from Airflow/Databricks Workflows logs, custom dashboards, and Great Expectations or similar.

Where Classic ETL Wins

Debugging. DLT pipelines are harder to debug in isolation. You can't easily run a single table definition outside the pipeline context. In classic ETL, you run the Spark job directly in a notebook and inspect intermediate DataFrames.

Testing. Unit testing DLT code is an active pain point. The DLT unit testing framework is still evolving. Classic PySpark code is testable with standard pytest and databricks-connect.

Flexibility. DLT is constrained to the DLT API surface. If you need custom checkpoint logic, complex conditional branching, or integration with non-Databricks systems, you hit the framework boundaries fast. Classic ETL has no such ceiling.

Portability. DLT is Databricks-only. Classic PySpark runs on any Spark cluster — EMR, GCP Dataproc, a self-hosted cluster. If cross-cloud portability matters, DLT is a lock-in risk.

When to Choose Each

Choose DLT when:

  • You're building streaming pipelines on Databricks and want batch/streaming unified
  • Data quality enforcement is a core requirement and you want it without DIY boilerplate
  • The pipeline follows a clear Bronze → Silver → Gold pattern with defined expectations
  • Your team is Databricks-focused and operational simplicity matters more than flexibility
  • You're comfortable with the Databricks cost premium

Choose Classic ETL when:

  • You need platform portability (may leave Databricks or run on multi-cloud)
  • Debugging and unit testing in isolation are priorities
  • The pipeline logic is complex enough to require full programmatic control
  • You have existing Airflow infrastructure and team expertise
  • The pipeline involves non-Databricks systems or custom checkpointing

The Gold Layer and What Comes After

Whether you use DLT or classic ETL, the Gold-layer tables it produces need to be explored. That's where the pipeline ends and analysis begins — and for teams doing ad-hoc exploration without spinning up a full BI tool, Harbinger Explorer lets you query those tables directly in the browser using DuckDB WASM with natural language query support.

Conclusion

DLT is not a universal upgrade over classic ETL — it's a different set of tradeoffs. If your team is Databricks-native, building streaming lakehouses with data quality requirements, and willing to accept reduced debugging flexibility, DLT is genuinely the right choice. If you need portability, testability, or pipeline complexity that exceeds what the DLT API handles, classic ETL with Airflow remains the more practical option.

The expectation syntax and streaming unification are DLT's real arguments. Evaluate them against your actual pipeline needs, not the abstract promise of "less code."


Continue Reading


[VERIFY] DLT tier names — Databricks rebranded DLT tiers as part of the Lakeflow announcement; verify current tier names (Core/Pro/Advanced) at databricks.com/product/pricing [PRICING-CHECK] DLT DBU surcharges per tier — verify current rates including any changes from the Lakeflow rebrand [VERIFY] dlt.read_stream vs dlt.read API — confirm current DLT Python API surface at docs.databricks.com/delta-live-tables


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial