Harbinger Explorer

Back to Knowledge Hub
Engineering

ETL vs ELT: Which Pipeline Fits Your Data Stack?

7 min read·Tags: etl, elt, data pipelines, dbt, data warehouse, snowflake, data engineering

Every data team eventually hits this decision: do you transform data before loading it into your warehouse, or after? The answer shapes your architecture, your costs, and how fast your analysts get answers. ETL and ELT sound nearly identical, but the difference between them determines whether your pipeline is a bottleneck or a backbone.

TL;DR — ETL vs ELT at a Glance

DimensionETLELT
Transform locationDedicated middleware / staging serverInside the target warehouse
Compute usedExternal (Informatica, Talend, SSIS)Warehouse engine (Snowflake, BigQuery, Databricks)
Raw data preserved?Usually not — only transformed output landsYes — raw data lands first, then transforms run
LatencyHigher (transform before load)Lower (load first, transform on-demand)
ScalabilityLimited by middleware server capacityScales with warehouse compute
Schema flexibilitySchema-on-write (rigid)Schema-on-read (flexible)
Cost modelMiddleware licensing + server infraWarehouse compute costs
Best forLegacy systems, strict compliance, small volumesCloud warehouses, large volumes, iterative analytics
Typical toolsInformatica, SSIS, Talend, DataStagedbt, Spark SQL, Snowflake tasks, BigQuery scheduled queries

What Is ETL?

ETL — Extract, Transform, Load — is the traditional approach. Data is pulled from source systems, transformed on a dedicated middleware server, and only the cleaned, structured result is loaded into the target database.

This was the standard for decades because on-premise data warehouses (Teradata, Oracle, Netezza) had expensive, limited compute. Running heavy transformations inside the warehouse was wasteful. It made sense to do the heavy lifting elsewhere.

A Typical ETL Flow

Source DBs → Extract → Staging Server (transform) → Load → Data Warehouse

The transformation step handles deduplication, type casting, joins, business logic, and aggregation — all before data touches the warehouse.

ETL Example — SSIS-Style Pseudoflow (SQL Server)

-- SQL Server / SSIS: Transform in staging before loading to warehouse
-- Step 1: Extract into staging table
SELECT 
    order_id,
    customer_id,
    order_date,
    total_amount,
    status
INTO staging.raw_orders
FROM OPENROWSET('SQLNCLI', 'Server=source_erp;Database=sales;',
    'SELECT order_id, customer_id, order_date, total_amount, status FROM orders WHERE order_date >= DATEADD(day, -1, GETDATE())')

-- Step 2: Transform in staging (clean + enrich before loading)
SELECT 
    o.order_id,
    o.customer_id,
    c.customer_segment,
    CAST(o.order_date AS DATE) AS order_date,
    ROUND(o.total_amount, 2) AS total_amount,
    CASE 
        WHEN o.status = 'C' THEN 'Completed'
        WHEN o.status = 'P' THEN 'Pending'
        WHEN o.status = 'X' THEN 'Cancelled'
        ELSE 'Unknown'
    END AS order_status,
    GETDATE() AS loaded_at
INTO staging.transformed_orders
FROM staging.raw_orders o
JOIN staging.customer_dim c ON o.customer_id = c.customer_id

-- Step 3: Load only the clean result into the warehouse
INSERT INTO warehouse.fact_orders
SELECT * FROM staging.transformed_orders

-- Step 4: Clean up staging
DROP TABLE staging.raw_orders
DROP TABLE staging.transformed_orders

The raw data is gone after loading. If you discover a bug in the transform logic three months later, you're re-extracting from source — if the data still exists there.

What Is ELT?

ELT — Extract, Load, Transform — flips the order. Raw data lands in the warehouse first, then transformations run inside it using the warehouse's own compute engine.

This approach became viable (and dominant) when cloud data warehouses like Snowflake, BigQuery, and Databricks made compute elastic and cheap. Why maintain a separate transformation server when your warehouse can spin up a cluster in seconds?

A Typical ELT Flow

Source DBs → Extract → Load (raw) → Data Warehouse → Transform (inside warehouse)

ELT Example — dbt + Snowflake (Snowflake SQL)

-- Snowflake SQL: Raw data already loaded into raw.orders by an ingestion tool (Fivetran, Airbyte, etc.)

-- dbt model: models/staging/stg_orders.sql
-- Transforms run inside Snowflake using warehouse compute

WITH source AS (
    SELECT * FROM {{ source('raw', 'orders') }}
),

cleaned AS (
    SELECT
        order_id,
        customer_id,
        order_date::DATE AS order_date,
        ROUND(total_amount, 2) AS total_amount,
        CASE 
            WHEN status = 'C' THEN 'Completed'
            WHEN status = 'P' THEN 'Pending'
            WHEN status = 'X' THEN 'Cancelled'
            ELSE 'Unknown'
        END AS order_status,
        CURRENT_TIMESTAMP() AS transformed_at
    FROM source
    WHERE order_date >= DATEADD(day, -1, CURRENT_DATE())
)

SELECT * FROM cleaned

The raw data stays in raw.orders. The transformed view sits alongside it. If your business logic changes, you rerun the dbt model — no re-extraction needed.

When ETL Still Makes Sense

ETL isn't dead. It's the right choice in specific scenarios:

Compliance and data minimization. If regulations require you to not store raw PII in your warehouse (GDPR Art. 5, HIPAA), transforming before load — masking, pseudonymizing, or dropping sensitive fields — is a legitimate pattern. You can't redact what's already landed.

Legacy on-premise warehouses. If you're running Teradata or Oracle on fixed hardware, compute is genuinely expensive. Pre-transforming data to reduce warehouse load is rational economics, not nostalgia.

Small, stable datasets. If you're loading a 10-row config table from a REST API once a day, the ETL vs ELT distinction is academic. Use whatever's simpler.

Bandwidth-constrained environments. When loading data over a slow network, transforming first to reduce volume makes pragmatic sense.

When ELT Is the Better Choice

For most modern cloud-native teams, ELT is the default for good reasons:

Raw data preservation. The single biggest advantage. When (not if) your business logic changes, you don't need to re-extract from source. The raw data is already in your warehouse. This alone saves teams weeks of rework per year.

Warehouse-native compute scales. Snowflake, BigQuery, and Databricks auto-scale. You're not bottlenecked by a middleware server's RAM and CPU. A transformation that takes 45 minutes on an ETL server might take 90 seconds on Snowflake with an XL warehouse.

Analyst self-service. When raw data lives in the warehouse, analysts can explore it directly. They can build their own staging models in dbt without waiting for the ETL team to add fields to the extraction.

Iterative development. Changing a dbt model and rerunning it takes minutes. Changing an Informatica mapping, testing it in a staging environment, and deploying to production takes days.

Cost Comparison

Cost FactorETLELT
Middleware licensing$10k-$500k+/year (Informatica, DataStage)$0 (dbt Core is free)
Server infrastructureDedicated staging serversNone — uses warehouse compute
Warehouse computeLower (only querying clean data)Higher (transforms run here too)
Engineering timeHigh (complex middleware management)Lower (SQL-based transforms)
Re-processing after logic changesHigh (re-extract + re-transform)Low (just rerun transforms)
Typical total for mid-size teamHigher (licensing is the killer)Lower (compute is elastic and metered)

Last verified: March 2026

The Hybrid Reality

In practice, most teams run a hybrid. Here's what that looks like:

ELT for the main pipeline: Ingest raw data → warehouse → dbt transforms. This covers 80-90% of use cases.

ETL for edge cases: PII masking before load, data from legacy sources that need format conversion, or high-volume streams that benefit from pre-aggregation.

The modern stack typically looks like:

  • Extract + Load: Fivetran, Airbyte, or custom Python scripts
  • Transform: dbt (running inside the warehouse)
  • Orchestrate: Airflow, Dagster, or Prefect
  • ETL exceptions: Spark jobs or Python scripts for PII handling

Common Mistakes

Mistake 1: "We do ELT" but actually loading pre-transformed data. If your ingestion tool is applying joins, filters, or aggregations before landing data in the warehouse, you're doing ETL. Know what you're actually running.

Mistake 2: Loading everything raw without governance. ELT doesn't mean "dump everything and figure it out later." You still need a schema registry, data contracts, or at minimum a documented raw layer convention. Otherwise your raw zone becomes a data swamp.

Mistake 3: Running heavy transforms during peak query hours. Just because transforms run in-warehouse doesn't mean they should run when analysts are querying. Schedule transform jobs during off-peak windows or use isolated compute (Snowflake warehouse scheduling, BigQuery reservations).

Mistake 4: Choosing ETL because "that's what we know." Sunk-cost reasoning. If your team has Informatica expertise but you're moving to Snowflake, the migration to dbt-based ELT pays for itself within a year in licensing savings alone.

Exploring Your Data Before Committing to an Architecture

Before deciding how to transform your data, it helps to actually look at it. If you're evaluating new data sources — APIs, CSVs, or uploaded datasets — Harbinger Explorer lets you query data directly in your browser using DuckDB WASM, no warehouse setup required. You can explore schemas, test transformations with natural-language queries, and understand your data shape before building pipeline infrastructure around it.

Make the Call Based on Your Stack

If you're running a modern cloud warehouse, start with ELT. Load raw, transform in-warehouse with dbt or Spark SQL, and keep your raw layer intact. Add ETL exceptions only where compliance or legacy constraints require it. The architecture that preserves raw data and pushes transforms to elastic compute will outperform middleware-dependent pipelines in cost, speed, and flexibility.

Your next step: audit your current pipeline. If you're running transformations outside the warehouse that could run inside it, you've found your first migration candidate.


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...