Harbinger Explorer

Back to Knowledge Hub
Engineering

DuckDB vs SQLite: Which Embedded Database Fits Your Workflow?

7 min read·Tags: duckdb, sqlite, embedded-database, olap, oltp, python, sql, data-engineering

DuckDB vs SQLite: Which Embedded Database Fits Your Data Workflow?

You've got data in CSV files, Parquet exports, or local tables — and you need to query it without spinning up a server. Both DuckDB and SQLite promise exactly that: embedded, zero-config SQL databases. But they're built for fundamentally different workloads, and picking the wrong one costs you hours of frustration. Here's how to decide.

TL;DR — The 30-Second Version

SQLite is the world's most deployed database — optimized for transactional workloads, embedded apps, and row-level operations. DuckDB is an analytical engine designed for column-oriented queries, aggregations, and scanning large datasets. They're not competitors — they solve different problems.

If you're building an app that reads/writes individual records: SQLite. If you're analyzing data, running aggregations, or exploring Parquet files: DuckDB.

Feature-by-Feature Comparison

DimensionDuckDBSQLite
Primary WorkloadOLAP (analytical queries)OLTP (transactional operations)
Storage FormatColumnarRow-oriented
Concurrency ModelSingle-writer, multiple readers (MVCC)Single-writer, multiple readers (WAL mode)
File Format.duckdb (or in-memory).sqlite / .db
Direct Parquet/CSV QueriesYes — native, zero-copy readsNo — requires import or extension
SQL DialectPostgreSQL-compatibleSQLite SQL (custom dialect)
Window FunctionsFull support, optimizedSupported since 3.25.0
Nested Types (LIST, STRUCT, MAP)Native supportJSON extension only
Max Database SizeLimited by disk281 TB theoretical max
Language BindingsPython, R, Java, Node.js, Rust, Go, C/C++Nearly every language ever made
WASM SupportYes (duckdb-wasm)Yes (sql.js)
MaturityStable since 2023 (v0.8+), v1.0 released 202424+ years, battle-tested
LicenseMITPublic domain

Where DuckDB Wins: Analytical Queries

DuckDB was purpose-built for the analytical workload that data engineers and analysts deal with daily. Its columnar storage means aggregations, GROUP BYs, and scans over millions of rows run dramatically faster than SQLite.

Example: Aggregating Sales Data

-- DuckDB (PostgreSQL-compatible SQL)
-- Query a Parquet file directly — no import step needed
SELECT
    region,
    product_category,
    DATE_TRUNC('month', order_date) AS month,
    SUM(revenue) AS total_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers,
    ROUND(AVG(revenue), 2) AS avg_order_value
FROM read_parquet('sales_2025/*.parquet')
WHERE order_date >= '2025-01-01'
GROUP BY region, product_category, month
ORDER BY total_revenue DESC;

This query reads directly from Parquet files on disk — no ETL, no import, no schema definition. Try that with SQLite and you'd need to first load everything into a table, losing time and doubling your storage.

Why It's Faster for Analytics

DuckDB uses vectorized execution — processing data in batches of 2,048 values at a time rather than row-by-row. Combined with columnar storage (only reading the columns your query touches), a 10-million-row aggregation that takes 45 seconds in SQLite often finishes in under 2 seconds in DuckDB.

Where SQLite Wins: Transactional Workloads and Embeddability

SQLite isn't trying to be an analytics engine. It's the most deployed database on the planet — running inside every iPhone, Android device, web browser, and countless desktop applications. Its strength is reliable, fast, single-record operations.

Example: Application State Management

-- SQLite SQL
-- Classic transactional pattern: insert or update a user session
INSERT INTO user_sessions (user_id, session_token, last_active, ip_address)
VALUES ('usr_42', 'tok_abc123', datetime('now'), '192.168.1.50')
ON CONFLICT(user_id) DO UPDATE SET
    session_token = excluded.session_token,
    last_active = excluded.last_active,
    ip_address = excluded.ip_address;

-- Fast point lookup by primary key
SELECT * FROM user_sessions WHERE user_id = 'usr_42';

For this kind of workload — single-row inserts, updates, and lookups by key — SQLite is extremely fast. Its row-oriented storage means reading a complete record is one sequential disk read, while DuckDB would need to reconstruct the row from multiple column segments.

Why SQLite Dominates Embedded Apps

  • Zero configuration: Copy one file, you have a database
  • Single-file storage: The entire database is one portable file
  • Exceptional reliability: Used in aviation software, banking, and operating systems
  • Tiny footprint: ~600 KB library size
  • 24 years of battle-testing: Edge cases you haven't even imagined are already handled

The Python Experience: Side by Side

For data engineers working in Python, the developer experience differs significantly.

DuckDB: DataFrame-Friendly

import duckdb

# Query Parquet files, CSVs, and even Pandas DataFrames seamlessly
con = duckdb.connect()

# Read directly from files
result = con.sql("""
    SELECT
        country,
        COUNT(*) AS event_count,
        AVG(magnitude) AS avg_magnitude
    FROM read_csv_auto('earthquake_data.csv')
    WHERE magnitude > 3.0
    GROUP BY country
    ORDER BY event_count DESC
    LIMIT 20
""").fetchdf()  # Returns a Pandas DataFrame directly

print(result)

# You can also query existing DataFrames
import pandas as pd
df = pd.read_csv('sensors.csv')

# DuckDB auto-detects the DataFrame variable
high_readings = con.sql("""
    SELECT sensor_id, MAX(reading) as peak
    FROM df
    WHERE reading > 100
    GROUP BY sensor_id
""").fetchdf()

SQLite: Simple and Reliable

import sqlite3

con = sqlite3.connect('app.db')
cur = con.cursor()

# Standard CRUD operations
cur.execute("""
    CREATE TABLE IF NOT EXISTS events (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        event_type TEXT NOT NULL,
        payload TEXT,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    )
""")

cur.execute(
    "INSERT INTO events (event_type, payload) VALUES (?, ?)",
    ('user_login', '{"user_id": 42}')
)
con.commit()

# Fetch recent events
cur.execute("SELECT * FROM events ORDER BY created_at DESC LIMIT 10")
rows = cur.fetchall()

The difference is clear: DuckDB is built for the analytical workflow (files in, DataFrames out), while SQLite excels at structured application data management.

When to Choose DuckDB

  • Ad-hoc data exploration: You have CSVs, Parquet files, or JSON dumps and need to query them now
  • Local analytics: Running aggregations, window functions, or complex joins on datasets from thousands to hundreds of millions of rows
  • Replacing Pandas for SQL workflows: You think in SQL and want DataFrame-compatible output
  • Prototyping data pipelines: Test transformations locally before deploying to Spark or a warehouse
  • WASM analytics: Running analytical queries in the browser (dashboards, data tools)

When to Choose SQLite

  • Application backends: Mobile apps, desktop software, IoT devices, CLI tools
  • Configuration and state storage: App settings, caches, session stores
  • Low-volume transactional data: User records, logs, queues — where individual record access matters
  • Maximum portability: You need a database that works literally everywhere
  • Production systems requiring proven stability: When "battle-tested for 24 years" matters more than analytical speed

Common Mistakes and Misconceptions

"DuckDB replaces SQLite" — No. They target different workloads. Using DuckDB for a mobile app's local storage is like using a forklift to carry groceries. Using SQLite for analytical queries over 50 million rows is like carrying groceries one item at a time.

"SQLite can't handle big data" — SQLite handles large databases fine for transactional workloads. The issue is analytical query performance, not capacity.

"DuckDB is too new for production" — DuckDB hit v1.0 in June 2024 and has been stable for production analytical workloads since 0.8+. It's used in production at numerous companies. That said, SQLite's 24-year track record is hard to match for reliability requirements.

"I should just use PostgreSQL" — If you need a server-based database, sure. But both DuckDB and SQLite shine precisely because they're embedded — no server, no setup, no DevOps overhead.

Can You Use Both?

Absolutely — and many teams do. A practical pattern:

  1. SQLite manages your application's transactional data (user records, configs, job metadata)
  2. DuckDB handles analytical queries over exported data, log files, or Parquet snapshots
  3. DuckDB can even attach and query SQLite databases directly:
-- DuckDB SQL
-- Attach a SQLite database and query it with DuckDB's analytical engine
INSTALL sqlite;
LOAD sqlite;
ATTACH 'app.db' AS app_data (TYPE SQLITE);

-- Now run analytical queries over your SQLite data using DuckDB's engine
SELECT
    event_type,
    COUNT(*) AS occurrences,
    DATE_TRUNC('hour', created_at) AS hour
FROM app_data.events
GROUP BY event_type, hour
ORDER BY occurrences DESC;

This is genuinely powerful — you get SQLite's reliability for writes and DuckDB's speed for reads.

Running Analytical Queries in the Browser

Both databases support WebAssembly, but for different use cases. If you're building browser-based data exploration — querying uploaded CSVs, running aggregations over Parquet files, or letting users ask questions about their data in natural language — DuckDB WASM is the natural fit. Tools like Harbinger Explorer use exactly this approach: DuckDB WASM runs analytical SQL directly in your browser tab, with no server roundtrip, so your data never leaves your machine.

The Bottom Line

DuckDB and SQLite aren't competing — they're complementary. SQLite is the best embedded transactional database ever built. DuckDB is becoming the best embedded analytical database available. Pick based on your workload, not hype.

Your next step: If you're a data engineer who hasn't tried DuckDB yet, install it (pip install duckdb) and point it at your messiest CSV. Run one GROUP BY query. You'll immediately understand why it exists.

Continue Reading


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...