DuckDB vs SQLite: Which Embedded Database Fits Your Workflow?
DuckDB vs SQLite: Which Embedded Database Fits Your Data Workflow?
You've got data in CSV files, Parquet exports, or local tables — and you need to query it without spinning up a server. Both DuckDB and SQLite promise exactly that: embedded, zero-config SQL databases. But they're built for fundamentally different workloads, and picking the wrong one costs you hours of frustration. Here's how to decide.
TL;DR — The 30-Second Version
SQLite is the world's most deployed database — optimized for transactional workloads, embedded apps, and row-level operations. DuckDB is an analytical engine designed for column-oriented queries, aggregations, and scanning large datasets. They're not competitors — they solve different problems.
If you're building an app that reads/writes individual records: SQLite. If you're analyzing data, running aggregations, or exploring Parquet files: DuckDB.
Feature-by-Feature Comparison
| Dimension | DuckDB | SQLite |
|---|---|---|
| Primary Workload | OLAP (analytical queries) | OLTP (transactional operations) |
| Storage Format | Columnar | Row-oriented |
| Concurrency Model | Single-writer, multiple readers (MVCC) | Single-writer, multiple readers (WAL mode) |
| File Format | .duckdb (or in-memory) | .sqlite / .db |
| Direct Parquet/CSV Queries | Yes — native, zero-copy reads | No — requires import or extension |
| SQL Dialect | PostgreSQL-compatible | SQLite SQL (custom dialect) |
| Window Functions | Full support, optimized | Supported since 3.25.0 |
| Nested Types (LIST, STRUCT, MAP) | Native support | JSON extension only |
| Max Database Size | Limited by disk | 281 TB theoretical max |
| Language Bindings | Python, R, Java, Node.js, Rust, Go, C/C++ | Nearly every language ever made |
| WASM Support | Yes (duckdb-wasm) | Yes (sql.js) |
| Maturity | Stable since 2023 (v0.8+), v1.0 released 2024 | 24+ years, battle-tested |
| License | MIT | Public domain |
Where DuckDB Wins: Analytical Queries
DuckDB was purpose-built for the analytical workload that data engineers and analysts deal with daily. Its columnar storage means aggregations, GROUP BYs, and scans over millions of rows run dramatically faster than SQLite.
Example: Aggregating Sales Data
-- DuckDB (PostgreSQL-compatible SQL)
-- Query a Parquet file directly — no import step needed
SELECT
region,
product_category,
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers,
ROUND(AVG(revenue), 2) AS avg_order_value
FROM read_parquet('sales_2025/*.parquet')
WHERE order_date >= '2025-01-01'
GROUP BY region, product_category, month
ORDER BY total_revenue DESC;
This query reads directly from Parquet files on disk — no ETL, no import, no schema definition. Try that with SQLite and you'd need to first load everything into a table, losing time and doubling your storage.
Why It's Faster for Analytics
DuckDB uses vectorized execution — processing data in batches of 2,048 values at a time rather than row-by-row. Combined with columnar storage (only reading the columns your query touches), a 10-million-row aggregation that takes 45 seconds in SQLite often finishes in under 2 seconds in DuckDB.
Where SQLite Wins: Transactional Workloads and Embeddability
SQLite isn't trying to be an analytics engine. It's the most deployed database on the planet — running inside every iPhone, Android device, web browser, and countless desktop applications. Its strength is reliable, fast, single-record operations.
Example: Application State Management
-- SQLite SQL
-- Classic transactional pattern: insert or update a user session
INSERT INTO user_sessions (user_id, session_token, last_active, ip_address)
VALUES ('usr_42', 'tok_abc123', datetime('now'), '192.168.1.50')
ON CONFLICT(user_id) DO UPDATE SET
session_token = excluded.session_token,
last_active = excluded.last_active,
ip_address = excluded.ip_address;
-- Fast point lookup by primary key
SELECT * FROM user_sessions WHERE user_id = 'usr_42';
For this kind of workload — single-row inserts, updates, and lookups by key — SQLite is extremely fast. Its row-oriented storage means reading a complete record is one sequential disk read, while DuckDB would need to reconstruct the row from multiple column segments.
Why SQLite Dominates Embedded Apps
- Zero configuration: Copy one file, you have a database
- Single-file storage: The entire database is one portable file
- Exceptional reliability: Used in aviation software, banking, and operating systems
- Tiny footprint: ~600 KB library size
- 24 years of battle-testing: Edge cases you haven't even imagined are already handled
The Python Experience: Side by Side
For data engineers working in Python, the developer experience differs significantly.
DuckDB: DataFrame-Friendly
import duckdb
# Query Parquet files, CSVs, and even Pandas DataFrames seamlessly
con = duckdb.connect()
# Read directly from files
result = con.sql("""
SELECT
country,
COUNT(*) AS event_count,
AVG(magnitude) AS avg_magnitude
FROM read_csv_auto('earthquake_data.csv')
WHERE magnitude > 3.0
GROUP BY country
ORDER BY event_count DESC
LIMIT 20
""").fetchdf() # Returns a Pandas DataFrame directly
print(result)
# You can also query existing DataFrames
import pandas as pd
df = pd.read_csv('sensors.csv')
# DuckDB auto-detects the DataFrame variable
high_readings = con.sql("""
SELECT sensor_id, MAX(reading) as peak
FROM df
WHERE reading > 100
GROUP BY sensor_id
""").fetchdf()
SQLite: Simple and Reliable
import sqlite3
con = sqlite3.connect('app.db')
cur = con.cursor()
# Standard CRUD operations
cur.execute("""
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_type TEXT NOT NULL,
payload TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
cur.execute(
"INSERT INTO events (event_type, payload) VALUES (?, ?)",
('user_login', '{"user_id": 42}')
)
con.commit()
# Fetch recent events
cur.execute("SELECT * FROM events ORDER BY created_at DESC LIMIT 10")
rows = cur.fetchall()
The difference is clear: DuckDB is built for the analytical workflow (files in, DataFrames out), while SQLite excels at structured application data management.
When to Choose DuckDB
- Ad-hoc data exploration: You have CSVs, Parquet files, or JSON dumps and need to query them now
- Local analytics: Running aggregations, window functions, or complex joins on datasets from thousands to hundreds of millions of rows
- Replacing Pandas for SQL workflows: You think in SQL and want DataFrame-compatible output
- Prototyping data pipelines: Test transformations locally before deploying to Spark or a warehouse
- WASM analytics: Running analytical queries in the browser (dashboards, data tools)
When to Choose SQLite
- Application backends: Mobile apps, desktop software, IoT devices, CLI tools
- Configuration and state storage: App settings, caches, session stores
- Low-volume transactional data: User records, logs, queues — where individual record access matters
- Maximum portability: You need a database that works literally everywhere
- Production systems requiring proven stability: When "battle-tested for 24 years" matters more than analytical speed
Common Mistakes and Misconceptions
"DuckDB replaces SQLite" — No. They target different workloads. Using DuckDB for a mobile app's local storage is like using a forklift to carry groceries. Using SQLite for analytical queries over 50 million rows is like carrying groceries one item at a time.
"SQLite can't handle big data" — SQLite handles large databases fine for transactional workloads. The issue is analytical query performance, not capacity.
"DuckDB is too new for production" — DuckDB hit v1.0 in June 2024 and has been stable for production analytical workloads since 0.8+. It's used in production at numerous companies. That said, SQLite's 24-year track record is hard to match for reliability requirements.
"I should just use PostgreSQL" — If you need a server-based database, sure. But both DuckDB and SQLite shine precisely because they're embedded — no server, no setup, no DevOps overhead.
Can You Use Both?
Absolutely — and many teams do. A practical pattern:
- SQLite manages your application's transactional data (user records, configs, job metadata)
- DuckDB handles analytical queries over exported data, log files, or Parquet snapshots
- DuckDB can even attach and query SQLite databases directly:
-- DuckDB SQL
-- Attach a SQLite database and query it with DuckDB's analytical engine
INSTALL sqlite;
LOAD sqlite;
ATTACH 'app.db' AS app_data (TYPE SQLITE);
-- Now run analytical queries over your SQLite data using DuckDB's engine
SELECT
event_type,
COUNT(*) AS occurrences,
DATE_TRUNC('hour', created_at) AS hour
FROM app_data.events
GROUP BY event_type, hour
ORDER BY occurrences DESC;
This is genuinely powerful — you get SQLite's reliability for writes and DuckDB's speed for reads.
Running Analytical Queries in the Browser
Both databases support WebAssembly, but for different use cases. If you're building browser-based data exploration — querying uploaded CSVs, running aggregations over Parquet files, or letting users ask questions about their data in natural language — DuckDB WASM is the natural fit. Tools like Harbinger Explorer use exactly this approach: DuckDB WASM runs analytical SQL directly in your browser tab, with no server roundtrip, so your data never leaves your machine.
The Bottom Line
DuckDB and SQLite aren't competing — they're complementary. SQLite is the best embedded transactional database ever built. DuckDB is becoming the best embedded analytical database available. Pick based on your workload, not hype.
Your next step: If you're a data engineer who hasn't tried DuckDB yet, install it (pip install duckdb) and point it at your messiest CSV. Run one GROUP BY query. You'll immediately understand why it exists.
Continue Reading
Continue Reading
Data Lakehouse Architecture Explained
How data lakehouse architecture works, when to use it over a warehouse or lake, and the common pitfalls that trip up data engineering teams.
dbt vs Spark SQL: How to Choose
dbt or Spark SQL for your transformation layer? A side-by-side comparison of features, pricing, and use cases — with code examples for both and honest trade-offs for analytics engineers.
Delta Live Tables vs Classic ETL: Which Fits Your Pipeline?
DLT vs classic ETL compared honestly: declarative expectations, streaming, debugging, testing, and pricing. Includes DLT code example with expectations syntax.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial