Excel to SQL: A Migration Guide for Business Analysts
You've been managing data in Excel for years — VLOOKUP, SUMIF, pivot tables, the works. It gets the job done, mostly. But then the dataset grows past 100k rows, three people are editing the same file, and suddenly your formulas are referencing cells that no longer exist. SQL solves these problems, and the learning curve is smaller than you think if you already know Excel. This is the practical guide to excel to sql migration that nobody bothered to write clearly.
Why This Migration Is Worth Doing
Before the mechanics: SQL isn't just "Excel but faster." It changes how you think about data.
Excel is cell-based. You think in terms of individual cells, ranges, and formulas that reference coordinates. SQL is set-based. You think in terms of operations on entire tables at once. Once that mental model clicks, you start solving data problems faster.
Concrete wins from switching:
- Datasets of any size (millions of rows are routine)
- No version conflicts — the database is the single source of truth
- Reproducible analysis — your query is the documentation
- Collaboration without file-locking nightmares
- Auditability — queries are text, they go in version control
Your Migration Roadmap
Loading diagram...
① Start with your actual workbook — don't try to "redesign" first. Get it working in SQL, then optimize.
② Audit your data structure — identify which worksheets are raw data tables vs. calculation sheets vs. dashboards. Only the raw data tables move to SQL directly.
③ Map concepts — use the reference table below to translate your mental model.
④ Load the data — export as CSV and import into a SQL database (PostgreSQL is the friendliest for beginners). Many BI tools and SQL environments accept CSV directly.
⑤ Rewrite your formulas — work through them one by one using the mapping guide.
⑥ Validate — run your SQL query and your Excel formula side by side on a known sample. Numbers must match before you trust the SQL.
⑦ Retire — once the SQL version is validated and running reliably, archive the spreadsheet.
The Excel → SQL Concept Map
This is the reference table you'll use constantly during migration. Bookmark it.
| Excel Concept | SQL Equivalent | Notes |
|---|---|---|
| Worksheet / Tab | Table | Each worksheet becomes a table |
| Column header | Column name | Use snake_case: order_date, not "Order Date" |
| Row | Row / Record | |
| Cell | Value / Field | |
| Filter (dropdown) | WHERE clause | WHERE region = 'North' |
| Sort (A→Z) | ORDER BY col ASC | ORDER BY col DESC for Z→A |
| VLOOKUP / XLOOKUP | JOIN (LEFT JOIN) | Match on a key column, not column position |
| SUMIF | SUM() with WHERE or GROUP BY | |
| COUNTIF | COUNT() with WHERE or GROUP BY | |
| AVERAGEIF | AVG() with WHERE or GROUP BY | |
| Pivot Table | GROUP BY with aggregate functions | |
| IF formula | CASE WHEN … THEN … ELSE … END | Can be nested |
| Nested IFs | Nested CASE WHEN | |
CONCATENATE / & | CONCAT() or || operator | Dialect varies |
| LEFT(text, n) | LEFT(col, n) | Works in most SQL dialects |
| MID(text, start, n) | SUBSTRING(col, start, n) | |
| RIGHT(text, n) | RIGHT(col, n) | |
| LEN(text) | LENGTH(col) or LEN(col) | PostgreSQL: LENGTH; SQL Server: LEN |
| TRIM(text) | TRIM(col) | |
| UPPER / LOWER | UPPER(col) / LOWER(col) | |
| TEXT(date, format) | TO_CHAR(col, 'format') | PostgreSQL; FORMAT() in SQL Server |
| DATEDIF | DATEDIFF() or DATE_DIFF() | Dialect varies significantly |
| Remove Duplicates | SELECT DISTINCT | |
| Named Range | CTE (WITH name AS (…)) | More readable than subqueries |
| Workbook-to-workbook links | JOIN across tables | Put both in the same database |
| IFERROR | COALESCE(expr, fallback) | Returns fallback if expr is NULL |
| RANK | RANK() OVER (ORDER BY col) | Window function |
| Array formula | Subquery or window function | Depends on what the formula does |
| Conditional formatting | Not in SQL — apply in BI layer |
SQL Code Examples
Example 1: Replacing SUMIF with GROUP BY
In Excel, you'd use =SUMIF(region_column, "North", sales_column) in each row of a summary table — one formula per region, repeated.
In SQL, you calculate all regions at once:
-- PostgreSQL
-- Replaces: multiple SUMIF formulas, one per region in a summary sheet
-- Shows: total sales, order count, and average order value per region
SELECT
region,
SUM(amount) AS total_sales,
COUNT(*) AS order_count,
ROUND(AVG(amount), 2) AS avg_order_value
FROM orders
WHERE order_date >= '2025-01-01'
AND order_date < '2026-01-01'
GROUP BY region
ORDER BY total_sales DESC;
This replaces what would have been a pivot table in Excel — and it runs the same way on 100 rows or 100 million rows.
Example 2: Replacing VLOOKUP with a JOIN
VLOOKUP is one of the most-used Excel formulas. It looks up a value in one table and returns a column from a matching row. In SQL, this is a JOIN.
-- PostgreSQL
-- Replaces: =VLOOKUP(A2, customers_sheet, 3, FALSE) to get customer_name for each order
-- Uses LEFT JOIN to keep all orders even if no matching customer exists
-- (equivalent to VLOOKUP returning #N/A — here it returns NULL instead)
SELECT
o.order_id,
o.order_date,
o.amount,
c.customer_name,
c.email,
c.region
FROM orders AS o
LEFT JOIN customers AS c
ON o.customer_id = c.customer_id
ORDER BY o.order_date DESC;
Notice the LEFT JOIN — it keeps all rows from orders even when there's no matching customer. An INNER JOIN would silently drop those rows, which is often not what you want (and would be hard to notice, just like a VLOOKUP returning #N/A that gets hidden by error handling).
Bonus Example: Replacing a Pivot Table Summary
-- PostgreSQL
-- Replaces: a pivot table showing monthly revenue by product category
SELECT
DATE_TRUNC('month', order_date)::date AS month,
category,
SUM(amount) AS revenue,
COUNT(DISTINCT order_id) AS num_orders
FROM orders
JOIN products USING (product_id)
WHERE order_date >= '2025-01-01'
GROUP BY 1, 2
ORDER BY 1, 3 DESC;
Common Pitfalls
1. Treating column positions like Excel ranges
In Excel, VLOOKUP(A2, B:D, 3, FALSE) means "return the 3rd column." In SQL, you always reference columns by name. This is actually better — your queries don't break when columns are reordered.
2. Forgetting NULL vs. empty string
Excel treats blank cells as zero in numeric formulas. SQL treats NULL differently from 0 or ''. SUM(amount) ignores NULLs. COUNT(*) counts them. amount = NULL never returns true — use amount IS NULL. This trips up almost every Excel-to-SQL migrant.
3. JOINs silently dropping rows
INNER JOIN drops rows with no match. This is equivalent to VLOOKUP where #N/A rows are simply removed — which you'd never accept in Excel. Start with LEFT JOIN until you fully understand your data relationships.
4. Overcomplicating the first query Analysts often try to rewrite their most complex Excel model in one SQL query first. Write the simplest possible query. Validate it. Add complexity incrementally.
5. Ignoring SQL dialects
DATEDIFF('day', start, end) works in Databricks SQL. DATE_DIFF(end, start, DAY) works in BigQuery. end - start (returns days as integer) works in PostgreSQL. Always check the dialect for date functions specifically.
You Don't Have to Write All the SQL Yourself
One of the biggest friction points in the Excel → SQL transition is not knowing the exact syntax for an unfamiliar operation. This is exactly where Harbinger Explorer helps — you can type a question in plain English ("show me total revenue by region for last quarter") and it generates the SQL query for you. It's a good way to bridge the gap while you're still building SQL fluency, and you can learn by reading the generated queries rather than memorizing syntax cold.
Conclusion
The mental model shift from cell-based to set-based thinking is the hardest part of excel to sql migration. The syntax is learnable in a few weeks. The judgment about when to use GROUP BY vs. a window function, or LEFT JOIN vs. INNER JOIN — that comes with practice.
Start with one worksheet that has a clear, bounded purpose: a sales summary, a customer list, a monthly report. Get it working in SQL and validate it against Excel. Then move to the next. Don't try to migrate everything at once — let confidence build incrementally.
Continue Reading
- Databricks vs Snowflake vs BigQuery (2026)
- Medallion Architecture Explained
- Delta Live Tables vs Classic ETL: Which Fits Your Pipeline?
[VERIFY] PostgreSQL DATE_TRUNC cast ::date — confirm this syntax is valid (yes, in PostgreSQL DATE_TRUNC returns timestamp, cast to date is valid)
[VERIFY] DATEDIFF function signature varies significantly across dialects — verify examples match the target dialect
Continue Reading
Natural Language SQL: Ask Your Data Questions in Plain English
How NL2SQL works, real examples of natural language questions converted to SQL, an honest comparison of tools, and where it fails.
DuckDB Tutorial: Analytical SQL Directly in Your Browser
Get started with DuckDB in 15 minutes. Learn read_parquet, read_csv_auto, PIVOT, and when DuckDB beats SQLite and PostgreSQL for analytical SQL.
Building a REST API Data Pipeline in Python
A step-by-step guide to building a production-grade REST API data pipeline in Python. Covers authentication, pagination, rate limits, schema validation, and common pitfalls with real runnable code.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial