Back to Knowledge Hub
Tutorials

Excel to SQL: A Migration Guide for Business Analysts

9 min read·Tags: excel to sql, sql for analysts, sql tutorial, vlookup to join, business analyst, data migration, learn sql

You've been managing data in Excel for years — VLOOKUP, SUMIF, pivot tables, the works. It gets the job done, mostly. But then the dataset grows past 100k rows, three people are editing the same file, and suddenly your formulas are referencing cells that no longer exist. SQL solves these problems, and the learning curve is smaller than you think if you already know Excel. This is the practical guide to excel to sql migration that nobody bothered to write clearly.

Why This Migration Is Worth Doing

Before the mechanics: SQL isn't just "Excel but faster." It changes how you think about data.

Excel is cell-based. You think in terms of individual cells, ranges, and formulas that reference coordinates. SQL is set-based. You think in terms of operations on entire tables at once. Once that mental model clicks, you start solving data problems faster.

Concrete wins from switching:

  • Datasets of any size (millions of rows are routine)
  • No version conflicts — the database is the single source of truth
  • Reproducible analysis — your query is the documentation
  • Collaboration without file-locking nightmares
  • Auditability — queries are text, they go in version control

Your Migration Roadmap

Loading diagram...

Start with your actual workbook — don't try to "redesign" first. Get it working in SQL, then optimize.

Audit your data structure — identify which worksheets are raw data tables vs. calculation sheets vs. dashboards. Only the raw data tables move to SQL directly.

Map concepts — use the reference table below to translate your mental model.

Load the data — export as CSV and import into a SQL database (PostgreSQL is the friendliest for beginners). Many BI tools and SQL environments accept CSV directly.

Rewrite your formulas — work through them one by one using the mapping guide.

Validate — run your SQL query and your Excel formula side by side on a known sample. Numbers must match before you trust the SQL.

Retire — once the SQL version is validated and running reliably, archive the spreadsheet.

The Excel → SQL Concept Map

This is the reference table you'll use constantly during migration. Bookmark it.

Excel ConceptSQL EquivalentNotes
Worksheet / TabTableEach worksheet becomes a table
Column headerColumn nameUse snake_case: order_date, not "Order Date"
RowRow / Record
CellValue / Field
Filter (dropdown)WHERE clauseWHERE region = 'North'
Sort (A→Z)ORDER BY col ASCORDER BY col DESC for Z→A
VLOOKUP / XLOOKUPJOIN (LEFT JOIN)Match on a key column, not column position
SUMIFSUM() with WHERE or GROUP BY
COUNTIFCOUNT() with WHERE or GROUP BY
AVERAGEIFAVG() with WHERE or GROUP BY
Pivot TableGROUP BY with aggregate functions
IF formulaCASE WHEN … THEN … ELSE … ENDCan be nested
Nested IFsNested CASE WHEN
CONCATENATE / &CONCAT() or || operatorDialect varies
LEFT(text, n)LEFT(col, n)Works in most SQL dialects
MID(text, start, n)SUBSTRING(col, start, n)
RIGHT(text, n)RIGHT(col, n)
LEN(text)LENGTH(col) or LEN(col)PostgreSQL: LENGTH; SQL Server: LEN
TRIM(text)TRIM(col)
UPPER / LOWERUPPER(col) / LOWER(col)
TEXT(date, format)TO_CHAR(col, 'format')PostgreSQL; FORMAT() in SQL Server
DATEDIFDATEDIFF() or DATE_DIFF()Dialect varies significantly
Remove DuplicatesSELECT DISTINCT
Named RangeCTE (WITH name AS (…))More readable than subqueries
Workbook-to-workbook linksJOIN across tablesPut both in the same database
IFERRORCOALESCE(expr, fallback)Returns fallback if expr is NULL
RANKRANK() OVER (ORDER BY col)Window function
Array formulaSubquery or window functionDepends on what the formula does
Conditional formattingNot in SQL — apply in BI layer

SQL Code Examples

Example 1: Replacing SUMIF with GROUP BY

In Excel, you'd use =SUMIF(region_column, "North", sales_column) in each row of a summary table — one formula per region, repeated.

In SQL, you calculate all regions at once:

-- PostgreSQL
-- Replaces: multiple SUMIF formulas, one per region in a summary sheet
-- Shows: total sales, order count, and average order value per region

SELECT
    region,
    SUM(amount)         AS total_sales,
    COUNT(*)            AS order_count,
    ROUND(AVG(amount), 2) AS avg_order_value
FROM orders
WHERE order_date >= '2025-01-01'
  AND order_date <  '2026-01-01'
GROUP BY region
ORDER BY total_sales DESC;

This replaces what would have been a pivot table in Excel — and it runs the same way on 100 rows or 100 million rows.

Example 2: Replacing VLOOKUP with a JOIN

VLOOKUP is one of the most-used Excel formulas. It looks up a value in one table and returns a column from a matching row. In SQL, this is a JOIN.

-- PostgreSQL
-- Replaces: =VLOOKUP(A2, customers_sheet, 3, FALSE) to get customer_name for each order
-- Uses LEFT JOIN to keep all orders even if no matching customer exists
--   (equivalent to VLOOKUP returning #N/A — here it returns NULL instead)

SELECT
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_name,
    c.email,
    c.region
FROM orders AS o
LEFT JOIN customers AS c
    ON o.customer_id = c.customer_id
ORDER BY o.order_date DESC;

Notice the LEFT JOIN — it keeps all rows from orders even when there's no matching customer. An INNER JOIN would silently drop those rows, which is often not what you want (and would be hard to notice, just like a VLOOKUP returning #N/A that gets hidden by error handling).

Bonus Example: Replacing a Pivot Table Summary

-- PostgreSQL
-- Replaces: a pivot table showing monthly revenue by product category

SELECT
    DATE_TRUNC('month', order_date)::date  AS month,
    category,
    SUM(amount)                            AS revenue,
    COUNT(DISTINCT order_id)               AS num_orders
FROM orders
JOIN products USING (product_id)
WHERE order_date >= '2025-01-01'
GROUP BY 1, 2
ORDER BY 1, 3 DESC;

Common Pitfalls

1. Treating column positions like Excel ranges In Excel, VLOOKUP(A2, B:D, 3, FALSE) means "return the 3rd column." In SQL, you always reference columns by name. This is actually better — your queries don't break when columns are reordered.

2. Forgetting NULL vs. empty string Excel treats blank cells as zero in numeric formulas. SQL treats NULL differently from 0 or ''. SUM(amount) ignores NULLs. COUNT(*) counts them. amount = NULL never returns true — use amount IS NULL. This trips up almost every Excel-to-SQL migrant.

3. JOINs silently dropping rows INNER JOIN drops rows with no match. This is equivalent to VLOOKUP where #N/A rows are simply removed — which you'd never accept in Excel. Start with LEFT JOIN until you fully understand your data relationships.

4. Overcomplicating the first query Analysts often try to rewrite their most complex Excel model in one SQL query first. Write the simplest possible query. Validate it. Add complexity incrementally.

5. Ignoring SQL dialects DATEDIFF('day', start, end) works in Databricks SQL. DATE_DIFF(end, start, DAY) works in BigQuery. end - start (returns days as integer) works in PostgreSQL. Always check the dialect for date functions specifically.

You Don't Have to Write All the SQL Yourself

One of the biggest friction points in the Excel → SQL transition is not knowing the exact syntax for an unfamiliar operation. This is exactly where Harbinger Explorer helps — you can type a question in plain English ("show me total revenue by region for last quarter") and it generates the SQL query for you. It's a good way to bridge the gap while you're still building SQL fluency, and you can learn by reading the generated queries rather than memorizing syntax cold.

Conclusion

The mental model shift from cell-based to set-based thinking is the hardest part of excel to sql migration. The syntax is learnable in a few weeks. The judgment about when to use GROUP BY vs. a window function, or LEFT JOIN vs. INNER JOIN — that comes with practice.

Start with one worksheet that has a clear, bounded purpose: a sales summary, a customer list, a monthly report. Get it working in SQL and validate it against Excel. Then move to the next. Don't try to migrate everything at once — let confidence build incrementally.


Continue Reading


[VERIFY] PostgreSQL DATE_TRUNC cast ::date — confirm this syntax is valid (yes, in PostgreSQL DATE_TRUNC returns timestamp, cast to date is valid) [VERIFY] DATEDIFF function signature varies significantly across dialects — verify examples match the target dialect


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial