Harbinger Explorer

Back to Knowledge Hub
Tutorials

SQL Window Functions Tutorial: Rank, Aggregate, Compare

9 min read·Tags: sql, window-functions, postgresql, spark-sql, bigquery, analytics, tutorial

Every analyst hits the same wall: you need a running total, a row number, or a comparison to the previous row — and GROUP BY collapses the detail you need. SQL window functions solve this without subqueries or self-joins, and once you learn them, you'll wonder how you survived without them.

This tutorial covers the most useful window functions with runnable examples in PostgreSQL syntax (most work identically in Spark SQL, BigQuery Standard SQL, and Snowflake SQL). By the end, you'll handle rankings, running aggregates, and row-to-row comparisons confidently.

What Are SQL Window Functions?

A window function performs a calculation across a set of rows related to the current row — without collapsing them into a single output row like GROUP BY does. The key clause is OVER(), which defines the "window" each row sees.

ConceptGROUP BYWindow Function
Rows returnedOne per groupEvery original row
AggregationCollapses rowsAdds a column alongside existing data
Use caseSummary reportsRankings, running totals, row comparisons
Syntax markerGROUP BY colfunction() OVER(...)

Anatomy of a Window Function Call

-- PostgreSQL
SELECT
    employee_id,
    department,
    salary,
    RANK() OVER (
        PARTITION BY department    -- defines the group (like GROUP BY, but no collapse)
        ORDER BY salary DESC       -- defines the order within the partition
    ) AS dept_salary_rank
FROM employees;
  • PARTITION BY splits rows into groups (optional — omit it and the window is the entire result set)
  • ORDER BY determines the sequence within each partition
  • Frame clause (optional) further narrows which rows within the partition the function sees

The Three Families of Window Functions

1. Ranking Functions

Ranking functions assign a position to each row within its partition.

FunctionTiesGaps after tiesExample (values 100, 100, 90)
ROW_NUMBER()Breaks arbitrarilyNo gaps1, 2, 3
RANK()Same rankGaps after ties1, 1, 3
DENSE_RANK()Same rankNo gaps1, 1, 2
NTILE(n)Distributes into n bucketsN/ADepends on n

Practical Example: Top 3 Salespeople per Region

-- PostgreSQL
WITH ranked AS (
    SELECT
        salesperson_name,
        region,
        total_revenue,
        DENSE_RANK() OVER (
            PARTITION BY region
            ORDER BY total_revenue DESC
        ) AS revenue_rank
    FROM sales_summary
)
SELECT *
FROM ranked
WHERE revenue_rank <= 3
ORDER BY region, revenue_rank;

Why DENSE_RANK here? If two people tie at #2, you still want to see the person at #3. RANK() would skip to #4 after a tie, potentially returning fewer than 3 people per region.

2. Aggregate Window Functions

Any standard aggregate — SUM, AVG, COUNT, MIN, MAX — can be used as a window function by adding OVER().

Running Total

-- PostgreSQL
SELECT
    order_date,
    order_amount,
    SUM(order_amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM orders
ORDER BY order_date;

Moving Average (7-Day)

-- PostgreSQL
SELECT
    metric_date,
    daily_value,
    AVG(daily_value) OVER (
        ORDER BY metric_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM daily_metrics
ORDER BY metric_date;

Frame clauses matter. The ROWS BETWEEN clause controls exactly which rows feed the aggregate. Common frames:

FrameMeaning
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWAll rows from start to current (running total)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROWCurrent row + 6 before it (7-row window)
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWINGCurrent row to end
RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROWDate-based range (PostgreSQL)

ROWS vs RANGE: ROWS counts physical rows. RANGE uses logical values — useful for date-based windows but not supported uniformly across engines. Spark SQL supports RANGE with numeric/date types; BigQuery Standard SQL supports it for date expressions. When in doubt, use ROWS.

3. Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

These let you reach across rows without a self-join.

FunctionReturns
LAG(col, n, default)Value from n rows before
LEAD(col, n, default)Value from n rows after
FIRST_VALUE(col)First value in the window frame
LAST_VALUE(col)Last value in the window frame
NTH_VALUE(col, n)Nth value in the window frame

Month-over-Month Growth Rate

-- PostgreSQL
SELECT
    report_month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY report_month) AS prev_month_revenue,
    ROUND(
        (revenue - LAG(revenue, 1) OVER (ORDER BY report_month))::numeric
        / NULLIF(LAG(revenue, 1) OVER (ORDER BY report_month), 0)
        * 100, 1
    ) AS mom_growth_pct
FROM monthly_revenue
ORDER BY report_month;

Always provide a default for LAG/LEAD when nulls could cause downstream issues:

-- PostgreSQL
LAG(revenue, 1, 0) OVER (ORDER BY report_month)

First Purchase Date per Customer

-- PostgreSQL
SELECT DISTINCT
    customer_id,
    FIRST_VALUE(order_date) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_purchase_date
FROM orders;

Watch out with LAST_VALUE: The default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the current row unless you explicitly set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Combining Multiple Window Functions

You can use multiple window functions in the same query. When you repeat the same OVER() clause, use a named window to keep things DRY:

-- PostgreSQL
SELECT
    employee_id,
    department,
    salary,
    RANK() OVER dept_window AS salary_rank,
    AVG(salary) OVER dept_window AS dept_avg_salary,
    salary - AVG(salary) OVER dept_window AS diff_from_avg
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC)
ORDER BY department, salary_rank;

Engine support for named windows: PostgreSQL supports WINDOW clauses natively. Spark SQL 3.x supports them as well. BigQuery Standard SQL does not support named windows — you'll need to repeat the OVER() clause. [VERIFY]

Common Mistakes and Pitfalls

1. Forgetting the Frame Clause Default

When ORDER BY is present inside OVER(), the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not the entire partition. This catches people off guard with SUM() and AVG():

-- PostgreSQL — This is a running total, NOT a partition total
SELECT SUM(amount) OVER (PARTITION BY dept ORDER BY hire_date) FROM employees;

-- This IS a partition total (no ORDER BY = entire partition as frame)
SELECT SUM(amount) OVER (PARTITION BY dept) FROM employees;

2. Using Window Functions in WHERE

Window functions execute after WHERE. You can't filter on them directly:

-- ❌ This fails
SELECT * FROM employees WHERE RANK() OVER (ORDER BY salary DESC) <= 5;

-- ✅ Use a CTE or subquery
WITH ranked AS (
    SELECT *, RANK() OVER (ORDER BY salary DESC) AS rnk
    FROM employees
)
SELECT * FROM ranked WHERE rnk <= 5;

3. Non-Deterministic ROW_NUMBER

If your ORDER BY doesn't uniquely identify rows, ROW_NUMBER() assigns positions arbitrarily among ties — and the order can change between executions:

-- ❌ Non-deterministic: ties on salary get arbitrary row numbers
ROW_NUMBER() OVER (ORDER BY salary DESC)

-- ✅ Add a tiebreaker
ROW_NUMBER() OVER (ORDER BY salary DESC, employee_id ASC)

4. Performance: Missing Indexes

Window functions with ORDER BY benefit from indexes on the sort columns, especially on large tables. PARTITION BY + ORDER BY combinations work best with a composite index matching both.

5. LAST_VALUE Returning Current Row

As mentioned above — always set the frame explicitly when using LAST_VALUE:

-- ✅ Correct way to get the actual last value in the partition
LAST_VALUE(salary) OVER (
    PARTITION BY department
    ORDER BY hire_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Window Functions Across SQL Engines

FeaturePostgreSQLSpark SQLBigQuery Standard SQLSnowflake SQL
Named windows (WINDOW)✅ (3.x+)
RANGE with dates✅ (numeric/date)
NTH_VALUE
QUALIFY (filter on window)
NTILE

BigQuery and Snowflake offer QUALIFY as a cleaner alternative to CTE-based filtering:

-- BigQuery Standard SQL / Snowflake SQL
SELECT *
FROM sales_summary
QUALIFY DENSE_RANK() OVER (PARTITION BY region ORDER BY total_revenue DESC) <= 3;

This eliminates the CTE pattern entirely — worth knowing if you work in those environments.

When Window Functions Aren't the Right Tool

  • Simple aggregation without row-level detail — just use GROUP BY.
  • Cross-table lookups — that's a JOIN, not a window function.
  • Recursive patterns (e.g., tree traversal) — use recursive CTEs.
  • Very large partitions with complex frames — can be memory-intensive. If a partition contains millions of rows and you're using ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, consider whether a pre-aggregated approach would perform better.

Quick Reference Cheat Sheet

TaskFunctionKey Detail
Rank with gapsRANK()Ties share rank, next rank skipped
Rank without gapsDENSE_RANK()Ties share rank, next rank is +1
Unique row numberROW_NUMBER()Add tiebreaker to ORDER BY
Running totalSUM() OVER (ORDER BY ...)Default frame handles this
Moving averageAVG() OVER (ROWS BETWEEN n PRECEDING AND CURRENT ROW)Specify frame explicitly
Previous row valueLAG(col, 1)Provide default to avoid NULLs
Next row valueLEAD(col, 1)Provide default to avoid NULLs
Split into bucketsNTILE(n)Even distribution across n groups

Next Steps

Once you're comfortable with window functions, explore these patterns:

  • Sessionization — use LAG to detect gaps in event timestamps and assign session IDs
  • DeduplicationROW_NUMBER() partitioned by the natural key, filtered to rnk = 1
  • Cumulative distributionCUME_DIST() and PERCENT_RANK() for percentile calculations
  • Gap-and-island problems — combine ROW_NUMBER with date arithmetic to find consecutive sequences

If you're working with data from APIs or CSVs and want to run window function queries directly in your browser without setting up a database, Harbinger Explorer lets you load data sources and query them with DuckDB SQL — including full window function support — right from a browser tab.

Continue Reading


[VERIFY] BigQuery Standard SQL named window support — confirmed not available as of 2024, verify for 2026. [VERIFY] Spark SQL named window support — confirmed in 3.x, verify exact version.


Continue Reading

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Command Palette

Search for a command to run...