TutorialsMar 31, 2026

SQL Window Functions Tutorial: Rank, Aggregate, Compare

9 min read·Tags: sql, window-functions, postgresql, spark-sql, bigquery, analytics, tutorial

Every analyst hits the same wall: you need a running total, a row number, or a comparison to the previous row — and GROUP BY collapses the detail you need. SQL window functions solve this without subqueries or self-joins, and once you learn them, you'll wonder how you survived without them.

This tutorial covers the most useful window functions with runnable examples in PostgreSQL syntax (most work identically in Spark SQL, BigQuery Standard SQL, and Snowflake SQL). By the end, you'll handle rankings, running aggregates, and row-to-row comparisons confidently.

What Are SQL Window Functions?

A window function performs a calculation across a set of rows related to the current row — without collapsing them into a single output row like GROUP BY does. The key clause is OVER(), which defines the "window" each row sees.

Concept	GROUP BY	Window Function
Rows returned	One per group	Every original row
Aggregation	Collapses rows	Adds a column alongside existing data
Use case	Summary reports	Rankings, running totals, row comparisons
Syntax marker	`GROUP BY col`	`function() OVER(...)`

Anatomy of a Window Function Call

-- PostgreSQL
SELECT
    employee_id,
    department,
    salary,
    RANK() OVER (
        PARTITION BY department    -- defines the group (like GROUP BY, but no collapse)
        ORDER BY salary DESC       -- defines the order within the partition
    ) AS dept_salary_rank
FROM employees;

PARTITION BY splits rows into groups (optional — omit it and the window is the entire result set)
ORDER BY determines the sequence within each partition
Frame clause (optional) further narrows which rows within the partition the function sees

The Three Families of Window Functions

1. Ranking Functions

Ranking functions assign a position to each row within its partition.

Function	Ties	Gaps after ties	Example (values 100, 100, 90)
`ROW_NUMBER()`	Breaks arbitrarily	No gaps	1, 2, 3
`RANK()`	Same rank	Gaps after ties	1, 1, 3
`DENSE_RANK()`	Same rank	No gaps	1, 1, 2
`NTILE(n)`	Distributes into n buckets	N/A	Depends on n

Practical Example: Top 3 Salespeople per Region

-- PostgreSQL
WITH ranked AS (
    SELECT
        salesperson_name,
        region,
        total_revenue,
        DENSE_RANK() OVER (
            PARTITION BY region
            ORDER BY total_revenue DESC
        ) AS revenue_rank
    FROM sales_summary
)
SELECT *
FROM ranked
WHERE revenue_rank <= 3
ORDER BY region, revenue_rank;

Why DENSE_RANK here? If two people tie at #2, you still want to see the person at #3. RANK() would skip to #4 after a tie, potentially returning fewer than 3 people per region.

2. Aggregate Window Functions

Any standard aggregate — SUM, AVG, COUNT, MIN, MAX — can be used as a window function by adding OVER().

Running Total

-- PostgreSQL
SELECT
    order_date,
    order_amount,
    SUM(order_amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM orders
ORDER BY order_date;

Moving Average (7-Day)

-- PostgreSQL
SELECT
    metric_date,
    daily_value,
    AVG(daily_value) OVER (
        ORDER BY metric_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM daily_metrics
ORDER BY metric_date;

Frame clauses matter. The ROWS BETWEEN clause controls exactly which rows feed the aggregate. Common frames:

Frame	Meaning
`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`	All rows from start to current (running total)
`ROWS BETWEEN 6 PRECEDING AND CURRENT ROW`	Current row + 6 before it (7-row window)
`ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING`	Current row to end
`RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW`	Date-based range (PostgreSQL)

ROWS vs RANGE: ROWS counts physical rows. RANGE uses logical values — useful for date-based windows but not supported uniformly across engines. Spark SQL supports RANGE with numeric/date types; BigQuery Standard SQL supports it for date expressions. When in doubt, use ROWS.

3. Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

These let you reach across rows without a self-join.

Function	Returns
`LAG(col, n, default)`	Value from n rows before
`LEAD(col, n, default)`	Value from n rows after
`FIRST_VALUE(col)`	First value in the window frame
`LAST_VALUE(col)`	Last value in the window frame
`NTH_VALUE(col, n)`	Nth value in the window frame

Month-over-Month Growth Rate

-- PostgreSQL
SELECT
    report_month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY report_month) AS prev_month_revenue,
    ROUND(
        (revenue - LAG(revenue, 1) OVER (ORDER BY report_month))::numeric
        / NULLIF(LAG(revenue, 1) OVER (ORDER BY report_month), 0)
        * 100, 1
    ) AS mom_growth_pct
FROM monthly_revenue
ORDER BY report_month;

Always provide a default for LAG/LEAD when nulls could cause downstream issues:

-- PostgreSQL
LAG(revenue, 1, 0) OVER (ORDER BY report_month)

First Purchase Date per Customer

-- PostgreSQL
SELECT DISTINCT
    customer_id,
    FIRST_VALUE(order_date) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_purchase_date
FROM orders;

Watch out with LAST_VALUE: The default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the current row unless you explicitly set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

Combining Multiple Window Functions

You can use multiple window functions in the same query. When you repeat the same OVER() clause, use a named window to keep things DRY:

-- PostgreSQL
SELECT
    employee_id,
    department,
    salary,
    RANK() OVER dept_window AS salary_rank,
    AVG(salary) OVER dept_window AS dept_avg_salary,
    salary - AVG(salary) OVER dept_window AS diff_from_avg
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC)
ORDER BY department, salary_rank;

Engine support for named windows: PostgreSQL supports WINDOW clauses natively. Spark SQL 3.x supports them as well. BigQuery Standard SQL does not support named windows — you'll need to repeat the OVER() clause. [VERIFY]

Common Mistakes and Pitfalls

1. Forgetting the Frame Clause Default

When ORDER BY is present inside OVER(), the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not the entire partition. This catches people off guard with SUM() and AVG():

-- PostgreSQL — This is a running total, NOT a partition total
SELECT SUM(amount) OVER (PARTITION BY dept ORDER BY hire_date) FROM employees;

-- This IS a partition total (no ORDER BY = entire partition as frame)
SELECT SUM(amount) OVER (PARTITION BY dept) FROM employees;

2. Using Window Functions in WHERE

Window functions execute after WHERE. You can't filter on them directly:

-- ❌ This fails
SELECT * FROM employees WHERE RANK() OVER (ORDER BY salary DESC) <= 5;

-- ✅ Use a CTE or subquery
WITH ranked AS (
    SELECT *, RANK() OVER (ORDER BY salary DESC) AS rnk
    FROM employees
)
SELECT * FROM ranked WHERE rnk <= 5;

3. Non-Deterministic ROW_NUMBER

If your ORDER BY doesn't uniquely identify rows, ROW_NUMBER() assigns positions arbitrarily among ties — and the order can change between executions:

-- ❌ Non-deterministic: ties on salary get arbitrary row numbers
ROW_NUMBER() OVER (ORDER BY salary DESC)

-- ✅ Add a tiebreaker
ROW_NUMBER() OVER (ORDER BY salary DESC, employee_id ASC)

4. Performance: Missing Indexes

Window functions with ORDER BY benefit from indexes on the sort columns, especially on large tables. PARTITION BY + ORDER BY combinations work best with a composite index matching both.

5. LAST_VALUE Returning Current Row

As mentioned above — always set the frame explicitly when using LAST_VALUE:

-- ✅ Correct way to get the actual last value in the partition
LAST_VALUE(salary) OVER (
    PARTITION BY department
    ORDER BY hire_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Window Functions Across SQL Engines

Feature	PostgreSQL	Spark SQL	BigQuery Standard SQL	Snowflake SQL
Named windows (`WINDOW`)	✅	✅ (3.x+)	❌	✅
`RANGE` with dates	✅	✅ (numeric/date)	✅	✅
`NTH_VALUE`	✅	✅	✅	✅
`QUALIFY` (filter on window)	❌	❌	✅	✅
`NTILE`	✅	✅	✅	✅

BigQuery and Snowflake offer QUALIFY as a cleaner alternative to CTE-based filtering:

-- BigQuery Standard SQL / Snowflake SQL
SELECT *
FROM sales_summary
QUALIFY DENSE_RANK() OVER (PARTITION BY region ORDER BY total_revenue DESC) <= 3;

This eliminates the CTE pattern entirely — worth knowing if you work in those environments.

When Window Functions Aren't the Right Tool

Simple aggregation without row-level detail — just use GROUP BY.
Cross-table lookups — that's a JOIN, not a window function.
Recursive patterns (e.g., tree traversal) — use recursive CTEs.
Very large partitions with complex frames — can be memory-intensive. If a partition contains millions of rows and you're using ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, consider whether a pre-aggregated approach would perform better.

Quick Reference Cheat Sheet

Task	Function	Key Detail
Rank with gaps	`RANK()`	Ties share rank, next rank skipped
Rank without gaps	`DENSE_RANK()`	Ties share rank, next rank is +1
Unique row number	`ROW_NUMBER()`	Add tiebreaker to ORDER BY
Running total	`SUM() OVER (ORDER BY ...)`	Default frame handles this
Moving average	`AVG() OVER (ROWS BETWEEN n PRECEDING AND CURRENT ROW)`	Specify frame explicitly
Previous row value	`LAG(col, 1)`	Provide default to avoid NULLs
Next row value	`LEAD(col, 1)`	Provide default to avoid NULLs
Split into buckets	`NTILE(n)`	Even distribution across n groups

Next Steps

Once you're comfortable with window functions, explore these patterns:

Sessionization — use LAG to detect gaps in event timestamps and assign session IDs
Deduplication — ROW_NUMBER() partitioned by the natural key, filtered to rnk = 1
Cumulative distribution — CUME_DIST() and PERCENT_RANK() for percentile calculations
Gap-and-island problems — combine ROW_NUMBER with date arithmetic to find consecutive sequences

If you're working with data from APIs or CSVs and want to run window function queries directly in your browser without setting up a database, Harbinger Explorer lets you load data sources and query them with DuckDB SQL — including full window function support — right from a browser tab.

Continue Reading

[VERIFY] BigQuery Standard SQL named window support — confirmed not available as of 2024, verify for 2026. [VERIFY] Spark SQL named window support — confirmed in 3.x, verify exact version.

View all articles

Try Harbinger Explorer for free

Connect any API, upload files, and explore with AI — all in your browser. No credit card required.

Start Free Trial

Harbinger Explorer

SQL Window Functions Tutorial: Rank, Aggregate, Compare

What Are SQL Window Functions?

Anatomy of a Window Function Call

The Three Families of Window Functions

1. Ranking Functions

Practical Example: Top 3 Salespeople per Region

2. Aggregate Window Functions

Running Total

Moving Average (7-Day)

3. Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Month-over-Month Growth Rate

First Purchase Date per Customer

Combining Multiple Window Functions

Common Mistakes and Pitfalls

1. Forgetting the Frame Clause Default

2. Using Window Functions in WHERE

3. Non-Deterministic ROW_NUMBER

4. Performance: Missing Indexes

5. LAST_VALUE Returning Current Row

Window Functions Across SQL Engines

When Window Functions Aren't the Right Tool

Quick Reference Cheat Sheet

Next Steps

Continue Reading

Continue Reading

Apache Spark Tutorial: From Zero to Your First Data Pipeline

Natural Language SQL: Ask Your Data Questions in Plain English

DuckDB Tutorial: Analytical SQL Directly in Your Browser

Try Harbinger Explorer for free

What Are SQL Window Functions?

Anatomy of a Window Function Call

The Three Families of Window Functions

1. Ranking Functions

Practical Example: Top 3 Salespeople per Region

2. Aggregate Window Functions

Running Total

Moving Average (7-Day)

3. Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Month-over-Month Growth Rate

First Purchase Date per Customer

Combining Multiple Window Functions

Common Mistakes and Pitfalls

1. Forgetting the Frame Clause Default

2. Using Window Functions in WHERE

3. Non-Deterministic ROW_NUMBER

4. Performance: Missing Indexes

5. LAST_VALUE Returning Current Row

Window Functions Across SQL Engines

When Window Functions Aren't the Right Tool

Quick Reference Cheat Sheet

Next Steps

Continue Reading

Continue Reading

Apache Spark Tutorial: From Zero to Your First Data Pipeline

Natural Language SQL: Ask Your Data Questions in Plain English

DuckDB Tutorial: Analytical SQL Directly in Your Browser

Try Harbinger Explorer for free

Command Palette