SQL Window Functions Tutorial: Rank, Aggregate, Compare
Every analyst hits the same wall: you need a running total, a row number, or a comparison to the previous row — and GROUP BY collapses the detail you need. SQL window functions solve this without subqueries or self-joins, and once you learn them, you'll wonder how you survived without them.
This tutorial covers the most useful window functions with runnable examples in PostgreSQL syntax (most work identically in Spark SQL, BigQuery Standard SQL, and Snowflake SQL). By the end, you'll handle rankings, running aggregates, and row-to-row comparisons confidently.
What Are SQL Window Functions?
A window function performs a calculation across a set of rows related to the current row — without collapsing them into a single output row like GROUP BY does. The key clause is OVER(), which defines the "window" each row sees.
| Concept | GROUP BY | Window Function |
|---|---|---|
| Rows returned | One per group | Every original row |
| Aggregation | Collapses rows | Adds a column alongside existing data |
| Use case | Summary reports | Rankings, running totals, row comparisons |
| Syntax marker | GROUP BY col | function() OVER(...) |
Anatomy of a Window Function Call
-- PostgreSQL
SELECT
employee_id,
department,
salary,
RANK() OVER (
PARTITION BY department -- defines the group (like GROUP BY, but no collapse)
ORDER BY salary DESC -- defines the order within the partition
) AS dept_salary_rank
FROM employees;
- PARTITION BY splits rows into groups (optional — omit it and the window is the entire result set)
- ORDER BY determines the sequence within each partition
- Frame clause (optional) further narrows which rows within the partition the function sees
The Three Families of Window Functions
1. Ranking Functions
Ranking functions assign a position to each row within its partition.
| Function | Ties | Gaps after ties | Example (values 100, 100, 90) |
|---|---|---|---|
ROW_NUMBER() | Breaks arbitrarily | No gaps | 1, 2, 3 |
RANK() | Same rank | Gaps after ties | 1, 1, 3 |
DENSE_RANK() | Same rank | No gaps | 1, 1, 2 |
NTILE(n) | Distributes into n buckets | N/A | Depends on n |
Practical Example: Top 3 Salespeople per Region
-- PostgreSQL
WITH ranked AS (
SELECT
salesperson_name,
region,
total_revenue,
DENSE_RANK() OVER (
PARTITION BY region
ORDER BY total_revenue DESC
) AS revenue_rank
FROM sales_summary
)
SELECT *
FROM ranked
WHERE revenue_rank <= 3
ORDER BY region, revenue_rank;
Why DENSE_RANK here? If two people tie at #2, you still want to see the person at #3. RANK() would skip to #4 after a tie, potentially returning fewer than 3 people per region.
2. Aggregate Window Functions
Any standard aggregate — SUM, AVG, COUNT, MIN, MAX — can be used as a window function by adding OVER().
Running Total
-- PostgreSQL
SELECT
order_date,
order_amount,
SUM(order_amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders
ORDER BY order_date;
Moving Average (7-Day)
-- PostgreSQL
SELECT
metric_date,
daily_value,
AVG(daily_value) OVER (
ORDER BY metric_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_metrics
ORDER BY metric_date;
Frame clauses matter. The ROWS BETWEEN clause controls exactly which rows feed the aggregate. Common frames:
| Frame | Meaning |
|---|---|
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | All rows from start to current (running total) |
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW | Current row + 6 before it (7-row window) |
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING | Current row to end |
RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW | Date-based range (PostgreSQL) |
ROWSvsRANGE:ROWScounts physical rows.RANGEuses logical values — useful for date-based windows but not supported uniformly across engines. Spark SQL supportsRANGEwith numeric/date types; BigQuery Standard SQL supports it for date expressions. When in doubt, useROWS.
3. Value Functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE
These let you reach across rows without a self-join.
| Function | Returns |
|---|---|
LAG(col, n, default) | Value from n rows before |
LEAD(col, n, default) | Value from n rows after |
FIRST_VALUE(col) | First value in the window frame |
LAST_VALUE(col) | Last value in the window frame |
NTH_VALUE(col, n) | Nth value in the window frame |
Month-over-Month Growth Rate
-- PostgreSQL
SELECT
report_month,
revenue,
LAG(revenue, 1) OVER (ORDER BY report_month) AS prev_month_revenue,
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY report_month))::numeric
/ NULLIF(LAG(revenue, 1) OVER (ORDER BY report_month), 0)
* 100, 1
) AS mom_growth_pct
FROM monthly_revenue
ORDER BY report_month;
Always provide a default for LAG/LEAD when nulls could cause downstream issues:
-- PostgreSQL
LAG(revenue, 1, 0) OVER (ORDER BY report_month)
First Purchase Date per Customer
-- PostgreSQL
SELECT DISTINCT
customer_id,
FIRST_VALUE(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_purchase_date
FROM orders;
Watch out with LAST_VALUE: The default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so LAST_VALUE returns the current row unless you explicitly set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.
Combining Multiple Window Functions
You can use multiple window functions in the same query. When you repeat the same OVER() clause, use a named window to keep things DRY:
-- PostgreSQL
SELECT
employee_id,
department,
salary,
RANK() OVER dept_window AS salary_rank,
AVG(salary) OVER dept_window AS dept_avg_salary,
salary - AVG(salary) OVER dept_window AS diff_from_avg
FROM employees
WINDOW dept_window AS (PARTITION BY department ORDER BY salary DESC)
ORDER BY department, salary_rank;
Engine support for named windows: PostgreSQL supports
WINDOWclauses natively. Spark SQL 3.x supports them as well. BigQuery Standard SQL does not support named windows — you'll need to repeat theOVER()clause. [VERIFY]
Common Mistakes and Pitfalls
1. Forgetting the Frame Clause Default
When ORDER BY is present inside OVER(), the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — not the entire partition. This catches people off guard with SUM() and AVG():
-- PostgreSQL — This is a running total, NOT a partition total
SELECT SUM(amount) OVER (PARTITION BY dept ORDER BY hire_date) FROM employees;
-- This IS a partition total (no ORDER BY = entire partition as frame)
SELECT SUM(amount) OVER (PARTITION BY dept) FROM employees;
2. Using Window Functions in WHERE
Window functions execute after WHERE. You can't filter on them directly:
-- ❌ This fails
SELECT * FROM employees WHERE RANK() OVER (ORDER BY salary DESC) <= 5;
-- ✅ Use a CTE or subquery
WITH ranked AS (
SELECT *, RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT * FROM ranked WHERE rnk <= 5;
3. Non-Deterministic ROW_NUMBER
If your ORDER BY doesn't uniquely identify rows, ROW_NUMBER() assigns positions arbitrarily among ties — and the order can change between executions:
-- ❌ Non-deterministic: ties on salary get arbitrary row numbers
ROW_NUMBER() OVER (ORDER BY salary DESC)
-- ✅ Add a tiebreaker
ROW_NUMBER() OVER (ORDER BY salary DESC, employee_id ASC)
4. Performance: Missing Indexes
Window functions with ORDER BY benefit from indexes on the sort columns, especially on large tables. PARTITION BY + ORDER BY combinations work best with a composite index matching both.
5. LAST_VALUE Returning Current Row
As mentioned above — always set the frame explicitly when using LAST_VALUE:
-- ✅ Correct way to get the actual last value in the partition
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
Window Functions Across SQL Engines
| Feature | PostgreSQL | Spark SQL | BigQuery Standard SQL | Snowflake SQL |
|---|---|---|---|---|
Named windows (WINDOW) | ✅ | ✅ (3.x+) | ❌ | ✅ |
RANGE with dates | ✅ | ✅ (numeric/date) | ✅ | ✅ |
NTH_VALUE | ✅ | ✅ | ✅ | ✅ |
QUALIFY (filter on window) | ❌ | ❌ | ✅ | ✅ |
NTILE | ✅ | ✅ | ✅ | ✅ |
BigQuery and Snowflake offer QUALIFY as a cleaner alternative to CTE-based filtering:
-- BigQuery Standard SQL / Snowflake SQL
SELECT *
FROM sales_summary
QUALIFY DENSE_RANK() OVER (PARTITION BY region ORDER BY total_revenue DESC) <= 3;
This eliminates the CTE pattern entirely — worth knowing if you work in those environments.
When Window Functions Aren't the Right Tool
- Simple aggregation without row-level detail — just use
GROUP BY. - Cross-table lookups — that's a
JOIN, not a window function. - Recursive patterns (e.g., tree traversal) — use recursive CTEs.
- Very large partitions with complex frames — can be memory-intensive. If a partition contains millions of rows and you're using
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, consider whether a pre-aggregated approach would perform better.
Quick Reference Cheat Sheet
| Task | Function | Key Detail |
|---|---|---|
| Rank with gaps | RANK() | Ties share rank, next rank skipped |
| Rank without gaps | DENSE_RANK() | Ties share rank, next rank is +1 |
| Unique row number | ROW_NUMBER() | Add tiebreaker to ORDER BY |
| Running total | SUM() OVER (ORDER BY ...) | Default frame handles this |
| Moving average | AVG() OVER (ROWS BETWEEN n PRECEDING AND CURRENT ROW) | Specify frame explicitly |
| Previous row value | LAG(col, 1) | Provide default to avoid NULLs |
| Next row value | LEAD(col, 1) | Provide default to avoid NULLs |
| Split into buckets | NTILE(n) | Even distribution across n groups |
Next Steps
Once you're comfortable with window functions, explore these patterns:
- Sessionization — use
LAGto detect gaps in event timestamps and assign session IDs - Deduplication —
ROW_NUMBER()partitioned by the natural key, filtered tornk = 1 - Cumulative distribution —
CUME_DIST()andPERCENT_RANK()for percentile calculations - Gap-and-island problems — combine
ROW_NUMBERwith date arithmetic to find consecutive sequences
If you're working with data from APIs or CSVs and want to run window function queries directly in your browser without setting up a database, Harbinger Explorer lets you load data sources and query them with DuckDB SQL — including full window function support — right from a browser tab.
Continue Reading
- Apache Spark Tutorial: From DataFrames to Production Jobs
- Data Pipeline Monitoring: Catching Failures Before Your Stakeholders Do
- What Is dbt? The SQL Transformation Tool Explained
[VERIFY] BigQuery Standard SQL named window support — confirmed not available as of 2024, verify for 2026. [VERIFY] Spark SQL named window support — confirmed in 3.x, verify exact version.
Continue Reading
Apache Spark Tutorial: From Zero to Your First Data Pipeline
A hands-on Apache Spark tutorial covering core concepts, PySpark DataFrames, transformations, and real-world pipeline patterns for data engineers.
Natural Language SQL: Ask Your Data Questions in Plain English
How NL2SQL works, real examples of natural language questions converted to SQL, an honest comparison of tools, and where it fails.
DuckDB Tutorial: Analytical SQL Directly in Your Browser
Get started with DuckDB in 15 minutes. Learn read_parquet, read_csv_auto, PIVOT, and when DuckDB beats SQLite and PostgreSQL for analytical SQL.
Try Harbinger Explorer for free
Connect any API, upload files, and explore with AI — all in your browser. No credit card required.
Start Free Trial