Session 36 - Window Functions in SQL | DSMP 2023
Based on CampusX's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Window functions compute analytical metrics across related rows while returning results for each original row, unlike GROUP BY which collapses to one row per group.
Briefing
Window functions in SQL are positioned as the key upgrade from basic aggregation: they let analysts compute metrics like averages, ranks, and “next/previous” values while keeping the original row-level detail. Instead of collapsing data into one row per group (as with GROUP BY), window functions return results for every row, which makes it possible to run more complex analytical queries without losing context.
The session starts by grounding the concept. A window function is described as an analytical function that performs calculations across a set of rows related to the current row. That “set” is defined using window specifications—most importantly PARTITION BY (which splits rows into groups based on a column) and ORDER BY (which defines the sequence within each partition). The instructor emphasizes that this is conceptually similar to GROUP BY, but with a crucial behavioral difference: GROUP BY produces one output row per group, while window functions produce output row-by-row.
A worked example uses a student marks table with columns like Student ID, Name, Branch, and Marks. The goal is to compute average marks per branch. With GROUP BY, the output becomes a smaller table: Branch plus average marks. With a window function such as AVG(Marks) OVER (PARTITION BY Branch), the average is computed per branch but repeated on every original row belonging to that branch—so each student row carries the branch-level average alongside their own marks. The practical payoff is highlighted: even small differences in output shape enable a wide range of analytics.
The session then contrasts “overall” aggregation with windowed aggregation. An AVG over the entire dataset yields a single overall average repeated nowhere useful for row-level comparisons. Adding OVER and PARTITION BY Branch turns it into a per-branch average repeated for each student, enabling queries like “show only students whose marks are greater than their branch average.” The approach avoids correlated subqueries by using the windowed average directly in a comparison.
Next comes a core set of ranking functions that analysts repeatedly rely on: RANK, DENSE_RANK, and ROW_NUMBER. All three rank rows within a partition based on ORDER BY (typically Marks descending). RANK assigns the same rank to ties but skips subsequent ranks; DENSE_RANK also assigns the same rank to ties but does not skip; ROW_NUMBER assigns a unique sequential number to every row, even when marks tie. The instructor also notes that ORDER BY is essential for deterministic ranking.
The session expands into “value extraction” functions—FIRST_VALUE, LAST_VALUE, and NTH_VALUE—used to pull the first, last, or nth element within a window after sorting. A major conceptual warning appears here: LAST_VALUE depends on the window frame. That leads to the idea of FRAMES (the subset of rows included in the calculation for the current row). Using the default frame can produce surprising LAST_VALUE results; adjusting the frame to something like “UNBOUNDED FOLLOWING” ensures the last value truly reflects the end of the partition for each row.
Finally, the session covers LAG and LEAD for previous/next row values within a partition. These functions enable rolling analytics such as month-over-month growth: compute monthly revenue totals, then use LAG to subtract the previous month and calculate growth percentage. The session closes by previewing that the remaining window functions and more difficult query patterns will be tackled next, with practice-focused tasks and scenarios.
Cornell Notes
Window functions let SQL compute analytical metrics across related rows while still returning results for every original row. By using PARTITION BY to split data into groups and ORDER BY to define row order, functions like AVG, RANK, and ROW_NUMBER can produce group-aware outputs without collapsing rows like GROUP BY does. The session highlights ranking behavior differences: RANK skips ranks after ties, DENSE_RANK doesn’t, and ROW_NUMBER always assigns unique numbers. It also stresses that FIRST_VALUE/LAST_VALUE depend on the window frame, so adjusting frames (e.g., UNBOUNDED FOLLOWING) is necessary for correct “last” results. LAG/LEAD then support row-to-row comparisons such as month-over-month revenue growth.
How does a window function differ from GROUP BY in output shape, and why does that matter?
What do PARTITION BY and ORDER BY control inside a window function?
When should RANK, DENSE_RANK, and ROW_NUMBER be used, and what’s the tie behavior difference?
Why can LAST_VALUE produce unexpected results, and how do window frames fix it?
How do LAG and LEAD enable month-over-month growth calculations?
How can a query filter rows using a windowed aggregate without correlated subqueries?
Review Questions
- In the student marks example, what exact change turns a GROUP BY average into a windowed average that repeats per row?
- Given ties in marks within a branch, how would the output ranks differ between RANK and DENSE_RANK?
- What frame adjustment is needed so LAST_VALUE returns the true last value of the partition rather than the last value within the default frame?
Key Points
- 1
Window functions compute analytical metrics across related rows while returning results for each original row, unlike GROUP BY which collapses to one row per group.
- 2
PARTITION BY defines the window’s groups (e.g., one partition per Branch), and ORDER BY defines the row order inside each partition.
- 3
RANK, DENSE_RANK, and ROW_NUMBER differ mainly in how they handle ties: RANK skips ranks, DENSE_RANK doesn’t, and ROW_NUMBER always produces unique positions.
- 4
FIRST_VALUE, LAST_VALUE, and NTH_VALUE depend on the window frame; incorrect default frames can make LAST_VALUE behave like “last so far.”
- 5
Adjusting the frame (e.g., using UNBOUNDED FOLLOWING) ensures LAST_VALUE reflects the end of the partition.
- 6
LAG and LEAD provide previous/next row values within a partition and are ideal for rolling comparisons like month-over-month revenue growth.
- 7
Windowed aggregates can be used for filtering (e.g., marks above branch average) without correlated subqueries by computing the needed metric per row.