SQL Practice Quiz
SQL Practice Quiz — Study Guide
Advanced SQL: Window Functions, Indexing, CTEs & More
Mastering SQL beyond basic SELECT statements is what separates junior developers from senior engineers. This guide covers the advanced SQL concepts you'll need to write performant, elegant queries — from window functions and recursive CTEs to indexing strategies and common NULL gotchas.
Window Functions
Window functions perform calculations across a set of rows related to the current row, without collapsing them like GROUP BY does.
SELECT
employee_id,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;Ranking Functions Compared
| Function | Gaps in Sequence? | Unique per Partition? |
|---|---|---|
ROW_NUMBER() | No gaps | Always unique |
RANK() | Has gaps (1,1,3) | No |
DENSE_RANK() | No gaps (1,1,2) | No |
Key insight: DENSE_RANK() assigns unique sequential integers with no gaps — the answer to "which function has no gaps?"LAG and LEAD
Access previous or next row values without a self-join:
SELECT
order_date,
revenue,
LAG(revenue, 1) OVER (ORDER BY order_date) AS prev_revenue,
LEAD(revenue, 1) OVER (ORDER BY order_date) AS next_revenue
FROM sales;Window Frames
Control which rows are included in the calculation:
-- Running total using a frame
SUM(revenue) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)Common frame options: ROWS BETWEEN, RANGE BETWEEN, UNBOUNDED PRECEDING, CURRENT ROW, UNBOUNDED FOLLOWING.
Top-N Per Group
Use ROW_NUMBER() in a CTE or subquery to get top N rows per partition:
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;CTEs and Recursive Queries
Common Table Expressions (CTEs)
CTEs make complex queries readable by naming subqueries:
WITH high_earners AS (
SELECT * FROM employees WHERE salary > 100000
)
SELECT department, COUNT(*) FROM high_earners GROUP BY department;Recursive CTEs
Use WITH RECURSIVE to traverse hierarchical data (org charts, file trees) where depth is unknown:
WITH RECURSIVE org_chart AS (
-- Base case
SELECT id, name, manager_id, 0 AS level
FROM employees WHERE manager_id IS NULL UNION ALL
-- Recursive case
SELECT e.id, e.name, e.manager_id, oc.level + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart;
NULL Gotchas
NULL represents unknown, not zero or empty. This causes surprising behavior:
-- This returns NO rows even if NULL exists in the subquery!
SELECT * FROM orders WHERE customer_id NOT IN (SELECT id FROM blacklist);
-- If blacklist has ANY NULL, the result is always emptyWhy? x NOT IN (1, NULL) evaluates as x <> 1 AND x <> NULL → always UNKNOWN.
Fix: Use NOT EXISTS instead:
SELECT * FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM blacklist b WHERE b.id = o.customer_id);Other NULL rules:
NULL = NULL → FALSE (use IS NULL)COUNT() ignore NULLsCOALESCE(value, fallback) replaces NULLsIndexing & Performance
Index Types
| Index Type | Best For |
|---|---|
| B-Tree | Equality, range scans, ORDER BY |
| Hash | Equality only |
| GIN/GiST | Full-text, arrays, JSON |
| BRIN | Very large tables ordered by value (e.g., timestamps) |
For a 1-billion row table with range scans on a timestamp, a BRIN index is most efficient — it stores min/max per block range, using minimal space.
Covering Index
A covering index includes all columns needed by a query, avoiding a table lookup:
-- Query: SELECT name, email FROM users WHERE department = 'Engineering'
CREATE INDEX idx_covering ON users (department, name, email);
-- All needed columns are IN the index — no heap access neededPartial Index
Index only a subset of rows — great for filtering common conditions:
CREATE INDEX idx_active_users ON users (email) WHERE active = true;
-- Much smaller and faster than indexing all usersEXPLAIN and Query Optimization
EXPLAIN shows the query plan without running it. EXPLAIN ANALYZE actually executes the query and shows real timing and row counts.
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;Look for:
Best Practices
SELECT * in productionEXISTS instead of COUNT(*) to check existenceWHERE YEAR(created_at) = 2023 kills index usageJoins Deep Dive
Join Types
| Join | Returns |
|---|---|
| INNER JOIN | Matching rows only |
| LEFT JOIN | All left rows + matched right |
| RIGHT JOIN | All right rows + matched left |
| FULL OUTER JOIN | All rows from both sides |
| CROSS JOIN | Cartesian product |
| LATERAL JOIN | Subquery that references outer query columns |
LATERAL Joins
LATERAL allows a subquery to reference columns from preceding tables — like a correlated subquery in the FROM clause:
SELECT u.name, recent.order_date
FROM users u,
LATERAL (
SELECT order_date FROM orders o
WHERE o.user_id = u.id
ORDER BY order_date DESC LIMIT 1
) recent;Aggregation, Deduplication & Pivot
Deduplication
-- Keep one row per duplicate using ROW_NUMBER
WITH deduped AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM deduped WHERE rn > 1);PIVOT (Conditional Aggregation)
Most databases use CASE for pivoting:
SELECT
department,
SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) AS male_count,
SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0 END) AS female_count
FROM employees
GROUP BY department;Date Functions
DATE_TRUNC('month', created_at) -- PostgreSQL
DATE_FORMAT(created_at, '%Y-%m') -- MySQL
DATEADD(day, 7, order_date) -- SQL ServerKey Takeaways
DENSE_RANK() gives sequential ranks with no gaps; ROW_NUMBER() is always unique; RANK() skips numbers after ties.NOT IN with NULLs always returns empty results — use NOT EXISTS as a safe alternative.EXPLAIN ANALYZE actually runs the query and returns real execution stats; plain EXPLAIN only estimates.