🧪Advanced Query Techniques

This module covers Window Functions, Recursive Queries, Set Operations, Pivoting and Unpivoting.

📘 Window Functions

Window functions are powerful tools in SQL that allow you to perform calculations across a set of table rows that are somehow related to the current row. They are especially useful for running totals, cumulative aggregates, ranking, and more.

💡 Key Window Function Concepts

`OVER()`: Defines the window of rows that will be used for calculation.
`PARTITION BY`: Divides the result set into partitions before applying the window function.
`ORDER BY`: Determines the order in which rows are processed within each partition.

💡 Common Window Functions

`ROW_NUMBER()`: Assigns a unique number to each row in the result set.
`RANK()`: Assigns a rank to each row within its partition of the result set.
`DENSE_RANK()`: Similar to RANK(), but without gaps between ranks.
`LAG()`: Accesses data from a previous row in the same result set.
`LEAD()`: Accesses data from a subsequent row in the same result set.

✅ ROW_NUMBER() Example

SELECT
  employee_id,
  name,
  salary,
  ROW_NUMBER() OVER(ORDER BY salary DESC) AS row_num
FROM employees;

This example assigns a unique sequential number to each employee based on their salary in descending order.

💡 PARTITION BY Example

SELECT
  department,
  name,
  salary,
  ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

In this example, the `PARTITION BY` clause ensures that row numbers reset for each department.

💡 RANK() vs DENSE_RANK()

Both functions assign rankings, but `RANK()` leaves gaps when there are ties (e.g., two employees with the same salary), while `DENSE_RANK()` does not leave gaps.

✅ LAG() and LEAD() Examples

SELECT
  date,
  sales,
  LAG(sales, 1) OVER(ORDER BY date) AS prev_sales,
  LEAD(sales, 1) OVER(ORDER BY date) AS next_sales
FROM monthly_sales;

This query retrieves the previous and next month's sales values for each row.

💡 Best Practices

Always specify `PARTITION BY` when you need calculations to reset at partition boundaries.
Use `ORDER BY` within the window function to ensure rows are processed in the correct sequence.
Combine window functions with `DISTINCT` or aggregate functions when needed.
Test your queries incrementally to understand how each part affects the result.

❌ Common Mistakes

Forgetting to include `ORDER BY` in the window function.
Not using `PARTITION BY` when you need separate calculations for different groups.
Misusing `LAG()` and `LEAD()` without specifying the number of rows to offset.
Overcomplicating queries - break them down into smaller, manageable parts.

📘 Recursive Queries

Recursive queries are powerful SQL techniques used to handle hierarchical or self-referential data. They allow you to query data that refers back to itself, such as organizational structures, family trees, or nested categories.

💡 What is a Recursive Query?

A recursive query uses a Common Table Expression (CTE) to repeatedly execute the query until a specified condition is met. This allows you to traverse hierarchies and retrieve data that would otherwise require multiple queries.

✅ Common Table Expressions (CTEs)

CTEs are temporary result sets defined within a single query. They can be used to simplify complex queries and make them more readable. Recursive CTEs have two parts: the anchor member and the recursive member.

WITH cte_name AS (
    -- Anchor part
    SELECT ...
    UNION ALL
    -- Recursive part
    SELECT ...
)
SELECT * FROM cte_name;

💡 Key Components of a Recursive Query

`ANCHOR PART`: The initial query that provides the starting point for the recursion.
`RECURSIVE PART`: The query that joins with the CTE to continue the process.
`UNION ALL`: Combines results from the anchor and recursive parts.

✅ When to Use Recursive Queries?

Organizational hierarchies (e.g., reporting structures)
Tree structures (e.g., categories, file systems)
Genealogical relationships
Nested comments or forum threads

💡 Limitations and Best Practices

Ensure proper termination conditions to prevent infinite loops.
Use `MAXRECURSION` hint in SQL Server to limit recursion depth.
Test queries with small datasets first.
Avoid deep recursion when possible.

✅ Example: Employee Hierarchy

WITH EmployeeHierarchy AS (
    -- Anchor part
    SELECT employee_id, first_name, last_name, manager_id
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive part
    SELECT e.employee_id, e.first_name, e.last_name, e.manager_id
    FROM employees e
    INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM EmployeeHierarchy;

❌ Common Mistakes to Avoid

Not including a termination condition.
Using `UNION` instead of `UNION ALL` (can cause unnecessary duplicates).
Forgetting to alias the CTE correctly.
Not testing with small datasets.

📘 Set Operations

Set operations in SQL are powerful tools for combining, intersecting, or excluding result sets from multiple queries. These operations allow you to manipulate data at a higher level of abstraction, making complex queries more manageable and efficient.

💡 Basic Set Operations

`UNION`: Combines two result sets without duplicates. Use `UNION ALL` to include duplicates.
`INTERSECT`: Returns only the rows that appear in both result sets.
`EXCEPT`: Returns rows from the first query that are not present in the second query.

✅ UNION and UNION ALL

The `UNION` operator combines two or more result sets into a single set, eliminating duplicate rows. Use `UNION ALL` when you want to include all results, including duplicates.

SELECT column1 FROM table1
UNION
SELECT column1 FROM table2;

Use `UNION` when you want to combine two tables but don't want duplicates. For example, combining user data from multiple sources.

SELECT id, name FROM employees
UNION ALL
SELECT id, name FROM contractors;

Use `UNION ALL` when performance is a priority and you don't mind duplicates. This operation doesn't check for duplicates, so it's faster.

✅ INTERSECT: Finding Common Rows

The `INTERSECT` operator returns only the rows that are present in both result sets. This is useful for finding common elements between two datasets.

SELECT id FROM orders
INTERSECT
SELECT id FROM customers;

Use `INTERSECT` when you need to find overlapping data between tables, such as finding users who are both active and premium.

✅ EXCEPT: Removing Matching Rows

The `EXCEPT` operator returns rows from the first query that are not present in the second query. This is useful for finding differences between two datasets.

SELECT id FROM employees
EXCEPT
SELECT id FROM contractors;

Use `EXCEPT` when you need to find data that exists in one table but not in another, such as identifying users who haven't completed a certain action.

💡 Best Practices for Set Operations

Always use parentheses to clarify the order of operations when combining multiple set operations.
Avoid using `UNION` or `INTERSECT` with large datasets due to potential performance impacts.
Use `EXCEPT` sparingly and test for performance in production environments.

❌ Common Mistakes to Avoid

Don't use `UNION` when you need to include duplicates - use `UNION ALL` instead.
Avoid mixing different column counts or types in your queries.
Don't assume the order of rows will be preserved after a set operation.

📘 Pivoting and Unpivoting

Pivoting and unpivoting are powerful techniques in SQL that allow you to dynamically transform the shape of your data. These operations enable you to convert rows into columns (pivot) or columns into rows (unpivot), making it easier to analyze and present your data in various formats.

💡 Key Concepts

Pivoting: Transforms rows into columns, creating dynamic columns based on values.
Unpivoting: Transforms columns into rows, returning data to a normalized format.
Use cases include data aggregation, report generation, and dynamic column manipulation.

✅ Pivot Operations

The `PIVOT` clause allows you to rotate rows into columns based on specified values. This is particularly useful for creating crosstabular reports or aggregating data across multiple dimensions.

SELECT Product, [2023] AS Sales_2023, [2024] AS Sales_2024
FROM (
    SELECT Product, Year, SUM(Sales) AS TotalSales
    FROM SalesData
    GROUP BY Product, Year
) AS SourceTable
PIVOT (
    SUM(TotalSales)
    FOR Year IN (2023, 2024)
) AS PivotTable;

In this example, we're pivoting sales data by year, creating dynamic columns for each year.

✅ Unpivot Operations

The `UNPIVOT` clause transforms columns back into rows. This is useful when you need to revert data from a pivoted format or normalize it for further processing.

SELECT Product, Year, Sales
FROM (
    SELECT Product, [2023], [2024]
    FROM PivotTable
) AS SourceTable
UNPIVOT (
    Sales FOR Year IN ([2023], [2024])
) AS UnpivotTable;

Here, we're unpivoting the sales data to return it to a normalized format.

💡 Dynamic Pivoting with CASE Expressions

When you need more flexibility, `CASE` expressions can be used to dynamically determine pivot columns.

SELECT Product,
    SUM(CASE WHEN Year = 2023 THEN Sales ELSE 0 END) AS Sales_2023,
    SUM(CASE WHEN Year = 2024 THEN Sales ELSE 0 END) AS Sales_2024
FROM SalesData
GROUP BY Product;

💡 Best Practices for Pivoting and Unpivoting

Always specify column names explicitly when using `PIVOT` or `UNPIVOT`.
Use aggregate functions appropriately to avoid data duplication.
Test your queries with small datasets before scaling up.
Consider performance impacts, especially with large datasets.
Document your pivoting logic clearly for maintainability.

❌ Common Mistakes to Avoid

Don't forget to alias columns when using `PIVOT` or `UNPIVOT`.
Avoid using wildcard characters in pivot operations without careful consideration.
Don't use `PIVOT` for datasets with a high number of unique values, as it can lead to excessive column creation.

// Bad example: Using wildcards without aliasing
SELECT *
FROM SalesData
PIVOT (
    SUM(Sales)
    FOR Year IN (2023, 2024)
);

Always explicitly alias columns to maintain clarity and control over the output.

💡 Performance Considerations

Indexing is crucial for efficient pivoting operations.
Consider the cardinality of your pivot columns to avoid excessive memory usage.
Use `UNPIVOT` sparingly, as it can impact performance when dealing with wide tables.

For high-performance pivoting, ensure that the columns used in the `FOR` clause are properly indexed and have a manageable number of unique values.

Quiz

Question 1 of 16

Which window function assigns a unique sequential number to each row?

RANK()
ROW_NUMBER()
DENSE_RANK()
LAG()