This module covers GROUP BY and HAVING, ROLLUP, CUBE, and GROUPING SETS.
The `GROUP BY` clause is used in SQL to group rows that have the same values in specified columns. This allows you to perform aggregate functions like counting, summing, averaging, etc., on these groups of rows.
SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1;
The `HAVING` clause is used to filter groups based on a condition. It's similar to the `WHERE` clause but applies to aggregated data.
SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1
HAVING COUNT(*) > 10;
You can group by multiple columns or even expressions. For example, grouping by a substring of a column.
SELECT SUBSTRING(column1, 1, 3), COUNT(*)
FROM table_name
GROUP BY SUBSTRING(column1, 1, 3);
Let's say you have an `orders` table and want to find the total sales per customer.
SELECT customer_id, SUM(amount) AS total_sales
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 1000;
In advanced SQL aggregations, particularly in OLAP (Online Analytical Processing) scenarios, we often need to generate subtotals and totals across multiple dimensions. This is where **ROLLUP**, **CUBE**, and **GROUPING SETS** come into play. These powerful tools allow you to create complex hierarchical groupings and summarize data efficiently.
**ROLLUP** generates subtotals for the specified grouping columns, as well as a grand total. It is particularly useful when you need to create hierarchical groupings and see totals at each level.
SELECT region, year, SUM(sales) AS total_sales
FROM sales_data
GROUP BY ROLLUP(region, year);
**CUBE** generates all possible combinations of the specified grouping columns. This includes individual groupings, pairwise combinations, and the grand total. Use **CUBE** when you need to analyze data across multiple dimensions simultaneously.
SELECT region, product_category, SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, product_category);
**GROUPING SETS** allow you to explicitly define which groupings should be included in the result. This provides precise control over the output, making it easier to customize your aggregations.
SELECT region, product_category, SUM(sales) AS total_sales
FROM sales_data
GROUP BY GROUPING SETS(
(region),
(product_category)
);
SELECT region, year, product_category,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, year, product_category)
ORDER BY GROUPING_ID();
This query will generate all possible combinations of the three dimensions and include a `GROUPING_ID()` for easier interpretation. The results will show detailed sales data along with various subtotal and total aggregations.
Question 1 of 11