31 SQL Group by
31.1 Group by intro
GROUP BY clause in SQL is used to group rows that have the same values in specified columns. GROUP by is often used with aggregate functions like
- avg
- sum
- min
- max
- count
SELECT CustomerId , COUNT(*) as invoice_count, SUM(Total) as total_sales
FROM Invoice
GROUP BY CustomerId;
The result will be something like below:
Above query will
- Group all invoices by CustomerId
- count number of invoices for each customer
- calculate sum of invoices for each customer
If we use order by in this query, like below, we could see which customers are making the most purchases.
SELECT CustomerId , COUNT(*) as invoice_count, SUM(Total) as total_sales
FROM Invoice
GROUP BY CustomerId
order By total_sales desc;
Other aggregate functions could be used to infer more insights to invoice data.
Group by real life example
- From Ministry of Labor and Social Security
- how many companies each city has?
- how many people are employed by companies in each different city?
31.2 where vs having
Difference between having and where clauses in sql is where they are applied in query execution.
First lets remember select query list. Below is simplified version where some clauses like order by is omitted.
- SELECT select_list [ INTO new_table ]
- [ FROM table_source ]
- [ WHERE search_condition ]
- [ GROUP BY group_by_expression ]
- [ HAVING search_condition ]
WHERE Clause
- Where clause is used to filter rows before any group by or aggregation.
- where filter rows using FROM information using row level.
- Where could be used with SELECT, UPDATE, and DELETE statements.
SELECT i.CustomerId , COUNT(i.InvoiceId) as invoice_count
FROM Invoice i
WHERE i.Total > 4
GROUP BY i.CustomerId ;
This query first filters invoices greater than 4, then count how many invoice exists for each customer in the remaining rows. That is this query answer the question: How many invoices greater than 4 each customer has?.
HAVING Clause
- HAVING clause is used to filter rows after group by.
- HAVING clause could use aggregate functions
- HAVING clause used with SELECT statements and GROUP BY.
SELECT i.CustomerId , SUM(i.Total) as total_sales
FROM Invoice i
GROUP BY i.CustomerId
HAVING SUM(i.Total) > 45;
Key Differences
- Order of execution: WHERE is applied before grouping, while HAVING is applied after grouping
- Aggregate functions: HAVING can use aggregate functions (like COUNT, SUM, AVG), whereas WHERE cannot
- Usage with GROUP BY: HAVING is used with GROUP BY
- Type of operations: WHERE works on rows, while HAVING works on aggregated data
In practice, it is common to see both WHERE and HAVING together, with WHERE filtering individual rows and HAVING filtering the grouped results.