SQL Concepts: Subqueries, Aggregations, and CTEs

Today, we will explore more advanced SQL concepts: Subqueries, Aggregates with GROUP BY, and Common Table Expressions (CTEs). Don't worry if these terms sound intimidating – we'll break them down step by step and show you how these powerful tools can help you write more complex and efficient queries.

We'll continue using our bookstore database example to keep things familiar. Here's a quick refresher on our database structure and sample data:

CREATE TABLE Authors (
    AuthorID INT PRIMARY KEY,
    Name VARCHAR(100)
);

CREATE TABLE Books (
    BookID INT PRIMARY KEY,
    Title VARCHAR(200),
    AuthorID INT,
    Price DECIMAL(10, 2),
    FOREIGN KEY (AuthorID) REFERENCES Authors(AuthorID)
);

INSERT INTO Authors (AuthorID, Name) VALUES 
(1, 'J.K. Rowling'),
(2, 'George Orwell'),
(3, 'Jane Austen');

INSERT INTO Books (BookID, Title, AuthorID, Price) VALUES
(1, 'Harry Potter and the Philosopher''s Stone', 1, 20.99),
(2, '1984', 2, 15.99),
(3, 'Pride and Prejudice', 3, 12.99),
(4, 'Animal Farm', 2, 11.99);

Now, let's explore our advanced concepts using this data.

Subqueries: Queries Within Queries

Query-ception!

Imagine you're solving a complex puzzle. Sometimes, you need to solve a smaller puzzle first to get a piece for the bigger puzzle. That's essentially what a subquery does in SQL. It's a query nested inside another query, helping you break down complex problems into smaller, manageable parts.

Filtering with Dynamically Calculated Values

Let's say we want to find all books priced higher than the average book price. We could do this in two steps:

  1. Calculate the average price
  2. Find books priced above that average

With a subquery, we can do both in one go:

SELECT Title, Price
FROM Books
WHERE Price > (SELECT AVG(Price) FROM Books);

This query will return:

Title                                       | Price
--------------------------------------------|-------
Harry Potter and the Philosopher's Stone    | 20.99

Let's break this down:

  • The subquery (SELECT AVG(Price) FROM Books) calculates the average price of all books.
  • The outer query then uses this average price in its WHERE clause to filter the books.

It's like asking, "What's the average price?" and then using that answer to ask, "Which books cost more than that?"

Creating Temporary Data Sets

Sometimes, we need to perform calculations on our data before we can use it. A subquery in the FROM clause allows us to create a temporary data set (also called a derived table) that we can then query further.

For example, let's find the authors who have written more than one book:

SELECT Name, BookCount
FROM Authors
JOIN (
    SELECT AuthorID, COUNT(*) as BookCount
    FROM Books
    GROUP BY AuthorID
    HAVING COUNT(*) > 1
) AS AuthorBooks ON Authors.AuthorID = AuthorBooks.AuthorID;

This query will return:

Name           | BookCount
---------------|----------
George Orwell  | 2

This query is doing several things:

  1. The subquery (everything inside the parentheses after JOIN) counts how many books each author has written.
  2. It only keeps authors who have written more than one book (that's what HAVING COUNT(*) > 1 does).
  3. We give this temporary result set the name "AuthorBooks".
  4. We then join this with our Authors table to get the author's name.

It's like creating a temporary list of prolific authors and then looking up their names in our main author list.

Aggregations and GROUP BY: Summarizing Data

Aggregation functions are like super-powered calculators for your data. They perform calculations on a set of values and return a single result. Common aggregation functions include COUNT(), SUM(), AVG(), MAX(), and MIN().

The GROUP BY clause is the aggregation functions' best friend. It allows us to split our data into groups and perform our calculations on each group separately.

Let's calculate the total value of books for each author:

SELECT Authors.Name, SUM(Books.Price) as TotalValue
FROM Authors
LEFT JOIN Books ON Authors.AuthorID = Books.AuthorID
GROUP BY Authors.AuthorID, Authors.Name;

This query will return:

Name           | TotalValue
---------------|------------
J.K. Rowling   | 20.99
George Orwell  | 27.98
Jane Austen    | 12.99

Here's what this query does:

  1. It joins the Authors and Books tables, so we have all the information in one place.
  2. The GROUP BY Authors.AuthorID, Authors.Name tells SQL to create a group for each unique author.
  3. For each of these groups (i.e., for each author), it sums up the prices of their books.

It's like sorting all the books into piles by author and adding up the prices of each pile.

The LEFT JOIN ensures that we see all authors, even if they haven't written any books (in which case their TotalValue would be NULL or 0, depending on how we handle NULLs).

Common Table Expressions (CTEs)

[Previous sections remain the same]

Common Table Expressions (CTEs): Breaking Down Complex Queries

Common Table Expressions, or CTEs, are a powerful feature in SQL that allows you to create named temporary result sets that exist only for the duration of a query. Think of a CTE as a temporary table that you define at the beginning of your query and can reference later in the same query.

Why Use CTEs?

  • Readability: CTEs make complex queries easier to read and understand by breaking them into smaller, named parts.
  • Reusability: You can reference a CTE multiple times within the same query.
  • Recursion: CTEs can be recursive, allowing you to work with hierarchical or tree-structured data.

Basic CTE Syntax

The basic syntax of a CTE looks like this:

WITH cte_name AS (
    -- CTE query definition
)
SELECT * FROM cte_name;

Let's start with a simple example. Say we want to find out which books are priced above the average book price. We could use a subquery, but let's see how we can do this with a CTE:

WITH AveragePrice AS (
    SELECT AVG(Price) as AvgPrice
    FROM Books
)
SELECT Books.Title, Books.Price
FROM Books, AveragePrice
WHERE Books.Price > AveragePrice.AvgPrice;

Let's break this down:

  1. We start with the WITH keyword, which signals the beginning of our CTE definition.
  2. We name our CTE AveragePrice. This name can be anything you choose, but picking something descriptive is good.
  3. Inside the parentheses, we define our CTE query. Here, we're calculating the average price of all books.
  4. After the CTE definition, we write our main query. This query can reference our CTE by name (AveragePrice).
  5. In the main query, we select books with a higher price than the average price we calculated in the CTE.

This query will return all books priced above the average.

Now, let's revisit our more complex example of ranking books by price for each author:

WITH RankedBooks AS (
    SELECT 
        Books.Title,
        Authors.Name as AuthorName,
        Books.Price,
        ROW_NUMBER() OVER (PARTITION BY Books.AuthorID ORDER BY Books.Price DESC) as PriceRank
    FROM Books
    JOIN Authors ON Books.AuthorID = Authors.AuthorID
)
SELECT *
FROM RankedBooks
WHERE PriceRank = 1;

Let's break this down step by step:

  1. We define a CTE named RankedBooks.

  2. Inside the CTE:

    • We SELECT the book title, author name, and price.
    • We use the ROW_NUMBER() function to assign a rank to each book based on its price.
    • OVER (PARTITION BY Books.AuthorID ORDER BY Books.Price DESC) is a window function that:
      • Partitions the data by AuthorID (i.e., groups the books by author)
      • Orders the books within each partition by price in descending order
      • Assigns a row number (rank) to each book within its partition
  3. In the main query, we SELECT from our CTE and filter for only the top-ranked book for each author (WHERE PriceRank = 1).

This query will return the highest-priced book for each author:

Title                                       | AuthorName    | Price | PriceRank
--------------------------------------------|---------------|-------|----------
Harry Potter and the Philosopher's Stone    | J.K. Rowling  | 20.99 | 1
1984                                        | George Orwell | 15.99 | 1
Pride and Prejudice                         | Jane Austen   | 12.99 | 1

You can define multiple CTEs in a single query, separated by commas. For example:

WITH 
AuthorBookCounts AS (
    SELECT AuthorID, COUNT(*) as BookCount
    FROM Books
    GROUP BY AuthorID
),
AverageBookCount AS (
    SELECT AVG(BookCount) as AvgBookCount
    FROM AuthorBookCounts
)
SELECT Authors.Name, AuthorBookCounts.BookCount
FROM Authors
JOIN AuthorBookCounts ON Authors.AuthorID = AuthorBookCounts.AuthorID
CROSS JOIN AverageBookCount
WHERE AuthorBookCounts.BookCount > AverageBookCount.AvgBookCount;

This query defines two CTEs:

  1. AuthorBookCounts: Counts the number of books for each author
  2. AverageBookCount: Calculates the average number of books per author

The main query then uses both CTEs to find authors who have written more books than average.

CTEs are particularly useful when:

  • Your query involves multiple steps or complex logic
  • You need to reference the same subquery multiple times
  • You want to improve the readability of your complex queries
  • You're working with recursive queries (like traversing a tree structure)

Note: CTEs only exist for the duration of the query. If you need a more permanent solution, consider alternative options.

With practice, you'll find that CTEs can make your complex SQL queries much easier to write, read, and maintain.

Practice Exercise

To solidify your understanding, try writing queries to answer these questions:

  1. Find all books that are priced higher than the average price of books by the same author. (Hint: Use a subquery in the WHERE clause)
  2. For each author, find the difference between their highest and lowest priced books. (Hint: Use GROUP BY with MIN and MAX)
  3. Create a CTE that calculates the average book price, then use it to categorize each book as 'Expensive' if it's above average or 'Cheap' if it's below average.

Remember, these advanced concepts take time to master. Don't be discouraged if you find them challenging at first – every SQL expert started as a beginner.

In our next article, we'll explore how to optimize these types of queries for better performance. Until then, happy querying!

PostgresqlSqlBeginnerDatabase
Avatar for Niall Maher

Written by Niall Maher

Founder of Codú - The web developer community! I've worked in nearly every corner of technology businesses: Lead Developer, Software Architect, Product Manager, CTO, and now happily a Founder.

Loading

Fetching comments

Hey! 👋

Got something to say?

or to leave a comment.