SQL Queries and Filtering Data

In our previous articles, we covered the basics of creating tables, inserting data, and performing simple operations in SQL.

It's time to dive deeper into SQL by querying and filtering data. These skills are essential for extracting meaningful information from your databases.

Resetting Your Database

Before we dive into our examples, you might find that your tables don't look exactly like the ones we're using, especially if you've been experimenting with your database. Don't worry! Here's how you can reset your tables to match our examples:

Deleting Existing Tables

If you need to start fresh, you can delete existing tables using the DROP TABLE command. Be very careful with this command, as it will permanently delete the table and all its data!

DROP TABLE IF EXISTS Posts;
DROP TABLE IF EXISTS Users;

The IF EXISTS clause prevents an error if the table doesn't exist.

Deleting Data from Tables

If you want to keep your table structure but remove all the data, you can use the DELETE command without a WHERE clause:

DELETE FROM Posts;
DELETE FROM Users;

This removes all rows from the tables but keeps the table structure intact.

Recreating Tables

After deleting the tables or their data, you can recreate them using the CREATE TABLE commands we provided earlier:

CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50),
    Email VARCHAR(100),
    DateJoined DATE,
    IsActive BOOLEAN
);

CREATE TABLE Posts (
    PostID INT PRIMARY KEY,
    UserID INT,
    Content VARCHAR(500),
    PostDate DATE,
    Likes INT
);

Inserting Sample Data

To follow along with our examples, you'll need some data in your tables. Here's a set of INSERT statements to add sample data:

INSERT INTO Users (UserID, Username, Email, DateJoined, IsActive)
VALUES 
    (1, 'john_doe', 'john@example.com', '2024-08-15', TRUE),
    (2, 'jane_smith', 'jane@example.com', '2024-09-02', TRUE),
    (3, 'bob_johnson', 'bob@example.com', '2024-09-10', FALSE);

INSERT INTO Posts (PostID, UserID, Content, PostDate, Likes)
VALUES 
    (1, 1, 'Hello, World!', '2024-08-16', 10),
    (2, 1, 'Learning SQL is fun!', '2024-08-20', 15),
    (3, 2, 'My first post', '2024-09-03', 5),
    (4, 2, 'SQL is powerful', '2024-09-05', 20),
    (5, 3, 'Am I doing this right?', '2024-09-11', 3);

Now that we've covered resetting your database let's dive into querying and filtering data!

In the rest of the article, we will continue using our simple social media database example with Users and Posts tables.

Expanding on SELECT Statements

The SELECT statement is your primary tool for retrieving data from a database. We've used SELECT * to get all columns, but that's the beginning.

Selecting Specific Columns

Often, you'll want only specific columns, not everything. Here's how you do that:

SELECT Username, Email FROM Users;

This query retrieves only the Username and Email columns from the Users table. Why is this useful? It can:

  • Make your queries faster by retrieving less data
  • Simplify your results by showing only the information you need
  • Protect sensitive data by excluding columns you don't want to expose

Try it yourself: Can you write a query to get just the Content and Likes from the Posts table?

Using Aliases

Aliases allow you to rename columns in your result set. This can make your results more readable or clarify the meaning of a column:

SELECT 
    Username AS "User Name", 
    Email AS "Contact Email",
    DateJoined AS "Join Date"
FROM Users;

In this result, instead of "Username", "Email", and "DateJoined", your columns will be titled "User Name", "Contact Email", and "Join Date".

Aliases are handy for calculations or complex queries where column names might become unwieldy.

Filtering Data with WHERE

The WHERE clause is your best friend when filtering data. It allows you to specify conditions the data must meet to be included in the results.

Basic Comparisons

Let's start with some simple comparisons:

SELECT * FROM Users WHERE IsActive = TRUE;
SELECT * FROM Posts WHERE Likes > 10;
SELECT * FROM Users WHERE DateJoined >= '2024-09-01';

These queries demonstrate different types of comparisons:

  • Equality (=): Finding exact matches
  • Greater than (>): Finding values above a threshold
  • Greater than or equal to (>=): Finding values at or above a point

You can also use < (less than), <= (less than or equal to), and <> or != (not equal to).

Add more data if some of the conditions don't return any data to confirm that your logic works.

Combining Conditions

You can also combine conditions using AND, OR, and NOT to get even more specific:

SELECT * 
FROM Posts 
WHERE Likes > 5 AND PostDate >= '2024-09-01';

SELECT *
FROM Users
WHERE IsActive = TRUE OR DateJoined >= '2024-09-01';

SELECT *
FROM Users
WHERE NOT IsActive = FALSE;
  • AND: Both conditions must be true
  • OR: At least one condition must be true
  • NOT: Inverts the condition

You can use parentheses to group conditions for more complex logic:

SELECT *
FROM Posts
WHERE (Likes > 10 OR PostDate >= '2024-09-01') AND UserID = 1;

This finds popular or recent posts, but only from user 1.

Using BETWEEN and IN

BETWEEN is excellent for a range query:

SELECT * 
FROM Posts 
WHERE PostDate BETWEEN '2024-09-01' AND '2024-09-30';

This selects all posts from September 2024. BETWEEN is inclusive, meaning it includes the boundary values.

IN allows you to specify a list of possible values:

SELECT * 
FROM Users 
WHERE UserID IN (1, 3, 5);

This is equivalent to UserID = 1 OR UserID = 3 OR UserID = 5, but more concise.

Sorting Results with ORDER BY

Sorting your results can make them much more useful. The ORDER BY clause does this for you:

SELECT * FROM Posts ORDER BY Likes DESC;

This orders posts by the number of likes, from highest to lowest. DESC means descending order. If you omit DESC, it defaults to ASC (ascending order).

You can sort by multiple columns:

SELECT * 
FROM Posts 
ORDER BY PostDate DESC, Likes DESC;

This orders posts first by date (newest first), and then by likes (highest first) for posts on the same date. It's like saying "Show me the most popular posts for each day, with the most recent days first."

Limiting Results with LIMIT

Sometimes you only want a certain number of results. That's where LIMIT comes in:

SELECT * 
FROM Posts 
ORDER BY Likes DESC 
LIMIT 5;

This query returns the 5 most-liked posts. LIMIT is often used with ORDER BY to get "top N" results.

Note: The syntax for LIMIT can vary between database systems. In some systems, you might see TOP or FETCH FIRST used instead.

Using Arithmetic in Queries

SQL isn't just for retrieving data as-is. You can perform calculations right in your queries:

SELECT 
    PostID, 
    Likes, 
    Likes * 2 AS "Double Likes",
    Likes + 10 AS "Likes Plus Ten",
    Likes / 2 AS "Half Likes"
FROM Posts;

This shows the actual number of likes, double that number, likes plus ten, and half the number of likes for each post.

You can use +, -, *, and / for basic arithmetic. Many SQL systems also support functions like ROUND() for rounding numbers:

SELECT 
    PostID, 
    Likes, 
    ROUND(Likes / 2.0, 1) AS "Half Likes Rounded"
FROM Posts;

This rounds "Half Likes" to one decimal place.

6. Aggregate Functions and GROUP BY

Aggregate functions perform calculations across a set of rows. They're incredibly powerful for summarizing data. Here are some common ones:

COUNT()

Counts the number of rows:

SELECT COUNT(*) AS "Total Users" FROM Users;
SELECT COUNT(DISTINCT UserID) AS "Users Who Posted" FROM Posts;

The second query counts unique UserIDs in the Posts table, showing how many users have made at least one post.

SUM() and AVG()

SUM() adds up values, while AVG() calculates the average:

SELECT 
    SUM(Likes) AS "Total Likes",
    AVG(Likes) AS "Average Likes"
FROM Posts;

This gives you the total number of likes across all posts and the average number of likes per post.

MIN() and MAX()

These find the minimum and maximum values:

SELECT 
    MIN(DateJoined) AS "First User Joined", 
    MAX(DateJoined) AS "Latest User Joined" 
FROM Users;

This shows the earliest and latest join dates for users.

Using GROUP BY

The GROUP BY clause is used with aggregate functions to group the result-set by one or more columns. This allows you to perform calculations on groups of rows instead of the entire table.

For example, to find the number of posts made by each user:

SELECT 
    UserID,
    COUNT(*) AS "Number of Posts"
FROM Posts
GROUP BY UserID;

This groups the posts by UserID and counts how many posts are in each group.

You can use multiple aggregate functions with GROUP BY:

SELECT 
    UserID,
    COUNT(*) AS "Number of Posts",
    SUM(Likes) AS "Total Likes",
    AVG(Likes) AS "Average Likes"
FROM Posts
GROUP BY UserID;

This gives you a summary of each user's posting activity.

You can also use WHERE with GROUP BY, but the WHERE clause filters rows before they're grouped:

SELECT 
    UserID,
    COUNT(*) AS "Number of Posts"
FROM Posts
WHERE PostDate >= '2024-09-01'
GROUP BY UserID;

This counts posts for each user, but only considering posts from September 2024 onwards.

If you want to filter the groups after they're created, you use the HAVING clause:

SELECT 
    UserID,
    COUNT(*) AS "Number of Posts"
FROM Posts
GROUP BY UserID
HAVING COUNT(*) > 5;

This shows only users who have made more than 5 posts.

Practice Exercises

Now it's time to practice!

Try these exercises:

  1. Retrieve the usernames of all users who joined in September 2024.
  2. Find all posts with more than 20 likes, ordered by the number of likes.
  3. Get the 3 most recent posts.
  4. Calculate the total number of posts for each user.
  5. Find the user(s) with the most recent join date.
  6. Get the average number of likes for posts made by each user, but only for users whose posts average more than 10 likes.

Here are some hints to get you started:

For exercise 1, you'll need to use WHERE with a condition on the DateJoined column.

For exercise 2, you'll combine WHERE and ORDER BY.

Exercise 3 involves ORDER BY and LIMIT.

Exercise 4 is a perfect use case for COUNT() and GROUP BY.

For exercise 5, think about how you can use MAX() with the DateJoined column.

Exercise 6 combines GROUP BY, AVG(), and HAVING.

Don't worry if you find these challenging. SQL takes practice!

Remember, you can't break anything by running SELECT queries - they only read data, they don't modify it.

In our next article, we'll dive into data types and table design, which will help you understand how to structure your data more effectively.

PostgresqlBeginnerDatabaseSql
Avatar for Niall Maher

Written by Niall Maher

Founder of Codú - The web developer community! I've worked in nearly every corner of technology businesses: Lead Developer, Software Architect, Product Manager, CTO, and now happily a Founder.

Loading

Fetching comments

Hey! 👋

Got something to say?

or to leave a comment.