Understanding Migrations in PostgreSQL

As your application grows and evolves, so too must your database structure. Managing these changes effectively is crucial, especially when working in a team or deploying to multiple environments. This is where database migrations come in.

In this article, we'll explore how to implement a basic migration system for PostgreSQL databases using built-in tools and simple bash scripts.

What are Database Migrations?

Database migrations are a way to manage changes to your database schema over time. They allow you to:

Track changes to your database structure
Apply these changes consistently across different environments
Roll back changes if necessary
Collaborate effectively in a team setting

Think of migrations like a version control system (similar to Git) for your database schema. I think it's good to understand why we use them in nearly every project you create. Here are the top reasons I could think of:

Consistency: Ensures all environments (development, staging, production) have the same database structure.
History: Provides a clear record of how your database has evolved.
Collaboration: Allows team members to share and apply database changes easily.
Deployments: Simplifies updating your database structure when deploying new versions of your application.
Rollbacks: Allows you to undo changes if something goes wrong.

Setting Up a DIY Migration System

Before diving in, it's important to note that developers typically use specialized tools like Flyway, Liquibase, or Alembic to manage database migrations in real-world scenarios. These tools provide robust features and handle many edge cases. However, understanding the underlying process is valuable, so we'll create a simple system using PostgreSQL's built-in tools: pg_dump and psql.

We'll use some basic shell scripts to manage our migrations. Don't worry if you're not familiar with shell scripting – the goal here is to understand the process, not the specific syntax.

Directory Structure

First, set up your project structure:

your_project/
├── migrations/
│   ├── applied/
│   └── pending/
├── scripts/
│   ├── apply_migration.sh
│   └── rollback_migration.sh
└── database_version

The Migration Scripts

We'll create two Bash scripts to manage our migrations. Don't worry if you're not familiar with Bash scripting—the goal here is to understand the process, not the specific syntax.

apply_migration.sh: This script applies a new migration to the database.

#!/bin/bash

DB_NAME="your_database"
MIGRATIONS_DIR="./migrations"
VERSION_FILE="./database_version"

# Get the current database version
current_version=$(cat $VERSION_FILE)
new_version=$((current_version + 1))

# Check if there's a migration to apply
if [ ! -f "$MIGRATIONS_DIR/pending/${new_version}_up.sql" ]; then
    echo "No new migration to apply"
    exit 0
fi

# Apply the migration
psql -d $DB_NAME -f "$MIGRATIONS_DIR/pending/${new_version}_up.sql"

if [ $? -eq 0 ]; then
    # Update the version file
    echo $new_version > $VERSION_FILE
    # Move the applied migration files
    mv "$MIGRATIONS_DIR/pending/${new_version}_up.sql" "$MIGRATIONS_DIR/applied/"
    mv "$MIGRATIONS_DIR/pending/${new_version}_down.sql" "$MIGRATIONS_DIR/applied/"
    echo "Migration $new_version applied successfully"
else
    echo "Error applying migration $new_version"
    exit 1
fi

rollback_migration.sh: Then, this script rolls back the last applied migration. This means this script is like your undo button.

#!/bin/bash

DB_NAME="your_database"
MIGRATIONS_DIR="./migrations"
VERSION_FILE="./database_version"

# Get the current database version
current_version=$(cat $VERSION_FILE)

if [ $current_version -eq 0 ]; then
    echo "No migrations to roll back"
    exit 0
fi

# Roll back the last migration
psql -d $DB_NAME -f "$MIGRATIONS_DIR/applied/${current_version}_down.sql"

if [ $? -eq 0 ]; then
    # Update the version file
    new_version=$((current_version - 1))
    echo $new_version > $VERSION_FILE
    # Move the rolled back migration files
    mv "$MIGRATIONS_DIR/applied/${current_version}_up.sql" "$MIGRATIONS_DIR/pending/"
    mv "$MIGRATIONS_DIR/applied/${current_version}_down.sql" "$MIGRATIONS_DIR/pending/"
    echo "Migration $current_version rolled back successfully"
else
    echo "Error rolling back migration $current_version"
    exit 1
fi

Make these scripts executable:

chmod +x scripts/apply_migration.sh scripts/rollback_migration.sh

Creating and Applying Migrations

Now that we have our scripts set up let's walk through the process of creating and applying a migration. Remember to DROP tables if they already exist in your example database so you can follow these exercises.

Here's what we are going to do:

Decide on the change you want to make to your schema.
Create two new files in the migrations/pending/ directory:
- 1_up.sql: Contains the SQL to apply the change
- 1_down.sql: Contains the SQL to revert the change

For example, let's create a new books table, let's assume we have an empty database and want to create it like this:

1_up.sql:

CREATE TABLE IF NOT EXISTS books (
    id SERIAL PRIMARY KEY,
    title VARCHAR(100) NOT NULL,
    author VARCHAR(100) NOT NULL,
    publication_year INTEGER
);

Then if you needed to remove it, you would write the code to DROP it:

1_down.sql:

DROP TABLE IF EXISTS books;

Applying a Migration

To apply the migration, simply run:

./scripts/apply_migration.sh

This script will:

Check the current database version
Apply the next pending migration
Update the version number
Move the applied migration files to the applied/ directory

Rolling Back a Migration

If you need to undo the last migration, run:

./scripts/rollback_migration.sh

This script will:

Check the current database version
Apply the corresponding down migration
Decrement the version number
Move the migration files back to the pending/ directory

Best Practices for Migrations

Error-Proof Database Updates: They should be able to run multiple times without causing errors. Use CREATE TABLE IF NOT EXISTS or DROP TABLE IF EXISTS where appropriate. You'll notice you can run the migrations we wrote above multiple times without getting errors (but also the right results).

Use transactions: Wrap your migrations in transactions to ensure they're all-or-nothing operations. Our above action only has one operation so it didn't matter but you should use a transaction when there is multiple operations:

BEGIN;
-- Your migration SQL here
COMMIT;

Keep migrations small: One logical change per migration makes it easier to manage and roll back if needed. Treat them like commits.

Never modify an existing migration: Treat it as immutable once a migration has been applied and shared with others. If you need to change something, create a new migration.

Use descriptive names: Name your migration files descriptively, e.g., 1_create_books_table_up.sql.

Test migrations thoroughly: Always test your migrations, including rollbacks, in a safe environment before applying them to production.

Version control your migrations: Include your migration files in your project's version control system.

Limitations

While this DIY approach is excellent for understanding the concepts of database migrations, it has limitations:

Lack of advanced features: Professional migration tools offer features like dependency management, automatic rollbacks, and database-agnostic migrations.

Manual version tracking: Our simple version file could become a point of failure in more complex scenarios.

Limited error handling: Our scripts don't handle all possible error scenarios that could occur during migrations.

As mentioned, teams often use specialized database migration tools in real-world production environments that provide more robust features and better handle complex scenarios. Some popular tools include Flyway, Liquibase, and Alembic. These tools build upon the concepts we've explored here, adding features like:

Automatic detection and running of new migrations
Support for multiple database types
Integration with build and deployment pipelines
More sophisticated version tracking and conflict resolution

However, understanding the underlying concepts of how migrations work is valuable even when using these more advanced tools. It helps you better understand what these tools are doing behind the scenes and can be helpful when debugging migration issues.

Hopefully, by now, you can see why migrations are required in nearly every PostgreSQL project. Understanding how migrations work, including version tracking, applying changes in order, and providing a way to roll back changes, will be helpful throughout your career.