KnowledgebaseDatabases › PostgreSQL backup with WAL archiving — point-in-time recovery

PostgreSQL backup with WAL archiving — point-in-time recovery

Nightly logical dumps (pg_dump) give you a daily snapshot — restore loses up to 24 hours of changes since the last dump. For workloads where that's unacceptable (financial transactions, real-time SaaS data, anything where "lost an hour" is a real problem), PostgreSQL's WAL archiving + base backup pattern gives you point-in-time recovery to any second. This article walks through it.

How it works

  1. PostgreSQL continuously writes a Write-Ahead Log (WAL) — every change to the database appears in the WAL stream first.
  2. You take a periodic base backup (full snapshot of the data directory).
  3. Between base backups, you archive every WAL segment as PostgreSQL completes it (typically 16 MB segments, completed every few minutes on a busy DB).
  4. To restore: restore the base backup, then replay WAL segments forward to whatever point in time you want.

End-to-end RPO: bounded by how often you ship WAL off-host. Synchronous shipping = effectively zero data loss; periodic shipping = the period.

Configuration

Edit postgresql.conf:

wal_level = replica                    # enables WAL archiving (default in recent versions)
archive_mode = on
archive_command = '/usr/local/bin/wal-archive.sh "%p" "%f"'
archive_timeout = 300                   # force WAL segment switch every 5 min
                                        # (so backup doesn't lag forever during low-write periods)

The archive_command is a shell command PostgreSQL invokes for each completed WAL segment. %p is the absolute path; %f is just the filename. The command MUST return 0 on success — failures cause PostgreSQL to retry until the disk fills with un-archived WAL.

A working /usr/local/bin/wal-archive.sh:

#!/bin/bash
set -e
SRC="$1"      # full path to WAL segment
NAME="$2"     # just the filename
DEST="/var/lib/postgresql/wal_archive/$NAME"
# Copy locally first (cheap, fast)
cp "$SRC" "$DEST"
# Then ship off-host
restic backup "$DEST" --tag wal --quiet
# Optionally: prune old local copies past N days
find /var/lib/postgresql/wal_archive -type f -mtime +7 -delete

Make it executable, owned by the postgres user, mode 0700.

Restart PostgreSQL after enabling archiving: systemctl restart postgresql.

Taking the base backup

sudo -u postgres pg_basebackup -D /var/lib/postgresql/basebackup -Fp -Xs -P
  • -D destination directory (will be created; must not exist).
  • -Fp plain (file-by-file) format.
  • -Xs include WAL stream during backup (so the backup is self-consistent at the end of the run).
  • -P show progress.

Schedule weekly (or daily for smaller DBs):

# /etc/cron.weekly/pg-basebackup
#!/bin/bash
DEST="/var/lib/postgresql/basebackups/$(date +\%F)"
sudo -u postgres pg_basebackup -D "$DEST" -Fp -Xs -z
restic backup "$DEST" --tag basebackup
# Keep last 4 weekly base backups locally
find /var/lib/postgresql/basebackups -mindepth 1 -maxdepth 1 -type d -mtime +28 -exec rm -rf {} +

Restoring to a point in time

The scenario: it's 14:23 on Wednesday; at 14:15, someone ran a DELETE that wiped a critical table. You want to roll back to 14:14 — keeping all the legitimate work done that day before the bad DELETE.

  1. Stop the broken PostgreSQL: systemctl stop postgresql
  2. Move the broken data dir aside: mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main.broken
  3. Restore the latest base backup: cp -r /var/lib/postgresql/basebackups/2026-06-22 /var/lib/postgresql/15/main (or restore via restic)
  4. Restore WAL segments for the period between base backup and your target time:
    # Pull all WAL since the base backup from restic / B2 / wherever
    restic restore latest --target /tmp/wal-restore --include /var/lib/postgresql/wal_archive
    # Copy to where PostgreSQL expects them
    cp /tmp/wal-restore/var/lib/postgresql/wal_archive/* /var/lib/postgresql/15/main/pg_wal/
  5. Configure recovery target. In postgresql.conf (or a recovery signal file in newer versions):
    recovery_target_time = '2026-06-25 14:14:00'
    recovery_target_action = 'promote'
    And create /var/lib/postgresql/15/main/recovery.signal (empty file).
  6. Start PostgreSQL. It replays WAL segments forward to your target time, then becomes available.
  7. Verify the data is what you expected. Connect, check the critical table, confirm the row counts.

Operational notes

  • WAL archiving uses disk + bandwidth. Steady-state WAL volume depends on write workload — a busy DB can generate gigabytes per day. Plan storage accordingly.
  • archive_command failures fill the disk. If your shipping script breaks, WAL accumulates locally; eventually the data partition fills and PostgreSQL refuses writes. Monitor disk space + the success rate of the archive command.
  • Test the restore. A WAL archiving setup that has never been tested doesn't work. Restore a test instance from your backups quarterly; verify you can hit any timestamp.
  • Base backup + WAL ≠ pg_dump. Base backup is file-level; if your data files are corrupt (rare but possible), the corruption is preserved. pg_dump is logical; corrupt rows would error during the dump and you'd know.

The pragmatic position: run BOTH. Logical pg_dump nightly for catching corruption; base backup + WAL archive for point-in-time recovery. Storage is cheap; insurance is the point.

Also Read

« « Back

Powered by WHMCompleteSolution