PostgreSQL backup with WAL archiving — point-in-time recovery
Nightly logical dumps (pg_dump) give you a daily snapshot — restore loses up to 24 hours of changes since the last dump. For workloads where that's unacceptable (financial transactions, real-time SaaS data, anything where "lost an hour" is a real problem), PostgreSQL's WAL archiving + base backup pattern gives you point-in-time recovery to any second. This article walks through it.
How it works
- PostgreSQL continuously writes a Write-Ahead Log (WAL) — every change to the database appears in the WAL stream first.
- You take a periodic base backup (full snapshot of the data directory).
- Between base backups, you archive every WAL segment as PostgreSQL completes it (typically 16 MB segments, completed every few minutes on a busy DB).
- To restore: restore the base backup, then replay WAL segments forward to whatever point in time you want.
End-to-end RPO: bounded by how often you ship WAL off-host. Synchronous shipping = effectively zero data loss; periodic shipping = the period.
Configuration
Edit postgresql.conf:
wal_level = replica # enables WAL archiving (default in recent versions)
archive_mode = on
archive_command = '/usr/local/bin/wal-archive.sh "%p" "%f"'
archive_timeout = 300 # force WAL segment switch every 5 min
# (so backup doesn't lag forever during low-write periods)
The archive_command is a shell command PostgreSQL invokes for each completed WAL segment. %p is the absolute path; %f is just the filename. The command MUST return 0 on success — failures cause PostgreSQL to retry until the disk fills with un-archived WAL.
A working /usr/local/bin/wal-archive.sh:
#!/bin/bash
set -e
SRC="$1" # full path to WAL segment
NAME="$2" # just the filename
DEST="/var/lib/postgresql/wal_archive/$NAME"
# Copy locally first (cheap, fast)
cp "$SRC" "$DEST"
# Then ship off-host
restic backup "$DEST" --tag wal --quiet
# Optionally: prune old local copies past N days
find /var/lib/postgresql/wal_archive -type f -mtime +7 -delete
Make it executable, owned by the postgres user, mode 0700.
Restart PostgreSQL after enabling archiving: systemctl restart postgresql.
Taking the base backup
sudo -u postgres pg_basebackup -D /var/lib/postgresql/basebackup -Fp -Xs -P
-Ddestination directory (will be created; must not exist).-Fpplain (file-by-file) format.-Xsinclude WAL stream during backup (so the backup is self-consistent at the end of the run).-Pshow progress.
Schedule weekly (or daily for smaller DBs):
# /etc/cron.weekly/pg-basebackup
#!/bin/bash
DEST="/var/lib/postgresql/basebackups/$(date +\%F)"
sudo -u postgres pg_basebackup -D "$DEST" -Fp -Xs -z
restic backup "$DEST" --tag basebackup
# Keep last 4 weekly base backups locally
find /var/lib/postgresql/basebackups -mindepth 1 -maxdepth 1 -type d -mtime +28 -exec rm -rf {} +
Restoring to a point in time
The scenario: it's 14:23 on Wednesday; at 14:15, someone ran a DELETE that wiped a critical table. You want to roll back to 14:14 — keeping all the legitimate work done that day before the bad DELETE.
- Stop the broken PostgreSQL:
systemctl stop postgresql - Move the broken data dir aside:
mv /var/lib/postgresql/15/main /var/lib/postgresql/15/main.broken - Restore the latest base backup:
cp -r /var/lib/postgresql/basebackups/2026-06-22 /var/lib/postgresql/15/main(or restore via restic) - Restore WAL segments for the period between base backup and your target time:
# Pull all WAL since the base backup from restic / B2 / wherever restic restore latest --target /tmp/wal-restore --include /var/lib/postgresql/wal_archive # Copy to where PostgreSQL expects them cp /tmp/wal-restore/var/lib/postgresql/wal_archive/* /var/lib/postgresql/15/main/pg_wal/ - Configure recovery target. In
postgresql.conf(or a recovery signal file in newer versions):
And createrecovery_target_time = '2026-06-25 14:14:00' recovery_target_action = 'promote'/var/lib/postgresql/15/main/recovery.signal(empty file). - Start PostgreSQL. It replays WAL segments forward to your target time, then becomes available.
- Verify the data is what you expected. Connect, check the critical table, confirm the row counts.
Operational notes
- WAL archiving uses disk + bandwidth. Steady-state WAL volume depends on write workload — a busy DB can generate gigabytes per day. Plan storage accordingly.
- archive_command failures fill the disk. If your shipping script breaks, WAL accumulates locally; eventually the data partition fills and PostgreSQL refuses writes. Monitor disk space + the success rate of the archive command.
- Test the restore. A WAL archiving setup that has never been tested doesn't work. Restore a test instance from your backups quarterly; verify you can hit any timestamp.
- Base backup + WAL ≠ pg_dump. Base backup is file-level; if your data files are corrupt (rare but possible), the corruption is preserved. pg_dump is logical; corrupt rows would error during the dump and you'd know.
The pragmatic position: run BOTH. Logical pg_dump nightly for catching corruption; base backup + WAL archive for point-in-time recovery. Storage is cheap; insurance is the point.
Also Read
Powered by WHMCompleteSolution