KnowledgebaseBackups & Recovery › Full disaster-recovery walkthrough — from "VPS is gone" to back online

Full disaster-recovery walkthrough — from "VPS is gone" to back online

The hour after you realize you need to restore from off-host backup is not the time to be reading documentation. This article is the runbook — concrete steps from "my VPS is unreachable / corrupt / accidentally wiped" to "service is responding to traffic again," in the order you'd actually do them. Targets a single VPS running a typical stack (web server + database + some application data). Adapt as needed for PBX or mail server specifics; the structure is the same.

0. Before the disaster: have these on hand

If any of the following lives only on the dead VPS, you have a real problem. Make sure today, not after:

  • Backup repository URL + credentials.
  • Backup encryption passphrase.
  • SSH private key authorized on the backup server.
  • DNS provider login (to repoint A records).
  • LYLIX portal credentials (to order replacement VPS).
  • Anything else needed to bring the service back: API keys, license files, deployment scripts.

Store these in a password manager you can access from anywhere — not from a notes file on the dead box.

1. Confirm the VPS is actually unrecoverable

Before pulling the trigger on a full restore, rule out the cheaper recoveries:

  • Portal status: is the VPS running? Stopped? Suspended (billing)? An unintentional stop is fixed by clicking Start in the portal.
  • Browser console: log into the portal's console (bypasses SSH/network). If you can get a shell prompt, you have a network or sshd issue, not a destroyed VPS — fix that and skip the rest of this runbook.
  • Snapshot rollback: if a snapshot from before the problem exists in the portal, rolling back to it is the fastest recovery. Use this for "I made a config change and broke something" cases.
  • Rescue mode: if the VPS won't boot, rescue mode (in the portal) gives you a clean recovery environment with your disk mounted. Fix the issue (broken initramfs, edit /etc/fstab, recover GRUB) and reboot.

If none of those help — the VPS is genuinely lost, or restoring it would take longer than starting fresh — proceed.

2. Order a replacement VPS

From the LYLIX portal, order a fresh VPS sized similarly to the one you're replacing. Match the distro family if possible — restoring a Debian system onto AlmaLinux works but creates friction (different package names, paths, init expectations).

While it's deploying:

  • If it's an emergency, in the same order or via ticket, request that staff transfer the original IP to the new VPS. IP-keep is not self-serve but is often possible if requested early.
  • If you don't get the old IP back, you'll need to update DNS in step 7.

3. Set up SSH + the backup tool on the new VPS

# Get in via the browser console first (root password is in the welcome email).
# Then install your normal SSH key into authorized_keys.

mkdir -p ~/.ssh && chmod 700 ~/.ssh
echo "ssh-ed25519 AAAA... your-key" >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

# Disable password login once key works
sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
systemctl reload sshd

# Install the backup tool
apt update && apt install -y restic   # Debian / Ubuntu
# OR
dnf install -y restic                    # AlmaLinux (EPEL)

Confirm you can reach the backup repo:

export RESTIC_REPOSITORY=...    # B2/S3/SSH URL
export RESTIC_PASSWORD=...
restic snapshots | head -5

If restic snapshots lists archives, you're in. If it fails, fix that before continuing — restoration without repo access is a non-starter.

4. Restore in two passes

Pass 1: restore everything to a staging directory. Don't restore directly to / on the new VPS — you'd be overwriting the package manager's state with your old install and creating a mess.

mkdir /restore
restic restore latest --target /restore

Pass 2: pick out what you actually need and copy it into place on the new system.

  • /etc/ — selectively. Don't overwrite the new system's /etc/passwd, /etc/shadow, /etc/group wholesale (UIDs may differ). Do restore: application configs (/etc/nginx/, /etc/postfix/, etc.), TLS certs, custom systemd units.
  • /var/www/, /srv/, /opt/<your-app>/ — restore directly. Fix ownership after.
  • Database data — restore the dumps (don't copy /var/lib/mysql directly) and re-import.
  • /home/<users>/ — restore, recreate users on the new system with matching UIDs if possible.

5. Re-install packages, then re-import config

On the new VPS, install the same software stack: apt install nginx postfix dovecot .... Use a script if you have one; otherwise grep the restored /var/log/dpkg.log or /var/log/dnf.log for what was installed on the old box.

After packages are installed, drop your restored configs into /etc/ on top of the defaults, fix ownership (most config files want root:root 0644 but secrets like TLS keys want 0600), and restart services.

6. Re-import database dumps

# MySQL / MariaDB
zcat /restore/backups/mysql-2026-06-22.sql.gz | mysql

# PostgreSQL
pg_restore -d mydb /restore/backups/mydb-2026-06-22.dump

Verify with row counts against what you remember from production. If counts are off, your last dump was incomplete — see the backup verification article for the drill that catches this.

7. Cut DNS over

If you got the old IP back, no DNS changes needed — service is already reachable. If you got a new IP, update A/AAAA records at your DNS provider. Lower the TTL before the incident if you can; during the incident, you'll set the new IP and wait out whatever TTL was already cached.

8. Smoke-test from outside

  • HTTP/HTTPS: curl from a different network (your phone on cellular) and confirm the right content loads.
  • Mail: send a test message in and out, check the headers, verify SPF/DKIM/DMARC still pass.
  • SSH: from a fresh location, confirm key-only login works.
  • Application-specific: log in, perform a few common actions, check that data the user expects to see is there.

9. Post-incident

  • Document what was missing from the backup and add it.
  • Document how long the recovery actually took. Your stated RTO is whatever this turned out to be; the number you wrote on a planning doc is a wish.
  • If the old VPS had a public-facing identity (mail sender reputation, SSH host-key fingerprints customers had in known_hosts), warn users that those may change.
  • If the cause was unclear or external (hypervisor failure rather than user error), open a ticket with LYLIX to get the postmortem context.

The under-an-hour version

If your stack is simple (one web app, one database, one config directory), and you've practiced this drill before, the whole runbook from "VPS is gone" to "DNS cut over and service responding" can fit inside 30 minutes. Most operators discover the first time that theirs is more like 3 hours because of the things they hadn't actually tested. The fix is to do the drill before the disaster — see the backup verification article.

Also Read

« « Back

Powered by WHMCompleteSolution