Paperless-ngx on a LYLIX VPS — document scanning and search
Paperless-ngx turns a pile of scanned PDFs into a searchable, taggable document archive. OCR runs on every document; web UI lets you filter by tag, correspondent, date, or full-text search. Solves the "I have 10 years of receipts and tax docs in a drawer" problem. Runs comfortably on a 2 GB VPS for personal use.
Install (Docker Compose)
The official repo ships a docker-compose example. Adapt:
# /opt/paperless/docker-compose.yml
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:16
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- "127.0.0.1:8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_TIME_ZONE: America/New_York
PAPERLESS_URL: https://docs.example.com
PAPERLESS_SECRET_KEY: <long-random-string>
volumes:
data:
media:
pgdata:
redisdata:
cd /opt/paperless
docker compose up -d
# Create superuser
docker compose exec webserver python3 manage.py createsuperuser
Reverse proxy as usual.
How you actually use it
The ./consume directory on the VPS is the inbox. Anything dropped there gets OCR'd and added to the archive. Workflows:
- Scan to consume directory via SFTP / SMB — set up the scanner to deposit there directly.
- Email forwarding — configure PAPERLESS_EMAIL_* env vars to fetch from an IMAP inbox; receipts emailed to docs@example.com appear in your archive automatically.
- API upload — phone apps like Paperless Mobile or Paperless Share use the REST API.
OCR performance
Tesseract OCR runs CPU-bound. A single-page PDF takes a few seconds; a 50-page scanned document takes minutes on a 1-vCPU VPS. Worker process is configurable via PAPERLESS_TASK_WORKERS; default of 1 is conservative — bump to 2-4 if your VPS has the cores.
Backups
Two volumes matter: data (search index, metadata) and media (the actual document files). Plus the PostgreSQL DB. Back up all three with restic; restore order: PG first, then media + data.
Storage planning
Median scanned page: 100-500 KB. 10,000 documents averaging 5 pages each = roughly 50-100 GB. Plan accordingly.
Also Read
Powered by WHMCompleteSolution