KnowledgebaseSelf-Hosting › Paperless-ngx on a LYLIX VPS — document scanning and search

Paperless-ngx on a LYLIX VPS — document scanning and search

Paperless-ngx turns a pile of scanned PDFs into a searchable, taggable document archive. OCR runs on every document; web UI lets you filter by tag, correspondent, date, or full-text search. Solves the "I have 10 years of receipts and tax docs in a drawer" problem. Runs comfortably on a 2 GB VPS for personal use.

Install (Docker Compose)

The official repo ships a docker-compose example. Adapt:

# /opt/paperless/docker-compose.yml
services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  db:
    image: docker.io/library/postgres:16
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
    ports:
      - "127.0.0.1:8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_OCR_LANGUAGE: eng
      PAPERLESS_TIME_ZONE: America/New_York
      PAPERLESS_URL: https://docs.example.com
      PAPERLESS_SECRET_KEY: <long-random-string>

volumes:
  data:
  media:
  pgdata:
  redisdata:
cd /opt/paperless
docker compose up -d
# Create superuser
docker compose exec webserver python3 manage.py createsuperuser

Reverse proxy as usual.

How you actually use it

The ./consume directory on the VPS is the inbox. Anything dropped there gets OCR'd and added to the archive. Workflows:

  • Scan to consume directory via SFTP / SMB — set up the scanner to deposit there directly.
  • Email forwarding — configure PAPERLESS_EMAIL_* env vars to fetch from an IMAP inbox; receipts emailed to docs@example.com appear in your archive automatically.
  • API upload — phone apps like Paperless Mobile or Paperless Share use the REST API.

OCR performance

Tesseract OCR runs CPU-bound. A single-page PDF takes a few seconds; a 50-page scanned document takes minutes on a 1-vCPU VPS. Worker process is configurable via PAPERLESS_TASK_WORKERS; default of 1 is conservative — bump to 2-4 if your VPS has the cores.

Backups

Two volumes matter: data (search index, metadata) and media (the actual document files). Plus the PostgreSQL DB. Back up all three with restic; restore order: PG first, then media + data.

Storage planning

Median scanned page: 100-500 KB. 10,000 documents averaging 5 pages each = roughly 50-100 GB. Plan accordingly.

Also Read

« « Back

Powered by WHMCompleteSolution