unsubbed.co

ArchiveBox

Self-hosted web archiving tool that saves pages as HTML, PDF, screenshots, and WARC files from bookmarks, history, or RSS feeds.

Overview

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more… 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more… The project has 27K+ GitHub stars and is licensed under MIT.

Key Features

Source: GitHub README

  • Free & open source, own your own data & maintain your privacy by self-hosting
  • Powerful CLI with modular dependencies and support for Google Drive/NFS/SMB/S3/B2/etc.
  • Comprehensive documentation, active development, and rich community
  • Extracts a wide variety of content out-of-the-box: media (yt-dlp), articles (readability), code (git), etc.
  • Supports scheduled/realtime importing from many types of sources
  • Uses standard, durable, long-term formats like HTML, JSON, PDF, PNG, MP4, TXT, and WARC
  • Saves all pages to archive.org as well by default for redundancy (can be disabled for local-only mode)
  • Advanced users: support for archiving content requiring login/paywall/cookies (see wiki security caveats!)

Getting Started

Source: GitHub README

pip install archivebox mkdir -p ~/archivebox/data && cd ~/archivebox/data archivebox init —install

Normalized Features

Source: tool-features-normalized.json

apt, brew, desktop app, docker, docker compose, ldap, npm, pip, plugins, portainer, rest api, sqlite, sso, unraid, webhooks, yunohost.

Features

Authentication & Access

  • LDAP / Active Directory
  • Single Sign-On (SSO)

Integrations & APIs

  • Plugin / Extension System
  • REST API
  • Webhooks

Mobile & Desktop

  • Desktop App