unsubbed.co

Sosse

Self-hosted server monitoring tool that provides selenium based search engine and crawler with offline archiving.

Overview

Selenium Open Source Search Engine & crawler Crawl, archive, and search web pages—including JavaScript-heavy sites—with Sosse. Open source, flexible, and Selenium-powered. The project has 402 GitHub stars and is licensed under AGPL-3.0.

Key Features

Source: GitHub README

  • 🌍 Web Page Search: Search the content of web pages, including dynamically rendered ones, with advanced queries.
  • 🕑 Recurring Crawling: Crawl pages at fixed intervals or adapt the rate based on content changes.
  • 🔖 Web Page Archiving: Archive HTML content, adjust links for local use, download required assets, and support
  • 🏷️ Tags: Organize and filter crawled or archived pages using tags for better search and management.
  • 📂 File Downloads: Batch download binary files from web pages.
  • 📡 Webhooks: Integrate with external services using highly flexible webhooks. Connect to proprietary AI platforms
  • 🔔 Atom Feeds: Generate content feeds for websites that don’t have them, or receive updates when a new page
  • 🔒 Authentication: The crawler can authenticate to access private pages and retrieve content.
  • 👥 Permissions: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously.
  • 👤 Search Features: Includes private search history (doc),

Normalized Features

Source: tool-features-normalized.json

docker, postgresql, rss atom, tags, webhooks.

Features

Integrations & APIs

  • RSS / Atom Feeds
  • Webhooks

Search & Discovery

  • Tags / Labels