unsubbed.co

Diskover

Diskover gives you file indexer, search engine, and data management platform on your own infrastructure.

Open-source file indexing and storage analytics, honestly reviewed. Built for people drowning in unorganized storage, not for people with five folders.

TL;DR

  • What it is: Open-source (Apache 2.0) file system indexer and search engine powered by Elasticsearch — crawls your storage, indexes every file’s metadata, and gives you a browser-based dashboard to search and analyze it all [1][README].
  • Who it’s for: IT administrators, media production teams, and technically inclined small businesses with large, scattered storage across NAS, network shares, local disks, or cloud buckets. Less suited for non-technical founders managing typical web-app data [1][4].
  • Cost savings: Community Edition is free with no license keys required. Commercial enterprise indexing products and storage analytics tools run from thousands of dollars per year upward. A $5–10/mo VPS covers the infrastructure for most home or small-office setups [README][1].
  • Key strength: Scales to billions of files without slowing down, works across heterogeneous storage (Windows, Linux, NFS, SMB, cloud buckets), and never touches actual file contents — metadata-only indexing means it can’t corrupt or delete anything [1].
  • Key weakness: Requires Elasticsearch, Python, and a PHP-capable web server to run — setup is meaningfully more involved than “docker-compose up.” The Community Edition is explicitly scoped for home use and evaluation; advanced features (AI tagging, workflow automation, industry-specific plugins) are locked behind commercial subscriptions with undisclosed pricing [README][website].

What is Diskover

Diskover is a file system crawler that uses Elasticsearch as its backend. You point it at storage — local directories, NFS or SMB shares, mounted cloud buckets — and it walks every path, collects metadata for every file and folder (names, sizes, timestamps, owners, types), and writes it all into an Elasticsearch index. The web interface, diskover-web, is written in PHP and gives you a browser-based search bar, advanced filters, and a dashboard with storage breakdowns by file type, age, and usage patterns [README][1].

The headline from the README is accurate and unpretentious: “Open source file indexer, file search engine and data analytics and management platform powered by Elasticsearch.” The commercial website dresses this up with “END-TO-END DATA MANAGEMENT PLATFORM FOR THE UNSTRUCTURED WORLD” and language about AI-ready datasets and petabyte-scale unification, but the core is what it says on the GitHub tin [README][website].

What Diskover does differently from a basic file search tool is the combination of scalability and breadth. The crawler runs in parallel and can index millions of files without meaningful slowdown. It ingests from virtually any source that can be mounted or accessed over a network protocol. And because it stores everything in Elasticsearch, the search is fast and the filtering is surgical — you can find every file over 10GB last modified before 2023 owned by a specific user, across all connected storage simultaneously [1].

The project has 1,764 GitHub stars under Apache 2.0 license. The latest Community Edition release is v2.3.4, published January 2026. The commercial version, sold as annual subscriptions, adds plugins for specific industries (media, life sciences), workflow automation, AI-assisted tagging, and deeper enterprise governance features [README][website].


Why people choose it

The XDA Developers review [1] frames the appeal clearly: if you have too many drives, folders, and files scattered across devices, you currently do one of two things — manually dig through them, or give up and buy more storage. Diskover is the third option. You scan everything once, and then you can search across all of it from one browser tab in seconds.

For home users and small businesses, the practical use cases are: finding large or duplicate files eating disk space, locating old archives that haven’t been touched in years, and building a real map of what storage you actually own versus what you think you own [1].

The enterprise case is different but equally concrete. The Broadcast Beat case study [4] covers Illuminate Hollywood, a post-production studio that manages petabytes of film data across multiple storage systems. Their CEO and President use Diskover to get an instant view into everything in house — “if a client wants to know what they have in house for a file or title, they can pull it up, versus having to check with operations first.” The AJA Diskover Media Edition (a commercial variant) handles their daily operational data management across 12 production departments [4].

The common thread: Diskover solves the specific problem of data opacity. When you don’t know what you have or where it lives, every storage decision — what to archive, what to delete, what to buy more of — is a guess. Diskover turns that guess into an answer [1][4].


Features

Core indexing and search:

  • File system crawler that collects metadata: names, paths, sizes, timestamps, owners, file types [1][README]
  • Elasticsearch backend for fast, scalable search across millions of files [README]
  • Parallel crawling — handles millions of files without slowing [1]
  • Supports local file systems, NFS, SMB, and mounted cloud storage [1][README]
  • Metadata-only: never reads or modifies actual file contents [1]
  • Scheduled rescanning to keep the index current [1]

Web interface (diskover-web):

  • Browser-based search with simple and advanced query modes [1]
  • Filter by size, file type, date range, ownership, and more [1]
  • Sort results by size, modification date, or owner [1]
  • Dashboard showing total files, occupied space, breakdowns by type and age [1]
  • Written in PHP, JavaScript, HTML5, CSS [README]

Storage analysis:

  • Identify large files and folders consuming disproportionate space [1]
  • Surface duplicate files and wasted storage [1]
  • Show files and directories that haven’t been accessed in configurable time periods [1]
  • Data change and growth visibility [README]

Commercial edition extras (not in Community):

  • AI Data Assistant — natural language queries (“which projects haven’t changed in 12 months?”) [website]
  • Automated tagging and metadata enrichment [4][website]
  • Workflow automation, archiving triggers, retention policy enforcement [website]
  • Industry-specific plugins (media, life sciences) [website]
  • Integration with Snowflake, AWS, Dell, NetApp, OCI, Qumulo [website]
  • Governance, compliance, and defensible deletion workflows [website]

The Community Edition covers the indexing engine and search web app — everything you need to inventory and search storage. What it doesn’t cover is operational automation and AI-assisted management, which are commercial-only [README][website].


Pricing: SaaS vs self-hosted math

Diskover Community Edition:

  • Software: $0 (Apache 2.0) [README]
  • Infrastructure: $5–15/mo VPS with sufficient RAM for Elasticsearch plus the web app
  • No license keys, no expiration, no per-file or per-seat limits stated in the Community Edition [README]

Diskover commercial subscriptions:

  • Annual subscription pricing is not publicly listed. The website links to a plans page and requires contacting sales for corporate licensing [website][README]
  • The AJA Diskover Media Edition (a co-branded variant for media production) is available through AJA resellers [4]

What you’d otherwise pay: Enterprise storage analytics and file management platforms typically run in the thousands of dollars per year. StorNext, Komprise, and similar commercial file management suites are priced per petabyte or per node, with contracts starting at $5,000–$20,000+/year depending on scale. Specific competitive pricing data is not available for direct comparison, but Diskover’s Community Edition eliminates this cost entirely for teams whose needs fit within it.

For a small business or home lab user, the realistic math is: $0 for the software + $6–12/mo for a VPS (or $0 if you run it on existing hardware). The primary cost is setup time, not money.

Note: the AI assistant and advanced automation features visible on the commercial website are not accessible without a paid subscription. If those are the features that would actually move the needle for your team, budget for a sales conversation [website].


Deployment reality check

Diskover’s setup is more involved than a typical self-hosted web app. The stack requires:

  • Elasticsearch — the most heavyweight dependency. Elasticsearch needs at least 2GB of dedicated heap, and the JVM overhead on top of that means a realistic minimum is 4–6GB RAM for the whole stack [README][1]
  • Python 3 — for the crawler and indexer
  • PHP and a web server (Apache or nginx) — for diskover-web
  • A Linux system (macOS and Windows 10 also supported, but Linux is the primary deployment target) [README]

The Install Guide lives at the GitHub INSTALL.md. The User Guide is at docs.diskoverdata.com. Community support runs through a Slack workspace [README].

What can go sideways:

Elasticsearch is not a lightweight component. If you’re deploying on a low-spec VPS or a Raspberry Pi, expect the index to run slowly or fail entirely for large datasets. Memory is the critical constraint, not CPU or disk.

The web app is PHP, which is fine but requires correct server configuration — misconfigured PHP-FPM or file permissions are the typical failure modes during setup for users not familiar with web server admin.

The XDA review [1] doesn’t spend time on installation difficulty, which could mean it’s manageable — or that the reviewer was experienced enough that friction didn’t register. There are no third-party “I set this up in 30 minutes” accounts in the available sources to benchmark against.

Realistic time estimates: 1–3 hours for a Linux-experienced user comfortable with package installation and Elasticsearch configuration. For someone who has never touched Elasticsearch, budget a full afternoon and read the official docs before starting.

What the Community Edition doesn’t warn you about until you’re in it: The commercial website’s language about petabyte-scale unification, AI assistants, and workflow automation applies to the paid product. The Community Edition is explicitly described as “the perfect solution for home use, and try out Diskover” — functional and capable, but scoped accordingly [README].


Pros and cons

Pros

  • Apache 2.0 license. Genuinely open source, permissive. You can use it commercially, modify it, integrate it into your own tooling — no commercial license agreement needed for the Community Edition [README].
  • Metadata-only indexing. Diskover never reads file contents, cannot modify or delete files. For paranoid administrators (the right kind of paranoid), this matters. You can safely point it at sensitive directories without worrying about data exfiltration or corruption [1].
  • Scales to billions of files. Parallel crawling and Elasticsearch mean the architecture doesn’t fall over at serious scale. This is where most file-search alternatives fail [1].
  • Heterogeneous storage support. Local filesystem, NFS, SMB, cloud buckets — if you can mount it, Diskover can index it [1][README].
  • Real enterprise use. Illuminate Hollywood uses it for petabyte-scale media production workflows. It’s not a toy [4].
  • Active development. v2.3.4 released January 2026 — the project is maintained [README].
  • Vendor-neutral. Integrates with Dell, NetApp, Snowflake, AWS, OCI, Qumulo — not locked to any storage vendor [website].

Cons

  • Elasticsearch is a heavy dependency. If you don’t already run Elasticsearch, adding it for a file indexer is a significant infrastructure commitment. Memory requirements are meaningful [README].
  • Not a simple install. Multi-component stack (Python crawler + Elasticsearch + PHP web app) requires more setup discipline than a typical Docker Compose single-service app.
  • Community Edition is scoped for home use. The README says it plainly. Teams that want the AI assistant, automated tagging, retention workflows, or industry plugins need a commercial subscription — and pricing isn’t published [README][website].
  • Commercial pricing opacity. No pricing tiers, no ballpark numbers on the website. “Contact sales” for anything beyond the free tier is a friction point for budget planning.
  • Low GitHub star count relative to complexity. 1,764 stars is modest for an Elasticsearch-backed data management platform. Community size matters for finding help outside official support [GitHub].
  • Support is Slack-only for Community Edition. No issue tracker SLA, no guaranteed response time. For businesses that need reliable support, this points toward commercial [README].
  • Web interface is PHP. Functional but feels dated compared to modern React-based admin UIs. Not a dealbreaker, but worth noting against alternatives [README].
  • No useful third-party reviews available. The source [5] domain had an SSL error; source [2] is unrelated. Most substantive coverage is the single XDA review [1] and one case study [4] — limited independent validation.

Who should use this / who shouldn’t

Use Diskover if:

  • You’re an IT administrator or system engineer managing large, multi-source storage and need a real inventory of what you have.
  • You run a media production, design studio, or research environment accumulating petabytes of files across multiple storage systems [4].
  • You need cross-storage search that actually scales — local drives, NAS, network shares, and cloud in one query [1].
  • You already have the technical capability to deploy Elasticsearch and a PHP web server, or are willing to learn.
  • You want a permissive open-source license that lets you integrate the indexing capability into your own tools [README].

Skip it if you’re a non-technical founder with a typical web-app stack. Diskover solves file system opacity and storage analytics — problems you’re unlikely to have unless you’re managing servers with large local file stores, NAS devices, or media archives. It’s not a document management tool, not a cloud drive, not a team file sharing platform. If your files live in Google Drive, Dropbox, or S3 buckets accessed through a standard app, Diskover doesn’t map onto your workflow.

Skip it if Elasticsearch overhead is prohibitive. A VPS with less than 4GB RAM will struggle with Elasticsearch under any meaningful load. If your budget is a $4/mo Hetzner shared instance, look at simpler alternatives.

Skip it if you need the advanced features without a vendor relationship. The AI assistant and automation workflows visible on the website require a commercial subscription with undisclosed pricing. If those features are why you’re looking at Diskover, go in knowing you’ll need a sales conversation [website][README].


Alternatives worth considering

  • Everything (Windows) — for single-machine Windows search, Everything from Voidtools is instant, free, and requires no infrastructure. If you only need local Windows search, start there before reaching for Elasticsearch.
  • Recoll — desktop full-text search for Linux. Indexes file contents, not just metadata. Useful for personal document search, not suitable for multi-storage enterprise indexing.
  • Nextcloud — if what you actually need is unified file access and sharing across teams, Nextcloud is a self-hosted Google Drive replacement. Different problem space, but frequently what people think they need when they discover Diskover.
  • FileBrowser — lightweight web-based file browser for self-hosted file access. Simpler stack, no Elasticsearch, but no analytics or cross-storage indexing.
  • Elasticsearch + Kibana directly — if you’re technical enough to stand up Diskover’s stack, you could build a custom file indexer with custom dashboards. More control, more work, no community support structure.
  • Komprise, Aparavi, CTERA — commercial enterprise alternatives in the same space as Diskover’s commercial edition. Relevant if you have budget and want SLA-backed support, less relevant if you’re evaluating open-source options.

Bottom line

Diskover Community Edition is a legitimate, well-built tool for a specific problem: you have large amounts of storage across multiple locations, you don’t know what’s in it, and you want fast cross-storage search and storage analytics. It solves that problem well, scales to serious data volumes, and the Apache 2.0 license means there are no legal landmines in the free tier.

The honest caveat is that this is an IT infrastructure tool, not a general-purpose productivity app. The setup requires Elasticsearch, Python, and PHP — a stack that’s manageable for a systems person and genuinely daunting for someone who’s never configured a Java heap size. The Community Edition is scoped for home use and evaluation by the project’s own description, and the features that make Diskover compelling for real business operations (AI tagging, retention workflows, compliance) are behind commercial pricing that requires a sales call to access.

For media production teams, storage administrators, and technically capable small businesses with sprawling file systems: Diskover is worth the setup investment. For the non-technical founder escaping a typical SaaS bill, this is probably not the tool you’re looking for.


Sources

  1. Anurag Singh, XDA Developers“Diskover is a free, self-hosted tool that can index all of your files on every device” (Oct 7, 2025). https://www.xda-developers.com/diskover-free-self-hosted-file-indexer/
  2. Broadcast Beat“Illuminate Hollywood Manages Modern Data Loads with AJA Diskover Media Edition”. https://www.broadcastbeat.com/illuminate-hollywood-on-managing-modern-data-loads/

Primary sources: