unsubbed.co

Swirl Search

For search engines, Swirl Search is a self-hosted solution that provides AI-powered search and RAG platform.

Self-hosted enterprise search, honestly reviewed. No marketing fluff, just what you get when you connect your company’s data silos without moving them.

TL;DR

  • What it is: Open-source (Apache 2.0) federated AI search platform — query SharePoint, Confluence, GitHub, Slack, and 100+ other sources simultaneously without copying data into a vector database [README].
  • Who it’s for: Mid-size teams and enterprises with data spread across multiple systems who want AI-powered search without a massive ETL project. Not really a solo-founder tool — this is squarely enterprise infrastructure.
  • Cost savings: Glean, the most direct commercial alternative, runs $15–20/user/month for a 100-person team (estimate — Glean doesn’t publish pricing). Swirl community edition runs on a VPS with no per-seat fee. The math only works if someone on your team can manage Docker.
  • Key strength: Data stays exactly where it is. No vector database setup, no ETL pipelines, no copying sensitive documents to third-party infrastructure. Queries go to your sources in parallel and results come back ranked [README][1].
  • Key weakness: Independent reviews are nearly nonexistent — we found one technical tutorial [1] and the official documentation [2]. The GitHub star count (2,980) is modest for a tool claiming enterprise readiness. Pricing for the Enterprise tier is opaque: “contact us.”

Swirl Search is a federated search platform with an AI layer on top. You point it at your existing data sources — SharePoint, Confluence, GitHub, Jira, Slack, Elasticsearch, PostgreSQL, Google Drive, ServiceNow, and dozens more — and when someone asks a question, Swirl queries all of them simultaneously, ranks the results, and uses an LLM to generate a summarized answer with source citations [README].

The core idea is what the project calls “zero data migration.” Traditional enterprise search (think Elasticsearch or a custom RAG pipeline) requires you to extract data from your systems, transform it, load it into a search index or vector database, and keep it synchronized. Swirl skips all of that. Instead, it queries your source systems live, in parallel, at search time. Results respect the source system’s existing permissions — if a user can’t see a SharePoint folder, Swirl won’t surface it in their results [website].

The GitHub README frames this against a table that’s hard to argue with: no vector DB to set up, no data to migrate, no ETL pipelines, no weeks of infrastructure work — just a Docker command. The company claims teams using Swirl save an average of 7.5 hours of productive time per week, though that number is self-reported and not independently verified [README].

The Apache 2.0 license is genuinely permissive: you can deploy it on your own infrastructure, embed it in your own product, or modify it without commercial agreements. This is meaningfully different from tools with “open core” licensing that restrict what you can do commercially.

As of this review, the project sits at 2,980 GitHub stars. That’s honest context: it’s not a breakout project. For comparison, n8n has 100K+ stars and Activepieces has 21K+. Swirl is a real tool with real enterprise deployments (the case studies on the website reference a $3M cost avoidance, an air-gapped classified search deployment, and a hosting cost reduction scenario) but it hasn’t crossed into mass adoption yet [website].


Why people choose it

The search for honest third-party Swirl reviews turned up something notable: there almost aren’t any. The technical article landscape for this tool is sparse — we found one developer tutorial on fdaytalk.com [1] and the official documentation [2]. G2, Trustpilot, and the usual review aggregators don’t have meaningful Swirl coverage yet. That’s itself a signal: this is a tool that enterprise teams evaluate and deploy quietly, not one with a vocal prosumer community writing blog posts about it.

What we can extract from the available sources:

The “no vector DB” pitch resonates with anyone who’s tried to build RAG. The fdaytalk tutorial [1] walks through Swirl as an alternative to building your own Perplexity-style search, and the framing lands: setting up a vector database, writing ETL code to populate it, managing embeddings, dealing with stale data — it’s weeks of work before you get to the actual search quality problems. Swirl sidesteps this by querying sources directly. Whether that trade-off (live latency vs. indexed speed) works for your use case depends on how fast your source systems respond.

The air-gap and on-premises story is real. One case study on the website describes deploying Swirl for classified environments where data cannot leave the network [website]. If you’re in government, defense, healthcare, or any sector where data residency requirements are non-negotiable, the combination of Apache 2.0 + on-premises deployment + bring-your-own LLM is a legitimate feature, not marketing. You can run the entire stack — Swirl, your LLM (Ollama or similar), your source connectors — inside a private network with no external calls.

Permission-preserving search is underappreciated. Most DIY enterprise search projects hit the permissions wall eventually: you extract documents into a central index and suddenly users are seeing content they shouldn’t. Swirl’s federated model avoids this by querying the source system as the user (via SSO/OAuth), so the source system’s ACL stays in control [website][2].


Features

Based on the README, official documentation, and website:

Core search engine:

  • Federated search across 100+ connectors — SharePoint, OneDrive, Confluence, GitHub, Jira, Slack, Elasticsearch, OpenSearch, PostgreSQL, Google BigQuery, MongoDB, ServiceNow, and more [README][website]
  • Parallel query execution: all configured sources queried simultaneously
  • Relevancy re-ranking across heterogeneous result sets — results from different sources merged and scored by semantic similarity [1]
  • Result caching for performance [1]
  • Tag-based source routing: query a subset of sources using tags like company:tesla or code: [2]
  • Pagination and date-sort support for compatible sources [2]

AI and RAG layer:

  • AI-generated summaries with source citations — ask a question, get an answer with links back to the original documents [README]
  • Configurable LLM backend — OpenAI by default, but replaceable with any API-compatible model including on-premises options [README][1]
  • Chat interface in addition to traditional search results [website]
  • AI agents support — treat Swirl as a tool for AI agents needing to search internal knowledge [website]

Connector architecture:

  • SearchProvider configuration system — JSON configs that define how to query each source [2]
  • RequestsGet connector for any REST API or URL-based source [2]
  • Custom connector development: Python classes with well-defined interfaces, with documentation suggesting using coding AI to generate new connectors [2]
  • Response and result mapping system for extracting structured data from arbitrary JSON/XML responses [2]

Enterprise features:

  • SSO integration (from merged profile features list)
  • REST API for programmatic access [merged profile]
  • On-premises and air-gapped deployment support [website]
  • Existing permission inheritance from source systems [website]
  • PostgreSQL as the backing store [merged profile]

What the documentation doesn’t make clear: the exact boundary between community edition features and enterprise features. The website has a comparison table against Glean and Microsoft Copilot, but no community vs. enterprise feature matrix. This is a gap.


Pricing: the math

Community edition (self-hosted):

  • Software: $0 (Apache 2.0) [README]
  • VPS: $10–30/month depending on connector count and query volume
  • LLM API costs: if using OpenAI, you pay per token for summarization. At moderate usage (say, 500 queries/day with short summaries), this runs $20–50/month on GPT-4o-mini or similar [1]
  • Setup time: factored separately below

Enterprise edition:

  • Pricing: not published. “Contact us” / “Request Access” everywhere on the website [website]
  • No trial pricing, no per-seat numbers, no published tiers

Glean (the closest commercial comparison):

  • Pricing not published either (enterprise SaaS standard), but market estimates put it at $15–25/user/month at typical contract sizes
  • For 50 users: $750–$1,250/month = $9,000–$15,000/year
  • For 100 users: $1,500–$2,500/month = $18,000–$30,000/year

Self-hosted Swirl math for a 50-person team:

  • VPS: $20/month
  • LLM API (1,000 queries/day, mixed complexity): ~$50–100/month
  • One-time setup (contractor or internal engineering): $500–2,000
  • Total year one: ~$1,000–$3,400
  • Total year two onwards: ~$840–$1,440/year

The savings math against Glean is substantial — potentially $15,000–$25,000/year for a 100-person team. But the comparison only holds if someone on your team can actually maintain the deployment. Unlike Activepieces, where a non-technical founder can follow a guide on a weekend, Swirl’s connector configuration requires understanding JSON response paths, OAuth credential flows, and query parameter construction [2]. This is infrastructure work.


Deployment reality check

The README’s setup is straightforward if you know Docker:

curl https://raw.githubusercontent.com/swirlai/swirl-search/main/docker-compose.yaml -o docker-compose.yaml
export OPENAI_API_KEY='your-key'
docker-compose pull && docker-compose up

Then you hit localhost:8000 with credentials admin/password and start configuring SearchProviders through the admin panel [1].

What’s smooth:

  • The Docker Compose setup genuinely works. The fdaytalk tutorial [1] walks through it without flagging setup pain — a good sign.
  • The SearchProvider configuration system is well-documented [2]: JSON objects you paste into an admin UI, no code required for standard connectors.
  • Results appear quickly once providers are configured — parallel querying keeps latency reasonable.

What’s harder than it looks:

  • Connecting to OAuth-protected systems (SharePoint, Google Drive, Slack) requires configuring OAuth app registrations in those systems first. This is not “2-minute setup” territory — it’s a half-day task per major data source for someone who’s done it before, longer for someone who hasn’t.
  • The OpenAI API key requirement for AI features adds ongoing cost and a cloud dependency, which undermines the “fully on-premises” pitch unless you swap in Ollama or another local inference server.
  • Custom connectors require Python development [2]. The documentation suggests using AI to generate them, which is pragmatic but also tells you the documentation coverage isn’t complete enough to do it from docs alone.
  • No SLA, no managed updates, no monitoring out of the box — you own the operational burden.

Realistic time estimates:

  • Docker up with default config and one test data source: 30–60 minutes for a developer
  • Production deployment with SharePoint + Confluence + GitHub: 1–3 days including OAuth registrations and permission testing
  • For a non-technical founder: this is not a self-install. You need a developer or a deployment service.

Pros and cons

Pros

  • Apache 2.0 licensed. Genuinely permissive — self-host, embed, modify, resell with no commercial licensing conversation [README]. Unlike Glean (proprietary) or some “open core” tools with restrictive commercial clauses.
  • Federated architecture eliminates the ETL problem. If your primary objection to enterprise search is “I don’t want to copy all our data somewhere,” Swirl is the tool that actually solves this rather than deferring it [README][1].
  • Permission inheritance from source systems. ACLs stay in the source system; Swirl doesn’t create a parallel permission model you have to maintain [website].
  • Bring your own LLM. Swap OpenAI for any API-compatible model, including on-premises inference, for true air-gap deployments [README][website].
  • 100+ connectors. Broad coverage of enterprise SaaS: Microsoft 365, Google Workspace, Atlassian, GitHub, Slack, ServiceNow, and databases [README][website].
  • Real enterprise case studies. The website shows deployments in classified environments and at organizations projecting $3M in cost avoidance — not just demo users [website].
  • Active CI/CD. The README shows a passing test-build pipeline badge, suggesting the project is actively maintained [README].

Cons

  • No independent review coverage. We found zero third-party product reviews — only an official tutorial [1] and official docs [2]. Community trust is unverifiable from external sources.
  • 2,980 GitHub stars is modest. For a tool claiming enterprise readiness across 100+ connectors, that adoption signal is quieter than you’d expect.
  • Pricing opacity. Enterprise tier is “contact us” with no published numbers. You can’t evaluate cost without a sales conversation.
  • Community vs. enterprise feature boundary is unclear. The website doesn’t publish a feature matrix distinguishing what you get in the community edition versus what requires Enterprise licensing.
  • Not self-service for non-technical teams. The connector configuration requires understanding API response structures and OAuth flows [2]. The “no code required” claim on the website refers to not writing a search index — you’re still writing JSON configs and managing OAuth.
  • OpenAI dependency for AI features. Out of the box, AI summaries require an OpenAI API key, adding cloud dependency and per-token cost [1]. Local LLM substitution is possible but requires additional setup.
  • Latency trade-off vs. indexed search. Live federated queries are only as fast as your slowest source. If one of your ten connected systems is slow, your search latency reflects that.
  • Thin community ecosystem. No plugin marketplace, no community connector library, no Stack Overflow tag with meaningful volume.

Who should use this / who shouldn’t

Use Swirl Search if:

  • You’re a 20–500 person team with data spread across Microsoft 365, Google Workspace, Confluence, GitHub, and similar systems, and you’re spending significant time hunting for information across tabs.
  • Data residency or compliance requirements make it impossible to copy documents to a third-party search platform.
  • You’re in a regulated industry (healthcare, government, finance) where on-premises deployment and air-gap capability are non-negotiable.
  • You have at least one developer who can manage Docker deployments and configure OAuth connections.
  • You want Apache 2.0 licensing specifically — not “open core,” not “fair code,” genuinely Apache 2.0.

Skip it (use Glean) if:

  • You want a fully managed enterprise search product with a support contract and SLA.
  • You don’t have internal engineering capacity to maintain Docker infrastructure.
  • You need AI search that just works out of the box for a non-technical team with zero setup tolerance.

Skip it (stay with Elasticsearch + custom RAG) if:

  • You already have Elasticsearch deployed and just need the AI layer.
  • Your search performance requirements demand indexed search latency — live federated queries won’t meet your SLA.
  • You have a data engineering team who prefers to control the full pipeline.

Skip it (not your tool) if:

  • You’re a solo founder or team of 2–3 looking to replace a personal knowledge management tool. Swirl is enterprise infrastructure, not Notion search.
  • Your budget is zero and you have no one to maintain it.
  • Your data sources aren’t in Swirl’s connector list and you don’t have Python developers to write new ones.

Alternatives worth considering

  • Glean — the direct commercial competitor. Fully managed, enterprise SaaS, no self-hosting option, no public pricing. Best if you have budget and want a vendor-managed solution. Worse for compliance-heavy environments that can’t use external SaaS.
  • Microsoft Copilot — if your organization is already Microsoft 365, the M365 Copilot license adds AI search natively. No setup, no infrastructure, but Microsoft-only (no cross-platform connectors) and expensive per-seat.
  • Elasticsearch + LLM — the DIY alternative. Maximum flexibility, maximum effort. Good if you have a data engineering team and need indexed search speed. No federated model — you still have to move data.
  • Apache Solr — similar to Elasticsearch, minus the commercial overhead. Older, less AI-native, more work.
  • Perplexity for Teams — web-focused AI search, not your internal data. Different use case.
  • n8n — not a search tool, but if your actual need is automating access to information across systems rather than real-time search, workflow automation may solve the problem more simply.

For the specific use case — “I want to search across all my company’s tools without building ETL pipelines” — the realistic shortlist is Swirl vs. Glean. Pick Swirl if you have engineering capacity and need on-premises or Apache 2.0 licensing. Pick Glean if you want managed infrastructure and have the budget.


Bottom line

Swirl Search solves a real problem elegantly: enterprise teams spend enormous time hunting across disconnected systems, and the traditional fix (build a search index, maintain ETL pipelines, manage a vector database) trades one problem for three new ones. Swirl’s federated approach — query sources live, respect existing permissions, skip the data migration — is the correct architecture for this constraint, and the Apache 2.0 license makes it a genuine open-source option rather than an “open core” marketing play.

The honest caveat is that this is infrastructure, not a weekend project. Non-technical founders who need AI search across their company’s tools are better served by a managed solution or a deployment partner until Swirl develops a more polished setup experience. But for a team with engineering capacity that needs compliant, on-premises, federated search without writing the architecture from scratch, Swirl is worth a serious evaluation — particularly given that the only commercial alternative with comparable capabilities (Glean) costs an order of magnitude more.

If the deployment and maintenance burden is the blocker, that’s exactly what upready.dev deploys for clients — one-time setup, you own the infrastructure, no recurring vendor bill.


Sources

  1. F D A Y T A L K“Build Your Own AI-Powered Search Engine with SWIRL” (October 26, 2024). https://www.fdaytalk.com/build-your-own-ai-powered-search-engine-with-swirl/
  2. SWIRL AI Documentation“Tutorial: Extending SWIRL”. https://docs.swirlaiconnect.com/Tutorials.html

Primary sources:

Features

Authentication & Access

  • Single Sign-On (SSO)

Integrations & APIs

  • REST API