unsubbed.co

Elasticsearch

The distributed search and analytics engine that powers search at Netflix, eBay, and Uber — sub-millisecond queries across billions of documents, with vector search built in for AI/RAG applications.

Best for: Engineering teams at companies with billions of documents, dedicated DevOps resources, and search or analytics workloads that justify the operational complexity — not for small teams wanting 'just search.'

TL;DR

  • What it is: A distributed, RESTful search and analytics engine built on Apache Lucene. It stores JSON documents, uses inverted indices for near-instant full-text search, and now includes vector search for AI/RAG pipelines.
  • Who it’s for: Engineering teams at companies with billions of documents, dedicated DevOps resources, and search or analytics workloads that justify the operational complexity. Not for small teams wanting “just search.”
  • Cost comparison: Elastic Cloud starts at ~$95/mo for a basic deployment and scales to $500+/mo. Self-hosted Elasticsearch is free but demands serious hardware — minimum 16GB RAM for development, 64GB+ recommended per production node.
  • Key strength: Nothing else matches Elasticsearch at scale. Near real-time search across petabytes, vector search for AI/RAG pipelines, and a mature ecosystem (Kibana, Logstash, Beats) that covers the entire data pipeline.
  • Key weakness: Brutally complex to operate. Shard management, cluster health, JVM tuning, rolling upgrades — this is not a “deploy and forget” tool. The learning curve is steep and the infrastructure costs are real.

What is Elasticsearch

Elasticsearch is a distributed search and analytics engine that has become the de facto standard for full-text search, log analytics, and — more recently — vector search for AI applications. Originally released in 2010 by Shay Banon, it now sits at 76,347 GitHub stars and powers search at companies like Netflix, eBay, Walmart, and Uber.

The core idea: you feed it JSON documents, it indexes them using inverted indices (and now vector indices), and it returns search results in milliseconds. What makes it different from a regular database is speed at scale — Elasticsearch is designed to search billions of documents in near real-time, with horizontal scaling across dozens or hundreds of nodes.

Elasticsearch is the foundation of the Elastic Stack (formerly ELK Stack): Elasticsearch for storage and search, Kibana for visualization, Logstash for data ingestion, and Beats for lightweight data shipping. Together they form a complete observability and analytics platform.

The licensing situation has been contentious. In 2021, Elastic switched from Apache 2.0 to the Server Side Public License (SSPL) to prevent cloud providers (primarily AWS) from offering Elasticsearch as a managed service. AWS responded by forking OpenSearch under Apache 2.0. In 2024, Elastic added AGPLv3 as an option alongside SSPL, partially addressing open-source concerns — but the binary releases still ship under Elastic License 2.0.

Use cases the README explicitly lists: Retrieval Augmented Generation (RAG), vector search, full-text search, logs, metrics, APM, and security analytics.


Why people choose it over alternatives

Elasticsearch wins on scale, ecosystem maturity, and AI capabilities but loses on operational complexity and cost.

Versus OpenSearch

This is the comparison that matters most for self-hosters. OpenSearch forked from Elasticsearch 7.10 in 2021 and is now governed by the Linux Foundation under Apache 2.0. Elasticsearch shows 40–140% faster performance in vendor benchmarks, though both share the same Lucene core. OpenSearch differentiates through better vector search flexibility (Faiss and nmslib engines, up to 16,000 dimensions vs. Elasticsearch’s 4,096) and native AWS integration. The framing: Elasticsearch for performance and features, OpenSearch for true open-source licensing and AWS-native deployments.

Versus Typesense and Meilisearch

These are the “simple search” alternatives. Meilisearch positions Elasticsearch as the enterprise heavyweight that’s overkill for most use cases: “Elasticsearch suits organizations handling billions of documents with dedicated DevOps resources — not those seeking straightforward implementation.” Typesense and Meilisearch offer dramatically simpler setup (single binary, instant indexing) but can’t match Elasticsearch at scale.

Versus Algolia

Algolia is the SaaS alternative — zero ops, instant results, but expensive at volume. Algolia charges per search request and record count. A mid-size e-commerce site doing 10M searches/month on 5M records could easily pay $1,000+/mo on Algolia. The same workload on self-hosted Elasticsearch runs on hardware you already own.


Features: what it actually does

Search engine:

  • Full-text search with BM25 scoring, fuzzy matching, autocomplete, and suggestions
  • Vector search for semantic/AI applications with dense and sparse vectors
  • Hybrid search combining keyword and vector approaches
  • Geo-distance, polygon, and hexagonal spatial search
  • Aggregations framework for real-time analytics (histograms, terms, nested, pipeline)
  • Query DSL for complex, composable search queries

Data platform:

  • Near real-time indexing — documents are searchable within ~1 second of ingestion
  • Horizontal scaling across nodes with automatic shard rebalancing
  • REST API for all operations — language-agnostic by design
  • Client libraries for Java, Python, Go, Ruby, PHP, C#, Rust, and more
  • Cross-cluster search for federated queries across multiple deployments

AI and ML:

  • Native vector database capabilities for RAG pipelines
  • Integration with embedding models (Jina AI, frontier LLMs)
  • Inference service for running models alongside search
  • Semantic search with learned sparse retrieval (ELSER)

Observability stack (Elastic Stack):

  • Kibana for dashboards and visualization
  • Logstash for data ingestion pipelines
  • Beats (Filebeat, Metricbeat, etc.) for lightweight data shipping
  • APM for application performance monitoring
  • SIEM for security analytics

Deployment options:

  • Self-hosted: Docker, Kubernetes (Elastic Cloud on Kubernetes operator), bare metal
  • Elastic Cloud Hosted: managed service on AWS, GCP, Azure
  • Elastic Cloud Serverless: fully managed, pay-per-use
  • start-local script for quick Docker-based local setup

Pricing: SaaS vs self-hosted math

Elastic Cloud (their SaaS):

  • 14-day free trial only — no permanent free tier
  • Hosted: starts ~$95/mo for a small deployment (2 availability zones, 8GB RAM). Production workloads typically run $200–$500/mo. Enterprise deployments easily exceed $1,000/mo
  • Serverless: usage-based pricing on search, ingest, and storage

Self-hosted:

  • Software: free (AGPL or Elastic License)
  • Hardware requirements: minimum 16GB RAM for a single-node dev setup. Production clusters: 3+ nodes, 32–64GB RAM each, SSD storage mandatory
  • Realistic VPS cost: $60–$200/mo for a minimal 3-node cluster on Hetzner or DigitalOcean
  • Engineering time: significant — you need someone who understands JVM tuning, shard strategies, and cluster operations

Concrete math for a search-heavy application:

Say you’re running a product catalog with 2M documents and 500K searches/day. On Algolia, that’s solidly in the $500–$1,000+/mo territory. On Elastic Cloud, a cluster sized for this workload runs ~$200–$400/mo. Self-hosted on three dedicated servers (e.g., 3x Hetzner AX52 at ~$60/mo each), you’re at ~$180/mo for unlimited searches.

The catch: self-hosted Elasticsearch requires someone who can manage it. If your team doesn’t have Elasticsearch expertise, the “free” software comes with expensive engineering hours.


Deployment reality check

Elasticsearch is not a quick-setup tool. The README provides a start-local script for Docker-based development, but production deployment is a different story.

What you actually need for production:

  • 3+ Linux servers with minimum 32GB RAM each (64GB recommended)
  • SSD storage — Elasticsearch is I/O intensive
  • JVM configuration tuned for your workload (heap size, GC settings)
  • A load balancer for coordinating client requests
  • Monitoring for cluster health, shard allocation, and disk usage
  • A backup strategy (snapshot/restore to S3 or equivalent)

What can go sideways:

  • Shard management is the #1 operational headache. Too many shards = overhead and instability. Too few = uneven distribution. Getting this right requires understanding your data and query patterns.
  • JVM garbage collection pauses can cause nodes to drop out of the cluster temporarily, triggering unnecessary shard reallocation.
  • Disk pressure — when a node hits 85% disk usage, Elasticsearch starts moving shards around. At 95%, it goes read-only. Running out of disk on a production cluster is a bad day.
  • Rolling upgrades between major versions (7.x to 8.x) require careful planning. Index compatibility is not guaranteed across major versions.
  • The licensing situation means you need to decide: AGPL (true open source, copyleft), Elastic License (free but restrictive), or OpenSearch (Apache 2.0 fork). This decision has long-term implications.
  • Memory hunger — Elasticsearch and the JVM are not lightweight. A single development node recommends 16GB RAM. Running it on a 2GB VPS is not realistic.

Realistic time estimate: a senior engineer familiar with Elasticsearch can set up a production cluster in 1–2 days. Someone learning from scratch should budget a week or more including capacity planning, security configuration, and backup setup.


Who should use this (and who shouldn’t)

Use Elasticsearch if:

  • You have millions or billions of documents and need sub-second search latency.
  • You’re building an observability stack (logs, metrics, APM) and want a unified platform.
  • You need vector search for AI/RAG applications at scale.
  • Your team includes engineers who understand distributed systems and JVM operations.
  • You’re willing to invest in proper cluster management (or pay for Elastic Cloud).

Skip it (use Meilisearch or Typesense) if:

  • You have fewer than 1M documents and need “just search.”
  • You want a single-binary deployment with zero configuration.
  • Your team doesn’t include anyone with distributed systems experience.

Skip it (use OpenSearch) if:

  • You need pure Apache 2.0 licensing without copyleft restrictions.
  • Your infrastructure is AWS-native and you want tight IAM/CloudWatch integration.
  • The SSPL/AGPL licensing situation is a dealbreaker for your legal team.

Skip it (use Algolia) if:

  • You want zero-ops hosted search and budget isn’t a concern.
  • You need instant setup for a prototype or early-stage product.

Alternatives worth considering

  • OpenSearch — the Apache 2.0 fork. Same Lucene core, AWS-backed, growing independently. Better for AWS shops and teams requiring permissive licensing.
  • Meilisearch — Rust-based, single-binary search engine. Dramatically simpler to deploy and operate. Best for product search with < 10M documents.
  • Typesense — C++ based, instant search focus. Similar simplicity to Meilisearch, slightly different trade-offs.
  • Algolia — SaaS search. Zero ops, instant relevance, but expensive at scale. Best for teams with budget but no DevOps.
  • PostgreSQL full-text search — if you already have Postgres and need basic search on < 1M records, tsvector is free and requires no additional infrastructure.
  • Solr — the other Lucene-based search engine. More mature than Elasticsearch in some enterprise features, less popular, smaller community.

For a self-hoster evaluating search engines: if you have the expertise and the scale, Elasticsearch is the gold standard. If you don’t have both, you almost certainly want something simpler.


Bottom line

Elasticsearch earned its 76K GitHub stars by being genuinely excellent at what it does: fast, scalable search and analytics across massive datasets. The addition of vector search and AI integration keeps it relevant as the industry moves toward RAG and semantic search. But “excellent at scale” comes with “complex at every scale.” The operational overhead — shard management, JVM tuning, capacity planning, rolling upgrades — is real and significant. This is not a tool you casually add to a side project.

The honest assessment: if you’re asking “should I use Elasticsearch?” and you don’t already have a dedicated team managing distributed systems, the answer is probably no. Use Meilisearch or Typesense for product search, PostgreSQL for basic search, or Algolia if you have budget. But if you’re at the scale where Elasticsearch makes sense — millions of documents, complex analytics, observability pipelines — nothing else comes close.

If the operational burden is the blocker, that’s exactly what upready.dev helps with: deploying and managing self-hosted infrastructure so you get the cost savings without the DevOps overhead.

Sources

This review synthesizes 5 independent third-party articles along with primary sources from the project itself. Inline references throughout the review map to the numbered list below.

  1. [1] meilisearch.com — “Elasticsearch Review 2025: Right Search Platform for You?” (link)
  2. [2] bigdataboutique.com — “Elasticsearch vs OpenSearch - 2025 update” (link)
  3. [3] pureinsights.com — “Elasticsearch vs OpenSearch in 2025: What the Fork?” (link)
  4. [4] sematext.com — “Complete Elasticsearch Guide for Beginners” — deployment-guide (link)
  5. [5] knowi.com — “What is Elasticsearch? Complete Guide for 2026 (How It Works)” — critical-review (link)
  6. [6] GitHub repository — official source code, README, releases, and issue tracker (https://github.com/elastic/elasticsearch)
  7. [7] Official website — Elasticsearch project homepage and docs (https://www.elastic.co/elasticsearch)

References [1]–[7] above were used to cross-check claims about features, pricing, deployment, and limitations in this review.

Features

Integrations & APIs

  • REST API