Milvus
Milvus is a high-performance open-source vector database built for AI applications, supporting billion-scale similarity search with sub-second latency.
Open-source vector database, honestly reviewed. Built for AI engineers, not for the faint of heart.
TL;DR
- What it is: Open-source (Apache 2.0) vector database built for high-performance similarity search at billion-scale — the infrastructure layer behind RAG pipelines, semantic search engines, and recommendation systems [1][2].
- Who it’s for: AI/ML engineers and data teams building production applications that need to search across millions or billions of vector embeddings. Not a tool for non-technical founders expecting a simple setup [2][3].
- Cost savings: Zilliz Cloud (the managed version) starts at $0 for prototyping and $4/million vCUs for serverless. Self-hosted is free under Apache 2.0, but the infrastructure cost is real — dedicated hardware for production starts at $49–$100/mo on bare metal [3][4].
- Key strength: Benchmark-leading performance at scale. Reddit Engineering chose Milvus after finding it 5–10x faster for batch ingestion than competitors. Query latency in single-digit milliseconds at million-scale [1].
- Key weakness: Steep operational complexity. The full distributed deployment runs on Kubernetes with multiple dependent services. If you’re not already comfortable with K8s, this will hurt [3][README].
What is Milvus
Milvus is a vector database — the specific kind of database purpose-built for AI applications where you need to find things by meaning rather than exact match. When your application converts text, images, or audio into embedding vectors (arrays of floating-point numbers), Milvus is the engine that stores those vectors and answers queries like “find me the 10 documents most semantically similar to this query” at speed [README][2].
Written in Go and C++, Milvus is engineered for hardware acceleration on both CPU and GPU. The architecture separates storage from compute, which is what makes horizontal scaling possible — you can add query nodes without touching your storage layer [README][1].
The project sits at 43,358 GitHub stars and is a graduated project under the LF AI & Data Foundation, with Zilliz as the primary commercial backer and contributor [1][2]. That structure — open-source foundation with a commercial managed service — mirrors what Elastic did with Elasticsearch, for better or worse.
There are three deployment flavors:
- Milvus Lite — a Python library (
pip install pymilvus[milvus-lite]), file-based, zero infrastructure. For local development and prototyping only [README]. - Milvus Standalone — single-node Docker deployment. Works for development and smaller production loads [README].
- Milvus Distributed — the full Kubernetes-native cluster deployment. This is what you need for billion-scale workloads, and it is genuinely complex [README][3].
The managed offering, Zilliz Cloud, handles all of this operationally — serverless, dedicated, or BYOC (bring your own cloud) options available [README][4].
Why people choose it
The cases for Milvus cluster around three things: performance at scale, hybrid search capability, and the Apache 2.0 license.
Performance. The most cited real-world case is Reddit Engineering. Reddit evaluated multiple vector databases for their recommendation infrastructure and chose Milvus, citing 5–10x faster batch ingestion than competing systems [1]. The Forrester Wave named Milvus a leader in the vector database category. On query latency, Milvus achieves single-digit millisecond response at million-scale [1][2]. The important caveat from benchmarking: Milvus has the fastest indexing time in its class but is not the leader on requests-per-second or latency when working with very high-dimensional embeddings [1]. Know your workload.
Hybrid search. v2.5 shipped native hybrid search — full-text BM25 search and dense vector search within a single query, without routing between separate systems [1]. This matters for RAG pipelines where you want keyword precision and semantic recall simultaneously. Previously you’d need to run a separate Elasticsearch or OpenSearch instance and merge results yourself.
License. Apache 2.0 is as clean as it gets. You can use it commercially, embed it in products you sell, run it in client environments — no usage restrictions, no “fair-code” complications, no calls to lawyers [1][2]. For teams that have been burned by Elastic’s BSL switch or MongoDB’s SSPL history, this matters.
Ecosystem positioning. Milvus powers use cases that are genuinely hard to fake: multimodal search (images + text in one query), billion-vector retrieval for recommendation systems, and agentic AI workflows where a language model needs to retrieve relevant context at sub-100ms latency [1][2]. These are the applications where “just use pgvector” stops working.
Features
Search types:
- Approximate nearest neighbor (ANN) search across dense vectors — the core use case [README][2]
- Full-text search (BM25) natively integrated since v2.5, no separate text engine needed [1]
- Hybrid dense + sparse search in a single query [1]
- Batch query support [README]
Index types supported:
- HNSW — highest recall, higher memory usage
- IVF (multiple variants) — good throughput, tunable recall/speed tradeoff
- FLAT — brute-force exact search, for small collections where precision matters more than speed
- SCANN, DiskANN for disk-based indexes [README][2]
v2.6 additions (most recent major release):
- Tiered storage — hot/warm/cold data separation, lowers infrastructure cost for large-scale deployments [1][4]
- Int8 compression for HNSW indexes — reduces memory footprint significantly [1]
- Architectural simplification to reduce operational complexity (though “simplified” is relative here) [4]
Security and multi-tenancy:
- Mandatory user authentication [README]
- TLS encryption [README]
- Role-Based Access Control (RBAC) [README]
- Multi-user isolation [README]
Deployment and integration:
- Python SDK (
pymilvus) with a clean client API [README] - Docker support (Standalone) and Helm charts (Distributed) [README][3]
- Integrates with LangChain, LlamaIndex, and most major AI frameworks [2]
- GPU acceleration support for indexing and search [README][3]
Pricing: SaaS vs self-hosted math
Zilliz Cloud (managed Milvus):
| Tier | Monthly Cost | Storage | vCUs | Use Case |
|---|---|---|---|---|
| Free | $0 | 5 GB | 2.5M/mo | Prototyping, dev |
| Serverless | $4 per million vCUs | Pay-as-you-go | Up to 100 collections | Variable workloads |
| Dedicated | Starting ~$99/mo | Custom | Dedicated | Production |
| Enterprise | Custom | Custom | Custom | Enterprise SLA |
vCUs (Virtual Compute Units) are Zilliz’s unified resource metric that covers both read and write operations. The serverless billing at $4/million vCUs sounds cheap until you’re running a real production workload — hidden cost article [4] specifically flags index rebuilds, data egress, and operational complexity as surprise cost sources [4].
Self-hosted:
- Software: $0 (Apache 2.0)
- Milvus Standalone on a $20–40/mo VPS: workable for development and low-traffic production
- Production Kubernetes cluster: the infrastructure cost varies wildly by provider and scale. A minimal 3-node cluster for real workloads starts in the $150–400/mo range on managed K8s (EKS, GKE, AKS).
- Dedicated bare metal (from [3]): DatabaseMart starts at $49/mo for 32GB RAM / 4-core, up to $2,099/mo for H100 GPU servers — relevant if you’re doing GPU-accelerated indexing at scale.
The honest math: For teams already running Kubernetes who know what they’re doing, self-hosted Milvus is free software with the infrastructure cost you’d be paying anyway. For teams without K8s expertise, the “free software” trades a SaaS bill for a serious operational burden.
Zilliz Cloud’s free tier is genuinely useful for prototyping — 5GB and 2.5M vCUs monthly at no cost. If your production dataset fits in 5GB of vectors and your traffic is light, the serverless tier at $4/million vCUs is competitive with Pinecone’s pod-based pricing. Data not available on exact Pinecone-to-Zilliz comparison at specific scale points, but the Airbyte breakdown [4] confirms Zilliz’s serverless model can be significantly cheaper than alternatives at variable traffic.
Deployment reality check
This is where the gap between the README’s clean Python quickstart and production reality opens up.
Milvus Lite (development): pip install pymilvus[milvus-lite], then three lines of Python. Genuinely that simple. Runs locally, persists to a file, no Docker required. The limitation is it’s not deployable to production — it’s a local SDK, not a server [README].
Milvus Standalone (single node): Docker Compose with dependencies. You’ll need Docker, at minimum 8GB RAM, and a handle on networking. The Docker Compose file bundles etcd and MinIO as dependencies — meaning you’re running three services minimum [README][3].
Milvus Distributed (production cluster): This is the full deployment that actually scales. Required components: multiple Milvus nodes (query, index, data, proxy), etcd (metadata), MinIO or S3 (object storage), Pulsar or Kafka (message queue). All of this on Kubernetes. The DatabaseMart hosting guide [3] exists precisely because this is non-trivial to stand up — they’re selling managed hosting specifically for teams that tried and gave up.
Hardware requirements for Standalone [3]:
- CPU: 2+ cores (8+ recommended for production)
- RAM: 8GB minimum, 16GB+ recommended
- Storage: SSD strongly recommended; HDDs will kill search latency
- OS: Linux (Ubuntu 18.04+ or CentOS 7+)
What can go wrong:
- etcd is a coordination service that Milvus depends on heavily — misconfigured etcd has caused data availability issues in production deployments [3]
- Memory sizing is critical. Milvus loads vector indexes into RAM for performance. Undersizing RAM means constant disk I/O and latency spikes
- The “simplified” architecture in v2.6 is simpler than v2.5 but still meaningfully complex versus other self-hosted tools
- GPU support requires NVIDIA drivers and CUDA — an additional setup surface area if you go that route [3]
Realistic estimate: a technical engineer can run Milvus Standalone in 30–60 minutes. A production Kubernetes deployment with monitoring, backups, and proper sizing is days of work, not hours.
Pros and Cons
Pros
- Apache 2.0 license. Use it commercially, embed it, resell it — no restrictions. Cleaner than most alternatives [1][2].
- Performance leader for batch ingestion. Reddit’s 5–10x advantage over competitors is the most credible real-world benchmark in the dataset — that’s production infrastructure at social-network scale, not a synthetic benchmark [1].
- Native hybrid search. Full-text + vector in a single query without cobbling together separate systems [1]. This is a meaningful engineering simplification for RAG.
- Billion-scale architecture. The only tool in this comparison that’s architected from the ground up for truly massive collections. Horizontal scaling via K8s, separated storage and compute [README][1].
- Broad index support. HNSW, IVF (multiple variants), FLAT, DiskANN — you can tune the speed/recall/memory tradeoff for your specific workload rather than being locked into one algorithm [README][2].
- Forrester Wave leader. Independent analyst recognition with presumably real enterprise use cases behind it [1].
- Zilliz Cloud free tier. The zero-cost prototyping path lowers the barrier to evaluation [4].
- v2.6 tiered storage. Hot/warm/cold data tiers reduce the cost of large-scale deployments — relevant if you’re storing embeddings for billions of documents [1][4].
Cons
- Operational complexity. The full distributed deployment is genuinely hard to run. etcd, MinIO, Pulsar, multiple Milvus node types — this is not a weekend project for someone new to distributed systems [3][README].
- Not for small datasets. At small scale (under a few million vectors), simpler options like pgvector or Chroma are easier to operate and perform adequately. Milvus’s complexity tax only pays off at scale [2].
- Benchmarks have caveats. Fastest indexing time is real. Best RPS/latency at high dimensionality is not always Milvus [1]. Know which metric matters for your workload before committing.
- No website data available. The scrape of milvus.io failed during article research — pricing pages and feature lists had to be sourced from third parties, which means some current details may have drifted [merged profile scrape error].
- Hidden cloud costs. Index rebuilds, data egress, and underestimating vCU consumption are called out explicitly as surprise cost drivers on Zilliz Cloud [4].
- etcd dependency is a liability. etcd is powerful but operationally sensitive — it needs its own sizing, backup, and quorum management. Adding it as a Milvus dependency means you’re now also operating etcd [3].
- Community vs commercial split. Enterprise features (advanced security, enterprise support SLAs) are gated behind Zilliz Cloud commercial plans — the open-source version doesn’t have a commercial enterprise tier equivalent in the way some other tools do [4].
Who should use this / who shouldn’t
Use Milvus if:
- You’re an AI/ML engineer building a production RAG system, semantic search engine, or recommendation system that will need to scale beyond a few million vectors.
- You’re on Kubernetes already and the operational complexity of distributed Milvus doesn’t add to your existing burden.
- You need hybrid search (full-text + vector) without running two separate databases.
- Apache 2.0 license matters — you’re embedding this in a commercial product or don’t want licensing surprises.
- You’re prototyping and want to use Milvus Lite locally, then graduate to Zilliz Cloud or self-hosted without rewriting your client code.
Skip it (use pgvector instead) if:
- Your dataset is under 5 million vectors and you’re already running PostgreSQL. pgvector with HNSW indexes is much simpler to operate and performs adequately at this scale.
- You don’t have Kubernetes expertise and can’t afford to acquire it. The operational surface area is too large.
- You want SQL joins and relational queries alongside vector search — Milvus is a dedicated vector store, not a general-purpose database.
Skip it (use Qdrant instead) if:
- You want a vector database that’s easier to self-host than Milvus but more capable than pgvector. Qdrant’s Rust implementation is fast, the deployment story is simpler (single binary or Docker), and the filtering capabilities are strong.
- You’re a solo developer or small team where operational simplicity outweighs raw scale.
Skip it (use Zilliz Cloud directly) if:
- You want managed Milvus without the self-hosting complexity. Same underlying engine, zero K8s administration.
Skip it (use Chroma or Weaviate) if:
- You’re building a prototype or internal tool with lighter scale requirements. Chroma in particular has a much shorter path from “pip install” to working search.
Alternatives worth considering
- pgvector — PostgreSQL extension for vector search. The right choice if you’re already on Postgres, your dataset is under ~10M vectors, and you don’t want to operate another database. HNSW index support since pg_embedding made it competitive on recall/speed [2].
- Qdrant — Rust-based, single-binary deployment, strong payload filtering, good performance benchmarks. The operational simplicity advantage over Milvus is real and meaningful for teams without K8s experience.
- Weaviate — GraphQL-first, multi-modal by design, good managed cloud option. More opinionated about schema than Milvus. Slightly easier to get started, less raw performance at billion-scale.
- Chroma — Developer-friendly, Python-first, embeddable or client-server. The right tool for prototyping and smaller production deployments. Not in the same performance class for scale.
- Pinecone — Fully managed, no self-hosting option. The operational simplicity is the product. Expensive at scale, no open-source option, vendor lock-in is real.
- Elasticsearch / OpenSearch — If you’re already running Elasticsearch and adding vector search, staying there might be simpler than introducing Milvus. The vector performance at scale is worse, but the operational familiarity may offset that.
- Zilliz Cloud — Managed Milvus. If you want the Milvus engine without the K8s complexity, this is the direct answer.
Bottom line
Milvus is the right answer to a specific question: “I need to search billions of vectors at production scale with sub-10ms latency, I have Kubernetes expertise on my team, and I need a license I can actually use commercially.” If that’s your situation, nothing in the open-source landscape matches it cleanly. The 43K GitHub stars, the Forrester Wave leadership, and Reddit Engineering’s production vote are genuine signals, not marketing noise.
The honest caveat for the unsubbed.co audience — non-technical founders escaping SaaS bills — is that Milvus is probably not your tool. The operational complexity of distributed Milvus is not offset by cost savings if you’re spending 40 hours learning Kubernetes. In that scenario, Zilliz Cloud’s free tier for prototyping and serverless tier for small production workloads gives you the Milvus engine without the infrastructure burden. Or consider Qdrant for self-hosting with substantially less operational surface area.
If you need someone to evaluate whether Milvus or a simpler alternative fits your actual workload — and to deploy it without the weekend of K8s debugging — that’s the kind of scoping upready.dev does.
Sources
- Milvus Blog — “Milvus Exceeds 40K GitHub Stars” (milvus.io). https://milvus.io/blog/milvus-exceeds-40k-github-stars.md
- SaaSHub — “Milvus Reviews: Is Milvus Good?” (saashub.com). https://www.saashub.com/milvus
- DatabaseMart — “Milvus Hosting – Scalable Vector Database for AI Applications” (databasemart.com). https://www.databasemart.com/ai/milvus-hosting
- Airbyte — “Milvus Vector Database Pricing: Cloud vs Self-Hosted Cost Guide” by Jim Kutz, August 28, 2025 (airbyte.com). https://airbyte.com/data-engineering-resources/milvus-database-pricing
Primary sources:
- GitHub repository and README: https://github.com/milvus-io/milvus (43,358 stars, Apache-2.0 license)
- Official website: https://milvus.io
- Zilliz Cloud (managed service): https://cloud.zilliz.com
Features
Authentication & Access
- Multi-User Support
- Role-Based Access Control
AI & Machine Learning
- AI-Powered Search
Search & Discovery
- Full-Text Search
Security & Privacy
- Encryption
- SSL / TLS / HTTPS
Category
Replaces
Compare Milvus
Both are database tools. Jina.ai has 2 unique features, Milvus has 6.
Both are database tools. Jina has 2 unique features, Milvus has 6.
Both are database tools. Milvus has 6 unique features, SeMI's Weaviate has 3.
Both are database tools. Milvus has 6 unique features, Weaviate has 3.
Related Databases & Data Tools Tools
View all 122 →Supabase
99KThe open-source Firebase alternative — Postgres database, Auth, instant APIs, Realtime subscriptions, Edge Functions, Storage, and Vector embeddings.
Prometheus
63KAn open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
NocoDB
62KTurn your existing database into a collaborative spreadsheet interface — without moving a single row of data.
Meilisearch
56KLightning-fast, typo-tolerant search engine with an intuitive API. Drop-in replacement for Algolia that you can self-host for free.
DBeaver
49KFree universal database management tool for developers, DBAs, and analysts. Supports 100+ databases including PostgreSQL, MySQL, SQLite, MongoDB, and more.
TiDB
40KTiDB is an open-source, MySQL compatible, distributed SQL database that powers companies like LinkedIn, Pinterest, Square, and more.