unsubbed.co

SeaweedFS

Distributed storage for billions of small files — O(1) disk seek regardless of file count, with S3 API, FUSE mount, and WebDAV in one system.

Best for: Teams storing massive numbers of small files (images, thumbnails, user uploads) who have outgrown single-server storage, or anyone paying significant AWS S3 bills with 10TB+ data.

TL;DR

  • What it is: An open-source (Apache 2.0) distributed storage system that combines S3-compatible object storage, a POSIX filesystem via FUSE, and HDFS compatibility. Inspired by Facebook’s Haystack paper, designed for billions of small files with O(1) disk seek.
  • Who it’s for: Teams storing massive numbers of small files (images, thumbnails, documents) who’ve outgrown single-server storage, and anyone paying significant AWS S3 bills who want the same API on their own hardware.
  • Cost savings: AWS S3 costs $0.023/GB/month for storage plus data transfer fees. Self-hosted SeaweedFS on dedicated servers runs at hardware cost only. One team reported dropping from $1,950/month on S3 to $215/month self-hosted — 89% reduction.
  • Key strength: Handles billions of small files without performance degradation. The volume-based architecture packs files into larger volumes, requiring only one disk seek per file access regardless of total file count. MinIO and traditional filesystems hit inode limits at scale.
  • Key weakness: Requires external metadata store (Redis, PostgreSQL, etc.) for the filer component, adding operational complexity. Documentation is wiki-based and can be sparse. Smaller community and corporate backing than MinIO.

What is SeaweedFS

SeaweedFS is a distributed storage system that solves a specific problem: storing and serving billions of files with predictable low-latency access. The core insight, borrowed from Facebook’s Haystack paper, is that traditional filesystems waste enormous overhead when dealing with many small files. Each file requires its own inode (512 bytes of metadata), and accessing a file requires multiple disk seeks to traverse the directory structure.

SeaweedFS eliminates this by packing files into larger volumes (typically 30GB). A single volume can hold tens of thousands of files. To read a file, the system looks up its volume location (cached in memory — nanosecond access), then does exactly one disk seek to read the data. This gives O(1) read performance regardless of whether you have 1,000 files or 1 billion.

The architecture separates concerns into distinct components:

  • Master Server: manages volume allocation, cluster topology, and metadata. Supports Raft consensus for high availability.
  • Volume Servers: store the actual data in volumes. Each volume is a single file on disk containing packed user files with only 16-40 bytes metadata overhead per file (versus 512 bytes in ext4).
  • Filer: provides user-facing interfaces — HTTP API, S3-compatible gateway, FUSE mount for POSIX access, and WebDAV. Requires an external metadata store.

The project has 30,980 GitHub stars, is written in Go, and has been actively developed since 2011.


Why people choose it over MinIO, Ceph, and AWS S3

Versus MinIO. MinIO is the obvious comparison — both are S3-compatible open-source object stores. MinIO is a pure S3 replacement with near-complete API compatibility (versioning, lifecycle policies, bucket notifications, IAM). SeaweedFS supports core S3 operations but lacks advanced features like S3 Select and fine-grained IAM. Where SeaweedFS dominates is small-file workloads. MinIO stores each object as an individual file on the underlying filesystem, meaning it inherits filesystem inode limitations. At billions of small files, MinIO struggles while SeaweedFS’s volume-packing architecture maintains constant performance. For pure S3 replacement with large files, MinIO is the safer choice. For billions of small files, SeaweedFS is technically superior.

Versus Ceph. Ceph is the enterprise-grade distributed storage system used by Red Hat and major cloud providers. It’s notoriously complex to deploy — a minimum Ceph cluster requires at least 3 nodes with specific hardware requirements, and tuning it for production takes deep expertise. SeaweedFS is simpler to deploy and operate, but lacks Ceph’s enterprise features. If you have a dedicated storage team, Ceph. If you’re a small team that needs distributed storage, SeaweedFS.

Versus AWS S3. S3 is infinitely scalable, zero maintenance, and rock-solid reliable. At small scale (under 1TB), S3 is cheaper than buying and maintaining hardware. At medium-to-large scale (10TB+), self-hosted SeaweedFS on dedicated servers can save 80-90% on monthly costs. The breakeven point is around 10TB.


Features: what it actually does

Core storage engine:

  • O(1) disk seek for file access via volume-based architecture
  • Files packed into volumes with only 16-40 bytes metadata overhead per file
  • Configurable replication levels with rack and data center awareness
  • Erasure coding for storage efficiency (configurable ratios)
  • Automatic master server failover via Raft consensus
  • Automatic gzip compression and compaction
  • TTL (time-to-live) support for automatic file expiration

Access protocols:

  • S3 gateway — S3-compatible API for drop-in replacement of AWS S3
  • FUSE mount — mount as a local filesystem for POSIX access via standard Linux tools
  • WebDAV — for file manager and desktop access
  • HDFS compatibility — drop-in replacement for Hadoop HDFS
  • HTTP REST API — direct upload/download/delete operations
  • Iceberg tables — for data lake integration

Filer (filesystem layer):

  • Directory structure with metadata stored in external database (Redis, PostgreSQL, MySQL, MongoDB, Cassandra, LevelDB, SQLite)
  • Cloud tiering — transparent offloading of cold data to S3, GCS, Azure Blob, Backblaze B2
  • Cross-cluster replication for disaster recovery

Operations:

  • Docker deployment with docker-compose examples
  • Kubernetes deployment via Helm charts
  • Single-command development mode via weed mini
  • Master UI at port 9333 for cluster overview
  • Prometheus metrics endpoint for monitoring

Pricing: SaaS vs self-hosted math

AWS S3 for comparison:

  • Storage: $0.023/GB/month (Standard)
  • Data transfer out: $0.09/GB (first 10TB/month)
  • PUT requests: $0.005 per 1,000

SeaweedFS self-hosted:

  • Software: $0 (Apache 2.0)
  • Hardware: dedicated servers from Hetzner, OVH, or similar
  • Enterprise: $1/TB/month or $10/TB/year, free under 25TB

Concrete savings from a real migration: A team storing 50TB of media files on S3 was paying:

  • S3 storage + data transfer: ~$1,950/month

After migrating to SeaweedFS on 3 Hetzner dedicated servers ($65/month each):

  • Hardware + bandwidth: ~$215/month

Annual savings: ~$20,000. The breakeven for self-hosting is around 10TB.


Deployment reality check

Development / testing (5 minutes): Run weed mini -dir=/data — this single command starts a complete SeaweedFS setup with Master, Volume Server, Filer, and S3 gateway. Suitable for evaluation.

Production minimum (3 nodes, 2-4 hours):

  • 3 servers with dedicated storage
  • Master server on each node for Raft HA
  • Volume servers using local disks
  • Filer with Redis or PostgreSQL metadata store
  • S3 gateway fronted by a reverse proxy

What can go sideways:

  • Volume sizing. The default 30GB volume size creates many volumes consuming master server memory. For large deployments, increase to 100GB+.
  • Replica placement. Without explicit --rack and --dataCenter flags, SeaweedFS may place replicas on the same physical server.
  • Garbage collection. Deleted files don’t reclaim space automatically. Run weed volume.vacuum periodically or disk usage grows without bound.
  • Metadata store bottleneck. At 100M+ files, the filer’s metadata store becomes the bottleneck. PostgreSQL slows down on list operations — switching to Redis helps.
  • S3 compatibility gaps. S3 Select, Object Lock, and fine-grained IAM are missing. Test your application’s S3 usage patterns before migrating.

Realistic time estimate: 5 minutes for weed mini evaluation. 2-4 hours for a production 3-node cluster. 1-2 days for a full migration from S3 including data sync and application testing.


Who should use this

Use SeaweedFS if:

  • You’re storing billions of small files (images, thumbnails, documents, user uploads) and traditional filesystems are hitting inode limits.
  • You’re spending significant money on S3 and want to bring storage in-house with S3 API compatibility.
  • You need multi-protocol access (S3 + filesystem + WebDAV) to the same data.
  • You have 10TB+ of data where self-hosting economics clearly beat cloud storage.

Use MinIO instead if:

  • You need near-complete S3 API compatibility (versioning, lifecycle, IAM).
  • Your objects are mostly large files (>1MB) where MinIO’s per-object storage model isn’t a bottleneck.
  • You want better documentation, a polished web console, and larger community support.

Stay on AWS S3 if:

  • You have under 5TB and the operational overhead of self-hosting isn’t worth the cost savings.
  • Your team doesn’t have Linux server administration expertise.

Sources

This review synthesizes 5 independent third-party articles along with primary sources from the project itself. Inline references throughout the review map to the numbered list below.

  1. [1] medium.com (2023-09-05) — “Seaweedfs Distributed Storage Part 1: Introduction” — tutorial-architecture (link)
  2. [2] typevar.dev (2025-10-26) — “SeaweedFS: A Software Engineer’s Guide to Billions of Files and O(1) Disk Seek” — deep-dive-technical (link)
  3. [3] computingforgeeks.com (2020-12-01) — “Setup SeaweedFS Distributed Object Storage on Ubuntu [Guide]” — tutorial-deployment (link)
  4. [4] blog.min.io (2024-08-01) — “SeaweedFS vs MinIO: Distributed Object Storage Compared” — comparison (link)
  5. [5] medium.com (2025-02-01) — “Migrating 50TB from AWS S3 to SeaweedFS: Lessons Learned” — migration-cost (link)
  6. [6] GitHub repository — official source code, README, releases, and issue tracker (https://github.com/seaweedfs/seaweedfs)
  7. [7] Official website — SeaweedFS project homepage and docs (https://seaweedfs.com)

References [1]–[7] above were used to cross-check claims about features, pricing, deployment, and limitations in this review.

Features

Integrations & APIs

  • Plugin / Extension System
  • REST API