unsubbed.co

Hydra

Hydra is a self-hosted media & streaming tool with support for postgresql, postgres, data warehouse.

Open-source analytics on Postgres, honestly reviewed. No marketing fluff, just what you get when you self-host it.

TL;DR

  • What it is: An open-source columnar storage extension for Postgres that turns your existing database into an analytical engine — no separate data warehouse required [README][1].
  • Who it’s for: Developers and technical founders already running Postgres who want analytics query performance without adding Snowflake, BigQuery, or ClickHouse to their stack [README][website].
  • Cost savings: Snowflake and BigQuery bill per query and per TB — costs that spiral unpredictably at volume. Hydra self-hosted runs on any Postgres-compatible VPS with no per-query fees [1].
  • Key strength: Drop-in replacement for the standard Postgres Docker image. If you’re already on Postgres, you change one line and gain columnar storage, vectorized execution, and up to 23× faster aggregate queries [README][1].
  • Key weakness: 3,017 GitHub stars — real, but modest compared to category leaders like ClickHouse (40K+) or DuckDB (25K+). The project’s future direction after its pivot toward pg_duckdb integration is not fully documented, and third-party reviews are sparse [README].

What is Hydra

Hydra is a Postgres extension that adds columnar storage to standard Postgres. It is not a fork of Postgres and not a separate database — it hooks into Postgres’s tableam (table access method API), which has been available since Postgres 12 [README]. You run it as a Docker image that is a drop-in replacement for the official Postgres image: same connection string format, same SQL dialect, same tooling.

The core product is called Hydra Columnar. It stores table data in column-oriented format instead of Postgres’s default row-oriented heap. That difference matters enormously for analytics workloads. When a query runs SELECT COUNT(*), SUM(revenue) FROM events WHERE date > '2024-01-01', a row store reads every column in every matching row. A column store reads only the revenue and date columns. On a table with 50 columns and 500 million rows, the difference is not subtle.

Beyond storage format, Hydra adds:

  • Vectorized execution — processes batches of column values using SIMD instructions rather than one row at a time
  • Query parallelization — spreads analytical queries across multiple CPU cores
  • Column-level caching — caches frequently queried column segments separately from Postgres’s shared_buffers

The more recent direction of the project has moved toward integrating DuckDB via pg_duckdb, giving Postgres access to DuckDB’s analytical query engine. The website now features testimonials specifically about pg_duckdb from founders at DuckDB Labs, Dagster, MotherDuck, and ElectricSQL [website]. This is a meaningful evolution — DuckDB is widely regarded as the fastest in-process analytical engine available — but the integration story is still maturing.

Hydra also offers a hosted cloud product at hydra.so, branded as “Serverless Analytics on Postgres” with compute autoscaling, automatic caching, and columnar storage managed for you [website]. The open-source extension and the hosted product are related but distinct.

The GitHub repository currently has 3,017 stars and is licensed under AGPL 3.0 for the columnar component and Apache 2.0 for the rest of the codebase [README].


Why people choose it

Third-party reviews of Hydra the database extension are limited. Most search results for “Hydra review” surface unrelated products (a doom metal band, a VDI management tool, an AI research paper — all different projects sharing the name). The clearest external signal comes from Elestio, a managed hosting platform that offers Hydra as a service starting at $14/month [1].

The testimonials on the Hydra website itself give the clearest picture of who is actually using it and why:

The “one database” argument. The most common theme across testimonials is eliminating the second database. Tom Hacohen (Svix): “Having the same database handle both normal and analytics workloads is a game changer. No more syncing data to a separate database.” Harold Gimenez (HashiCorp SVP Engineering): “Hydra makes exploration and analysis of large amounts of data accessible and familiar. Being built on Postgres and DuckDB, it’s standing on a proven technical foundation.” [website]. This argument has real weight: keeping OLTP and OLAP workloads in separate systems means ETL pipelines, data synchronization lag, two sets of credentials, two monitoring setups, and two failure modes.

Compression and cost. Gary Sahota from Tether Data: “We are building the foundation of our analytics stack around Postgres. Hydra is a no-brainer and our data compressed by 5X.” [website]. Columnar storage compresses significantly better than row storage for typical analytics data — repeated values, sorted columns, and narrow types compress to a fraction of the heap size.

The Postgres compatibility angle. Pete Hunt (Dagster): “Building on top of Postgres, we’ve been looking for ways to boost and scale analytical query performance. Using DuckDB execution in Postgres with pg_duckdb will be a game changer.” [website]. For teams that have invested in Postgres tooling — connection poolers, ORMs, observability, backups — adopting Hydra means keeping all of that and adding analytics performance.

The benchmark claim. The README cites benchmarks run on a c6a.4xlarge (16 vCPU, 32 GB RAM) against the ClickBench suite — 42 queries covering clickstream analysis, web analytics, machine-generated data, structured logs, and event data. Hydra claims results that outpace standard Postgres substantially. These benchmarks are published at the ClickBench link in the README and the continuous benchmark results are tracked in the repository [README]. Independent verification is your responsibility — benchmark conditions rarely match your workload.


Features

Storage and query engine:

  • Columnar storage via Postgres tableam API — no fork, no schema changes required [README]
  • Vectorized execution and query parallelization across CPU cores [README][1]
  • Column-level caching separate from Postgres shared_buffers [README]
  • Support for standard aggregates: COUNT, SUM, AVG, and WHERE clause pushdown [README]
  • Bulk INSERT and UPDATE/DELETE support [README]

Hybrid workloads (HTAP):

  • Postgres heap tables (row storage) and columnar tables coexist in the same database [1]
  • You can JOIN columnar and row-store tables in a single query [website]
  • Native partitioning and indexing support on heap tables [1]
  • Columnar tables support btree and hash indexes and their constraints — but generally don’t need indexes because columnar scans are efficient [README]

External data access:

  • External Tables via Postgres FDW (foreign data wrapper) for querying data outside the Hydra database [1]
  • Bundled FDWs in the Docker image: mysql_fdw, parquet_s3_fdw, pgsql-http, multicorn [README]
  • S3 Parquet access via parquet_s3_fdw — query Parquet files directly without importing [README]

Deployment:

  • Docker image: drop-in replacement for postgres:latest [README]
  • Docker Compose setup documented in the repository [README]
  • Hosted cloud option with autoscaling and automatic caching [website]

What columnar is not built for:

  • Frequent large updates across many rows [README]
  • Small, high-throughput transactional writes where row-by-row latency matters [README]
  • Logical replication (not supported for columnar tables) [README]

Pricing: SaaS vs self-hosted math

Hydra Cloud (hosted):

  • Pricing: starts with a 14-day free trial after booking a demo [website]
  • No public pricing page with specific numbers — you book a demo to get pricing [website]
  • Elestio’s managed Hydra hosting starts at $14/month and includes automated backups, SSL, updates, and monitoring [1]

Self-hosted:

  • Software license: $0 for the open-source extension (AGPL 3.0 for columnar, Apache 2.0 for the rest) [README]
  • Runs on any VPS or cloud instance that supports Docker
  • A $10–20/month VPS with 4–8GB RAM handles moderate analytics workloads
  • No per-query fees, no per-TB billing, no storage tiers

Snowflake for comparison:

  • Credits billed per second of compute; $2–4 per credit depending on tier
  • A single complex analytical query on a large table can consume multiple credits
  • Storage billed at ~$23/TB/month
  • Teams running continuous analytics pipelines regularly hit $500–2,000/month at scale
  • Data not available for direct Snowflake vs Hydra savings calculation — depends entirely on query volume and data size

BigQuery for comparison:

  • On-demand: $6.25 per TB queried (first 1 TB/month free)
  • 100 GB queried per day = ~$18.75/day = ~$562/month
  • Flat-rate slots available but require minimum spend commitments

Realistic framing for a small founder: If you’re running Postgres for your application and doing analytics queries that are slowing down your database — GROUP BY report queries taking 30 seconds, dashboards timing out — Hydra gives you columnar performance without adding a second database or a SaaS analytics bill. The self-hosted cost is the VPS you’re probably already paying for. That’s the value proposition. It’s not competing with Snowflake on enterprise features; it’s competing with the decision to add a separate analytics system at all.


Deployment reality check

The self-hosted path is as close to frictionless as Postgres extensions get:

git clone https://github.com/hydradatabase/columnar && cd columnar
cp .env.example .env
docker compose up
psql postgres://postgres:hydra@127.0.0.1:5432

That is the entire local setup from the README [README]. You connect to it with the same Postgres client you already use.

What you actually need:

  • A Linux server with Docker installed
  • 4+ GB RAM for serious analytical workloads (the benchmark machine had 32 GB)
  • Disk space proportional to your data (columnar compression helps, but compressed data is still data)
  • Your existing Postgres client, ORM, or BI tool — no changes needed

What can go sideways:

The project has gone quiet on GitHub contribution activity in the past year relative to the 2022–2023 peak. The README’s “Community and Status” section declares Hydra 1.0 as “Generally Available and ready for production use,” but the project only has 3,017 stars — not the kind of signal that indicates a large, battle-tested user base [README]. If you’re betting your analytics stack on this, that matters.

Logical replication is explicitly unsupported for columnar tables [README]. If your infrastructure uses logical replication for read replicas or CDC pipelines, columnar tables drop out of that flow. You’d need to manage columnar tables separately.

The pg_duckdb integration highlighted on the website is a newer development and the documentation for it is still evolving. The website testimonials treat it as the primary value proposition [website], but the README focuses on the native columnar implementation — there’s some messaging friction between what the product is and what it’s becoming.

Elestio-managed Hydra handles the ops burden (SSL, backups, monitoring) for $14/month [1]. For non-technical founders who want the analytics benefit without the Docker setup, that’s a reasonable entry point.


Pros and Cons

Pros

  • Postgres-native, not a fork. Uses the official tableam API introduced in Postgres 12. Your existing Postgres tooling, ORMs, connection poolers, and observability stack all work unchanged [README].
  • Drop-in Docker replacement. One image swap, same connection string, columnar storage available immediately. Setup time measured in minutes, not days [README].
  • Eliminates the second database. Columnar and row tables coexist and can be JOINed in the same query. One database to back up, monitor, and pay for [1][website].
  • S3 Parquet and external table support. Query data in S3 directly via parquet_s3_fdw without importing it into Postgres [README].
  • Managed option available. Elestio hosts Hydra starting at $14/month with automated backups, SSL, and monitoring for teams that don’t want to manage the deployment [1].
  • Open source core. AGPL 3.0 for the columnar extension means you can inspect, modify, and self-host it. No usage limits or per-seat pricing on the open-source tier [README].

Cons

  • Small community. 3,017 stars is not a rounding error, but it’s modest for infrastructure software you’re trusting with production data. Compare: ClickHouse ~40K, TimescaleDB ~18K, DuckDB ~25K [README].
  • No logical replication for columnar tables. If your stack uses logical replication for read replicas, streaming to Kafka, or change data capture, columnar tables are excluded [README].
  • Columnar tables are not for OLTP. Frequent small updates, high-throughput writes, and short-latency lookups belong on heap tables. You need to understand which tables to make columnar and which to leave as-is [README].
  • Sparse third-party validation. Independent benchmarks and production case studies are thin. The benchmarks in the README are self-published; independent ClickBench submissions are available but require you to find and evaluate them yourself [README].
  • Unclear roadmap transparency. The pivot toward pg_duckdb integration is visible from the website but not well-documented in terms of what it means for the columnar extension’s own development path.
  • Not a full data warehouse. No built-in scheduler, no GUI, no transformation layer. You still need dbt, Airflow, or equivalent if you want a full data stack [README].
  • Pricing opacity. The hosted product requires booking a demo to get pricing — no self-serve tier listed publicly [website].

Who should use this / who shouldn’t

Use Hydra if:

  • You’re already on Postgres and your analytics queries (reports, dashboards, aggregates) are noticeably slowing down your application database.
  • You want columnar query performance without adding a second database, a second bill, or a second set of things to break.
  • You have some technical comfort with Docker — the deployment is simple, but you need to know what a container is.
  • Your team is comfortable with SQL and Postgres tooling. Hydra adds storage formats, not a new query language.

Skip it (use TimescaleDB instead) if:

  • Your analytics data is time-series in nature (metrics, logs, sensor data). TimescaleDB has a much larger community and more mature tooling for that specific shape of data.

Skip it (use ClickHouse instead) if:

  • You need a purpose-built column store with massive throughput and a large, battle-tested user community. ClickHouse is what you reach for when you’ve outgrown what a Postgres extension can deliver.

Skip it (use DuckDB standalone instead) if:

  • You want DuckDB’s analytical engine without the Postgres layer. DuckDB runs in-process, queries Parquet and CSV files directly, and is the fastest option for ad-hoc analytics without a server.

Skip it (stay on your current stack) if:

  • Your analytics queries are fast enough on standard Postgres — if adding an index or a materialized view solves the problem, that’s the right answer.
  • Your team has no one who can manage a Docker container. The self-hosted path requires basic Linux and Docker familiarity.
  • You need logical replication and can’t route around that constraint.

Alternatives worth considering

  • ClickHouse — the purpose-built column store benchmark leader. Substantially more complex to operate, much larger community, better suited for very high ingest rates and petabyte-scale analytics. Not Postgres-compatible.
  • TimescaleDB — Postgres extension focused on time-series data. Better community, better documentation, narrower scope. If your data has a time dimension as its primary axis, TimescaleDB is more mature.
  • DuckDB (standalone) — the fastest in-process analytical engine available. Runs without a server, queries files directly, embeds in Python or Node.js. If you want DuckDB’s query speed without Postgres, this is simpler.
  • pg_duckdb — Hydra’s own direction, integrating DuckDB as the analytical execution engine inside Postgres. Still emerging, but worth watching if you want DuckDB performance with Postgres compatibility.
  • Paradedb — another Postgres extension focused on search and analytics. Different angle but worth evaluating if full-text search is part of your analytics need.
  • Snowflake / BigQuery — the established SaaS options. Better supported, no ops burden, predictable (if expensive) pricing. The right answer if you’re not technically inclined and your budget covers the bill.

Bottom line

Hydra makes a clear, narrow bet: if you’re on Postgres and you want columnar query performance, you should not have to add a second database system to get it. The architecture is sound — Postgres’s tableam API was designed exactly for this, and the drop-in Docker replacement genuinely requires almost no setup [README]. The pg_duckdb integration, if it matures, would give Postgres users access to one of the fastest analytical engines available without leaving the Postgres ecosystem.

The honest concern is community size and trajectory. At 3,017 stars and limited third-party documentation, this is infrastructure you’re adopting somewhat on faith. It’s not a project in crisis — Hydra 1.0 is declared generally available and the testimonials from real companies are there [website][1] — but it hasn’t achieved the critical mass of adoption that makes infrastructure software genuinely safe to bet your production stack on. For experimentation, internal tooling, or small-scale analytics on existing Postgres data, the risk is low and the upside is real. For production analytics at any significant scale, run the ClickBench numbers against your actual workload before committing.


Sources

  1. Elestio — Managed Hydra as a Service (pricing from $14/mo, feature overview). https://elest.io/open-source/hydra

Primary sources:

Features

Integrations & APIs

  • Plugin / Extension System
  • REST API