unsubbed.co

Hatchet

Hatchet is a self-hosted developer tools replacement for Amazon CodeCatalyst, CircleCI, and more.

Durable background task orchestration, honestly reviewed. Built for engineering teams tired of babysitting Redis-backed queues.

TL;DR

  • What it is: MIT-licensed durable task queue and workflow orchestrator built on Postgres — a modern replacement for Celery, BullMQ, and the simpler end of Temporal [2][3].
  • Who it’s for: Engineering teams building AI agents, RAG pipelines, or any background processing that needs fairness, retries, and observability without setting up a full Temporal or Airflow cluster [1][2].
  • Cost savings: Managed Temporal or similar orchestration services charge per state transition — at scale, that adds up fast. Hatchet’s MIT license means you self-host on whatever Postgres instance you already have, and the cloud tier starts free [2].
  • Key strength: Postgres as the single source of truth. No separate message broker required under ~100 requests/second, which cuts your infrastructure surface in half [1][2]. SDKs in Python, TypeScript, and Go with first-class Pydantic and async support [3].
  • Key weakness: Squarely a developer tool — requires code to use. No visual workflow builder, no drag-and-drop. Non-technical founders won’t run this themselves; they’ll have a developer run it for them [2][3].

What is Hatchet

Hatchet is a background task orchestration platform. You define tasks as ordinary functions, register them on workers, and let Hatchet handle scheduling, retries, concurrency limits, fairness, and visibility. The core insight behind the project is that Postgres — which most teams already run — is sufficient for a task queue at sane throughput levels, and using it means you get transactional consistency between task state and business data in a single database [2].

The founders (Alexander and Gabe, YC W24) built it out of direct frustration with Celery in modern Python projects: poor async support, no built-in Pydantic integration, and a debugging experience that amounts to reading log files and hoping [2][3]. The Python SDK in particular draws obvious inspiration from FastAPI — Pydantic models as typed task inputs and outputs, async-first, and a decorator API that reads naturally alongside the rest of a modern Python backend [3].

As of this review: 6,746 GitHub stars, MIT licensed, SDKs for Python, TypeScript, and Go. There’s a cloud offering at cloud.onhatchet.run for teams who want managed orchestration engine with workers running on their own infrastructure, and a self-hosted path for those who want everything on-prem [2].


Why People Choose It

The clearest case for Hatchet comes from a team called Cynco, documented in a detailed AWS deployment writeup [1]. Their previous stack was a custom Go queue with straightforward FIFO — fine for single-tenant workloads, increasingly painful for multi-tenant AI pipelines. The problems they hit are the canonical reasons people graduate from simple Redis queues:

  1. Retry logic becomes custom code. Every team reinvents exponential backoff, jitter, and dead-letter handling. Hatchet makes these first-class configuration [1].
  2. FIFO is unfair under multi-tenancy. User 100 waits for users 1–99 to clear. Hatchet’s round-robin group keys ensure tasks from different tenants interleave, with per-tenant concurrency caps to prevent any one customer from saturating the queue [1].
  3. In-memory queues don’t survive restarts. Postgres-backed storage means every task state survives a crash and picks up exactly where it left off [1][2].

The HN launch post [2] identifies the other frequently-hit wall: Celery’s friction with modern Python. If your codebase uses async/await throughout and Pydantic models everywhere, Celery starts feeling like a foreign body — its API predates both and accommodates them poorly [2][3]. The Reddit r/Python thread [3] confirms this: engineers appreciate being able to define a task input as a Pydantic model and have the SDK handle serialization, deserialization, and static type checking end to end.

The comparison to Temporal and Airflow is worth spelling out: both are more powerful, and both are significantly more operationally complex. Temporal requires its own cluster and a specific programming model. Airflow is Python-centric but built around DAGs as YAML/Python scripts, not around the application code that produces and consumes tasks. Hatchet fits the gap between “a Redis queue with a library” and “a full Temporal deployment” — specifically the gap where most product startups actually live [1][2].


Features

Core queue mechanics:

  • Durable task queue backed by Postgres; optional RabbitMQ for higher throughput [2]
  • Automatic retries with configurable policies — no custom retry code needed [1]
  • Task start latency under 20ms [website scrape]
  • Exactly-once semantics for task state updates [website scrape]

Workflow composition:

  • Chain tasks into multi-step workflows with dependency graphs [2]
  • Fan-out (child workflows) — one parent task spawning arbitrary parallel children [2]
  • Steps that only run on workflow failure — built-in cleanup paths [2]
  • Checkpoint-based recovery: failed workflows resume from the last successful step, not from scratch [website scrape]

Fairness and concurrency:

  • Group round-robin queuing with group keys — deterministic fairness across tenants [1]
  • Per-tenant concurrency and rate limits [1]
  • Global rate limiting [2]
  • Priority lanes [website scrape]

Scheduling:

  • Schedule tasks at a future timestamp — not held in memory, safe for arbitrary future dates [3]
  • Dynamic cron creation from code [3]
  • Declarative crons on workflow definitions [3]
  • HTTP webhook triggers (useful for Vercel’s function timeout limits) [2]

Observability:

  • Real-time web dashboard with task status and workflow visualization [2]
  • Alerting on task failures [website scrape]
  • Replay failed pipelines directly from the UI [website scrape]
  • Metrics export to external monitoring tools [website scrape]

SDKs:

  • Python (Pydantic-first, async-first, FastAPI-inspired) [3]
  • TypeScript (type-safe inputs/outputs, npm package) [2][README]
  • Go [1][README]

Hatchet Lite:

  • Single Docker image bundling all Hatchet components + RabbitMQ migration, admin CLI, REST API, and gRPC engine [2]
  • Designed for local development and low-volume production (hundreds of tasks per minute) [2]

Pricing: SaaS vs Self-Hosted Math

Hatchet Cloud:

  • Free tier available (specific limits not published) [website scrape]
  • Paid tiers: not listed publicly — contact sales or sign up and see [website scrape]

This is a gap worth flagging: pricing page details weren’t available in any of the reviewed sources. Budget-conscious teams evaluating Hatchet Cloud should sign up and compare directly before committing.

Self-hosted (MIT license):

  • Software: $0
  • Postgres: likely already in your stack; if not, a managed Postgres on Supabase or Railway is $5–25/month depending on size
  • Hatchet Lite (single Docker image): runs on a $10–20 VPS for low-to-medium throughput
  • Full deployment with external RabbitMQ: adds another service to manage

What you’re replacing:

  • Celery + Redis: Redis adds $15–50/month on most managed providers. You lose managed retries and workflow chaining — you rebuild those yourself over time.
  • BullMQ + Redis: same infrastructure cost, TypeScript-specific.
  • Temporal Cloud: $0.00000250 per state transition. A moderate workflow processing 1 million steps per day costs ~$75/month and climbs proportionally. At scale, Temporal Cloud becomes a significant line item.
  • Airflow managed (Astronomer, MWAA): $200–500+/month for small clusters.

The self-hosting case for Hatchet is strongest for teams already running Postgres: the incremental infrastructure cost for Hatchet Lite is essentially zero, while the operational cost of maintaining retry logic, fairness algorithms, and a dashboard from scratch is real engineering time [1][2].


Deployment Reality Check

The fast path (Hatchet Lite):

curl -fsSL https://install.hatchet.run/install.sh | bash
hatchet server start

This is genuine — the CLI bootstraps a working local instance in under five minutes for anyone with Docker already installed [README]. For a developer evaluating the tool, the time-to-first-task is one of the shorter ones in this category.

For production:

The recommended architecture separates the orchestration engine (managed Hatchet Cloud or self-hosted on your infra) from the workers (always your own infra). Workers connect to the engine and pull tasks; they run on Kubernetes, Railway, Render, ECS, or any container platform [website scrape]. This split makes sense: you scale workers based on task load, and the orchestration layer is stateless from the worker’s perspective.

For full self-hosting under meaningful load, you need:

  • Postgres (your existing instance or a dedicated one)
  • RabbitMQ (bundled in Hatchet Lite; separate service for production)
  • Container platform for workers
  • A reverse proxy if exposing the dashboard externally

The Medium AWS writeup [1] adds some operational texture: the author’s team (Cynco) chose Hatchet specifically because the Postgres-only mode eliminated RabbitMQ for their sub-100 req/second workload. “Less stuff to handle, deploy, and fix” is the exact framing [1]. That calculation holds until you push past those throughput limits.

What can go sideways:

  • The RabbitMQ dependency in the full stack is an additional service to operate, monitor, and upgrade. Hatchet Lite hides it; production exposes it.
  • Connecting Hatchet to local LLMs (Ollama, etc.) for AI workflows means setting those up separately — Hatchet doesn’t bundle inference.
  • The dashboard is read-focused; programmatic management (triggering or canceling workflows via API) exists but the API surface is less documented than the SDK paths.

Realistic time to production for a team with Docker/container experience: 2–4 hours from zero to workers running in the cloud. For teams new to container orchestration, budget more.


Pros and Cons

Pros

  • MIT licensed, genuinely. Not “fair-code,” not source-available, not “community edition with commercial restrictions.” The full codebase is MIT [2]. You can embed it in a product, fork it, or build a competing service with it.
  • Postgres as the queue broker. If you’re already on Postgres, Hatchet Lite’s marginal infrastructure cost is close to zero. Fewer services, fewer failure modes, and transactional consistency between task state and your business data [1][2].
  • Modern Python SDK. Pydantic models as typed task inputs, async/await throughout, a FastAPI-like decorator API [3]. The ergonomics are measurably better than Celery for any codebase written in the past three years.
  • Multi-language. Python, TypeScript, and Go SDKs — not just a Python story. Teams with polyglot services can use one orchestration layer [1][README].
  • Built-in fairness for multi-tenant workloads. Round-robin group keys and per-tenant concurrency limits are first-class, not bolt-ons [1]. Most simple queues make you implement this yourself.
  • Real observability. Dashboard, alerting, failure replay — without wiring up a separate monitoring stack [2][website scrape].
  • YC-backed with cloud + self-hosted options. Real company, real product, not an abandoned open-source project [2].

Cons

  • Developer tool only. No visual builder, no no-code interface. A non-technical founder cannot use this directly; an engineer needs to integrate it [2][3].
  • Pricing opacity. Cloud tier pricing isn’t publicly listed. Hard to do a cost comparison without signing up [website scrape].
  • RabbitMQ in the full stack. Hatchet Lite abstracts it, but production deployments at meaningful throughput still pull in RabbitMQ as a dependency. More moving parts [2].
  • Younger than Celery and Temporal. 6,746 stars is solid for the age of the project (YC W24), but the community, StackOverflow presence, and third-party tutorial ecosystem aren’t there yet compared to Celery’s decade of production use.
  • API surface for programmatic management is thin. Triggering and managing workflows from code works via SDK; REST API-based management is less mature [2].
  • No built-in AI inference. For AI-heavy workflows, you connect Hatchet to your LLM setup. It orchestrates the calls but doesn’t provide inference [website scrape].

Who Should Use This / Who Shouldn’t

Use Hatchet if:

  • You’re building an AI product or data pipeline with Python, TypeScript, or Go and you’ve already outgrown a simple Redis queue.
  • Your team is dealing with multi-tenant fairness problems — users stepping on each other’s jobs, one customer’s batch filling the entire queue.
  • You want durable execution and replay without the operational overhead of Temporal or Airflow.
  • You’re already running Postgres and want to add background task infrastructure without another managed service.
  • You need retry, timeout, and failure handling that isn’t custom code in your business logic.

Skip it (use Temporal) if:

  • You’re building long-running, event-driven workflows with complex saga patterns, compensation logic, or extremely high durability requirements. Temporal’s execution model handles this more robustly at the cost of more complexity.

Skip it (stay on Celery/BullMQ) if:

  • Your task queue is simple and working, and you don’t need fairness, workflow chaining, or the dashboard. Switching has an integration cost; don’t pay it without a reason.

Skip it (pick Airflow) if:

  • You’re a data engineering team building scheduled batch pipelines with heavy dependency graphs and you need the Airflow ecosystem of operators, hooks, and community connectors.

Skip it entirely if:

  • You’re a non-technical founder hoping to orchestrate business automations. This is code infrastructure, not an automation platform. Look at Activepieces or n8n instead.

Alternatives Worth Considering

  • Celery — the Python incumbent. Mature, huge community, battle-tested. Poor async support, no built-in observability, and debugging requires external tools (Flower, etc.) [2][3].
  • BullMQ — Node.js equivalent of Celery, backed by Redis. Solid for TypeScript teams with simpler workloads.
  • Temporal — the enterprise choice for complex durable workflows. More powerful execution model, more operational complexity, and usage-based cloud pricing that becomes significant at scale.
  • Airflow — the data engineering standard for batch pipeline scheduling. Wrong tool for API-triggered background tasks; right tool for scheduled ETL.
  • Inngest — similar positioning to Hatchet (modern workflow orchestration, code-first), TypeScript-first, cloud-native. Relevant comparison for Node teams.
  • River — Postgres-native job queue for Go. More minimal than Hatchet, single-language, no dashboard.
  • Prefect — Python-only, data-pipeline focused, strong UI, managed cloud with per-run pricing.

For a Python engineering team choosing between Hatchet and Celery, the decision is really about how much custom infrastructure code you want to write for retries, fairness, and observability. Hatchet gives you those out of the box. For teams choosing between Hatchet and Temporal, it’s about complexity: Hatchet is the 80/20 answer that covers most use cases without the learning curve.


Bottom Line

Hatchet is the task queue that engineering teams reach for when a Redis-backed library queue stops being enough but Temporal feels like overkill. The Postgres-native design is a real architectural bet, not a gimmick — it means simpler infrastructure, transactional consistency, and one fewer managed service for most teams. The Python SDK is genuinely good: Pydantic-first, async-aware, and readable to any FastAPI user without a mental context switch. The multi-tenant fairness primitives (round-robin group keys, per-tenant concurrency limits) solve a specific and painful class of bugs that FIFO queues simply can’t handle. The honest limitation is that it’s a developer tool through and through — no UI for non-technical users, limited public pricing transparency on the cloud tier, and a younger ecosystem than the tools it competes with. If you’re running background jobs in Python, TypeScript, or Go and you’ve hit the wall on visibility, fairness, or durability, it’s worth an afternoon to evaluate. The Lite image makes that evaluation nearly frictionless.


Sources

  1. Hazqeel Afyq, Medium“Self-Host Hatchet + Full-Stack App on AWS” (Dec 9, 2025). https://medium.com/@hazqeelafyq09/self-host-hatchet-full-stack-app-on-aws-b7b9c3a2adc8
  2. Best of Show HN (YC W24 launch)“Hatchet (YC W24) – Open-source task queue, now with a cloud version”. https://bestofshowhn.com/yc-w24/hatchet
  3. r/Python, Reddit“Hatchet - a task queue for modern Python apps” (posted by hatchet-dev). https://www.reddit.com/r/Python/comments/1k045yv/hatchet_a_task_queue_for_modern_python_apps/

Primary sources: