An LLM cost-optimization plugin, honestly reviewed. No marketing fluff — just what you actually get when you install it.

TL;DR

What it is: Open-source (MIT) LLM routing plugin for Claude Code that intercepts outgoing queries and redirects them to the most cost-effective model instead of always using the same expensive one [1][2].
Who it’s for: Developers and AI-heavy teams running Claude Code at volume who are watching their Anthropic bill climb every month. Also useful for teams wanting observability into per-message AI costs [2].
Cost savings claim: Up to 70% reduction in LLM token costs by routing simple tasks to cheaper models instead of firing Opus/Sonnet at every request [1][2].
Key strength: Fully local option — in local mode, all routing, scoring, and telemetry stay on your machine. The cloud version is a blind proxy that physically cannot read your prompts [1].
Key weakness: Deeply Claude Code-specific. This is not a general-purpose LLM proxy you can drop in front of any AI app. It works as an OpenClaw plugin, which limits who can actually use it [1]. Independent third-party reviews of the tool don’t exist yet — it’s an early-stage project at ~4K GitHub stars.

What is Manifest

Manifest is an open-source LLM routing layer that sits between Claude Code (referred to in their codebase as “OpenClaw”) and the AI providers you’re already using. Instead of every query going to the same model regardless of complexity, Manifest intercepts the request, scores it across 23 dimensions in under 2ms, and routes it to the cheapest model capable of handling it [1][2].

The core insight is obvious once you say it out loud: you don’t need a frontier model to answer “what’s the current date” or summarize a short README. But by default, Claude Code sends all traffic to the same model endpoint. Manifest breaks that assumption [2].

The project sits at 3,974 GitHub stars as of this review and is MIT-licensed — meaning you can self-host, fork, and modify freely. It’s built by a two-person team: Bruno Perez and Sébastien Conejo, based on the about page [3]. There’s no YC backing, no enterprise sales team, and no big company behind it. That’s both a feature (genuinely scrappy open-source project) and a risk (two people with no announced funding).

The tool ships in two modes: cloud (quick install, dashboard accessible from any device, telemetry hits their servers — though message content is never collected) and local (telemetry stays entirely on your machine, dashboard at http://127.0.0.1:2099, works with local models like Ollama) [1].

Why People Choose It

Because no independent third-party reviews of this tool exist yet, this section is based entirely on the project’s own documentation and website claims. That’s worth saying plainly.

The pitch targets a real pain point: Claude Code and AI agent workflows at scale cost real money, and most of that cost comes from over-provisioning — routing every request through the highest-tier model when the task doesn’t justify it [1][2]. The alternatives are either general-purpose LLM proxies (LiteLLM, OpenRouter) that aren’t tightly integrated with Claude Code’s plugin system, or just manually switching models yourself.

The differentiation Manifest claims over OpenRouter specifically is architectural: your prompts don’t leave your machine in local mode. OpenRouter acts as a middleman that sees your content; Manifest’s local mode never sends prompt content anywhere [1]. For teams working with sensitive codebases, internal documentation, or client data flowing through AI agents, this distinction matters.

The README’s competitive table comparing Manifest to OpenRouter is unfortunately truncated in the source data, so that comparison is incomplete [1].

Features

Based on the README and website:

Core routing:

23-dimension query scoring algorithm that runs in under 2ms locally [1]
Automatic model fallbacks — if the selected model fails, retries with backup models instantly [1]
Routes to “the most suitable model” — the specific provider logic isn’t spelled out in available docs

Observability:

Real-time dashboard showing tokens, costs, messages, and model usage [1][2]
Per-message cost analysis — you can see what each individual query cost [2]
Usage alerts and limits — get notified when you cross a budget threshold [1][2]
OTLP-native: uses OpenTelemetry standard for traces, metrics, and logs, meaning you can export telemetry to your existing observability stack [1]

Privacy model:

Local mode: all agent messages, token counts, costs, and telemetry stored locally. Nothing external [1]
Cloud mode: only OpenTelemetry metadata (model used, token count, latency) is sent — message content is never collected, and the proxy is architected so it physically cannot read prompts [1]

Deployment:

Installs as a native Claude Code plugin (one command) [1]
Cloud version: API key from app.manifest.build, three-line install [1]
Local version: fully offline, Tailscale-compatible for multi-device access within your network [1]
No coding required for setup [2]

What’s not there:

Support for AI frameworks outside the Claude Code plugin ecosystem — if you’re running LangChain, LlamaIndex, or a custom agent loop that isn’t Claude Code, this tool doesn’t help you
Detailed documentation on which models are available in the routing pool (docs page was partially inaccessible during scraping) [4]
Any mention of custom routing rules or overrides

Pricing: SaaS vs Self-Hosted Math

Manifest’s own pricing page wasn’t accessible during research, so specifics are limited to what the README documents.

Cloud version: requires an API key from app.manifest.build. Pricing not publicly disclosed in available sources. Sign-up is free; whether the product has a paid tier or is entirely free isn’t clear from the data available [1].

Local version: free. No account required. You run the dashboard at http://127.0.0.1:2099 on your own machine. The only cost is compute, and since it’s scoring queries in under 2ms with a local algorithm, the overhead is negligible [1].

The actual savings math: Manifest claims up to 70% cost reduction. At face value, here’s how that math could work: if you’re running Claude Code all day and 70% of your queries are simple enough for a cheap model (Haiku vs Sonnet 3.5 is roughly 20x cheaper per token), routing intelligently could absolutely produce 50–70% savings in realistic workflows. But the actual reduction depends entirely on your specific query distribution and whether Manifest’s routing quality is good enough that you’re not getting degraded outputs on the routed-to-cheaper-model requests [1][2].

There’s no independent benchmark data available. The 70% claim is from Manifest’s own marketing.

Deployment Reality Check

The install path is genuinely simple for anyone already using Claude Code:

# Cloud version
openclaw plugins install manifest
openclaw config set plugins.entries.manifest.config.apiKey "mnfst_YOUR_KEY"
openclaw gateway restart

# Local version
openclaw plugins install manifest
openclaw config set plugins.entries.manifest.config.mode local
openclaw gateway restart

That’s it. No Docker, no VPS, no database setup [1].

What can go sideways:

Claude Code dependency is absolute. If you’re not already using Claude Code as your primary AI development workflow, Manifest is useless. It’s a plugin for a specific tool, not a standalone proxy.
Local LLM support requires separate setup. The README mentions Ollama compatibility, but Manifest doesn’t ship or configure Ollama for you [1]. You set that up yourself and point Manifest at it.
Multi-device in local mode is awkward. The README suggests using Tailscale to proxy the local dashboard across devices [1]. That works, but it’s not the same as the cloud version’s native multi-device support.
Very early stage. At ~4K stars with a two-person team and no independent reviews, you’re adopting this before the community has stress-tested it at scale. The GitHub CI badge is passing, but production stability at high agent request volumes is unknown.
No REST API for programmatic control is listed in the feature set, though the dashboard and plugin provide the primary interface [1].

Pros and Cons

Pros

MIT licensed and fully open source. Inspect the routing algorithm, fork it, self-host it. No “fair-code” restrictions [1].
Local mode is genuinely private. Not “trust us, we anonymize it” — the local mode architecture physically doesn’t send content anywhere [1]. This is a meaningful claim.
Zero-friction install if you’re already on Claude Code. Three commands and it’s running [1].
OpenTelemetry native. If you already have an observability stack (Grafana, Datadog, Honeycomb), Manifest’s telemetry slots in using OTLP — the standard format [1].
Real-time cost visibility. Per-message cost tracking in the dashboard is genuinely useful for teams that want to understand where their AI budget is going [2].
Automatic fallbacks. If a model errors out, it retries with backup models automatically — better than your agents silently failing [1].

Cons

Single-platform lock. Only works as a Claude Code plugin. If you move off Claude Code, Manifest goes with it. No general-purpose proxy capability.
No independent validation of the 70% claim. Zero third-party reviews, no published benchmarks. The cost savings figure comes entirely from Manifest’s own marketing [2].
Cloud pricing is opaque. No public pricing page accessible. You have to sign up to find out what the cloud tier costs [1].
Two-person team, no announced funding. This is a risk for anyone betting critical infrastructure on it. The GitHub project exists and is active, but long-term maintenance is uncertain.
Routing quality is a black box. The “23-dimension scoring algorithm” is referenced but not explained [1]. Whether it correctly identifies which queries can be downgraded without quality loss is the core product question — and there’s no external data to evaluate it.
Documentation gaps. The docs page (Mintlify-hosted) was partially unavailable during scraping. What’s publicly visible about the routing model, supported providers, and configuration options is thin [4].

Who Should Use This / Who Shouldn’t

Use Manifest if:

You’re a developer or team running Claude Code daily and your Anthropic token spend is already noticeable.
You process a high volume of simple, repetitive AI tasks (summarization, classification, structured extraction) that don’t need a frontier model.
You want per-message cost visibility without building your own observability layer.
Privacy is a concern — specifically, you don’t want prompt content leaving your infrastructure (use local mode).
You’re comfortable being an early adopter of a small open-source project.

Skip it if:

You’re not using Claude Code as your primary AI development environment. This tool does nothing for LangChain apps, custom Python agent loops, or any workflow outside the Claude Code plugin system.
You’re running a production system where routing quality failures (getting a cheap model response when you needed a smart one) would cause real problems — without independent benchmarks, you can’t evaluate that risk.
You need enterprise support, SLAs, or audit trails. Two founders with a Discord is what you get.
Your team is non-technical. Despite “no coding required” in the install, this is firmly a developer tool.

Alternatives Worth Considering

LiteLLM — open-source, runs as a proxy in front of any LLM provider, supports routing rules, load balancing, and fallbacks. Works with any framework, not just Claude Code. More mature, larger community, more configuration options. If you need a general-purpose LLM proxy, LiteLLM is the default answer.
OpenRouter — hosted LLM routing service, wide model catalog, usage-based pricing. The Manifest README specifically references OpenRouter as the comparison point, with Manifest’s key differentiator being that prompt content never hits a third-party server in local mode [1]. OpenRouter requires sending prompts through their infrastructure.
Portkey — observability and routing layer for LLMs, similar dashboard and cost tracking features, supports multiple frameworks. Closed-source SaaS with a free tier.
Helicone — LLM observability platform (logging, costs, analytics), less focused on routing and more on monitoring. Open-source self-hosted option exists.
DIY model tiers in your agent code — for technically capable teams, explicitly specifying cheaper models for low-complexity tasks in your agent architecture avoids the dependency on a routing plugin entirely. Less elegant, but zero additional infrastructure.

Bottom Line

Manifest solves a real problem for a specific audience: developers running Claude Code who have noticed their token costs scaling uncomfortably with usage. The local mode’s privacy architecture is genuinely thoughtful — “blind proxy by architecture, not by policy” is a meaningful distinction from services that ask you to trust their data handling claims. The install story is as frictionless as a plugin can get.

The caveats are substantial: it’s a two-person early-stage project with no independent validation of its core cost-savings claim, documentation that’s still sparse, and a dependency on staying within the Claude Code ecosystem. If the 70% savings figure holds up for your specific query distribution, the math is compelling. If the routing algorithm downgrades queries that needed a smarter model, you’ll spend that savings recovering from degraded outputs. Until independent benchmarks exist, you’re taking that on faith. Worth testing if you’re running Claude Code at volume — not worth betting a production system on without verification.

Sources

Manifest GitHub Repository — README and project documentation (MIT license, 3,974 stars). https://github.com/mnfst/manifest
Manifest Official Website — Homepage. https://manifest.build
Manifest Official Website — About page (team: Bruno Perez, Sébastien Conejo). https://manifest.build/about/
Manifest Documentation (Mintlify-hosted). https://manifest.build/docs/introduction

Note: No independent third-party reviews of Manifest (the LLM router) were available at the time of writing. All claims about cost savings, routing behavior, and product quality are sourced from the project’s own documentation and website.

Features

Integrations & APIs

Plugin / Extension System
REST API

AI & Machine Learning

AI / LLM Integration

Analytics & Reporting

Dashboard
Metrics & KPIs

Replaces

Firebase

66 tools

Related AI & Machine Learning Tools

View all 93 →

OpenClaw

320K

Personal AI assistant you run on your own devices. 25+ messaging channels, voice, cron jobs, browser control, and a skills system.

ai ml MIT

Ollama

166K

Run open-source LLMs locally — get up and running with DeepSeek, Qwen, Gemma, Llama, and more with a single command.

ai ml MIT

Open WebUI

128K

Run AI on your own terms. Connect any model, extend with code, protect what matters—without compromise.

ai assistants MIT Easy to deploy

OpenCode

124K

The open-source AI coding agent — free models included, or connect Claude, GPT, Gemini, and 75+ other providers.

ai ml MIT

Zed

77K

A high-performance code editor built from scratch in Rust by the creators of Atom — GPU-accelerated rendering, built-in AI, real-time multiplayer, and no Electron.

ai ml

OpenHands

69K

The open-source, model-agnostic platform for cloud coding agents — automate real software engineering tasks with sandboxed execution, SDK, CLI, and enterprise-grade security.

ai ml

TL;DR

What is Manifest

Why People Choose It

Features

Pricing: SaaS vs Self-Hosted Math

Deployment Reality Check

Pros and Cons

Pros

Cons

Who Should Use This / Who Shouldn’t

Alternatives Worth Considering

Bottom Line

Sources

Features

Integrations & APIs

AI & Machine Learning

Analytics & Reporting

Category

Replaces

Related AI & Machine Learning Tools

OpenClaw

Ollama

Open WebUI

OpenCode

Zed

OpenHands