Letta
Open-source platform for building AI agents with persistent memory, reasoning capabilities, and tool integration.
Open-source stateful agent infrastructure, honestly reviewed. Built on the MemGPT research. Not yet as boring as “enterprise-ready” — which is either a feature or a bug depending on your use case.
TL;DR
- What it is: Open-source (Apache-2.0) platform for building stateful AI agents — agents that persist memory across sessions, learn from interactions, and can recall context from previous conversations indefinitely [README][5].
- Who it’s for: AI developers and technically curious founders who want to build agents that feel like they “know you” rather than starting fresh every conversation. Also relevant to teams evaluating memory infrastructure for production agent systems [1][5].
- Cost savings: Letta’s server is free to self-host. Cloud AI memory alternatives like Mem0 and Zep charge on subscription or API-call volume. The main cost is the LLM API calls Letta’s memory engine makes — those you pay regardless [1][5].
- Key strength: Genuinely open Apache-2.0 license, deeply community-driven, cleanest developer API in the stateful-agent space, and the only framework where the memory architecture (OS-inspired tiered memory) was published as peer-reviewed research before it became a product [5][README].
- Key weakness: Still maturing. The Medium reviewer who benchmarked it in mid-2025 was direct: “Letta isn’t quite there for mission-critical applications” [1]. Its memory decisions are driven by the LLM itself, which inherits all the opacity and inconsistency that implies [5].
What is Letta
Letta started as MemGPT — a research project that treated LLM context management like an operating system manages RAM. The idea: give an agent a “core memory” (always in context), a “recall storage” (searchable conversation history), and an “archival storage” (compressed long-term memory), and let the LLM itself decide what to page in and out. The paper got traction, the GitHub repo hit tens of thousands of stars, and the team spun it into a company — renaming the project Letta and rebuilding it as a full agent platform [5][README].
What ships today has three layers: Letta Code (a CLI agent you run locally that takes actions on your computer), the Letta API (a REST server plus Python and TypeScript SDKs for embedding agents in your own applications), and a self-hosted server you can run via Docker [README][4].
The core premise is memory persistence. Most LLM-based agents are stateless — every conversation starts from scratch. Letta agents maintain memory blocks that survive across sessions, get updated as new information comes in, and can be explicitly viewed and edited by the user. The product description calls this “persistent agents instead of stateless sessions” [website].
The GitHub repo sits at 21,636 stars as of this review, the license is Apache-2.0, and the project is backed by over 100 contributors [README][5].
Why people choose it
The comparison that matters most is Letta vs. Mem0 vs. Zep, since these are the three systems developers actually evaluate when they need persistent agent memory [1][5].
Versus Mem0. Mem0 is the production-ready option — YC-backed, $24M funded, cloud-first, and currently the most polished memory API if you just want it to work without running your own infrastructure [5]. The trade-off: Mem0 is effectively closed-source in its cloud form, and self-hosting is possible but not the primary experience. Calvin Ku, who tested all three, framed it plainly: Mem0 is for teams that need memory to work reliably in production today. Letta is for developers who believe the open, research-driven approach will pay off as underlying LLMs get better [1]. If you need to ship next week, Mem0 wins. If you’re betting on the next 12 months, Letta is the more interesting bet.
Versus Zep. Zep uses a temporal knowledge graph rather than LLM-driven memory tiers. That makes it more predictable and less dependent on a smart LLM to make good memory decisions. Zep’s Apache-2.0 + commercial licensing is similar to Letta’s Apache-2.0, but the self-hosted Community Edition has feature limits. For developers who want graph-structured memory with temporal entity relationships, Zep pulls ahead. For developers who want an agent framework where memory is a first-class agent capability rather than a separate layer, Letta’s model fits better [5].
The benchmark picture. The most credible published benchmark is LoCoMo (Long Conversation Memory), which tests systems across 81 question-answer pairs in multi-session conversations. Letta (MemGPT) scores approximately 83.2%, which puts it in the production-tier cluster alongside Zep (~85%) and behind the research-only systems that require heavy cloud compute [5]. That’s a respectable score, but it carries a caveat: Letta’s memory decisions are made by the LLM itself, which means the score varies depending on which model you use. The benchmark number reflects a specific model configuration — not a fixed property of the framework [5][1].
The open-source case. Where Letta differentiates itself clearly is license and community. The Apache-2.0 license means you can embed it in a commercial product, fork it, or white-label it without a commercial agreement. The Discord is active — reviewers specifically mention getting fast responses from core contributors [1]. That combination of liberal licensing and community responsiveness is rare in the AI memory space.
Features
Based on the README, docs, and third-party coverage:
Memory architecture:
- Core memory (always in LLM context) — structured blocks for persona, human, and custom labels [README][4]
- Recall storage — searchable conversation history across sessions [5]
- Archival storage — compressed long-term memory with vector search [4][5]
- Memory palace — visual interface to inspect and edit what the agent actually “knows” [website]
- LLM-driven memory management — the agent itself decides when to write, update, or archive memories [5]
Letta Code (the CLI agent):
- Terminal-first:
npm install -g @letta-ai/letta-code, thenletta[README] - Takes actions on your local computer — file operations, code execution, web search [README][website]
- Skills and subagents — pre-built modules for advanced memory and continual learning [README]
- Background memory subagents that improve prompts and context over time [website]
- Port memory and conversation history between models and devices [website]
Letta API:
- Full REST API with Python and TypeScript/Node.js SDKs [README]
- Create named agents with custom memory blocks via code [README]
- Streaming responses and tool calling (web_search, fetch_webpage built-in) [README]
- Multi-model: OpenAI, Anthropic, Ollama, and others configurable via environment variables [4]
- E2B tool sandboxing for custom tools [4]
Server and infrastructure:
- Docker deployment with bundled PostgreSQL and pgvector [4]
- Password-protected server via bearer token [4]
- External Postgres support via
LETTA_PG_URI[4] - Connect to local LLMs via Ollama by setting
OLLAMA_BASE_URL[4] - MCP tools supported — executed outside the server sandbox [4]
Pricing: SaaS vs self-hosted math
Letta server (self-hosted): Free. Apache-2.0. You run the Docker container, you own the data [README][5].
Letta Cloud (app.letta.com): The README references API keys from app.letta.com, and as of this review, specific cloud pricing tiers are not publicly documented in detail. The Medium reviewer from May 2025 noted the SaaS offering was “still under development” [1]. If you need cloud pricing figures, check the current pricing page directly — the situation was in flux at time of writing.
LLM costs — the hidden line item: Unlike workflow automation tools, Letta’s memory management is LLM-driven. Every memory read, write, or update triggers LLM calls. The README recommends Claude Opus 4.5 or GPT-5.2 for best performance [README]. Those aren’t cheap tokens. At high agent volume, the LLM bill for memory operations can dwarf the infrastructure cost. This is the number that doesn’t appear in the “self-hosted is free” pitch but matters enormously in practice.
Competitor reference (Mem0): Mem0 charges on a usage basis via its cloud API. Exact pricing varies and requires checking their current plan page, but the structure is API-call-based — you pay per memory operation, which compounds at scale [5].
Self-host math: A Letta server on a $6 Hetzner VPS handles the infrastructure cost. The real ongoing cost is: LLM API fees × number of memory operations × number of agents. For a personal assistant or small-scale deployment, this is negligible. For a 1,000-user production system where each agent is actively learning, model the LLM costs before committing.
Deployment reality check
Docker is the documented path, and it’s genuinely simple [4]:
docker run \
-v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
-p 8283:8283 \
-e OPENAI_API_KEY="your_key" \
letta/letta:latest
That single command starts the server with bundled PostgreSQL and exposes the API on port 8283 [4]. Add Anthropic or Ollama keys via environment variables for multi-provider support [4].
What you need:
- A Linux VPS or local machine with Docker
- An LLM API key (OpenAI, Anthropic, or a local Ollama instance)
- For production: a domain, reverse proxy (Caddy/nginx), and ideally an external Postgres instance
- For vector search (archival memory): an embedding model configured at agent creation time — this must be explicit in Docker mode, unlike the managed cloud [4]
Password protection is available but opt-in — you pass SECURE=true and set a server password [4]. The default install has no auth, which means you should not expose port 8283 publicly without either the password flag or firewall rules.
What can go sideways:
- Embedding models are not configured by default in the Docker path — you must specify
embeddingwhen creating each agent, or archival memory search won’t work [4] - The LLM-driven memory approach means agent behavior varies noticeably across models. The same Letta agent running on GPT-4o-mini versus Claude Opus 4.5 will make different memory decisions — not a bug, but worth accounting for in any evaluation [1][5]
- The Medium reviewer noted a concern that resources are “spread a bit thin” — the team has been building the Desktop app, broad model support, and Letta Code simultaneously, which may slow depth of improvement in core memory robustness for complex scenarios [1]
- The SaaS cloud offering was still maturing as of mid-2025. If you need a managed service rather than self-hosting, verify current availability and SLAs before building on it [1]
Realistic setup time for a developer: 15–30 minutes for a working local Docker instance. For a production server with domain, HTTPS, and external Postgres: 2–4 hours.
Pros and Cons
Pros
- Apache-2.0 licensed. Genuinely open — embed in commercial products, fork, resell [5][README]. No “Fair-code” or commercial licensing surprises.
- Research-backed memory architecture. The OS-inspired tiered memory (core, recall, archival) came from peer-reviewed work, not marketing copy. The design choices have reasons [5].
- Cleanest developer API in the stateful-agent category. Creating a named agent with typed memory blocks via Python or TypeScript SDK is straightforward. The code examples in the README actually run [README].
- Model-agnostic. OpenAI, Anthropic, Ollama — swap via environment variable [4]. Memory persists across model changes, so you can migrate agents between providers without losing history [website].
- Active, responsive community. The Discord has core contributors answering questions directly. The Medium reviewer got GitHub engagement from the team hours after publishing [1].
- Local LLM support. You can run the entire stack — agent, server, memory, LLM inference — on your own hardware via Ollama [4]. True data sovereignty.
- Letta Code is a real product. The CLI agent with skills, subagents, and memory is functional today, not vaporware [README][website].
Cons
- Not production-ready for mission-critical use. The May 2025 Medium benchmark was direct about this: “Letta isn’t quite there” for serious production workloads. Things may have improved — but evaluate carefully rather than assuming [1].
- LLM-driven memory = opacity. The agent decides what to remember. This inherits all the inconsistency and unpredictability of LLM reasoning. You can’t audit a rule; you can only observe behavior [5]. If you need deterministic memory management, Zep’s knowledge graph approach is more predictable.
- Hidden LLM cost. Memory operations burn tokens. At scale, the cost of memory management can be significant — and it’s not surfaced in the “free to self-host” pitch [README][1].
- SaaS offering was still maturing as of this review. If you need a managed cloud option today, verify current availability rather than assuming [1].
- Benchmark score is model-dependent. The 83.2% LoCoMo score is not a fixed property — it reflects performance with a specific model. Weaker models will score worse [5][1].
- Embedding configuration is not automatic in Docker. You must specify the embedding model at agent creation time or archival memory won’t work correctly [4].
- No multi-user auth out of the box. The server has optional password protection, but no user management or RBAC for team deployments [4].
Who should use this / who shouldn’t
Use Letta if:
- You’re building an AI assistant or agent that needs to remember context across weeks or months — customer data, personal preferences, ongoing project state.
- You want Apache-2.0 licensing you can embed in a commercial product without legal negotiation.
- You’re comfortable running Docker and willing to manage LLM API costs as a variable expense.
- You want to experiment with the memory-as-OS-tier architecture and are willing to accept some rough edges for that depth.
- You’re building with local models (Ollama) and need memory that runs entirely on your own hardware.
Skip it (use Mem0 instead) if:
- You need memory infrastructure that’s production-stable today, with SLAs and managed uptime.
- You want a cloud API you can call without running any infrastructure yourself.
- You’re shipping a customer-facing feature on a deadline measured in weeks.
Skip it (use Zep instead) if:
- You need deterministic, graph-structured temporal memory rather than LLM-driven memory decisions.
- You’re building complex agent workflows with entity relationships that need to be queried programmatically.
- You want self-hosted memory with a more mature enterprise offering.
Skip it (build your own) if:
- Your memory requirements are simple — a database with conversation history and a vector search layer will do 80% of what Letta does at lower complexity and no additional dependency.
- You don’t want your memory management logic inside a third-party framework.
Alternatives worth considering
- Mem0 — YC-backed, cloud-first, most production-ready in the category, open-core with cloud API [1][5]. Best if you need it to just work.
- Zep — Apache-2.0 + commercial, knowledge graph-based temporal memory, stronger for complex entity-relationship tracking [5]. More predictable than Letta’s LLM-driven approach.
- Supermemory — MIT licensed, cloud-hosted, designed for personal knowledge management with multi-source ingestion [5]. Less suited for agent memory per se.
- SuperLocalMemory — local-first, MIT, mathematical retrieval (no cloud LLM required), EU AI Act compliant by design [5]. Niche but relevant if data locality is a hard requirement.
- LangChain Memory — if you’re already in the LangChain ecosystem, their memory modules handle simple persistence without a separate service, at the cost of less sophistication.
- Building directly on PostgreSQL + pgvector — for teams that want full control, a vector-enabled Postgres instance with a conversation table covers most use cases without a framework dependency.
Bottom line
Letta is the most honest attempt in the open-source space to solve the actual hard problem: making AI agents that accumulate knowledge rather than amnesia. The OS-inspired memory architecture is principled, the Apache-2.0 license is clean, and the community around the project is genuinely active. The honest caveat from independent evaluators is that it’s still maturing — LLM-driven memory management is powerful in theory and inconsistent in practice, and the production story lags behind Mem0. For a developer building a personal assistant, an internal knowledge agent, or exploring what stateful AI looks like when you own the infrastructure, Letta is the right place to start. For a team shipping a customer-facing product with uptime requirements next quarter, verify the current production readiness before committing.
If self-hosting Letta (or any agent infrastructure) is the blocker, that’s exactly what upready.dev handles for technical founders — one-time deployment, you own the stack.
Sources
- Calvin Ku, Medium / Asymptotic Spaghetti Integration — “From Beta to Battle‑Tested: Picking Between Letta, Mem0 & Zep for AI Memory” (May 5, 2025). https://medium.com/asymptotic-spaghetti-integration/from-beta-to-battle-tested-picking-between-letta-mem0-zep-for-ai-memory-6850ca8703d1
- Railway — “Deploy and Host Letta AI Service with one click on Railway”. https://railway.com/deploy/letta
- Reddit r/LLMDevs — “LLM memory locally hosted options”. https://www.reddit.com/r/LLMDevs/comments/1olu9sp/llm_memory_locally_hosted_options/
- Letta Documentation — “Deploy a Letta server with Docker”. https://docs.letta.com/guides/docker/
- Varun Pratap Bhardwaj, DEV Community — “5 AI Agent Memory Systems Compared: Mem0, Zep, Letta, Supermemory, SuperLocalMemory (2026 Benchmark Data)”. https://dev.to/varun_pratapbhardwaj_b13/5-ai-agent-memory-systems-compared-mem0-zep-letta-supermemory-superlocalmemory-2026-benchmark-59p3
Primary sources:
- GitHub repository and README: https://github.com/letta-ai/letta (21,636 stars, Apache-2.0 license, 100+ contributors)
- Official website: https://letta.com
- Letta Cloud: https://app.letta.com
- Documentation: https://docs.letta.com
Category
Related AI & Machine Learning Tools
View all 93 →OpenClaw
320KPersonal AI assistant you run on your own devices. 25+ messaging channels, voice, cron jobs, browser control, and a skills system.
Ollama
166KRun open-source LLMs locally — get up and running with DeepSeek, Qwen, Gemma, Llama, and more with a single command.
Open WebUI
128KRun AI on your own terms. Connect any model, extend with code, protect what matters—without compromise.
OpenCode
124KThe open-source AI coding agent — free models included, or connect Claude, GPT, Gemini, and 75+ other providers.
Zed
77KA high-performance code editor built from scratch in Rust by the creators of Atom — GPU-accelerated rendering, built-in AI, real-time multiplayer, and no Electron.
OpenHands
69KThe open-source, model-agnostic platform for cloud coding agents — automate real software engineering tasks with sandboxed execution, SDK, CLI, and enterprise-grade security.