unsubbed.co

Mem0

Memory layer for AI applications that learns from user interactions, reduces token costs, and delivers personalized experiences across sessions.

Persistent AI memory, honestly reviewed. What you actually get when you add Mem0 to your stack — and when it’s not worth it.


TL;DR

  • What it is: Open-source (Apache-2.0) persistent memory layer for AI agents and LLM applications — think of it as a long-term memory system you bolt onto ChatGPT, Claude, or any other model [1].
  • Who it’s for: Developers building AI assistants, customer support bots, or any LLM application where “the AI keeps forgetting everything the user told it last week” is a real problem [2].
  • Cost savings: Full-context prompting is expensive. Mem0 claims 90% lower token usage vs. sending full conversation history every time — which directly translates to API bill reductions at scale [1].
  • Key strength: 26% accuracy improvement over OpenAI’s native memory on the LOCOMO benchmark. 91% faster p95 latency than full-context approaches. Exclusive memory provider for the AWS Agent SDK [1].
  • Key weakness: Self-hosted version is more limited than the managed platform — batch operations, the dashboard, and some SDK parity issues (e.g., OSS JavaScript SDK missing update/delete helpers). Pricing for the managed platform isn’t transparent until you’re deep in the funnel [3][4].

What is Mem0

Mem0 (pronounced “mem-zero”) is a memory layer that sits between your AI application and your users. Instead of either losing all context between sessions or cramming your entire conversation history into every prompt — both of which are terrible solutions — Mem0 extracts, compresses, and retrieves the relevant bits when needed [2].

The core idea: your AI had a conversation with a user last Tuesday. The user mentioned they’re vegetarian and avoid dairy. Today they ask “what should I make for dinner?” Without memory, your AI starts from scratch. With Mem0, it surfaces that dietary preference and answers accordingly — without you having to resend 40,000 tokens of chat history to get there [1][2].

The company behind it is YC-backed (S24 batch), raised $24M in total ($3.9M seed, $20M Series A led by Basis Set Ventures), and has grown API call volume from 35M in Q1 2025 to 186M by Q3 2025 — roughly 30% month-over-month growth [1]. The GitHub repo sits at 50,210 stars. That’s not a hobbyist project; that’s a startup with real traction in a real problem space.

The technical architecture is a hybrid storage system: vector databases for semantic memory retrieval (finding relevant past memories by meaning, not exact text match) and graph databases for tracking relationships between entities and memories. The LLM itself handles extracting what’s worth remembering from conversations [2].

It’s model-agnostic: works with OpenAI, Anthropic, or any open-source LLM. The default is OpenAI’s gpt-4.1-nano-2025-04-14 for extraction, but you can swap it [README].


Why people choose it

The comparison set here isn’t Zapier or n8n. It’s “build your own memory system” versus “use Mem0.” Most teams that end up on Mem0 went through one of three failure modes first:

The full-context approach. Send the entire conversation history with every request. Works fine for short conversations, becomes ruinously expensive and slow at scale. At 100K tokens per request, GPT-4 costs add up fast. Mem0’s own benchmarks show 90% lower token usage and 91% lower p95 latency than this approach [1]. Those numbers come from their own research paper, so read them with appropriate skepticism — but even at half those numbers, it’s a meaningful improvement.

The naive RAG approach. Chunk and embed everything, retrieve by similarity. Works better than full-context but loses structured facts. “The user is vegetarian” gets diluted across 30 conversation chunks and retrieved inconsistently. Mem0’s extraction step turns raw conversation into structured memories, which it claims produces a 26% accuracy improvement over OpenAI’s built-in memory on the LOCOMO benchmark [1].

Roll-your-own. Write custom extraction prompts, manage your own vector store, build your own retrieval logic. This is what most serious teams default to. It’s not bad — but it takes 2-4 weeks to build something robust, and you’re constantly maintaining it. Mem0’s value proposition here is “we already solved these problems, and we have a research paper to show the solution works” [1].

The AWS Agent SDK endorsement is the most significant external validation. Being selected as the exclusive memory provider for AWS’s agent framework means someone at Amazon’s AI team looked at the alternatives and chose this one. That’s not a marketing claim — that’s a procurement decision [1].


Features

Memory operations:

  • Add: Extract memories from raw conversation text. One API call, Mem0 figures out what’s worth storing [2].
  • Search: Semantic retrieval of relevant memories given a new user query [2].
  • Update: Fix or enrich existing memories without deleting them. Batch updates support up to 1,000 memories in one request [3].
  • Delete: Single memory, batch by ID list, or filter-based deletion (by user, agent, session). GDPR/CCPA-compliant erasure supported [4].

Memory scopes:

  • User memory: Preferences and facts that persist across sessions
  • Session memory: Context within a single conversation
  • Agent memory: What the AI model itself has “learned”
  • Organization memory: Shared knowledge across an entire company [1]

Infrastructure:

  • Hybrid vector + graph storage (semantic search + relationship tracking) [2]
  • Supports OpenAI, Anthropic, and open-source LLMs for extraction
  • Built-in observability: TTL, size, and access tracking per memory [README]
  • MCP support: AI agents can update and delete their own memories [3][4]
  • Framework integrations: LangGraph, CrewAI, OpenAI SDK [README]

SDKs:

  • Python: pip install mem0ai — full-featured, including update/delete
  • JavaScript: npm install mem0ai — managed platform fully supported; OSS version missing some helpers (update, delete) [3][4]
  • REST API available for both

Benchmarks from their research paper:

  • +26% accuracy vs. OpenAI Memory on LOCOMO benchmark
  • 91% faster responses vs. full-context
  • 90% lower token usage vs. full-context [1]

Pricing: SaaS vs self-hosted math

The managed platform has four tiers (Hobby, Starter, Pro, Enterprise) but the specific prices are not published on the homepage — you have to sign up to see them [website]. This is a common pattern for developer infrastructure tools and it’s mildly annoying; pricing opacity is a friction point.

What is clear:

  • Hobby tier exists (likely free or very low cost, based on naming convention)
  • Enterprise is custom pricing, contact sales
  • The free tier exists: you can sign up at app.mem0.ai and get started without a credit card [README]

Self-hosted:

  • Software is Apache-2.0: free to use, fork, embed, and modify [README]
  • You need to run your own vector store (Qdrant, Pinecone, Chroma, pgvector, or others from their supported list)
  • You need your own LLM API key for the extraction step — this is a non-zero cost
  • No separate cost for the Mem0 library itself

The real math: The cost savings case isn’t about Mem0’s SaaS pricing — it’s about what you stop spending on LLM tokens. If you’re running a customer support bot that handles 10,000 conversations per month, each with 20-turn history, and you’re currently sending full context every time:

  • Full context at 10K tokens/request × 10K conversations × 20 turns = 2 billion tokens/month on GPT-4o at $2.50/1M = **$5,000/month** in prompt tokens alone
  • With 90% reduction: ~$500/month — a $4,500/month saving

That math is why companies integrate this. Mem0’s SaaS fee becomes noise against that background [1].


Deployment reality check

Managed platform (simplest path):

  • Sign up at app.mem0.ai, get an API key, add from mem0 import MemoryClient, done
  • No infrastructure to manage
  • Dashboard for inspecting, updating, and debugging memories
  • Setup time: under an hour for basic integration [README]

Self-hosted:

  • Install via pip install mem0ai — the library itself is lightweight
  • You configure the vector store separately (Qdrant is a common choice)
  • You need an LLM API key for extraction — Mem0 doesn’t ship a local LLM, you configure one
  • No dashboard out of the box — you’re working with logs and your own tooling
  • Batch operations (batch_update, batch_delete) are managed platform features; on OSS you script your own loops [3][4]
  • JavaScript OSS SDK is missing update/delete helpers as of this writing — Python SDK is more complete [3][4]

Things that can bite you:

  • The “self-hosted” label covers the library, but you still depend on LLM API calls for extraction unless you run a local model. Full data sovereignty requires Ollama or similar plus a self-hosted vector store — more moving parts than the headline suggests.
  • Memory quality depends on extraction quality, which depends on the LLM you choose. Cheaper models produce noisier memories.
  • No built-in deduplication UI on OSS — if you’re storing contradictory memories (“user likes coffee” and “user quit coffee”), handling conflicts is your problem.

Realistic time to working integration:

  • Managed platform: 1-3 hours for a developer who’s done LLM development before
  • Self-hosted with Qdrant + Ollama: half a day to full day including infrastructure setup

Pros and Cons

Pros

  • Apache-2.0 license. Genuinely permissive. Use it in commercial products, embed it in your SaaS, no licensing conversations required [README].
  • Strong benchmark numbers. +26% over OpenAI Memory, 91% latency reduction, 90% token savings — published research, not just marketing claims [1].
  • Real adoption. 50K+ GitHub stars, 13M+ Python downloads, 186M API calls/month, 100K+ developers [1][README].
  • AWS Agent SDK integration. Independent third-party validation that this is production-grade [1].
  • Multi-level memory scopes. User, Session, Agent, Organization — covers the common architectures [1][2].
  • Model-agnostic. Not locked to OpenAI. Works with Anthropic, open-source models, anything with a completion API [2].
  • GDPR/CCPA delete compliance. Proper erasure support, filter-based deletion, wildcard purge [4].
  • MCP support. Agents can self-manage their memories — update preferences, delete stale facts [3][4].

Cons

  • Managed platform pricing is opaque. You have to sign up to see the actual numbers — friction for budget-conscious founders evaluating options [website].
  • OSS JavaScript SDK is incomplete. Missing update and delete helpers — Python-first project [3][4].
  • Dashboard is managed-only. Self-hosting means no visual memory inspection. You debug with logs [3].
  • Batch operations are managed-only. Bulk updates and deletes require the platform API; OSS users write their own loops [3][4].
  • Still depends on LLM API for extraction. Self-hosted doesn’t mean air-gapped unless you also run a local LLM. Adds complexity and a dependency on external services or separate infrastructure [1].
  • Memory quality depends on extraction model choice. Cheaper models produce messier memories. The benchmark numbers assume a capable extraction LLM [1][README].
  • Not a drop-in for ChatGPT memory. Mem0 solves the developer infrastructure problem, not the end-user product problem. Your AI doesn’t automatically remember things — you have to wire up the add/search calls in your application code [README].

Who should use this / who shouldn’t

Use Mem0 if:

  • You’re building an AI assistant, customer support bot, or any LLM application where users return across multiple sessions and you want them to not repeat themselves every time.
  • You’re paying significant LLM API bills driven by large context windows and want to cut them.
  • You want a production-tested solution rather than building memory extraction logic yourself.
  • You need Apache-2.0 licensing for commercial embedding.
  • You’re already on Python — the OSS SDK is mature there.

Skip it if:

  • You’re building a single-session chatbot or tool where conversation history doesn’t matter after the session ends. Plain context window management is simpler.
  • You need full data sovereignty with zero LLM API calls — that requires additional infrastructure (local LLM) that Mem0 doesn’t provide.
  • Your primary language is JavaScript and you need self-hosted — the OSS JS SDK gaps will slow you down [3][4].
  • You want a user-facing “AI that remembers” without developer integration work — Mem0 is a developer tool, not an end-user product.

Alternatives worth considering

  • OpenAI Memory (built-in). If you’re building on GPT-4o exclusively, OpenAI has native memory. Mem0 benchmarks 26% better accuracy on LOCOMO [1], but if you’re already all-in on OpenAI’s platform, the native option has zero integration friction.
  • LangChain Memory modules. If you’re already in the LangChain ecosystem, their memory abstractions cover basic use cases. Less optimized than Mem0, but one less dependency.
  • Supermemory. Developer-focused memory API with container tagging and metadata filtering for precise retrieval [5]. Comparable positioning to Mem0’s managed platform, smaller community.
  • Zep. Another memory layer focused on enterprise use cases, structured fact extraction, and dialog history management. More opinionated about conversation structure.
  • Roll-your-own with Qdrant + custom extraction. If your memory use case is narrow and well-defined, a purpose-built solution may be simpler to operate than a general-purpose framework. Budget 2-4 weeks of engineering time.
  • Pinecone + LangChain. The established production RAG stack. More general-purpose, not memory-specific — you’d build the extraction layer yourself.

Bottom line

Mem0 has the strongest evidence package in the AI memory layer space: published research, 50K+ GitHub stars, 186M monthly API calls, and independent validation via the AWS Agent SDK selection. The Apache-2.0 license and straightforward Python integration make the managed platform a reasonable first choice before you’ve validated whether memory even improves your product. The self-hosted path is real but comes with gaps — no dashboard, incomplete JS SDK, batch operations missing — so weight those against your infrastructure preferences. For a non-technical founder, the managed platform is the pragmatic choice; for an engineering team with strong infra preferences and an existing vector store, the OSS library is solid. The core value proposition is real: if you’re burning LLM budget on full-context prompting or your users are frustrated repeating themselves, Mem0 solves a genuine problem with production-tested code.

If integrating Mem0 into your stack sounds right but the engineering setup is the blocker, that’s the kind of one-time implementation upready.dev handles for founders who’d rather ship than configure.


Sources

  1. TechCrunch“Mem0 raises $24M from YC, Peak XV and Basis Set to build the memory layer for AI apps” (Oct 28, 2025). https://techcrunch.com/2025/10/28/mem0-raises-24m-from-yc-peak-xv-and-basis-set-to-build-the-memory-layer-for-ai-apps/
  2. MOGE“Mem0: A self-improving memory layer for AI applications”. https://moge.ai/product/mem0
  3. Mem0 Docs“Update Memory — mem0.ai”. https://docs.mem0.ai/core-concepts/memory-operations/update
  4. Mem0 Docs“Delete Memory — mem0.ai”. https://docs.mem0.ai/core-concepts/memory-operations/delete
  5. Supermemory Docs“Organizing & Filtering Memories — supermemory.ai”. https://supermemory.ai/docs/search/filtering

Primary sources:

Features

Integrations & APIs

  • Plugin / Extension System