Logfire
Logfire is a self-hosted apm & observability tool with support for AI, openai, observability.
Honestly reviewed. Not a self-hosted backend — but a genuinely cheap way to watch your LLM app fall apart in production.
TL;DR
- What it is: AI observability SaaS from the Pydantic team — traces, metrics, and logs for LLM apps, AI agents, and the application code underneath them, built on OpenTelemetry [4].
- Who it’s for: Python developers and AI founders building production LLM applications who need to see why their agent broke — not just that it broke [4][5].
- The license trap: The SDK is MIT-licensed (4,110 GitHub stars). The server that receives and displays your data is closed source and cloud-only. You cannot self-host Logfire the same way you can self-host Activepieces or n8n.
- Cost savings (vs alternatives): LangSmith at 50M spans/month for 5 users runs ~$5,170/mo. Logfire for the same workload: ~$129/mo. That’s roughly 40x cheaper [4]. Against Arize AX Pro: ~8x cheaper [5].
- Free tier: 10 million spans/month for one user — no credit card required [4].
- Key strength: Full-stack traces that show your LLM call and the database query that preceded it and the API timeout that followed it — in a single unified view [4][5].
- Key weakness: The backend is not open source. You’re trading one SaaS bill for a smaller one, not escaping SaaS entirely.
What is Logfire
Logfire is an observability platform built by the team behind Pydantic, the Python data validation library that runs inside nearly every Python AI application ever shipped. The pitch is simple: if you already use Pydantic to validate your data structures, you can use Logfire to watch them move through production [README].
Three lines of Python is the standard install story:
import logfire
logfire.configure()
logfire.instrument_fastapi(app)
That gets you distributed traces, span timing, error events, and a live dashboard at pydantic.dev/logfire. Everything goes to Logfire’s cloud over OpenTelemetry’s wire format — which is the part worth understanding before you assume “MIT license” means “self-hostable” [README].
The open/closed boundary: The GitHub repository at pydantic/logfire contains the Python SDK, documentation, and testing utilities. The README states plainly: “the server application for recording and displaying data is closed source.” There is also a TypeScript SDK (pydantic/logfire-js) and a Rust SDK. All the SDKs are open source. The backend that stores and queries your spans is not. Logfire is, in practice, a managed SaaS with an open-source client library [README].
This matters for the unsubbed.co audience. If you’re looking to escape a $300/mo LangSmith bill, Logfire delivers. If you want to own the entire stack and run it on a $10 VPS, look at Grafana + Tempo or SigNoz instead.
Why people choose it
The comparison pages Logfire publishes against LangSmith and Arize AX are self-serving by nature, but the underlying numbers are verifiable and the structural arguments hold up.
The core complaint about LangSmith is the same one that drives people off Zapier: usage-based pricing that compounds fast. LangSmith’s free tier covers 5,000 traces/month. One reasonably active AI app blows past that in a day. Once you’re on their paid plan at $39/seat plus $2.50/1,000 trace overage, a team of 5 developers with 50M spans/month is looking at ~$5,170/mo [4]. Logfire for the same workload: ~$129/mo. That’s not a marginal improvement — it’s a different category of cost [4].
The complaint about Arize AX is the dual-axis billing: $10/million spans plus $3/GB of payload. For RAG architectures that attach retrieval context to every trace, the payload charge adds up fast in a way that’s hard to predict in advance [5]. Logfire charges $2/million spans flat, no payload surcharge [5].
The structural advantage that makes these savings real is full-stack observability. Most AI observability tools trace LLM calls — the request, the response, the token count. Logfire traces the LLM call inside the broader application trace that includes the database query, the HTTP call to a third-party API, the vector search, and the Pydantic validation that ran before any of it. When something breaks in production, “the LLM responded slowly” is rarely the root cause. Usually it’s a 400ms PostgreSQL query or a flaky upstream API. Logfire shows you all of it in one trace [4][5].
The Pydantic integration is the differentiator that doesn’t exist anywhere else. If you’re using Pydantic models to validate your LLM outputs — and most Python AI apps do — Logfire gives you built-in analytics on validation failures: how often models produce data that fails your schema, which fields fail most, when failure rates spike. No other tool has this because no other tool is built by the same team [README][homepage].
Features
Core observability:
- Distributed tracing with OpenTelemetry — full traces across services, languages, and frameworks [4][5]
- Logs and metrics in the same platform — all three OTel signal types [README]
- Live view showing spans arriving in real-time, filterable by level, service, scope, and tags [1]
- SQL query interface (PostgreSQL-compatible) — you write
WHEREclauses against your spans, not a proprietary filter DSL [4] - AI-powered query helper: describe what you want in English, get a SQL query generated by Pydantic AI [1]
- Span search with autocomplete schema hints and a reference panel of pre-built query clauses [1]
- Timeline histogram showing span counts over time [1]
AI-specific:
- Native instrumentation for Pydantic AI, OpenAI, Anthropic, and other LLM providers [homepage]
- Agent behavior tracing — see the reasoning loop, tool calls, and intermediate outputs of an AI agent [homepage]
- Real-time cost tracking: token usage, per-prompt spend, model selection analytics [homepage]
- Integrated evaluation via Pydantic Evals — curate datasets from production traces, run evals continuously, catch regressions [homepage]
- MCP server: Logfire exposes your production data as an MCP tool — you can query your traces from Claude Code, Cursor, or Windsurf without leaving your editor [4][5]
Language support:
- Python (first-class — FastAPI, Django, SQLAlchemy, asyncpg, httpx, and many more via
logfire.instrument_*methods) [README] - JavaScript/TypeScript (Node.js, Next.js, browsers, Cloudflare Workers, Deno) via separate SDK [homepage]
- Rust via separate SDK [homepage]
- Any OTel-compatible language via standard OpenTelemetry — Go, Java, Ruby, etc. [4]
Testing utilities:
logfire.testingmodule withCaptureLogfirepytest fixture — lets you assert against spans in tests with plain dicts, including span hierarchy, attributes, and exception events [2]TestExporterwithexported_spans_as_dict()— readable, deterministic output suitable for snapshot testing [2]InMemoryMetricReaderfor testing metrics collection in isolation [2]
Enterprise features:
- Data retention: 30 days default, 90 days on Growth plan; custom on Enterprise [4]
- SOC 2 Type II in progress (not confirmed complete as of this writing — verify with vendor)
- Custom data retention and support SLAs on Enterprise tier
Pricing: the actual math
Logfire’s pricing is span-based, not seat-based below 5 users:
| Plan | Spans | Seats | Price |
|---|---|---|---|
| Free | 10M/mo | 1 | $0 |
| Team/Growth | Pay-as-you-go | Up to 5 included | $2/million spans |
| Growth (>5 seats) | Pay-as-you-go | $25/seat | $2M spans + seats |
| Enterprise | Custom | Custom | Contact sales |
Data from [4][5]. Verify current rates at pydantic.dev/logfire.
Workload comparisons against alternatives [4][5]:
| Workload | LangSmith | Arize AX | Logfire | Logfire savings vs LangSmith |
|---|---|---|---|---|
| 1 user, 5M spans/mo | ~$1,238 | ~$99 | $0 (free) | ~$1,238/mo |
| 5 users, 50M spans/mo | ~$5,170 | ~$999 | ~$129 | ~40x |
| 20 users, 500M spans/mo | ~$125,755 | ~$12,249 | ~$1,229 | ~100x |
The free tier is genuinely useful. 10M spans/month for one user covers a solo founder running an LLM app in early production. You can instrument FastAPI, your database calls, and your OpenAI requests and stay inside the free tier for months while validating product-market fit.
What this isn’t: a path to $6/mo self-hosted observability. You’re still on someone else’s infrastructure. The savings are real, but they’re savings relative to LangSmith and Arize, not relative to self-hosting Prometheus.
Deployment reality check
“Deployment” for Logfire is pip install logfire and three lines of Python. There is no server to provision, no Docker Compose to debug, no Postgres to connect. The SDK sends spans over OTLP to Logfire’s cloud [README].
This is both the product’s greatest convenience and its fundamental constraint. For a non-technical founder who wants to know what their AI app is doing in production without hiring a DevOps engineer, this is close to zero friction. For someone who wants full data sovereignty or needs to run inside a private network, the architecture is a dealbreaker.
What can go sideways:
- You’re sending production traces — potentially including user inputs and LLM responses — to Logfire’s servers. If your compliance requirements prohibit this, you need a self-hosted alternative.
- The free tier’s 30-day retention (the page implies this but exact free tier retention should be confirmed) means historical debugging windows are limited unless you pay for extended retention [4].
- Logfire is a younger product than Datadog or New Relic. It lacks the breadth of pre-built dashboards, alert integrations, and incident management features that mature ops teams expect.
- The MCP server feature — querying your traces from a coding assistant — is genuinely novel but also means your AI assistant can read your production observability data. Think through the permissions model before enabling this in a team setting.
Pros and Cons
Pros
- Full-stack traces, not LLM-only. Databases, APIs, agent reasoning, validation — one trace shows the complete picture. This is the real differentiator [4][5].
- Dramatically cheaper than LangSmith and Arize at any meaningful scale. 40-100x less expensive at the workloads where those tools get painful [4].
- Free tier is actually usable. 10M spans/month for one user is enough to instrument a real production app [4].
- Built on OpenTelemetry. Your instrumentation is portable. If you outgrow Logfire or the product changes, you point your OTel exporter elsewhere. No proprietary lock-in at the data layer [4][5].
- SQL query interface. Standard PostgreSQL syntax to query spans — AI coding assistants write excellent SQL, which means you can ask arbitrary questions about production behavior without learning a new query DSL [4][5].
- MCP server. Query your production traces from Claude Code, Cursor, or Windsurf without leaving your editor [4][5].
- Pydantic integration. Built-in validation analytics for teams using Pydantic models — nobody else has this [README].
- Testing utilities are first-class. The
CaptureLogfirepytest fixture lets you write deterministic assertions against spans, which most observability tools don’t support [2]. - Python-centric but polyglot. First-class Python support with first-party JS/TS and Rust SDKs, plus any OTel-compatible language [homepage].
Cons
- Backend is closed source. The SDK is MIT, but you cannot self-host the platform. You are on their managed cloud [README].
- Not a self-hosted escape hatch. This review site’s usual audience wants to escape SaaS. Logfire is cheaper SaaS, not zero-SaaS.
- Younger product than competitors. Feature breadth for alerting, on-call integration, anomaly detection, and enterprise access controls is not at Datadog/Grafana Cloud levels yet.
- Data leaves your infrastructure. LLM inputs, outputs, and agent reasoning show up in Logfire’s cloud. If you’re in a regulated industry or promised users their data stays on-prem, this requires a harder conversation.
- No human annotation or feedback workflows. LangSmith has mature tooling for labeling traces and building evaluation datasets in a UI. Logfire’s eval story is code-first via Pydantic Evals [4].
- OpenTelemetry familiarity helps. The “3 lines of code” pitch is true for simple cases. Wiring up custom spans, baggage propagation, and multi-service tracing still requires understanding how OTel works.
- Data retention cliff on free tier. 30 days is enough for debugging current incidents; it’s not enough for trend analysis [4].
Who should use this / who shouldn’t
Use Logfire if:
- You’re building a Python LLM app (FastAPI + OpenAI, Pydantic AI, any LLM framework) and you need to see what’s actually happening in production.
- You’re currently paying LangSmith more than ~$100/mo and your app isn’t on LangChain/LangGraph specifically.
- You want one tool for both AI tracing and application-level observability (databases, HTTP, background jobs) instead of stitching together two separate platforms.
- You want to query production traces with SQL from your editor via an MCP connection.
- You’re a solo founder in early production and want a meaningful free tier before you commit to paying for observability.
Skip it, look at SigNoz or Grafana Cloud instead, if:
- You want complete data sovereignty and the backend must run on your own infrastructure.
- Your team’s stack is polyglot-first and Python isn’t the center of gravity — Logfire’s best-in-class integrations are Python-first.
- You need mature alerting, PagerDuty/OpsGenie integration, and SLA monitoring that enterprise ops teams expect.
- Your compliance team won’t approve sending traces to a third-party SaaS.
Skip it, stay on LangSmith, if:
- You’re deeply invested in LangChain or LangGraph and the native graph state visualization matters — LangSmith renders LangGraph execution graphs visually, Logfire doesn’t [4].
- Your team has non-engineering members who need annotation queues and human feedback workflows baked into the UI [4].
Skip it, look at Braintrust, if:
- Your primary need is structured evaluation workflows with dataset management, scoring, and human-in-the-loop review — Braintrust’s evaluation tooling is more mature than Logfire’s current code-first Pydantic Evals approach [3].
Alternatives worth considering
- LangSmith — the most direct comparison. Better LangGraph support, human annotation workflows, and a more mature evaluation UI. Dramatically more expensive at scale. Choose it if you’re on LangChain and the ecosystem integration matters [4].
- Arize AX / Phoenix — Arize has two products: AX (commercial SaaS) and Phoenix (open source, self-hostable). Phoenix is worth serious consideration if you want LLM-specific tracing with real self-hosting. AX is more expensive than Logfire with payload-based billing [5][3].
- Braintrust — AI evaluation and monitoring with a purpose-built trace database (Brainstore), a Loop automation agent, and mature dataset/annotation workflows. Free tier: 1M spans. Pro: $249/mo. Better for eval-centric teams [3].
- SigNoz — fully open source (AGPL), self-hostable, OpenTelemetry-native, general-purpose observability (traces, metrics, logs). No LLM-specific features, but if you want to own the stack, this is the honest answer.
- Grafana + Tempo + Loki — the self-hosted stack for teams with infrastructure experience. More setup, more control, zero SaaS bills.
- New Relic / Datadog — the incumbents. Feature-complete, expensive, and not AI-specific. New Relic has a generous free tier but the lock-in is deep.
For an AI founder choosing between paying for observability and paying less: Logfire vs LangSmith is the practical question for Python shops. For teams that want zero SaaS: SigNoz or Phoenix.
Bottom line
Logfire is a genuinely well-engineered product from a team with a strong track record — the Pydantic library downloads run at 560M+/month, which means these developers understand what production Python looks like at scale. The observability platform reflects that: full-stack traces, SQL queries, an MCP server, and pricing that makes LangSmith’s billing look almost comedic by comparison.
But be clear-eyed about what you’re buying. The “MIT license” in the GitHub header refers to the SDK. The backend is closed source and cloud-hosted. This isn’t a self-hosting story — it’s a “pay a fraction of what you’re paying LangSmith” story. For solo founders and small teams building LLM applications in Python who are currently flying blind (or paying $200+/mo for the privilege of not flying blind), the free tier is worth an afternoon and the paid tier is worth the math.
If you’re hitting Logfire’s free tier limits and want someone to evaluate whether the paid tier makes sense for your architecture, upready.dev works with AI founders on exactly this.
Sources
- Logfire Docs — Live View: Monitor Spans & Traces in Real Time (pydantic.dev). https://logfire.pydantic.dev/docs/guides/web-ui/live/
- Logfire Docs — Testing Logfire Instrumentation (pydantic.dev). https://logfire.pydantic.dev/docs/reference/advanced/testing/
- EveryDev.ai — Braintrust AI Evaluation and Monitoring Platform (everydev.ai). https://www.everydev.ai/tools/braintrust
- Pydantic Logfire — Logfire vs LangSmith: Production-Grade AI Observability (pydantic.dev). https://pydantic.dev/logfire/vs-langsmith
- Pydantic Logfire — Logfire vs Arize AX: Developer-First AI Observability (pydantic.dev). https://pydantic.dev/logfire/vs-arize
Primary sources:
- GitHub repository: https://github.com/pydantic/logfire (4,110 stars, MIT SDK license)
- Official website: https://pydantic.dev/logfire
- Pricing reference (embedded in comparison pages): https://pydantic.dev/logfire/vs-langsmith
Related Monitoring & Observability Tools
View all 92 →Firecrawl
94KTurn websites into LLM-ready data — scrape, crawl, and extract structured content from any website as clean markdown, JSON, or screenshots.
Uptime Kuma
84KFancy self-hosted uptime monitoring with 90+ notification services, status pages, and 20-second check intervals — the open-source UptimeRobot alternative.
Netdata
78KReal-time infrastructure monitoring with per-second metrics, 800+ integrations, built-in ML anomaly detection, and AI troubleshooting — using just 5% CPU and 150MB RAM.
Elasticsearch
76KThe distributed search and analytics engine that powers search at Netflix, eBay, and Uber — sub-millisecond queries across billions of documents, with vector search built in for AI/RAG applications.
Grafana
73KThe open-source observability platform for visualizing metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and dozens more data sources.
Sentry
43KSentry is the leading error tracking and application performance monitoring platform, helping developers diagnose, fix, and optimize code across every stack.