Open-source LLM engineering, honestly reviewed. No marketing fluff, just what you get when you self-host it.

TL;DR

What it is: Open-source (MIT) LLM engineering platform — tracing, prompt management, evaluations, and cost tracking for AI applications, with the source code on your server and no per-seat fees [2][4].
Who it’s for: AI/ML teams at startups and enterprises building LLM applications who want production-grade observability without vendor lock-in or headcount-based billing [1][2].
Cost savings: LangSmith (the primary alternative) charges $39/seat/month — a 5-person team pays $195/month, $2,340/year. Langfuse self-hosted runs on a $10–20/month VPS with no per-seat cost at all [1][2].
Key strength: Framework-agnostic and genuinely MIT-licensed. Works with LangChain, LlamaIndex, OpenAI, Anthropic, and 80+ other integrations via native SDKs and OpenTelemetry. Unlimited usage on self-hosted — no rate limits, no unit caps [1][4].
Key weakness: SSO, project-level RBAC, audit logs, and data retention policies are not in the open-source version — they require the Enterprise tier (custom pricing, bundled with ClickHouse). For teams needing governance, the self-hosted free tier hits a real wall [3][4].

What is Langfuse

Langfuse is an open-source platform for debugging and improving LLM applications in production. You instrument your application with the SDK, and Langfuse captures every LLM call, chain step, tool invocation, and agent loop — surfacing costs, latency, errors, and output quality across your entire trace history. The company describes it as an “Open Source LLM Engineering Platform” and that description is unusually accurate [website].

What makes it stand out from the growing pile of LLM observability tools is three things. First, the MIT license on the community edition — not the “fair-code” or “sustainable use” licenses you often see in this space, but actual MIT, meaning you can self-host without calling a lawyer [1][4]. Second, usage-based pricing with no per-seat model — a team of 10 costs the same as a team of 2 as long as the monthly unit volume is the same, which is a meaningful difference when your team is growing [1][2]. Third, the project is backed by real infrastructure: Langfuse was acquired by ClickHouse, and the backend now leverages ClickHouse’s OLAP performance for high-throughput trace queries at scale [5].

The project started as a YC W23 company and currently sits at 23,334 GitHub stars. It integrates with over 80 frameworks and providers via native SDKs (Python and JavaScript), OpenTelemetry (Java, Go, custom), and proxy-based logging via LiteLLM [1][5].

Why people choose it over LangSmith, Arize, and the alternatives

The three reviews and comparison guides we synthesized land in roughly the same place: Langfuse wins on openness, pricing model, and infrastructure control, and loses on LangChain-native depth and enterprise governance.

Versus LangSmith. LangSmith is the obvious comparison because it’s the most widely used alternative. The trade-off is clear: LangSmith is better if your entire stack runs on LangChain or LangGraph, where setup is one environment variable and tracing “just works” for chain and agent workflows. But LangSmith charges $39/seat/month at the Plus tier [1]. A 5-person team that grows to 15 sees their observability bill triple without touching more infrastructure. Langfuse charges per usage unit, not per developer — and self-hosted charges nothing at all.

The leanware.co comparison [2] puts it plainly: “Its MIT-licensed open-source model works well for teams that want control over their data or need to adapt the platform to their workflows.” The LangSmith bet is convenience; the Langfuse bet is ownership.

Versus Arize AX and Phoenix. Arize AX is proprietary enterprise SaaS — no self-hosting parity [5]. Arize Phoenix exists as open-source but is explicitly positioned for local development and testing only, not production observability. Langfuse’s architecture (ClickHouse backend) is built for production-scale OLAP from the start, while Phoenix uses PostgreSQL and isn’t designed for that load [5].

On pricing, Arize AX charges based on span counts plus data ingestion volume in GB. For LLM apps with long context windows, that data ingestion pricing can surprise you. Langfuse charges per “unit” (a trace, observation, or score), which is more predictable and doesn’t punish you for large payloads [5].

On the self-hosted parity question. One Reddit thread [3] captures the most important concern non-infrastructure teams bring: “We had a bad experience with PostHog self-hosted earlier this year (great product when they host the app though!)” — basically asking whether Langfuse’s self-hosted version is a first-class citizen or a stripped-down demo. The self-hosted pricing page [4] answers clearly: all core platform features (tracing, evaluations, prompt management, datasets, playground, API) are included with no usage limits. What you don’t get in OSS is SSO, project-level RBAC, data retention management, audit logs, and the SCIM API. For a production engineering team of under 20, that’s fine. For a regulated enterprise with identity management requirements, that’s the upgrade trigger.

Features: what it actually does

Based on the documentation, README, and pricing comparison pages:

Core observability:

Distributed tracing across LLM calls, chains, tool uses, and agent loops [1][2]
Session tracking for multi-turn chats/threads and user-level tracking [4]
Token and cost tracking across 100+ models [5]
Latency and throughput metrics [1]
Multi-modal support (in beta on cloud) [pricing page]

Prompt management:

Versioned prompt storage with release management [4]
Prompt composability (build prompts from prompt components) [4]
Server-side and client-side prompt caching [4]
Protected deployment labels — staging vs. production [4]
Playground for testing prompts against live models [4]
Webhooks and Slack notifications on prompt deployments [4]

Evaluations:

Datasets for offline evaluation [4]
Experiments via SDK and UI [4]
Custom evaluation scores [4]
User feedback tracking [4]
External evaluation pipeline support [4]
LLM-as-judge evaluators [4]
Human annotation queues (1 queue on Hobby, unlimited on paid) [pricing page]

Integrations:

Native SDKs for Python and JavaScript [1]
OpenTelemetry for Java, Go, and any custom integration [1][4]
LangChain, LlamaIndex, OpenAI, Anthropic, and 80+ others [1][5]
LiteLLM proxy-based logging [4]
PostHog and Mixpanel data exports [pricing page]
Scheduled batch export to blob storage (Core+ plans) [pricing page]

Deployment options (self-hosted):

Docker Compose for local and single-server installs [4]
Kubernetes with Helm charts [4]
Terraform templates for AWS, Azure, and GCP [4]

Pricing: SaaS vs self-hosted math

Langfuse Cloud:

Hobby: Free — 50k units/month, 30-day data retention, 2 users, community support [pricing page]
Core: $29/month — 100k units/month included, additional at $8/100k, 90-day retention, unlimited users, in-app support [pricing page]
Pro: $199/month — 100k included + overage pricing, 3-year data retention, high rate limits, SOC 2 and ISO 27001 reports, HIPAA BAA available [pricing page]
Teams add-on: $300/month on top of Pro — Enterprise SSO, SSO enforcement, project-level RBAC, dedicated Slack support [pricing page]
Enterprise: $2,499/month — audit logs, SCIM API, custom rate limits, uptime SLA, dedicated support engineer [pricing page]

Self-hosted (OSS):

Software license: $0 (MIT) [4]
VPS: $10–20/month on Hetzner or Contabo
ClickHouse: bundled in the Docker Compose setup, or self-managed

LangSmith for comparison:

Developer: Free — 5k traces/month
Plus: $39/seat/month — 10k traces/month per seat, then $10/1k additional traces
Enterprise: custom pricing

Concrete savings math:

A team of 5 engineers logging ~200k traces/month. On LangSmith Plus: 5 × $39 = $195/month, $2,340/year. On Langfuse Core: $29/month flat (100k units included, ~$8 in overage for the extra 100k = ~$37/month total). Self-hosted Langfuse: roughly $15/month VPS with no unit limits.

That’s $2,340/year (LangSmith) vs. $444/year (Langfuse Cloud) vs. $180/year (self-hosted). For a 10-person team, LangSmith doubles to $4,680/year while Langfuse stays flat on usage-based pricing.

One important caveat: Langfuse’s “unit” definition (a trace, observation, or score) means complex agentic workflows with many sub-spans can accumulate units quickly. The pricing calculator on langfuse.com/pricing helps estimate real costs before committing [pricing page].

Deployment reality check

The Reddit thread [3] from a PM (not an infra engineer) evaluating self-hosting captures the real concern: will the self-hosted version quietly degrade into a second-tier product as the company focuses on cloud revenue? Based on the self-hosted pricing page [4], the OSS tier includes the full observability stack with no usage limits — that’s an honest feature parity claim and different from the PostHog pattern the commenter referenced.

What you actually need:

A Linux VPS with at least 4GB RAM (ClickHouse is memory-hungry under any real trace volume)
Docker and docker-compose
A domain and reverse proxy (Caddy or nginx) for HTTPS
PostgreSQL and ClickHouse (both bundled in the default Docker Compose)
Redis (also bundled)

What can go sideways:

The Azure OpenAI question from Reddit [3] is worth addressing directly: Langfuse doesn’t have a dedicated Azure OpenAI integration listed on its integrations page, but because it supports OpenTelemetry and has an OpenAI SDK wrapper, Azure OpenAI calls can be traced by pointing the OpenAI client at the Azure endpoint while still using Langfuse’s Python SDK. The commenter noted they couldn’t find direct Azure integrations — that’s still partially true as of early 2026, but it’s a workaround rather than a blocker.

SSO is not in OSS. This is the most common trap. The self-hosted OSS tier includes organization-level RBAC and Google/AzureAD/GitHub social login. It does not include Enterprise SSO (Okta, AzureAD/EntraID), SSO enforcement, or project-level RBAC [4]. If your company mandates Okta, you’re looking at the Enterprise tier, which is custom-priced and bundled with ClickHouse’s commercial plan — a meaningful jump.

ClickHouse acquisition. Langfuse is now part of ClickHouse [website]. That’s generally a positive signal for infrastructure maturity and funding, but it introduces a long-term strategic question: will the OSS version remain a first-class citizen, or will the Enterprise tier become the real product? As of this writing, the MIT license and OSS feature parity are intact. But it’s worth watching.

Realistic install time: a technical user comfortable with Docker Compose can have a working instance in 45–90 minutes. A PM following a step-by-step guide should budget half a day, including domain setup and the ClickHouse tuning that large trace volumes eventually require.

Pros and cons

Pros

Actually MIT-licensed core. The full feature set — tracing, evaluations, prompt management, datasets, playground — is MIT and self-hostable with no usage caps [1][4]. Not “open core with the real features paywalled.”
No per-seat pricing. Adding engineers to your team doesn’t raise your bill. Pricing scales with application usage, not headcount — a real differentiator over LangSmith [1][2].
Framework-agnostic. 80+ integrations via native SDKs and OpenTelemetry. LangChain, LlamaIndex, OpenAI, Anthropic, and custom stacks all work [1][5].
ClickHouse backend. Built for OLAP scale from the start, not retrofitted. High-throughput trace ingestion and fast historical queries [5].
Genuinely unlimited on self-hosted. No rate limits, no unit caps, no 30-day data retention cutoff — the OSS tier has none of those constraints [4].
OpenTelemetry native. Trace data flows in standard OTel format, which reduces vendor lock-in and enables custom tooling on top [5].
Transparent pricing. Cloud pricing is published with a calculator. No “contact sales for real numbers” until Enterprise tier.

Cons

SSO, audit logs, and project-level RBAC are Enterprise-only. The community edition is missing the governance features that companies past ~20 people typically require. The jump from OSS to Enterprise is an undisclosed custom price, not a published number [3][4].
ClickHouse resource requirements. Running ClickHouse on a shared $5 VPS is a bad time. You need real RAM. The Docker Compose default bundles it, but you’ll notice it under production load.
No native simulation or regression testing for conversational agents. If you’re building voice or chat AI agents and want end-to-end conversation simulation before deployment, Langfuse doesn’t provide that natively — it traces what happened, it doesn’t simulate what might happen [1].
LangChain/LangGraph tracing isn’t as seamless as LangSmith. LangSmith is built by the LangChain team and enables automatic tracing with one environment variable. Langfuse requires explicit SDK instrumentation. For LangChain-heavy teams, that’s extra work [2].
Advanced alerting requires external tooling. Langfuse surfaces metrics and can trigger webhooks, but configuring threshold-based alerting (e.g., latency spikes, error rate thresholds) still requires wiring up external monitoring. LangSmith has this natively [1].
Azure OpenAI isn’t a first-class integration. Workarounds exist, but teams running primarily on Azure OpenAI face some friction in getting clean tracing out of the box [3].
ClickHouse acquisition introduces uncertainty. The OSS-first positioning has held so far, but the long-term incentives of a ClickHouse-owned commercial product are worth monitoring if you’re making a multi-year infrastructure bet.

Who should use this / who shouldn’t

Use Langfuse if:

You’re building LLM applications and running LangSmith at $39/seat/month while your engineering team grows — the Langfuse pricing model pays for itself within months.
You need genuine MIT self-hosting for data sovereignty, compliance, or cost control, and your governance requirements don’t mandate SSO/RBAC.
Your stack is framework-agnostic or uses multiple providers (OpenAI, Anthropic, LiteLLM, custom models) — Langfuse’s breadth of integrations handles mixed stacks cleanly.
You’re a startup or early-stage team that wants production-grade observability without committing to per-seat costs before you know team size.

Skip it (use LangSmith) if:

Your entire application is built on LangChain or LangGraph and you want zero-friction, automatic tracing that understands chain internals without SDK instrumentation.
You need native alerting with Slack/email thresholds without external tooling.
You’re already deploying LangGraph agents to production — LangSmith’s deployment integration is native.

Skip it (use Arize AX) if:

You’re in fintech and need PCI DSS 4.0 compliance specifically — Arize AX has that certification, Langfuse does not [5].
You’re heavily invested in the Arize ML observability stack and want LLM tracing alongside traditional ML monitoring in one platform.

Skip self-hosted (use Langfuse Cloud) if:

You don’t have a DevOps or infrastructure person — ClickHouse is not a database you want to tune under production load without experience.
Your company mandates Okta/SSO from day one — go straight to Enterprise conversation rather than deploying OSS and then scrambling.

Alternatives worth considering

LangSmith — the direct competitor. LangChain-native, per-seat pricing, better for LangChain/LangGraph heavy teams. $39/seat/month [1][2].
Arize AX — enterprise SaaS, proprietary. Better fintech compliance posture (PCI DSS). No real self-hosting parity [5].
Arize Phoenix — open-source from the Arize team, PostgreSQL-backed, designed for local development and testing rather than production observability. Worth evaluating if you want a lighter footprint [5].
Helicone — developer-focused LLM observability proxy. Simpler setup, less feature-rich for evaluations and prompt management.
Traceloop / OpenLLMetry — OpenTelemetry-first open-source tracing library. Less full-featured as a standalone platform but integrates with any OTel backend.
Weights & Biases (W&B Weave) — if you’re already using W&B for ML experiment tracking, Weave extends it to LLM tracing. Makes sense if your team is already W&B-native.

For a team choosing between self-hosted options, the realistic shortlist is Langfuse vs. Arize Phoenix. Langfuse wins on production readiness and the breadth of cloud-tier features if you ever need them; Phoenix wins if you want dead-simple local debugging without standing up ClickHouse.

Bottom line

Langfuse is the most credible open-source option in LLM observability right now. It doesn’t try to win on LangChain-native tracing (LangSmith owns that) or fintech compliance certifications (Arize AX wins there). Its bet is simpler: give teams a genuinely MIT-licensed, usage-based, framework-agnostic platform that they can self-host without usage caps and without paying per developer. For the common case — an engineering team of 5–15 building LLM applications on a mix of providers, watching a LangSmith bill that scales with headcount — the math is obvious. The self-hosted path is real and the features are there. The honest caveats: you need actual infrastructure skills to run ClickHouse at production load, and if your company requires Okta SSO, you’re in Enterprise pricing territory before you even deploy.

If deploying ClickHouse and keeping a Linux server updated isn’t where you want your attention, that’s exactly the kind of infrastructure setup that upready.dev handles for teams.

Sources

Cekura — “Langfuse vs. LangSmith vs. Cekura: I Tested All 3” (Mar 19, 2026). https://www.cekura.ai/blogs/langfuse-vs-langsmith
Leanware — “Langfuse vs LangSmith: Feature Comparison, Pricing & Verdict”. https://www.leanware.co/insights/langfuse-vs-langsmith
Reddit r/LangChain — “Is Langfuse self-hosted really equal to the managed product? + Azure compatibility questions”. https://www.reddit.com/r/LangChain/comments/1m27js0/is_langfuse_selfhosted_really_equal_to_the/
Langfuse — “Self-Hosted Pricing”. https://langfuse.com/pricing-self-host
Langfuse — “Arize AX Alternative? Langfuse vs. Arize AI and Arize Phoenix for LLM Observability”. https://langfuse.com/faq/all/best-phoenix-arize-alternatives

Primary sources:

GitHub repository: https://github.com/langfuse/langfuse (23,334 stars, MIT license)
Official website: https://langfuse.com
Pricing page: https://langfuse.com/pricing

Features

Integrations & APIs

REST API

AI & Machine Learning

AI / LLM Integration

Automation & Workflows

Workflows

Collaboration

Version History

Replaces

Compare Langfuse

Langfuse vs

Loki

Both are monitoring tools. Langfuse has 8 unique features, Loki has 1.

Langfuse vs

Signoz

Both are monitoring tools. Langfuse has 8 unique features, Signoz has 1.

Related Monitoring & Observability Tools

View all 92 →

Firecrawl

94K

Turn websites into LLM-ready data — scrape, crawl, and extract structured content from any website as clean markdown, JSON, or screenshots.

monitoring AGPL-3.0

Uptime Kuma

84K

Fancy self-hosted uptime monitoring with 90+ notification services, status pages, and 20-second check intervals — the open-source UptimeRobot alternative.

monitoring MIT

Netdata

78K

Real-time infrastructure monitoring with per-second metrics, 800+ integrations, built-in ML anomaly detection, and AI troubleshooting — using just 5% CPU and 150MB RAM.

monitoring GPL-3.0

Elasticsearch

76K

The distributed search and analytics engine that powers search at Netflix, eBay, and Uber — sub-millisecond queries across billions of documents, with vector search built in for AI/RAG applications.

monitoring

Grafana

73K

The open-source observability platform for visualizing metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and dozens more data sources.