OpenLLMetry
OpenLLMetry lets you run observability platform for LLMs using OpenTelemetry. Monitor performance entirely on your own server.
Open-source LLM instrumentation, honestly reviewed. No marketing fluff — just what you actually get when you instrument your AI app with it.
TL;DR
- What it is: An open-source instrumentation library that adds OpenTelemetry-compatible tracing to LLM applications — think of it as a shim between your code and whatever observability backend you already run [1].
- Who it’s for: Engineering teams building LLM applications who already have an observability stack (Datadog, Honeycomb, Grafana, etc.) and want to add LLM call tracing without switching platforms.
- Cost savings: The library is free (Apache 2.0). Whether you save money depends entirely on which backend you send traces to — if you’re already paying for Datadog, you plug OpenLLMetry in at zero additional cost [1][2].
- Key strength: Vendor-neutral. Two lines of Python or TypeScript, and your LLM calls appear as traces in whatever system your team already knows [1].
- Key weakness: OpenLLMetry is instrumentation, not a platform. You get no built-in UI, no prompt management, no evaluation workflow, no dashboards. You have to bring those yourself [1].
What is OpenLLMetry
OpenLLMetry is a set of OpenTelemetry extensions that instrument LLM provider calls — OpenAI, Anthropic, Cohere, Bedrock, Gemini, Mistral, Ollama, together.ai — and popular frameworks like LangChain and LlamaIndex. The output is standard OpenTelemetry trace data: spans, attributes, and metrics that any OTLP-compatible backend can consume.
It was built by Traceloop (YC-backed), and the library sits at around 6,938 GitHub stars. The Apache 2.0 license means you can use it commercially, embed it in your product, or fork it without restriction.
The architecture is worth understanding before you evaluate it. OpenLLMetry is not a product — it’s a bridge. Your LLM calls get wrapped in OpenTelemetry spans carrying attributes like model name, token counts, prompt text, and completion text. Those spans get exported to wherever you point your OTLP exporter: Traceloop’s own managed platform, Datadog, Honeycomb, Grafana Tempo, Dynatrace, Azure Application Insights, New Relic, Splunk, or Braintrust, among others [1][2].
The install is genuinely two lines:
pip install traceloop-sdk
from traceloop.sdk import Traceloop
Traceloop.init()
After that, every call to OpenAI (or whichever provider you’re using) emits a trace. No manual instrumentation. No decorators required unless you want to annotate your own functions as workflows or tasks.
The semantic conventions the project defined for LLM telemetry have since been accepted into the OpenTelemetry specification itself — which is a meaningful signal about the project’s technical seriousness [README].
Why People Choose It
The PostHog engineering team lists OpenLLMetry at #4 in their comparison of open-source LLM observability tools, behind PostHog, Langfuse, and Opik [1]. That ranking tells you something useful: OpenLLMetry is respected but not the first choice if you need a complete platform out of the box.
The teams that actively choose OpenLLMetry over the alternatives share a common profile: they already have observability infrastructure. If you’re running Datadog for your API servers, adding LLM traces to the same platform means your on-call team doesn’t need to learn a new tool. If you’re already in Honeycomb or Grafana, OpenLLMetry extends what you have rather than fragmenting it. The value proposition is operational consolidation, not new capability.
The Braintrust integration is a good illustration of how this works in practice [2]. Rather than replacing Braintrust’s evaluation workflow, OpenLLMetry acts as the data pipe: you set TRACELOOP_BASE_URL to Braintrust’s OTLP endpoint, add an auth header, and your traces land in Braintrust automatically. The integration is a few lines of configuration, not a platform migration.
What engineers say they like, based on the project’s documentation and integration ecosystem:
- No vendor lock-in. If Traceloop’s cloud pricing changes, or if you switch observability vendors next year, you change one environment variable. Your instrumentation code stays the same.
- Standard protocol. Any tool that speaks OTLP works. The list of supported destinations includes every major observability platform.
- Framework coverage. LangChain and LlamaIndex are first-class — you get tracing for the entire chain or agent run, not just individual LLM calls.
- Language parity. Python and TypeScript/JavaScript are both supported via separate packages (openllmetry-js for the JS side).
Features
Instrumented LLM providers (from README):
- OpenAI (including streaming)
- Anthropic
- AWS Bedrock
- Google Gemini
- Cohere
- Mistral
- Ollama
- together.ai
- IBM Watsonx
Instrumented frameworks and infrastructure:
- LangChain
- LlamaIndex
- Vector databases (Chroma and others — exact list in README)
What gets captured per span:
- Model name, prompt text, completion text
- Token counts (prompt tokens, completion tokens, total)
- Latency per call
- Streaming detection
- Workflow and task annotations via decorators
Exporter destinations (verified, not exhaustive): Traceloop, Axiom, Azure Application Insights, Braintrust, Dash0, Datadog, Dynatrace, Google Cloud, Grafana Tempo, Honeycomb, Instana, New Relic, Splunk, Highlight [README][2]
What OpenLLMetry does NOT include:
- A built-in visualization UI
- Prompt management or versioning
- Dataset management for evaluation
- Evaluation scoring or human review workflows
- Cost tracking dashboard (that’s your backend’s job)
- Alerting or anomaly detection
If you want those capabilities, you’re either relying on your existing observability platform to provide them (Datadog has cost attribution, Honeycomb has query-based alerting) or you’re using Traceloop’s commercial platform as the backend, which adds those features on top of the OTLP traces OpenLLMetry produces.
Pricing: SaaS vs Self-Hosted Math
This is where the review has to be honest about a data gap: Traceloop’s managed platform pricing is not published in any of the sources available for this review, and fabricating numbers would be irresponsible. What is clear from first principles:
OpenLLMetry the library: $0. Apache 2.0 license, runs anywhere, no call-home requirement.
Traceloop the managed backend: Pricing not publicly disclosed in available sources. Contact their sales or check their current pricing page directly.
Self-hosted with your existing stack:
- If you already pay for Datadog, Honeycomb, or Grafana Cloud, the marginal cost of adding OpenLLMetry traces is whatever your additional data volume costs under that plan’s pricing model.
- A reasonable estimate: if you’re sending 1M LLM traces per month to Datadog, expect maybe $20–$100/month in additional ingest costs depending on your plan — but that’s Datadog’s pricing, not OpenLLMetry’s.
- If you run open-source Grafana + Tempo or similar, your only cost is the storage.
The comparison that matters: A team already paying $200/month for Datadog adds OpenLLMetry traces at maybe $30/month incremental ingest cost — and gets LLM observability without adopting a new platform or login. Compare that to Langfuse Cloud at $29/month for 100k events [1] if your volume is low, or significantly more at scale. The calculus depends entirely on your existing stack.
Deployment Reality Check
Getting OpenLLMetry running is genuinely fast. If your existing code already calls OpenAI or Anthropic, you’re one pip install and two lines away from traces appearing in your backend.
The complication is the backend, not the library.
If you’re sending to Traceloop’s managed cloud: Set the TRACELOOP_API_KEY environment variable. Done.
If you’re sending to Braintrust: Set TRACELOOP_BASE_URL to the Braintrust OTLP endpoint and add an auth header [2]. About five minutes of configuration.
If you’re sending to Datadog, Honeycomb, or Grafana: Configure the standard OTLP exporter in your existing collector setup. If you already run an OpenTelemetry Collector, add the LLM trace pipeline alongside your existing ones. If you don’t run a Collector yet, you’ll need to set one up — that’s a 1–2 hour task if you haven’t done it before.
What can go sideways:
- No built-in sampling. High-volume production applications can generate a lot of trace data. You need to configure sampling at the Collector or SDK level, or your observability bill grows with every LLM call.
- Streaming calls. OpenAI streaming responses require the library to buffer and stitch — generally works, but worth testing with your specific streaming usage patterns.
- Large prompts. If you’re passing multi-thousand-token context windows, the trace attributes get large. Some backends have limits on attribute size that can silently truncate prompt text.
- Framework version pinning. OpenLLMetry’s instrumentations patch LLM client libraries at import time. If a provider releases a major SDK version change, the instrumentation may break until the OpenLLMetry package is updated. This is a general risk with auto-instrumentation.
Realistic setup time: 15–30 minutes if you’re sending to Traceloop or Braintrust. 1–3 hours if you’re integrating with an existing Collector-based setup you manage yourself.
Pros and Cons
Pros
- Apache 2.0 license. No usage restrictions, no “fair-code” gray areas, no license review required [1][README].
- Two lines to instrument. The fastest path from “zero observability” to “LLM calls in my existing platform” [README].
- Vendor-neutral output. Your traces go wherever you point the OTLP exporter. Changing observability backends doesn’t require changing your instrumentation code [1][2].
- Semantic conventions now in OpenTelemetry upstream. This is not a niche standard — it’s the direction the industry is standardizing on [README].
- Wide LLM provider coverage. OpenAI, Anthropic, Bedrock, Gemini, Mistral, Ollama, Cohere, and more — it’s not an OpenAI-only library [README].
- Framework instrumentation. LangChain and LlamaIndex chains produce coherent multi-span traces, not just individual call spans.
- YC-backed team. Traceloop has real funding and commercial incentive to keep the open-source project healthy.
Cons
- Not a complete platform. OpenLLMetry gives you traces. It does not give you a UI, dashboards, prompt management, evaluation pipelines, datasets, or alerting. If you want those, you’re either buying them elsewhere or using Traceloop’s commercial cloud [1].
- Lower star count than key competitors. At 6,938 stars, it’s behind Langfuse (23,300 stars), PostHog (32,100 stars), and likely others [1]. Smaller community means fewer StackOverflow answers when you’re debugging.
- Ranked #4 by engineers who use these tools daily. PostHog’s review places it behind PostHog, Langfuse, and Opik specifically because it lacks the platform capabilities the others provide [1].
- No self-hosted backend shipped. You can send traces to a self-hosted Grafana stack, but OpenLLMetry doesn’t package a turnkey self-hosted analytics backend the way Langfuse does.
- Traceloop platform pricing is opaque. If you want the full platform experience (UI, cost tracking, alerting), you’re in a sales conversation. That’s a friction point for a solo founder who just wants transparent pricing.
- Auto-instrumentation can break. When LLM SDK vendors release major updates, there’s a window before OpenLLMetry catches up. Plan for this in production.
Who Should Use This / Who Shouldn’t
Use OpenLLMetry if:
- Your engineering team already runs Datadog, Honeycomb, Grafana, or any OTLP-compatible backend, and you want LLM call visibility without a new platform.
- You’re building a product that you’ll eventually need to observe across multiple LLM providers and you want to avoid locking your instrumentation to a single vendor’s SDK.
- You’re already using LangChain or LlamaIndex and want end-to-end trace visibility across agent runs.
- Your compliance or security team prefers data staying in infrastructure you already control and have approved.
Skip it (use Langfuse instead) if:
- You’re starting from zero and want a self-contained observability platform — UI, traces, prompt management, evaluations, and datasets — without needing a pre-existing observability stack.
- You’re a solo founder or small team who doesn’t want to manage an OTLP pipeline.
- You want cost tracking dashboards out of the box rather than building them in your existing platform.
Skip it (use PostHog instead) if:
- You want LLM observability alongside product analytics, session replay, and feature flags in one tool.
- You’re on a tight budget and need a free tier with enough volume to get started without a credit card conversation.
Skip it (use Phoenix/Arize instead) if:
- Your primary concern is ML model evaluation and dataset management, not production trace volume.
Alternatives Worth Considering
- Langfuse — the most direct alternative for teams that want a complete platform. Self-hostable, MIT licensed, 23k+ stars, includes prompt management, evaluation, datasets, and a full UI. Better choice if you don’t have an existing observability stack [1].
- PostHog — MIT licensed all-in-one dev tool (analytics + LLM observability). The LLM analytics product integrates with A/B testing and session replay. Free tier covers 100k events/month [1].
- Opik (by Comet) — focused on LLM evaluation and tracing with a clean UI. Smaller community but purpose-built for the LLM development workflow [1].
- Phoenix (by Arize) — strong on dataset management and evaluation; best for teams that do systematic ML evaluation cycles rather than production monitoring [1].
- Helicone — simpler proxy-based approach rather than SDK instrumentation. If you don’t want to touch your code, Helicone routes traffic through their infrastructure and adds observability at the proxy layer [1].
- Datadog LLM Observability — if you’re already deep in Datadog and want native product support rather than a community library, Datadog now ships their own LLM tracing product. More expensive, zero setup friction if you’re already a customer.
For a non-technical founder starting from scratch, the realistic shortlist is Langfuse or PostHog, not OpenLLMetry. OpenLLMetry is an engineers’ tool that solves an integration problem, not a product that solves a “where do I see my LLM metrics” problem.
Bottom Line
OpenLLMetry answers a specific question well: “How do I get LLM call traces into the observability stack I already run?” If that’s your question, the answer is two lines of code and fifteen minutes. The Apache 2.0 license, the vendor-neutral OTLP output, and the broad provider coverage are all real and useful. But OpenLLMetry is not a product that replaces Langfuse or PostHog for teams building their first LLM observability setup. It’s middleware for teams that already know what they’re doing with observability and want LLM calls to fit into that existing picture. If you’re a non-technical founder who just wants to see what your AI app is doing without managing infrastructure, start with Langfuse. If you’re an engineering team with a Datadog or Grafana contract and a preference for keeping everything in one pane, OpenLLMetry is the right call.
Sources
- PostHog Blog — “7 best free and open source LLM observability tools” (Mar 19, 2026). https://posthog.com/blog/best-open-source-llm-observability-tools
- Braintrust Documentation — “TraceLoop — OpenLLMetry Integration”. https://www.braintrust.dev/docs/integrations/sdk-integrations/traceloop
Primary sources:
- GitHub repository and README: https://github.com/traceloop/openllmetry (6,938 stars, Apache 2.0 license)
- Official website: https://www.traceloop.com/openllmetry
- Documentation: https://traceloop.com/docs/openllmetry/introduction
Features
Integrations & APIs
- Plugin / Extension System
Related Monitoring & Observability Tools
View all 92 →Firecrawl
94KTurn websites into LLM-ready data — scrape, crawl, and extract structured content from any website as clean markdown, JSON, or screenshots.
Uptime Kuma
84KFancy self-hosted uptime monitoring with 90+ notification services, status pages, and 20-second check intervals — the open-source UptimeRobot alternative.
Netdata
78KReal-time infrastructure monitoring with per-second metrics, 800+ integrations, built-in ML anomaly detection, and AI troubleshooting — using just 5% CPU and 150MB RAM.
Elasticsearch
76KThe distributed search and analytics engine that powers search at Netflix, eBay, and Uber — sub-millisecond queries across billions of documents, with vector search built in for AI/RAG applications.
Grafana
73KThe open-source observability platform for visualizing metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and dozens more data sources.
Sentry
43KSentry is the leading error tracking and application performance monitoring platform, helping developers diagnose, fix, and optimize code across every stack.