Loading…
Observability · LangChain
LangChain's hosted observability + eval platform.
Tracing, dataset management, eval orchestration, and prompt playground from the LangChain team. Pairs naturally if LangChain or LangGraph already runs in your stack, but works standalone via SDKs.
Model support
Where it runs
Tags
Related in Observability
Galileo
Evaluation and observability for GenAI apps and agents, with inline guardrails.
A platform for testing, monitoring, and guardrailing LLM and agent applications. It ships 20+ out-of-the-box evals for RAG, agents, and safety, lets teams author custom evaluators, and turns those offline evals into real-time production guardrails powered by its own Luna eval models.
AI insight: Scores traces with its own small 'Luna' evaluation models rather than an LLM-as-judge, keeping inline production guardrails cheap to run.
Scale3 Labs
Open-source, OpenTelemetry-based observability for LLM apps and agents.
Langtrace is an open-source observability and evaluation platform for LLM applications, capturing traces, token usage, latency, and cost across popular models, frameworks, and vector databases. Because it emits standard OpenTelemetry spans, traces flow to any OTel-compatible backend, and instrumentation is a two-line SDK install in Python or TypeScript. It ships as a hosted cloud with a free tier plus a self-hostable / on-prem option for data-sensitive teams.
AI insight: Built on OpenTelemetry, so its LLM traces export to any OTel backend (Grafana, Datadog) rather than locking you into one dashboard.
Traceloop
LLM observability built on OpenTelemetry.
A reliability platform for LLM apps: its open-source OpenLLMetry SDK instruments LLM, vector-DB, and framework calls as standard OpenTelemetry spans, which Traceloop's hosted dashboard turns into traces, cost/latency analytics, and quality monitoring. Because the data is plain OTel, you can pipe it to existing observability stacks instead of a proprietary one.
AI insight: OpenLLMetry builds on OpenTelemetry, so traces export to Datadog, Honeycomb or any OTel backend — not only Traceloop's dashboard.
Weights & Biases
Tracing and evaluation for LLM apps, from Weights & Biases.
An observability and evaluation toolkit for generative-AI applications. A single @weave.op decorator traces every model call — capturing inputs, outputs, latency, token cost, and errors — and the same SDK builds rigorous evaluations using LLM-as-judge and custom scorers. Traces and experiments are organized in the Weights & Biases web platform for side-by-side comparison across prompts and models.
AI insight: The SDK is Apache-2.0 open source, but the traces it captures land in W&B's hosted platform — free for solo use.
Comet
Open-source LLM evaluation, tracing, and monitoring.
Open-source platform from Comet for debugging and evaluating LLM and agent apps: full tracing of calls, tools, and agent steps, LLM-as-a-judge and heuristic evals, prompt management, and production dashboards. Self-host via Docker or Kubernetes, or use Comet's hosted cloud.
AI insight: Apache-2.0 and fully self-hostable — a langfuse-style tracing-plus-eval platform you own, with an optional Comet-hosted cloud.
Arize AI
LLM tracing + evaluation. Strong on retrieval debugging.
Phoenix is Arize's observability platform — run locally in a notebook or as a hosted service. Especially strong for inspecting RAG pipelines, finding bad chunks, and tracking retrieval quality over time.
AI insight: Spins up inside a Jupyter notebook and is sharpest at RAG debugging — finding the bad chunk that poisoned a retrieval.
Helicone
Drop-in LLM proxy with logging, caching, and cost tracking.
One-line integration — change your OpenAI/Anthropic base URL and get a dashboard with every prompt, response, latency, and dollar tracked. Adds caching and rate-limit handling without code changes.
AI insight: Integrate by changing one base-URL line — no SDK wrapper — and it's open-source, so you can self-host the proxy.
Langfuse
Open-source LLM observability. Self-hostable, OpenTelemetry-native.
Tracing, evals, prompt management, and dataset tooling for LLM apps — self-host on your own infra or use Langfuse Cloud. The open-source default when you want full ownership of your observability stack.
AI insight: The self-hostable, OpenTelemetry-native answer to LangSmith — pick it when observability data has to stay on your own infra.