Loading…
Observability · Fiddler AI
AI observability and security platform for LLM apps, agents, and ML models.
An enterprise platform to monitor, analyze, and safeguard generative AI and ML in production. The Fiddler Trust Service scores prompts and responses for hallucination, toxicity, PII leakage, and prompt-injection, with low-latency guardrails plus real-time alerting and root-cause analysis. Originally an explainable-AI and model-monitoring pioneer, now spanning LLM and agent observability.
Model support
Where it runs
Tags
Related in Observability
LangWatch
Open-source LLM observability, evaluation, and agent testing.
An open-source platform for monitoring, evaluating, and testing LLM and agent applications. LangWatch captures traces, runs evaluations and simulations, and surfaces quality and cost metrics in production. Offered as managed cloud or fully self-hosted for teams with strict data-residency needs.
AI insight: Open-source, and pairs trace observability with 'Scenario' agent-simulation tests rather than only passive production monitoring.
Galileo
Evaluation and observability for GenAI apps and agents, with inline guardrails.
A platform for testing, monitoring, and guardrailing LLM and agent applications. It ships 20+ out-of-the-box evals for RAG, agents, and safety, lets teams author custom evaluators, and turns those offline evals into real-time production guardrails powered by its own Luna eval models.
AI insight: Scores traces with its own small 'Luna' evaluation models rather than an LLM-as-judge, keeping inline production guardrails cheap to run.
Scale3 Labs
Open-source, OpenTelemetry-based observability for LLM apps and agents.
Langtrace is an open-source observability and evaluation platform for LLM applications, capturing traces, token usage, latency, and cost across popular models, frameworks, and vector databases. Because it emits standard OpenTelemetry spans, traces flow to any OTel-compatible backend, and instrumentation is a two-line SDK install in Python or TypeScript. It ships as a hosted cloud with a free tier plus a self-hostable / on-prem option for data-sensitive teams.
AI insight: Built on OpenTelemetry, so its LLM traces export to any OTel backend (Grafana, Datadog) rather than locking you into one dashboard.
Lunary
Open-source observability and prompt management for LLM apps.
An open-source platform for monitoring, debugging, and improving LLM applications and chatbots. Lunary combines request tracing, cost and user analytics, versioned prompt management with A/B testing, plus human-in-the-loop review and automated scoring. Self-host the Apache-2.0 community edition or use the managed cloud, which starts free with a 10k-events monthly tier.
AI insight: Apache-2.0 and self-hostable, it bundles prompt versioning, A/B tests, and human review alongside tracing — not observability alone.
Traceloop
LLM observability built on OpenTelemetry.
A reliability platform for LLM apps: its open-source OpenLLMetry SDK instruments LLM, vector-DB, and framework calls as standard OpenTelemetry spans, which Traceloop's hosted dashboard turns into traces, cost/latency analytics, and quality monitoring. Because the data is plain OTel, you can pipe it to existing observability stacks instead of a proprietary one.
AI insight: OpenLLMetry builds on OpenTelemetry, so traces export to Datadog, Honeycomb or any OTel backend — not only Traceloop's dashboard.
Weights & Biases
Tracing and evaluation for LLM apps, from Weights & Biases.
An observability and evaluation toolkit for generative-AI applications. A single @weave.op decorator traces every model call — capturing inputs, outputs, latency, token cost, and errors — and the same SDK builds rigorous evaluations using LLM-as-judge and custom scorers. Traces and experiments are organized in the Weights & Biases web platform for side-by-side comparison across prompts and models.
AI insight: The SDK is Apache-2.0 open source, but the traces it captures land in W&B's hosted platform — free for solo use.
Comet
Open-source LLM evaluation, tracing, and monitoring.
Open-source platform from Comet for debugging and evaluating LLM and agent apps: full tracing of calls, tools, and agent steps, LLM-as-a-judge and heuristic evals, prompt management, and production dashboards. Self-host via Docker or Kubernetes, or use Comet's hosted cloud.
AI insight: Apache-2.0 and fully self-hostable — a langfuse-style tracing-plus-eval platform you own, with an optional Comet-hosted cloud.
Arize AI
LLM tracing + evaluation. Strong on retrieval debugging.
Phoenix is Arize's observability platform — run locally in a notebook or as a hosted service. Especially strong for inspecting RAG pipelines, finding bad chunks, and tracking retrieval quality over time.
AI insight: Spins up inside a Jupyter notebook and is sharpest at RAG debugging — finding the bad chunk that poisoned a retrieval.
Helicone
Drop-in LLM proxy with logging, caching, and cost tracking.
One-line integration — change your OpenAI/Anthropic base URL and get a dashboard with every prompt, response, latency, and dollar tracked. Adds caching and rate-limit handling without code changes.
AI insight: Integrate by changing one base-URL line — no SDK wrapper — and it's open-source, so you can self-host the proxy.
Langfuse
Open-source LLM observability. Self-hostable, OpenTelemetry-native.
Tracing, evals, prompt management, and dataset tooling for LLM apps — self-host on your own infra or use Langfuse Cloud. The open-source default when you want full ownership of your observability stack.
AI insight: The self-hostable, OpenTelemetry-native answer to LangSmith — pick it when observability data has to stay on your own infra.