Skip to content

Observability AI apps

Tracing, monitoring, and debugging for LLM apps — see what your prompts, chains, and agents actually did.

13 apps · researched & kept current by Claude Code

Filter & search these 13 apps
  • View LangWatch details
    ObservabilityFREEMIUMOpen core

    LangWatch

    LangWatch

    Open-source LLM observability, evaluation, and agent testing.

    An open-source platform for monitoring, evaluating, and testing LLM and agent applications. LangWatch captures traces, runs evaluations and simulations, and surfaces quality and cost metrics in production. Offered as managed cloud or fully self-hosted for teams with strict data-residency needs.

    Worth knowing

    Amsterdam startup whose founders met at an Antler residency; raised a €1M pre-seed led by Passion Capital in 2025.

    • observability
    • evaluation
    • agent-testing
    • llmops
  • View Pydantic Logfire details
    ObservabilityFREEMIUMOpen core

    Pydantic Logfire

    Pydantic

    Observability for LLM and agent apps, from the Pydantic team.

    An observability platform that traces your whole application stack — LLM calls, agents, databases, and HTTP — not just the model layer. The Python/JS/Rust SDKs are open source and built on OpenTelemetry, while the hosted backend handles storage, querying, and dashboards. Free tier covers 10M spans per month.

    Worth knowing

    Pydantic's first commercial product, launched alongside a $12.5M Sequoia-led Series A in October 2024 to expand beyond the OSS library.

    • observability
    • tracing
    • opentelemetry
    • open-source
  • View Traceloop details
    ObservabilityFREEMIUMOpen core

    Traceloop

    Traceloop

    LLM observability built on OpenTelemetry.

    A reliability platform for LLM apps: its open-source OpenLLMetry SDK instruments LLM, vector-DB, and framework calls as standard OpenTelemetry spans, which Traceloop's hosted dashboard turns into traces, cost/latency analytics, and quality monitoring. Because the data is plain OTel, you can pipe it to existing observability stacks instead of a proprietary one.

    Worth knowing

    A Y Combinator (W23) startup behind OpenLLMetry; acquired by ServiceNow in 2026.

    • observability
    • opentelemetry
    • tracing
    • open-source
    • +1
  • View Lunary details
    ObservabilityFREEMIUMOpen core

    Lunary

    Lunary

    Open-source observability and prompt management for LLM apps.

    An open-source platform for monitoring, debugging, and improving LLM applications and chatbots. Lunary combines request tracing, cost and user analytics, versioned prompt management with A/B testing, plus human-in-the-loop review and automated scoring. Self-host the Apache-2.0 community edition or use the managed cloud, which starts free with a 10k-events monthly tier.

    Worth knowing

    Apache-2.0 and self-hostable, it bundles prompt versioning, A/B tests, and human review alongside tracing — not observability alone.

    • llm-observability
    • prompt-management
    • tracing
    • open-source
  • View W&B Weave details
    ObservabilityFREEMIUMOpen core

    W&B Weave

    Weights & Biases

    Tracing and evaluation for LLM apps, from Weights & Biases.

    An observability and evaluation toolkit for generative-AI applications. A single @weave.op decorator traces every model call — capturing inputs, outputs, latency, token cost, and errors — and the same SDK builds rigorous evaluations using LLM-as-judge and custom scorers. Traces and experiments are organized in the Weights & Biases web platform for side-by-side comparison across prompts and models.

    Worth knowing

    The SDK is Apache-2.0 open source, but the traces it captures land in W&B's hosted platform — free for solo use.

    • llm-observability
    • tracing
    • eval
    • open-source
    • +1
  • View Galileo details
    ObservabilityFREEMIUM

    Galileo

    Galileo

    Evaluation and observability for GenAI apps and agents, with inline guardrails.

    A platform for testing, monitoring, and guardrailing LLM and agent applications. It ships 20+ out-of-the-box evals for RAG, agents, and safety, lets teams author custom evaluators, and turns those offline evals into real-time production guardrails powered by its own Luna eval models.

    Worth knowing

    Raised a $45M Series B led by Scale Venture Partners, with HuggingFace and Postman CEOs joining as angels.

    • evaluation
    • observability
    • guardrails
    • agents
  • View Fiddler AI details
    ObservabilityPAID

    Fiddler AI

    Fiddler AI

    AI observability and security platform for LLM apps, agents, and ML models.

    An enterprise platform to monitor, analyze, and safeguard generative AI and ML in production. The Fiddler Trust Service scores prompts and responses for hallucination, toxicity, PII leakage, and prompt-injection, with low-latency guardrails plus real-time alerting and root-cause analysis. Originally an explainable-AI and model-monitoring pioneer, now spanning LLM and agent observability.

    Worth knowing

    Founder Krishna Gade built Facebook's 'Why am I seeing this?' explainability feature before starting Fiddler.

    • llm-observability
    • monitoring
    • guardrails
    • ml-monitoring
  • View Langtrace details
    ObservabilityFREEMIUMOpen core

    Langtrace

    Scale3 Labs

    Open-source, OpenTelemetry-based observability for LLM apps and agents.

    Langtrace is an open-source observability and evaluation platform for LLM applications, capturing traces, token usage, latency, and cost across popular models, frameworks, and vector databases. Because it emits standard OpenTelemetry spans, traces flow to any OTel-compatible backend, and instrumentation is a two-line SDK install in Python or TypeScript. It ships as a hosted cloud with a free tier plus a self-hostable / on-prem option for data-sensitive teams.

    Worth knowing

    Maker Scale3 Labs contributed the first official OpenAI instrumentation to OpenTelemetry and helped author its GenAI conventions.

    • observability
    • tracing
    • opentelemetry
    • open-source
    • +1
  • View Opik details
    ObservabilityFREEMIUMOpen core

    Opik

    Comet

    Open-source LLM evaluation, tracing, and monitoring.

    Open-source platform from Comet for debugging and evaluating LLM and agent apps: full tracing of calls, tools, and agent steps, LLM-as-a-judge and heuristic evals, prompt management, and production dashboards. Self-host via Docker or Kubernetes, or use Comet's hosted cloud.

    Worth knowing

    Launched in September 2024 by Comet, the established ML experiment-tracking company, extending its platform from training into LLM ops.

    • observability
    • evaluation
    • tracing
    • open-source
  • View Arize Phoenix details
    ObservabilityFREEMIUM

    Arize Phoenix

    Arize AI

    LLM tracing + evaluation. Strong on retrieval debugging.

    Phoenix is Arize's observability platform — run locally in a notebook or as a hosted service. Especially strong for inspecting RAG pipelines, finding bad chunks, and tracking retrieval quality over time.

    Worth knowing

    Licensed under Elastic License 2.0 (source-available), not OSI open-source — despite its open GitHub repo.

    • tracing
    • rag
    • retrieval-debugging
  • View LangSmith details
    ObservabilityFREEMIUM

    LangSmith

    LangChain

    LangChain's hosted observability + eval platform.

    Tracing, dataset management, eval orchestration, and prompt playground from the LangChain team. Pairs naturally if LangChain or LangGraph already runs in your stack, but works standalone via SDKs.

    Worth knowing

    LangChain's primary commercial product and revenue driver behind its 2025 $1.25B unicorn valuation.

    • tracing
    • evals
    • datasets
    • langchain
  • View Helicone details
    ObservabilityFREEMIUMOpen core

    Helicone

    Helicone

    Drop-in LLM proxy with logging, caching, and cost tracking.

    One-line integration — change your OpenAI/Anthropic base URL and get a dashboard with every prompt, response, latency, and dollar tracked. Adds caching and rate-limit handling without code changes.

    Worth knowing

    YC W23 startup acquired by docs platform Mintlify in March 2026, having processed over 14 trillion tokens for 16,000+ orgs.

    • proxy
    • logging
    • caching
    • cost-tracking
  • View Langfuse details
    ObservabilityFREEMIUMOpen core

    Langfuse

    Langfuse

    Open-source LLM observability. Self-hostable, OpenTelemetry-native.

    Tracing, evals, prompt management, and dataset tooling for LLM apps — self-host on your own infra or use Langfuse Cloud. The open-source default when you want full ownership of your observability stack.

    Worth knowing

    Y Combinator W23 startup; acquired by ClickHouse in January 2026.

    • open-source
    • tracing
    • evals
    • self-hosted