Skip to content

EvalSnowflake

TruLens

Open-source evaluation and tracing for LLM and agent apps.

Category
Eval
Pricing
FREE
Platforms
CLIAPI
Models
BYO key / model
Verified
Jun 13, 2026

TruLens is an open-source Python library for evaluating and tracing LLM, RAG, and agent applications. You wrap your app with feedback functions that score outputs on metrics like groundedness, context relevance, and answer relevance, then trace runs and compare versions on a metrics leaderboard. It integrates OpenTelemetry tracing and runs locally with a built-in dashboard.

Pros & cons

  • Open source and free, OpenTelemetry tracing
  • RAG Triad feedback functions built in
  • Provider-agnostic LLM-as-judge metrics
  • Leaderboard to compare app versions
  • Python library, no hosted SaaS
  • Smaller community than LangSmith/Langfuse
  • Setup to wire feedback providers

Tags

Further reading

View all Eval
  • View Ragas details
    EvalFREEOSS

    Ragas

    Exploding Gradients

    Open-source evaluation toolkit for RAG and LLM applications.

    Open-source (Apache-2.0) Python framework for evaluating retrieval-augmented generation and LLM apps. Provides reference-free metrics — faithfulness, answer relevancy, context precision/recall — plus knowledge-graph-based synthetic test generation. Integrates with LangChain, LlamaIndex, and CI pipelines.

    Worth knowing

    Began as a 2023 research paper (EACL 2024) and a Y Combinator W24 startup before becoming the default open-source RAG eval standard.

    • eval
    • rag
    • llm-as-judge
    • open-source
    • +1
  • View DeepEval details
    EvalFREEMIUMOpen core

    DeepEval

    Confident AI

    Pytest-style LLM evaluation framework. Open source.

    Open-source (Apache 2.0) framework for evaluating LLM apps the way Pytest tests code — assertions backed by 50+ ready metrics spanning LLM-as-judge, RAG, agents, conversation, and safety. Plugs into LangChain, CrewAI, OpenAI Agents and more. Confident AI is the paid cloud platform that adds test management, dashboards, and observability on top.

    Worth knowing

    Built by Confident AI (YC W25, founded 2024); its open-source framework runs ~2M evaluations a day.

    • eval
    • open-source
    • llm-as-judge
    • rag
    • +1
  • View LangSmith details
    ObservabilityFREEMIUM

    LangSmith

    LangChain

    LangChain's hosted observability + eval platform.

    Tracing, dataset management, eval orchestration, and prompt playground from the LangChain team. Pairs naturally if LangChain or LangGraph already runs in your stack, but works standalone via SDKs.

    Worth knowing

    LangChain's primary commercial product and revenue driver behind its 2025 $1.25B unicorn valuation.

    • tracing
    • evals
    • datasets
    • langchain
  • View Arize Phoenix details
    ObservabilityFREEMIUM

    Arize Phoenix

    Arize AI

    LLM tracing + evaluation. Strong on retrieval debugging.

    Phoenix is Arize's observability platform — run locally in a notebook or as a hosted service. Especially strong for inspecting RAG pipelines, finding bad chunks, and tracking retrieval quality over time.

    Worth knowing

    Licensed under Elastic License 2.0 (source-available), not OSI open-source — despite its open GitHub repo.

    • tracing
    • rag
    • retrieval-debugging