EvalSnowflake

TruLens

Open-source evaluation and tracing for LLM and agent apps.

Categories: EvalObservability
Pricing: FREE
Source: Open source
Platforms: CLIAPI
Models: BYO key / model
Verified: Jun 13, 2026

TruLens is an open-source Python library for evaluating and tracing LLM, RAG, and agent applications. You wrap your app with feedback functions that score outputs on metrics like groundedness, context relevance, and answer relevance, then trace runs and compare versions on a metrics leaderboard. It integrates OpenTelemetry tracing and runs locally with a built-in dashboard.

Capabilities 2

What it actually does — grouped by capability family.

LLM evaluation (primary capability)
LLM observability (secondary capability)

Pros & cons

OpenTelemetry tracing, runs locally
RAG Triad feedback functions built in
Provider-agnostic LLM-as-judge metrics
Leaderboard to compare app versions

Python library, no hosted SaaS
Smaller community than LangSmith/Langfuse
Setup to wire feedback providers

View Ragas details
EvalFREEOSS
Ragas
Exploding Gradients
Evaluation toolkit for RAG and LLM applications.
Open-source (Apache-2.0) Python framework for evaluating retrieval-augmented generation and LLM apps. Provides reference-free metrics — faithfulness, answer relevancy, context precision/recall — plus knowledge-graph-based synthetic test generation. Integrates with LangChain, LlamaIndex, and CI pipelines.
Faithfulness & relevancy metrics
LLM-judge scores add cost/variance
- eval
- rag
- llm-as-judge
- open-source
- +1
Open
View DeepEval details
EvalFREEMIUMOpen core
DeepEval
Confident AI
Pytest-style framework for evaluating LLM apps in CI.
Open-source (Apache 2.0) framework for evaluating LLM apps the way Pytest tests code — assertions backed by 50+ ready metrics spanning LLM-as-judge, RAG, agents, conversation, and safety. Plugs into LangChain, CrewAI, OpenAI Agents and more. Confident AI is the paid cloud platform that adds test management, dashboards, and observability on top.
Assertions run in your CI pipeline
LLM-as-judge adds cost
- eval
- open-source
- llm-as-judge
- rag
- +1
Open
View LangSmith details
ObservabilityFREEMIUM
LangSmith
LangChain
LangChain's hosted observability + eval platform.
Tracing, dataset management, eval orchestration, and prompt playground from the LangChain team. Pairs naturally if LangChain or LangGraph already runs in your stack, but works standalone via SDKs.
Native LangChain/LangGraph tracing
Closed source, cloud-only
- tracing
- evals
- datasets
- langchain
Open
View Arize Phoenix details
ObservabilityFREEMIUM
Arize Phoenix
Arize AI
LLM tracing and evaluation with retrieval debugging.
Phoenix is Arize's observability platform — run locally in a notebook or as a hosted service. Especially strong for inspecting RAG pipelines, finding bad chunks, and tracking retrieval quality over time.
Source-available, runs locally
Less polished than hosted SaaS evals
- tracing
- rag
- retrieval-debugging
Open

Open TruLens

TruLens

Capabilities 2

Pros & cons

Tags

Further reading

Ragas

DeepEval

LangSmith

Arize Phoenix