EvalBraintrust

Braintrust

Hosted eval + tracing platform for LLM apps.

Category: Eval
Pricing: FREEMIUM
Source: Proprietary
Hosting: Cloud
Platforms: WebAPI
Models: BYO key / model
Verified: Jun 7, 2026

Production-grade eval orchestration with a dashboard, dataset versioning, and OpenTelemetry tracing. Useful once eval volume outgrows a CI YAML file.

Capabilities 3

What it actually does — grouped by capability family.

LLM evaluation (primary capability)
LLM observability (secondary capability)
Prompt management (secondary capability)

Pros & cons

Eval workflow as the primary interface
CI scorers block merges on regression
Dataset versioning + OTel tracing
Generous free tier

Closed-source SaaS
Self-hosting needs Enterprise contract
Overkill for tiny single-file eval needs

View Langfuse details
ObservabilityFREEMIUMOpen core
Langfuse
Langfuse
Open-source LLM observability. Self-hostable, OpenTelemetry-native.
Tracing, evals, prompt management, and dataset tooling for LLM apps — self-host on your own infra or use Langfuse Cloud. The open-source default when you want full ownership of your observability stack.
Own your observability data
Self-host infra cost at scale
- open-source
- tracing
- evals
- self-hosted
Open
View LangSmith details
ObservabilityFREEMIUM
LangSmith
LangChain
LangChain's hosted observability + eval platform.
Tracing, dataset management, eval orchestration, and prompt playground from the LangChain team. Pairs naturally if LangChain or LangGraph already runs in your stack, but works standalone via SDKs.
Native LangChain/LangGraph tracing
Closed source, cloud-only
- tracing
- evals
- datasets
- langchain
Open
View Vellum details
EvalFREEMIUM
Vellum
Vellum
Build, evaluate, and deploy production LLM apps and agents.
An end-to-end development platform for building, testing, and shipping LLM applications and agents. Vellum pairs a visual drag-and-drop workflow builder with a Python SDK, and bundles prompt versioning, RAG, evaluation, and production monitoring in one place so technical and non-technical teammates can collaborate. Built-in eval and test suites let teams measure quality before and after deploy. A free tier is available; paid Pro and Enterprise plans add seats and scale.
Visual builder plus Python SDK
Cloud-only platform
- llmops
- evaluation
- prompt-engineering
- workflows
- +1
Open
View Promptfoo details
EvalFREEOSS
Promptfoo
Promptfoo
LLM eval CLI with rubric scoring and golden sets.
YAML-driven eval harness. Pair a prompt with a goldset, define rubrics, run across multiple models in CI. Strong for catching prompt regressions before they hit production.
YAML-driven, version-controllable evals
CLI-first, less of a hosted UI
- eval
- ci
- rubric
- open-source
Open

Open Braintrust

Braintrust

Capabilities 3

Pros & cons

Tags

Further reading

Langfuse

LangSmith

Vellum

Promptfoo