Braintrust
Braintrust
Hosted eval + tracing platform for LLM apps.
Production-grade eval orchestration with a dashboard, dataset versioning, and OpenTelemetry tracing. Useful once eval volume outgrows a CI YAML file.
- eval
- tracing
- datasets
- production
Iris scores AI agent output, catches safety failures, and enforces cost budgets — exposed as an MCP server rather than an SDK. Any MCP-compatible agent discovers its tools (trace logging, output evaluation, rule management, LLM-as-judge) and uses them automatically, with no code changes, so every output flowing through the protocol gets evaluated. It detects PII leaks, prompt injection, hallucinations, and budget anomalies. The core is MIT-licensed and free to self-host; a managed cloud adds dashboards and alerting.
Pros & cons
Tags