Skip to content

EvalIris

Iris

MCP-native eval and observability server for AI agents.

Categories
EvalObservabilityMCP
Pricing
FREEMIUM
Source
Open core
Hosting
Hybrid
Platforms
API
Models
BYO key / model
Verified
Jun 20, 2026

Iris scores AI agent output, catches safety failures, and enforces cost budgets — exposed as an MCP server rather than an SDK. Any MCP-compatible agent discovers its tools (trace logging, output evaluation, rule management, LLM-as-judge) and uses them automatically, with no code changes, so every output flowing through the protocol gets evaluated. It detects PII leaks, prompt injection, hallucinations, and budget anomalies. The core is MIT-licensed and free to self-host; a managed cloud adds dashboards and alerting.

Pros & cons

  • MIT-licensed open-source core
  • No-code, MCP-native integration
  • Free self-host, free cloud tier
  • PII, injection, and cost checks
  • Newer, niche MCP-focused tool
  • Best fit for MCP-based agents
  • Smaller ecosystem than SDK evals

Tags

View all Eval
  • View Braintrust details
    EvalFREEMIUM

    Braintrust

    Braintrust

    Hosted eval + tracing platform for LLM apps.

    Production-grade eval orchestration with a dashboard, dataset versioning, and OpenTelemetry tracing. Useful once eval volume outgrows a CI YAML file.

    Eval workflow as the primary interface
    Closed-source SaaS
    • eval
    • tracing
    • datasets
    • production
  • View DeepEval details
    EvalFREEMIUMOpen core

    DeepEval

    Confident AI

    Pytest-style LLM evaluation framework. Open source.

    Open-source (Apache 2.0) framework for evaluating LLM apps the way Pytest tests code — assertions backed by 50+ ready metrics spanning LLM-as-judge, RAG, agents, conversation, and safety. Plugs into LangChain, CrewAI, OpenAI Agents and more. Confident AI is the paid cloud platform that adds test management, dashboards, and observability on top.

    Pytest-style, CI-friendly
    LLM-as-judge adds cost
    • eval
    • open-source
    • llm-as-judge
    • rag
    • +1
  • View Promptfoo details
    EvalFREEOSS

    Promptfoo

    Promptfoo

    Open-source LLM eval CLI. Rubric scoring + golden sets.

    YAML-driven eval harness. Pair a prompt with a goldset, define rubrics, run across multiple models in CI. Strong for catching prompt regressions before they hit production.

    Open source, YAML-driven evals
    CLI-first, less of a hosted UI
    • eval
    • ci
    • rubric
    • open-source