Iris vs Promptfoo

A side-by-side comparison of Iris and Promptfoo, two Eval tools, drawn from Ignaite's continuously-verified listings.

Compared from listings verified as of 2026-06-20

Iris

Eval

MCP-native eval and observability server for AI agents.

Promptfoo

Eval

LLM eval CLI with rubric scoring and golden sets.

At a glance

Feature comparison of Iris and Promptfoo
Attribute	Iris	Promptfoo
Category	Eval	Eval
Pricing (differs)	FREEMIUM	FREE
License (differs)	Open core	Open source
Deployment (differs)	Hybrid	—
Platforms (differs)	API	CLI, macOS, Windows, Linux
Model support	BYO key / model	BYO key / model
Vendor (differs)	Iris	Promptfoo

The honest brief

Iris

MCP-native: every output through the protocol is scored automatically with no SDK or instrumentation, rather than wiring evals into your code.

No SDK or instrumentation to add
Free self-host, free cloud tier
Trace logging and LLM-as-judge scoring
PII, injection, and cost checks

Newer, niche MCP-focused tool
Best fit for MCP-based agents
Smaller ecosystem than SDK evals

Promptfoo

Define evals in plain YAML and run one goldset across models in CI — a prompt regression fails the build like any other test.

YAML-driven, version-controllable evals
Runs in CI, model-agnostic
Goldsets and rubric scoring
Also does red-teaming/security scans

CLI-first, less of a hosted UI
Teams may want managed dashboards
Config sprawl on large eval suites

Iris details Promptfoo details All Eval apps