DeepEval
Confident AI
Pytest-style LLM evaluation framework. Open source.
Open-source (Apache 2.0) framework for evaluating LLM apps the way Pytest tests code — assertions backed by 50+ ready metrics spanning LLM-as-judge, RAG, agents, conversation, and safety. Plugs into LangChain, CrewAI, OpenAI Agents and more. Confident AI is the paid cloud platform that adds test management, dashboards, and observability on top.
- eval
- open-source
- llm-as-judge
- rag
- +1