MLflow

Open-source platform for the ML and GenAI lifecycle.

Categories: ObservabilityEval
Pricing: FREE
Source: Open source
Hosting: Self-host
Platforms: WebCLIAPILinuxmacOSWindows
Models: Model-agnostic
Verified: Jun 14, 2026

MLflow is an open-source platform for managing the full machine-learning and GenAI lifecycle — experiment tracking, model registry, deployment, and, more recently, LLM/agent observability. Its GenAI stack adds OpenTelemetry-based tracing, systematic evaluation with built-in metrics and LLM judges, and prompt versioning. Framework- and provider-agnostic, it runs on your own infrastructure with no vendor lock-in.

Capabilities 5

What it actually does — grouped by capability family.

LLM gateway / routing (secondary capability)

LLM observability (primary capability)
LLM evaluation (primary capability)
Prompt management (secondary capability)

App / agent deployment (secondary capability)

Pros & cons

Fully open source, no lock-in
OpenTelemetry-based, framework-agnostic
Built-in metrics and LLM judges
Large community + Linux Foundation backing
Self-host on your own infrastructure

Self-hosting adds operational overhead
Broad scope can feel heavy for simple needs
Managed convenience needs Databricks or DIY
UI less polished than some SaaS rivals

View Langfuse details
ObservabilityFREEMIUMOpen core
Langfuse
Langfuse
Open-source LLM observability. Self-hostable, OpenTelemetry-native.
Tracing, evals, prompt management, and dataset tooling for LLM apps — self-host on your own infra or use Langfuse Cloud. The open-source default when you want full ownership of your observability stack.
Own your observability data
Self-host infra cost at scale
- open-source
- tracing
- evals
- self-hosted
Open
View LangSmith details
ObservabilityFREEMIUM
LangSmith
LangChain
LangChain's hosted observability + eval platform.
Tracing, dataset management, eval orchestration, and prompt playground from the LangChain team. Pairs naturally if LangChain or LangGraph already runs in your stack, but works standalone via SDKs.
Native LangChain/LangGraph tracing
Closed source, cloud-only
- tracing
- evals
- datasets
- langchain
Open
View Arize Phoenix details
ObservabilityFREEMIUM
Arize Phoenix
Arize AI
LLM tracing and evaluation with retrieval debugging.
Phoenix is Arize's observability platform — run locally in a notebook or as a hosted service. Especially strong for inspecting RAG pipelines, finding bad chunks, and tracking retrieval quality over time.
Source-available, runs locally
Less polished than hosted SaaS evals
- tracing
- rag
- retrieval-debugging
Open
View W&B Weave details
ObservabilityFREEMIUMOpen core
W&B Weave
Weights & Biases
Tracing and evaluation for LLM apps, from Weights & Biases.
An observability and evaluation toolkit for generative-AI applications. A single @weave.op decorator traces every model call — capturing inputs, outputs, latency, token cost, and errors — and the same SDK builds rigorous evaluations using LLM-as-judge and custom scorers. Traces and experiments are organized in the Weights & Biases web platform for side-by-side comparison across prompts and models.
Single decorator traces every call
Traces land in W&B hosted platform
- llm-observability
- tracing
- eval
- open-source
- +1
Open

Open MLflow

MLflow

Capabilities 5

Pros & cons

Tags

Further reading

Langfuse

LangSmith

Arize Phoenix

W&B Weave