Skip to content

ObservabilityLinux Foundation

MLflow

Open-source platform for the ML and GenAI lifecycle.

Pricing
FREE
Hosting
Self-host
Platforms
WebCLIAPILinuxmacOSWindows
Models
Model-agnostic
Verified
Jun 14, 2026

MLflow is an open-source platform for managing the full machine-learning and GenAI lifecycle — experiment tracking, model registry, deployment, and, more recently, LLM/agent observability. Its GenAI stack adds OpenTelemetry-based tracing, systematic evaluation with built-in metrics and LLM judges, and prompt versioning. Framework- and provider-agnostic, it runs on your own infrastructure with no vendor lock-in.

Pros & cons

  • Fully open source (Apache-2.0), no lock-in
  • Spans tracing, evals, prompts + classic ML
  • OpenTelemetry-based, framework-agnostic
  • Large community + Linux Foundation backing
  • Self-host on your own infrastructure
  • Self-hosting adds operational overhead
  • Broad scope can feel heavy for simple needs
  • Managed convenience needs Databricks or DIY
  • UI less polished than some SaaS rivals

Tags

Further reading

View all Observability
  • View Langfuse details
    ObservabilityFREEMIUMOpen core

    Langfuse

    Langfuse

    Open-source LLM observability. Self-hostable, OpenTelemetry-native.

    Tracing, evals, prompt management, and dataset tooling for LLM apps — self-host on your own infra or use Langfuse Cloud. The open-source default when you want full ownership of your observability stack.

    Worth knowing

    Y Combinator W23 startup; acquired by ClickHouse in January 2026.

    • open-source
    • tracing
    • evals
    • self-hosted
  • View LangSmith details
    ObservabilityFREEMIUM

    LangSmith

    LangChain

    LangChain's hosted observability + eval platform.

    Tracing, dataset management, eval orchestration, and prompt playground from the LangChain team. Pairs naturally if LangChain or LangGraph already runs in your stack, but works standalone via SDKs.

    Worth knowing

    LangChain's primary commercial product and revenue driver behind its 2025 $1.25B unicorn valuation.

    • tracing
    • evals
    • datasets
    • langchain
  • View Arize Phoenix details
    ObservabilityFREEMIUM

    Arize Phoenix

    Arize AI

    LLM tracing + evaluation. Strong on retrieval debugging.

    Phoenix is Arize's observability platform — run locally in a notebook or as a hosted service. Especially strong for inspecting RAG pipelines, finding bad chunks, and tracking retrieval quality over time.

    Worth knowing

    Licensed under Elastic License 2.0 (source-available), not OSI open-source — despite its open GitHub repo.

    • tracing
    • rag
    • retrieval-debugging
  • View W&B Weave details
    ObservabilityFREEMIUMOpen core

    W&B Weave

    Weights & Biases

    Tracing and evaluation for LLM apps, from Weights & Biases.

    An observability and evaluation toolkit for generative-AI applications. A single @weave.op decorator traces every model call — capturing inputs, outputs, latency, token cost, and errors — and the same SDK builds rigorous evaluations using LLM-as-judge and custom scorers. Traces and experiments are organized in the Weights & Biases web platform for side-by-side comparison across prompts and models.

    Worth knowing

    The SDK is Apache-2.0 open source, but the traces it captures land in W&B's hosted platform — free for solo use.

    • llm-observability
    • tracing
    • eval
    • open-source
    • +1