Datalab

High-accuracy document parsing — PDFs and images to markdown, JSON, and HTML.

Categories: Data OpsVision
Pricing: FREEMIUM
Source: Open core
Hosting: Hybrid
Platforms: APICLI
Models: Self-contained (on-device)
Verified: Jun 20, 2026

Datalab turns PDFs, images, and office documents into clean markdown, JSON, and HTML with layout, table, math, and code preservation. It is the commercial, hosted layer over the open-source Marker converter and Surya OCR toolkit, offered as a pay-as-you-go API with a free monthly allowance, while the underlying models stay free to self-host for research and small startups.

Pros & cons

Open-source core (Marker + Surya)
Self-host free for research/small startups
Preserves tables, math, and code
90+ language OCR

Hosted API metered per page
Self-hosting needs GPU for throughput
Best results may need an LLM pass

Datalab

Docling

Reducto

Mindee

Unstructured