Datalab
Datalab
High-accuracy document parsing — PDFs and images to markdown, JSON, and HTML.
Datalab turns PDFs, images, and office documents into clean markdown, JSON, and HTML with layout, table, math, and code preservation. It is the commercial, hosted layer over the open-source Marker converter and Surya OCR toolkit, offered as a pay-as-you-go API with a free monthly allowance, while the underlying models stay free to self-host for research and small startups.
- document-parsing
- ocr
- pdf-to-markdown
- rag
- +1