Mindee

AI document-processing API that turns files into structured data.

Category: Data Ops
Pricing: FREEMIUM
Source: Proprietary
Hosting: Cloud
Platforms: WebAPI
Models: Self-contained (on-device)
Verified: Jun 12, 2026

Mindee is a developer-first document-AI platform that converts photos, PDFs, and scans — invoices, receipts, IDs, financial and mail documents — into structured JSON through a single REST API, with no model training required. Beyond extraction it handles document splitting, classification, and cropping, and ships SDKs for Python, Java, PHP, and more. Billing is credit-based per page processed.

Capabilities 4

What it actually does — grouped by capability family.

OCR / scanned-document extraction (primary capability)
Document parsing (structured) (primary capability)

Structured extraction (secondary capability)
Text classification (secondary capability)

Pros & cons

Pretrained models for common doc types
Single API call per document
SDKs for Python, Java, PHP, more
Transparent per-page credit pricing
Handles splitting, classification, cropping

Hosted API is proprietary
Credit costs scale with page volume
Custom doc types need a custom model

Tags

View all Data Ops →

View Nanonets details
Data OpsFREEMIUM
Nanonets
Nanonets
AI agents for document processing and enterprise data extraction.
Nanonets automates document-heavy workflows — invoices, orders, contracts, and claims — with AI agents that read, extract, and route structured data across ERPs, email, and approval chains. It runs on its own OCR-3 extraction model and can fold in LLMs for agentic pipelines. Offered as managed cloud with VPC, single-tenant, and on-premises deployment options and regional data residency.
Handles invoices, orders, contracts, claims
Leaderboard claims are vendor-reported
- document-ai
- idp
- ocr
- extraction
- +1
Open
View Reducto details
Data OpsFREEMIUM
Reducto
Reducto
Agentic document parsing and extraction for AI teams, via one API.
A document-intelligence API that parses, splits, extracts, and edits PDFs, images, spreadsheets, and slides into clean, structured output for RAG and AI pipelines. It blends custom in-house models with frontier ones and bills via usage credits, automatically discounting pages it can parse without the heavier pipeline.
Strong on complex/nested table layouts
API-only, no app UI
- document-parsing
- ocr
- extraction
- rag
Open
View Docling details
Data OpsFREEOSS
Docling
Docling Project
Toolkit that turns documents into AI-ready Markdown and JSON.
A document-processing toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, and audio into clean Markdown or JSON for LLM and RAG pipelines. It does advanced PDF understanding — page layout, reading order, table structure, and OCR for scans — and ships a hybrid chunker plus native LangChain and LlamaIndex integrations. Small enough to run on a laptop via a Python API or CLI; MIT-licensed and community-governed.
Runs on a laptop via Python API or CLI
Lower accuracy than top hosted parsers
- document-parsing
- rag
- open-source
- pdf
- +1
Open
View Unstructured details
Data OpsFREEMIUMOpen core
Unstructured
Unstructured
ETL for LLMs — turn PDFs, decks, and emails into clean, structured data.
Ingests 64+ file types and partitions, chunks, enriches, and embeds them into LLM-ready output, handling OCR, tables, and document hierarchy. An open-source library plus a low-code platform and API; a staple preprocessing layer for production RAG.
64+ file types ingested
OSS quality trails hosted partition models
- document-etl
- preprocessing
- rag
- open-source
Open

Open Mindee

Capabilities 4

Pros & cons

Tags

Nanonets

Reducto

Docling

Unstructured