Skip to content

Data OpsMindee

Mindee

AI document-processing API that turns files into structured data.

Category
Data Ops
Pricing
FREEMIUM
Hosting
Cloud
Platforms
WebAPI
Models
Self-contained (on-device)
Verified
Jun 12, 2026

Mindee is a developer-first document-AI platform that converts photos, PDFs, and scans — invoices, receipts, IDs, financial and mail documents — into structured JSON through a single REST API, with no model training required. Beyond extraction it handles document splitting, classification, and cropping, and ships SDKs for Python, Java, PHP, and more. Billing is credit-based per page processed.

Pros & cons

  • Pretrained APIs, no model training
  • Single API call per document
  • SDKs for Python, Java, PHP, more
  • Transparent per-page credit pricing
  • Open-source docTR OCR heritage
  • Hosted API is proprietary
  • Credit costs scale with page volume
  • Custom doc types need a custom model

Tags

View all Data Ops
  • View Nanonets details
    Data OpsFREEMIUM

    Nanonets

    Nanonets

    AI agents for document processing and enterprise data extraction.

    Nanonets automates document-heavy workflows — invoices, orders, contracts, and claims — with AI agents that read, extract, and route structured data across ERPs, email, and approval chains. It runs on its own OCR-3 extraction model and can fold in LLMs for agentic pipelines. Offered as managed cloud with VPC, single-tenant, and on-premises deployment options and regional data residency.

    Worth knowing

    A Y Combinator alum founded in 2017; raised a $29M Series B led by Accel in 2024.

    • document-ai
    • idp
    • ocr
    • extraction
    • +1
  • View Reducto details
    Data OpsFREEMIUM

    Reducto

    Reducto

    Agentic document parsing and extraction for AI teams, via one API.

    A document-intelligence API that parses, splits, extracts, and edits PDFs, images, spreadsheets, and slides into clean, structured output for RAG and AI pipelines. It blends custom in-house models with frontier ones and bills via usage credits, automatically discounting pages it can parse without the heavier pipeline.

    Worth knowing

    Founded in 2023 by MIT alumni; raised a $24.5M Series A led by Benchmark in 2025, with customers including Harvey, Scale AI and Vanta.

    • document-parsing
    • ocr
    • extraction
    • rag
  • View Docling details
    Data OpsFREEOSS

    Docling

    Docling Project

    Open-source toolkit that turns documents into AI-ready Markdown and JSON.

    A document-processing toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, and audio into clean Markdown or JSON for LLM and RAG pipelines. It does advanced PDF understanding — page layout, reading order, table structure, and OCR for scans — and ships a hybrid chunker plus native LangChain and LlamaIndex integrations. Small enough to run on a laptop via a Python API or CLI; MIT-licensed and community-governed.

    Worth knowing

    Built at IBM Research Zurich and donated to the LF AI & Data Foundation in April 2025.

    • document-parsing
    • rag
    • open-source
    • pdf
    • +1
  • View Unstructured details
    Data OpsFREEMIUMOpen core

    Unstructured

    Unstructured

    ETL for LLMs — turn PDFs, decks, and emails into clean, structured data.

    Ingests 64+ file types and partitions, chunks, enriches, and embeds them into LLM-ready output, handling OCR, tables, and document hierarchy. An open-source library plus a low-code platform and API; a staple preprocessing layer for production RAG.

    Worth knowing

    Raised a $40M Series B in March 2024 led by Menlo Ventures, with Databricks Ventures, IBM Ventures and NVIDIA's NVentures all participating.

    • document-etl
    • preprocessing
    • rag
    • open-source