Skip to content

Data OpsLlamaIndex

LlamaParse

Agentic document parsing that turns complex PDFs into AI-ready markdown.

Category
Data Ops
Pricing
FREEMIUM
Hosting
Cloud
Platforms
WebAPI
Models
Model-agnostic
Verified
Jun 13, 2026

LlamaParse is LlamaIndex's managed document-parsing service: it extracts text, tables, charts, and images from PDFs and 90+ other formats into clean markdown for RAG pipelines. It offers layout-aware and multimodal parsing modes and 100+ language support, and anchors the LlamaCloud platform alongside Extract, Classify, Split, and Index.

Pros & cons

  • Strong on tables, charts, scanned PDFs
  • 90+ formats, 100+ languages
  • Free tier with 10k credits/month
  • Tight fit with the LlamaIndex framework
  • Cloud-only, credit-based costs add up
  • Best modes cost more credits per page
  • Core parser is proprietary, not open source

Tags

Further reading

View all Data Ops
  • View Reducto details
    Data OpsFREEMIUM

    Reducto

    Reducto

    Agentic document parsing and extraction for AI teams, via one API.

    A document-intelligence API that parses, splits, extracts, and edits PDFs, images, spreadsheets, and slides into clean, structured output for RAG and AI pipelines. It blends custom in-house models with frontier ones and bills via usage credits, automatically discounting pages it can parse without the heavier pipeline.

    Worth knowing

    Founded in 2023 by MIT alumni; raised a $24.5M Series A led by Benchmark in 2025, with customers including Harvey, Scale AI and Vanta.

    • document-parsing
    • ocr
    • extraction
    • rag
  • View Unstructured details
    Data OpsFREEMIUMOpen core

    Unstructured

    Unstructured

    ETL for LLMs — turn PDFs, decks, and emails into clean, structured data.

    Ingests 64+ file types and partitions, chunks, enriches, and embeds them into LLM-ready output, handling OCR, tables, and document hierarchy. An open-source library plus a low-code platform and API; a staple preprocessing layer for production RAG.

    Worth knowing

    Raised a $40M Series B in March 2024 led by Menlo Ventures, with Databricks Ventures, IBM Ventures and NVIDIA's NVentures all participating.

    • document-etl
    • preprocessing
    • rag
    • open-source
  • View Docling details
    Data OpsFREEOSS

    Docling

    Docling Project

    Open-source toolkit that turns documents into AI-ready Markdown and JSON.

    A document-processing toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, and audio into clean Markdown or JSON for LLM and RAG pipelines. It does advanced PDF understanding — page layout, reading order, table structure, and OCR for scans — and ships a hybrid chunker plus native LangChain and LlamaIndex integrations. Small enough to run on a laptop via a Python API or CLI; MIT-licensed and community-governed.

    Worth knowing

    Built at IBM Research Zurich and donated to the LF AI & Data Foundation in April 2025.

    • document-parsing
    • rag
    • open-source
    • pdf
    • +1
  • View Chunkr details
    Data OpsFREEMIUMOpen core

    Chunkr

    Lumina AI

    Open-source document intelligence API for RAG-ready data.

    A document parsing and intelligence API that turns complex PDFs, slides, Word docs, and images into clean, LLM/RAG-ready chunks. Chunkr runs layout analysis, OCR, reading-order detection, semantic chunking, and schema-based extraction, emitting HTML, Markdown, or JSON. Self-host the open-source pipeline or call the managed cloud API, which includes a free tier of 200 pages with no card required.

    Worth knowing

    Built by Lumina AI (YC W24), the team behind a scientific-literature search engine; its parser is written in Rust.

    • document-parsing
    • ocr
    • rag
    • open-source