LlamaParse

Agentic document parsing that turns complex PDFs into AI-ready markdown.

Category: Data Ops
Pricing: FREEMIUM
Source: Proprietary
Hosting: Cloud
Platforms: WebAPI
Models: Model-agnostic
Verified: Jun 13, 2026

LlamaParse is LlamaIndex's managed document-parsing service: it extracts text, tables, charts, and images from PDFs and 90+ other formats into clean markdown for RAG pipelines. It offers layout-aware and multimodal parsing modes and 100+ language support, and anchors the LlamaCloud platform alongside Extract, Classify, Split, and Index.

Capabilities 2

What it actually does — grouped by capability family.

RAG pipeline (secondary capability)

Document parsing (structured) (primary capability)

Pros & cons

Strong on tables, charts, scanned PDFs
90+ formats, 100+ languages
Free tier with 10k credits/month
Tight fit with the LlamaIndex framework

Cloud-only, credit-based costs add up
Best modes cost more credits per page
Core parser is proprietary, not open source

View Reducto details
Data OpsFREEMIUM
Reducto
Reducto
Agentic document parsing and extraction for AI teams, via one API.
A document-intelligence API that parses, splits, extracts, and edits PDFs, images, spreadsheets, and slides into clean, structured output for RAG and AI pipelines. It blends custom in-house models with frontier ones and bills via usage credits, automatically discounting pages it can parse without the heavier pipeline.
Strong on complex/nested table layouts
API-only, no app UI
- document-parsing
- ocr
- extraction
- rag
Open
View Unstructured details
Data OpsFREEMIUMOpen core
Unstructured
Unstructured
ETL for LLMs — turn PDFs, decks, and emails into clean, structured data.
Ingests 64+ file types and partitions, chunks, enriches, and embeds them into LLM-ready output, handling OCR, tables, and document hierarchy. An open-source library plus a low-code platform and API; a staple preprocessing layer for production RAG.
64+ file types ingested
OSS quality trails hosted partition models
- document-etl
- preprocessing
- rag
- open-source
Open
View Docling details
Data OpsFREEOSS
Docling
Docling Project
Toolkit that turns documents into AI-ready Markdown and JSON.
A document-processing toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, and audio into clean Markdown or JSON for LLM and RAG pipelines. It does advanced PDF understanding — page layout, reading order, table structure, and OCR for scans — and ships a hybrid chunker plus native LangChain and LlamaIndex integrations. Small enough to run on a laptop via a Python API or CLI; MIT-licensed and community-governed.
Runs on a laptop via Python API or CLI
Lower accuracy than top hosted parsers
- document-parsing
- rag
- open-source
- pdf
- +1
Open
View Chunkr details
Data OpsFREEMIUMOpen core
Chunkr
Lumina AI
Open-source document intelligence API for RAG-ready data.
A document parsing and intelligence API that turns complex PDFs, slides, Word docs, and images into clean, LLM/RAG-ready chunks. Chunkr runs layout analysis, OCR, reading-order detection, semantic chunking, and schema-based extraction, emitting HTML, Markdown, or JSON. Self-host the open-source pipeline or call the managed cloud API, which includes a free tier of 200 pages with no card required.
Self-host or call the managed API
Accuracy below Reducto on hard layouts
- document-parsing
- ocr
- rag
- open-source
Open

Open LlamaParse

LlamaParse

Capabilities 2

Pros & cons

Tags

Further reading

Reducto

Unstructured

Docling

Chunkr