Unstract

Turn unstructured documents into structured data.

Categories: Data OpsVision
Pricing: FREEMIUM
Source: Open core
Hosting: Hybrid
Platforms: WebAPI
Models: BYO key / model
Verified: Jun 21, 2026

An agentic document-processing platform that extracts clean, structured JSON from PDFs, scans, and other complex documents using LLMs. Its Prompt Studio gives a no-code IDE to author and test extraction prompts per field, which you then deploy as APIs or ETL pipelines into your warehouse. Built by Zipstack, Unstract is open source under AGPL-3.0 and self-hostable via Docker Compose, with a managed cloud that adds SSO, human-in-the-loop review, and compliance certifications (SOC 2, HIPAA, ISO 27001, GDPR).

Pros & cons

Open-source (AGPL-3.0), self-hostable
Prompt Studio: no-code extraction IDE
Deploy extractions as APIs or ETL
Cloud adds SOC 2 / HIPAA / HITL review

AGPL-3.0 may deter some commercial use
Self-host setup is involved
LLM costs scale with document volume

Unstract

Reducto

Chunkr

LlamaParse

Extend