Skip to content

Data OpsCocoIndex

CocoIndex

Incremental data framework for fresh AI context.

Categories
Data OpsVector DB
Pricing
FREE
Platforms
API
Models
BYO key / model
Verified
Jun 19, 2026

CocoIndex is an open-source data transformation framework that keeps AI agents and LLM apps supplied with continuously fresh, structured context. It turns sources like codebases, PDFs, databases, and Slack into vector or graph stores, and reprocesses only what changed (delta-only) with parallel execution by default. A Rust core drives reliability while pipelines are defined declaratively in Python, with end-to-end lineage and an observability UI called CocoInsight.

Pros & cons

  • Apache-2.0 with a Rust core
  • Incremental, delta-only processing
  • Declarative Python pipelines
  • End-to-end lineage + CocoInsight UI
  • Younger, smaller ecosystem
  • Python-centric authoring
  • Bring your own model/embedding cost

Tags

View all Data Ops
  • View Docling details
    Data OpsFREEOSS

    Docling

    Docling Project

    Open-source toolkit that turns documents into AI-ready Markdown and JSON.

    A document-processing toolkit that converts PDF, DOCX, PPTX, XLSX, HTML, images, and audio into clean Markdown or JSON for LLM and RAG pipelines. It does advanced PDF understanding — page layout, reading order, table structure, and OCR for scans — and ships a hybrid chunker plus native LangChain and LlamaIndex integrations. Small enough to run on a laptop via a Python API or CLI; MIT-licensed and community-governed.

    Fully open-source and self-hostable
    Lower accuracy than top hosted parsers
    • document-parsing
    • rag
    • open-source
    • pdf
    • +1
  • View Unstructured details
    Data OpsFREEMIUMOpen core

    Unstructured

    Unstructured

    ETL for LLMs — turn PDFs, decks, and emails into clean, structured data.

    Ingests 64+ file types and partitions, chunks, enriches, and embeds them into LLM-ready output, handling OCR, tables, and document hierarchy. An open-source library plus a low-code platform and API; a staple preprocessing layer for production RAG.

    64+ file types ingested
    OSS quality trails hosted partition models
    • document-etl
    • preprocessing
    • rag
    • open-source
  • View LlamaIndex details
    OrchestrationFREEMIUMOpen core

    LlamaIndex

    LlamaIndex

    The data framework for LLM apps — RAG, agents, and document workflows.

    An open-source framework (Python + TypeScript) for connecting LLMs to your data — ingestion, indexing, retrieval, and agentic document workflows. Pairs with the managed LlamaCloud (LlamaParse) for production parsing and extraction. The most-used RAG framework after LangChain.

    Best-in-class RAG primitives
    Narrower than full orchestration frameworks
    • framework
    • rag
    • agents
    • open-source