Mixpeek

Find any scene in your video and multimodal library.

Categories: VisionSearch
Pricing: FREEMIUM
Source: Proprietary
Hosting: Cloud
Platforms: APIWeb
Models: Model-agnostic
Verified: Jun 19, 2026

Mixpeek is a multimodal retrieval API for searching across video, images, audio, and documents with natural language. It extracts and indexes structured features — faces, scenes, transcripts, OCR, and embeddings — over object storage like S3, GCS, and R2, then runs hybrid dense, sparse, and BM25 search with reranking. Cross-modal joins let a single query combine signals such as faces, spoken phrases, and on-screen text.

Pros & cons

Searches video, image, audio, and docs
Cross-modal joins in one query
Hybrid dense/sparse/BM25 retrieval
Indexes directly from object storage
Free vector-store tier

Developer/API-first, not no-code
Core platform is not open source
Smaller than general vector DBs

Mixpeek

TwelveLabs

Coactive AI

Voxel51