VisionAutonomi AI

VLM Run

Unified API gateway that extracts structured JSON from images, video, and documents.

Category: Vision
Pricing: FREEMIUM
Source: Proprietary
Hosting: Cloud
Platforms: APIWeb
Models: Self-contained (on-device)
Verified: Jun 16, 2026

VLM Run is a developer platform for visual AI that returns reliable structured JSON from images, video, and documents through a single API, combining hyper-specialized vision-language models with computer-vision tools for tasks like document parsing, structured OCR, object detection, and segmentation. It offers fine-tuning to specialize models for a domain, dashboards, and flexible deployment. The platform is operated by Autonomi AI.

Capabilities 6

What it actually does — grouped by capability family.

Fine-tuning / training (secondary capability)

OCR / scanned-document extraction (secondary capability)
Document parsing (structured) (secondary capability)

Object detection (secondary capability)
Video understanding (secondary capability)

Structured extraction (primary capability)

Pros & cons

One API for images, video, and documents
Parsing, OCR, detection, and segmentation
Free starter credits
Fine-tuning for specialized extraction

Pro tier jumps to $799/mo
Small team
Less brand recognition than incumbents

Tags

View all Vision →

View Moondream details
VisionFREEMIUMOpen core
Moondream
M87 Labs
Tiny open vision-language model for efficient image understanding.
An open-weights family of small vision-language models for captioning, visual Q&A, pointing, counting, and object detection — small enough to run on-device (checkpoints down to 0.5B on Hugging Face). Run it locally with the Photon engine, or call Moondream Cloud's OpenAI-compatible API with a free monthly credit tier and pay-per-image pricing.
Open-weights, free to self-host
Small models trail frontier VLMs on hard tasks
- vision-language
- open-weights
- on-device
- object-detection
Open
View Roboflow details
VisionFREEMIUM
Roboflow
Roboflow
Vision MLOps end-to-end. Annotate, train, deploy.
Annotation tooling, auto-labelling, hosted training, and edge deployment for computer-vision projects. Strong default when you're shipping a custom vision model rather than reaching for a multimodal LLM.
End-to-end vision MLOps
Free tier caps usage and privacy
- annotation
- training
- deployment
- edge
Open
View LandingAI details
VisionFREEMIUM
LandingAI
LandingAI
Build vision detectors and agents from a few labeled examples.
Build vision applications with a labelling-light workflow — point at examples, get a deployable detector. Recently extended into vision agents that reason over images and PDFs without bespoke training.
Fast path to a deployable detector
Less control than custom model training
- visual-prompting
- agents
- document-ai
- no-code
Open
View Reka Vision details
VisionPAID
Reka Vision
Reka
Multimodal platform to search, reason over, and clip large volumes of video.
Reka Vision is an enterprise multimodal system that indexes large image and video libraries so teams can search by meaning, ask timestamp-aware questions, and auto-generate highlights and clips. It is built by Reka, a frontier multimodal-model lab, and is available via API, an MCP server, or a hosted app. Access is sales-led (request a demo).
Natural-language search over video archives
Sales-led, demo-gated access
- video-understanding
- multimodal
- visual-search
- video-clipping
- +1
Open

Open VLM Run

Capabilities 6

Pros & cons

Tags

Moondream

Roboflow

LandingAI

Reka Vision