Skip to content

VisionAutonomi AI

VLM Run

Unified API gateway that extracts structured JSON from images, video, and documents.

Category
Vision
Pricing
FREEMIUM
Hosting
Cloud
Platforms
APIWeb
Models
Self-contained (on-device)
Verified
Jun 16, 2026

VLM Run is a developer platform for visual AI that returns reliable structured JSON from images, video, and documents through a single API, combining hyper-specialized vision-language models with computer-vision tools for tasks like document parsing, structured OCR, object detection, and segmentation. It offers fine-tuning to specialize models for a domain, dashboards, and flexible deployment. The platform is operated by Autonomi AI.

Pros & cons

  • One API for images, video, and documents
  • Structured JSON output
  • Free starter credits
  • Fine-tuning for specialized extraction
  • Pro tier jumps to $799/mo
  • Small team
  • Less brand recognition than incumbents

Tags

View all Vision
  • View Moondream details
    VisionFREEMIUMOpen core

    Moondream

    M87 Labs

    Tiny open vision-language model for efficient image understanding.

    An open-weights family of small vision-language models for captioning, visual Q&A, pointing, counting, and object detection — small enough to run on-device (checkpoints down to 0.5B on Hugging Face). Run it locally with the Photon engine, or call Moondream Cloud's OpenAI-compatible API with a free monthly credit tier and pay-per-image pricing.

    Worth knowing

    Built by M87 Labs, founded by AWS veterans; raised a $4.5M pre-seed backed by Felicis and GitHub's M12 fund in 2024.

    • vision-language
    • open-weights
    • on-device
    • object-detection
  • View Roboflow details
    VisionFREEMIUM

    Roboflow

    Roboflow

    Vision MLOps end-to-end. Annotate, train, deploy.

    Annotation tooling, auto-labelling, hosted training, and edge deployment for computer-vision projects. Strong default when you're shipping a custom vision model rather than reaching for a multimodal LLM.

    Worth knowing

    Its Roboflow Universe is one of the largest public computer-vision dataset and model hubs; $40M Series B led by GV in 2024.

    • annotation
    • training
    • deployment
    • edge
  • View LandingAI details
    VisionFREEMIUM

    LandingAI

    LandingAI

    Visual prompting + vision agents from Andrew Ng's lab.

    Build vision applications with a labelling-light workflow — point at examples, get a deployable detector. Recently extended into vision agents that reason over images and PDFs without bespoke training.

    Worth knowing

    Founded by Andrew Ng in 2017; raised a $57M Series A in 2021 backed by Intel, Samsung and Insight Partners.

    • visual-prompting
    • agents
    • document-ai
    • no-code
  • View Reka Vision details
    VisionPAID

    Reka Vision

    Reka

    Multimodal platform to search, reason over, and clip large volumes of video.

    Reka Vision is an enterprise multimodal system that indexes large image and video libraries so teams can search by meaning, ask timestamp-aware questions, and auto-generate highlights and clips. It is built by Reka, a frontier multimodal-model lab, and is available via API, an MCP server, or a hosted app. Access is sales-led (request a demo).

    Worth knowing

    Built by Reka, a 2022 lab of ex-DeepMind/Google/Meta researchers; $1B+ valuation in 2025 on a $110M Nvidia/Snowflake round.

    • video-understanding
    • multimodal
    • visual-search
    • video-clipping
    • +1