Skip to content

Vision AI apps

Computer-vision platforms and APIs — detection, OCR, visual search, and multimodal understanding.

11 apps · researched & kept current by Claude Code

Filter & search these 11 apps
  • View Supervisely details
    VisionFREEMIUM

    Supervisely

    Supervisely

    All-in-one computer vision platform to curate, label, and train models.

    A unified computer vision platform covering data curation, annotation, model training, and deployment across images, video, 3D point clouds, and medical imagery. AI-assisted labeling, experiment tracking, and a large catalog of installable apps make it customizable for most CV workflows. Free for researchers and small teams; Pro and self-hostable Enterprise editions for companies.

    Worth knowing

    Grew out of Deep Systems, a deep-learning consultancy its founders built in 2013, before launching as a product in 2017.

    • computer-vision
    • data-annotation
    • labeling
    • model-training
    • +1
  • View Mathpix details
    VisionFREEMIUM

    Mathpix

    Mathpix

    OCR and document conversion built for math, science, and STEM.

    OCR and document-conversion tooling specialized for STEM content. Mathpix reads printed and handwritten math, chemistry, tables, and text from images and PDFs, exporting to LaTeX, DOCX, Markdown, Excel, ChemDraw, and more. It ships as the Snip app (web, mobile, desktop, browser extension) for individuals and teams, plus a Convert API for developers building solving, tutoring, and grading products.

    Worth knowing

    Founded in 2016 by Stanford PhD student Nico Jimenez, starting as a tool to convert handwritten math to LaTeX.

    • ocr
    • document-conversion
    • latex
    • stem
    • +1
  • View Encord details
    VisionPAID

    Encord

    Encord

    Data platform to curate, label, and manage AI training data.

    An enterprise data development platform for preparing high-quality training data across images, video, documents, audio, DICOM, and 3D point clouds. It pairs AI-assisted labeling (SAM auto-segmentation, object tracking) with data curation, model evaluation, and workflow tooling, plus LLM-powered data agents for document tasks. Used heavily in medical imaging, robotics, and other physical-AI domains.

    Worth knowing

    YC W21 company founded by two ex-high-frequency traders; raised a $30M Series B led by Next47 in 2024.

    • data-annotation
    • training-data
    • computer-vision
    • medical-imaging
    • +1
  • View Reka Vision details
    VisionPAID

    Reka Vision

    Reka

    Multimodal platform to search, reason over, and clip large volumes of video.

    Reka Vision is an enterprise multimodal system that indexes large image and video libraries so teams can search by meaning, ask timestamp-aware questions, and auto-generate highlights and clips. It is built by Reka, a frontier multimodal-model lab, and is available via API, an MCP server, or a hosted app. Access is sales-led (request a demo).

    Worth knowing

    Built by Reka, a 2022 lab of ex-DeepMind/Google/Meta researchers; $1B+ valuation in 2025 on a $110M Nvidia/Snowflake round.

    • video-understanding
    • multimodal
    • visual-search
    • video-clipping
    • +1
  • View Ultralytics YOLO details
    VisionFREEMIUMOpen core

    Ultralytics YOLO

    Ultralytics

    State-of-the-art YOLO models for real-time object detection and vision.

    The open-source PyTorch framework behind the YOLO (You Only Look Once) family of vision models. One unified API covers object detection, instance and semantic segmentation, image classification, pose estimation, and oriented bounding boxes, with both a CLI and a Python interface. The 2026 flagship, YOLO26, is an end-to-end, NMS-free architecture tuned for edge and low-power deployment.

    Worth knowing

    AGPL-3.0 licensed: products that embed it must open-source their own code or buy an Ultralytics enterprise license.

    • object-detection
    • segmentation
    • yolo
    • open-source
    • +1
  • View Dataloop details
    VisionPAID

    Dataloop

    Dataloop

    Enterprise data engine for labeling and managing unstructured AI data.

    An AI-ready data platform that manages, labels, and orchestrates unstructured data — images, video, LiDAR, audio, and text — across the model lifecycle. It pairs data management and human-in-the-loop annotation with a serverless pipeline layer for pre/post-processing, RLHF, and RAG, plus a model-and-app marketplace. Originally focused on computer-vision production pipelines.

    Worth knowing

    The Israeli startup (founded 2017) was acquired by Dell in a ~$120M all-cash deal in late 2025.

    • data-labeling
    • computer-vision
    • annotation
    • mlops
  • View Moondream details
    VisionFREEMIUMOpen core

    Moondream

    M87 Labs

    Tiny open vision-language model for efficient image understanding.

    An open-weights family of small vision-language models for captioning, visual Q&A, pointing, counting, and object detection — small enough to run on-device (checkpoints down to 0.5B on Hugging Face). Run it locally with the Photon engine, or call Moondream Cloud's OpenAI-compatible API with a free monthly credit tier and pay-per-image pricing.

    Worth knowing

    Built by M87 Labs, founded by AWS veterans; raised a $4.5M pre-seed backed by Felicis and GitHub's M12 fund in 2024.

    • vision-language
    • open-weights
    • on-device
    • object-detection
  • View TwelveLabs details
    VisionFREEMIUM

    TwelveLabs

    TwelveLabs

    Video intelligence API: search, classify, and summarize video.

    Video understanding platform built on its own multimodal foundation models — Marengo for embeddings and semantic search, Pegasus for generative tasks like summaries and captions. Developers index video once and run natural-language search, classification, and analysis via API. Free tier with usage-based pricing beyond it.

    Worth knowing

    Its five Korean co-founders met in military cyber-ops; Nvidia made its first direct investment in a Korean AI startup here.

    • video-understanding
    • search
    • multimodal
    • embeddings
    • +1
  • View LandingAI details
    VisionFREEMIUM

    LandingAI

    LandingAI

    Visual prompting + vision agents from Andrew Ng's lab.

    Build vision applications with a labelling-light workflow — point at examples, get a deployable detector. Recently extended into vision agents that reason over images and PDFs without bespoke training.

    Worth knowing

    Founded by Andrew Ng in 2017; raised a $57M Series A in 2021 backed by Intel, Samsung and Insight Partners.

    • visual-prompting
    • agents
    • document-ai
    • no-code
  • View Roboflow details
    VisionFREEMIUM

    Roboflow

    Roboflow

    Vision MLOps end-to-end. Annotate, train, deploy.

    Annotation tooling, auto-labelling, hosted training, and edge deployment for computer-vision projects. Strong default when you're shipping a custom vision model rather than reaching for a multimodal LLM.

    Worth knowing

    Its Roboflow Universe is one of the largest public computer-vision dataset and model hubs; $40M Series B led by GV in 2024.

    • annotation
    • training
    • deployment
    • edge
  • View Voxel51 details
    VisionFREEMIUMOpen core

    Voxel51

    Voxel51

    FiftyOne — open-source vision data platform.

    Open-source toolkit for exploring, debugging, and curating vision datasets. Strong story for finding model failure modes, balancing classes, and tracking experiment drift across visual data at scale.

    Worth knowing

    Spun out of the University of Michigan in 2016 by robotics prof Jason Corso and PhD student Brian Moore; Bessemer-led $30M Series B.

    • open-source
    • datasets
    • evaluation
    • python