Skip to content

VisionMove AI

Move AI

Markerless 3D motion capture from ordinary video — even a single iPhone.

Categories
Vision3D
Pricing
FREEMIUM
Hosting
Cloud
Platforms
WebiOSAPI
Models
Self-contained (on-device)
Verified
Jun 15, 2026

Markerless motion-capture technology that turns 2D video into broadcast-quality 3D animation data using computer vision, biomechanics, and physics. The Move One app captures motion from a single iPhone, while multi-camera setups serve studio production; output exports to FBX and USD for game engines and animation pipelines. Used by studios including Ubisoft, Sony, and Disney.

Pros & cons

  • Markerless capture from a single iPhone
  • Multi-camera option for studio quality
  • Exports to FBX and USD
  • Used by major studios
  • Cloud processing; credit-based pricing
  • Single-camera accuracy trails multi-camera
  • Capture length capped on lower tiers

Tags

Further reading

View all Vision
  • View Encord details
    VisionPAID

    Encord

    Encord

    Data platform to curate, label, and manage AI training data.

    An enterprise data development platform for preparing high-quality training data across images, video, documents, audio, DICOM, and 3D point clouds. It pairs AI-assisted labeling (SAM auto-segmentation, object tracking) with data curation, model evaluation, and workflow tooling, plus LLM-powered data agents for document tasks. Used heavily in medical imaging, robotics, and other physical-AI domains.

    Worth knowing

    YC W21 company founded by two ex-high-frequency traders; raised a $30M Series B led by Next47 in 2024.

    • data-annotation
    • training-data
    • computer-vision
    • medical-imaging
    • +1
  • View Groundlight details
    VisionFREEMIUM

    Groundlight

    Groundlight AI

    Build reliable computer vision by asking plain-English questions about images.

    Groundlight lets developers create visual detectors by describing what to look for in natural language, with no training dataset required. Its system pairs ML models with built-in 24/7 human labeling, so applications return reliable answers from day one and the models improve automatically over time. It ships a Python SDK and REST API, supports edge inference on hardware like Raspberry Pi, and powers monitoring, industrial inspection, and robotics use cases.

    Worth knowing

    Founded in 2019 by ex-AWS SageMaker principal engineer Leo Dirac; named to CB Insights' AI 100 in 2024.

    • computer-vision
    • edge-ai
    • human-in-the-loop
    • no-code
  • View Coactive AI details
    VisionPAID

    Coactive AI

    Coactive AI

    Multimodal platform that makes images and video searchable and structured.

    Coactive AI is an enterprise multimodal application platform that pulls context directly from the pixels and audio in images and video — no manual tagging or metadata required. Teams use it to semantically search, label, govern, and structure large visual libraries at scale, turning unstructured media into queryable data. It is aimed at media, retail, and other enterprises with vast image and video archives.

    Worth knowing

    Raised a $30M Series B at a roughly $200M valuation, co-led by Cherryrock Capital and Emerson Collective.

    • multimodal
    • visual-search
    • video-understanding
    • data-labeling
    • +1
  • View Lightly details
    VisionPAID

    Lightly

    Lightly

    Computer-vision data curation, labeling, and model pretraining.

    Lightly is a computer-vision data platform that helps teams curate the most informative samples from large image and video datasets using embeddings, active learning, and near-duplicate detection. Its suite spans LightlyStudio (curation and labeling), LightlyTrain (self-supervised pretraining and fine-tuning of vision models), and LightlyEdge (smart data selection on devices). The aim is to cut labeling cost by training on the data that actually improves models.

    Worth knowing

    Grew out of LightlySSL, the team's open-source self-supervised-learning framework for computer vision.

    • computer-vision
    • data-curation
    • active-learning
    • self-supervised
  • View Hive details
    VisionPAID

    Hive

    Hive

    Cloud AI APIs for content moderation, search, and generation.

    Hive offers pre-trained deep-learning models delivered as cloud APIs for understanding, moderating, and generating visual, text, and audio content. Its core business is automated content moderation — flagging unsafe imagery, video, text, and audio at platform scale — alongside logo and object detection, AI-generated-content and deepfake detection, and reverse image search. The San Francisco company powers trust-and-safety pipelines for platforms including Reddit, Bluesky, and Midjourney. Models integrate with a few lines of code and run as managed or on-premise deployments.

    Worth knowing

    In 2024 Hive won the Pentagon Defense Innovation Unit's first deepfake-detection contract, a $2.4M two-year deal.

    • content-moderation
    • computer-vision
    • deepfake-detection
    • moderation-api
  • View Clarifai details
    VisionFREEMIUM

    Clarifai

    Clarifai

    Full-stack AI platform for computer vision and LLMs.

    Clarifai is a full-stack AI platform for building with unstructured image, video, text, and audio data. It pairs production computer-vision models — classification, detection, visual search — with a model hub for LLMs, plus data labeling, training of custom models, and inference, all behind one API and console. A free Community tier lets you discover and run models before moving to paid usage plans.

    Worth knowing

    Founded in 2013 by Matthew Zeiler, whose deep-learning model won that year's ImageNet (ILSVRC) image-classification challenge.

    • computer-vision
    • model-hub
    • data-labeling
    • inference
    • +1
  • View CVAT details
    VisionFREEMIUMOpen core

    CVAT

    CVAT.ai

    Open-source annotation platform for vision AI datasets.

    Data-labeling suite for images, video, and 3D: bounding boxes, polygons, segmentation, keypoints, and object tracking, with AI-assisted labeling via SAM and custom models through its API and SDK. Ships as the MIT-licensed Community edition to self-host, the hosted CVAT Online with free and paid plans, or a self-hosted Enterprise tier.

    Worth knowing

    Built inside Intel in 2017 atop the VATIC video tool and open-sourced in 2018; the team spun out as CVAT.ai in 2022.

    • annotation
    • labeling
    • computer-vision
    • datasets
  • View Supervisely details
    VisionFREEMIUM

    Supervisely

    Supervisely

    All-in-one computer vision platform to curate, label, and train models.

    A unified computer vision platform covering data curation, annotation, model training, and deployment across images, video, 3D point clouds, and medical imagery. AI-assisted labeling, experiment tracking, and a large catalog of installable apps make it customizable for most CV workflows. Free for researchers and small teams; Pro and self-hostable Enterprise editions for companies.

    Worth knowing

    Grew out of Deep Systems, a deep-learning consultancy its founders built in 2013, before launching as a product in 2017.

    • computer-vision
    • data-annotation
    • labeling
    • model-training
    • +1
  • View Mathpix details
    VisionFREEMIUM

    Mathpix

    Mathpix

    OCR and document conversion built for math, science, and STEM.

    OCR and document-conversion tooling specialized for STEM content. Mathpix reads printed and handwritten math, chemistry, tables, and text from images and PDFs, exporting to LaTeX, DOCX, Markdown, Excel, ChemDraw, and more. It ships as the Snip app (web, mobile, desktop, browser extension) for individuals and teams, plus a Convert API for developers building solving, tutoring, and grading products.

    Worth knowing

    Founded in 2016 by Stanford PhD student Nico Jimenez, starting as a tool to convert handwritten math to LaTeX.

    • ocr
    • document-conversion
    • latex
    • stem
    • +1
  • View Reka Vision details
    VisionPAID

    Reka Vision

    Reka

    Multimodal platform to search, reason over, and clip large volumes of video.

    Reka Vision is an enterprise multimodal system that indexes large image and video libraries so teams can search by meaning, ask timestamp-aware questions, and auto-generate highlights and clips. It is built by Reka, a frontier multimodal-model lab, and is available via API, an MCP server, or a hosted app. Access is sales-led (request a demo).

    Worth knowing

    Built by Reka, a 2022 lab of ex-DeepMind/Google/Meta researchers; $1B+ valuation in 2025 on a $110M Nvidia/Snowflake round.

    • video-understanding
    • multimodal
    • visual-search
    • video-clipping
    • +1