Loading…
Vision · LandingAI
Visual prompting + vision agents from Andrew Ng's lab.
Build vision applications with a labelling-light workflow — point at examples, get a deployable detector. Recently extended into vision agents that reason over images and PDFs without bespoke training.
Model support
Where it runs
Tags
Related in Vision
Encord
Data platform to curate, label, and manage AI training data.
An enterprise data development platform for preparing high-quality training data across images, video, documents, audio, DICOM, and 3D point clouds. It pairs AI-assisted labeling (SAM auto-segmentation, object tracking) with data curation, model evaluation, and workflow tooling, plus LLM-powered data agents for document tasks. Used heavily in medical imaging, robotics, and other physical-AI domains.
AI insight: Built for physical-world and medical AI: labels DICOM, NIfTI, LiDAR point clouds and SAR alongside images and video, not just photos.
Mathpix
OCR and document conversion built for math, science, and STEM.
OCR and document-conversion tooling specialized for STEM content. Mathpix reads printed and handwritten math, chemistry, tables, and text from images and PDFs, exporting to LaTeX, DOCX, Markdown, Excel, ChemDraw, and more. It ships as the Snip app (web, mobile, desktop, browser extension) for individuals and teams, plus a Convert API for developers building solving, tutoring, and grading products.
AI insight: Built for STEM OCR: returns LaTeX, Markdown, tables, and even ChemDraw SMILES from handwriting, not just plain text.
Supervisely
All-in-one computer vision platform to curate, label, and train models.
A unified computer vision platform covering data curation, annotation, model training, and deployment across images, video, 3D point clouds, and medical imagery. AI-assisted labeling, experiment tracking, and a large catalog of installable apps make it customizable for most CV workflows. Free for researchers and small teams; Pro and self-hostable Enterprise editions for companies.
AI insight: Works like an OS for computer vision — extend it with an ecosystem of installable apps for labeling, training, and inference.
Ultralytics
State-of-the-art YOLO models for real-time object detection and vision.
The open-source PyTorch framework behind the YOLO (You Only Look Once) family of vision models. One unified API covers object detection, instance and semantic segmentation, image classification, pose estimation, and oriented bounding boxes, with both a CLI and a Python interface. The 2026 flagship, YOLO26, is an end-to-end, NMS-free architecture tuned for edge and low-power deployment.
AI insight: AGPL-3.0 licensed: products that embed it must open-source their own code or buy an Ultralytics enterprise license.
M87 Labs
Tiny open vision-language model for efficient image understanding.
An open-weights family of small vision-language models for captioning, visual Q&A, pointing, counting, and object detection — small enough to run on-device (checkpoints down to 0.5B on Hugging Face). Run it locally with the Photon engine, or call Moondream Cloud's OpenAI-compatible API with a free monthly credit tier and pay-per-image pricing.
AI insight: Among the smallest open VLMs — a 0.5B checkpoint runs on-device, yet the family still does pointing, counting, and object detection.
TwelveLabs
Video intelligence API: search, classify, and summarize video.
Video understanding platform built on its own multimodal foundation models — Marengo for embeddings and semantic search, Pegasus for generative tasks like summaries and captions. Developers index video once and run natural-language search, classification, and analysis via API. Free tier with usage-based pricing beyond it.
AI insight: Trains video-native foundation models — Marengo for search, Pegasus for generation — instead of captioning sampled frames into a text LLM.
Roboflow
Vision MLOps end-to-end. Annotate, train, deploy.
Annotation tooling, auto-labelling, hosted training, and edge deployment for computer-vision projects. Strong default when you're shipping a custom vision model rather than reaching for a multimodal LLM.
AI insight: For when the answer is a small custom vision model, not a multimodal LLM — it owns the annotate-train-deploy loop end to end.
Voxel51
FiftyOne — open-source vision data platform.
Open-source toolkit for exploring, debugging, and curating vision datasets. Strong story for finding model failure modes, balancing classes, and tracking experiment drift across visual data at scale.
AI insight: FiftyOne's superpower is surfacing the bad labels and failure cases hiding in a vision dataset — debugging data, not just models.