Skip to content

InferenceLiquid AI

Liquid AI

On-device foundation models (LFMs) plus LEAP, an edge platform to ship them to any device.

Category
Inference
Pricing
FREEMIUM
Hosting
Hybrid
Platforms
WebiOSAndroidAPI
Models
Self-contained (on-device)
Verified
Jun 20, 2026

Liquid AI builds Liquid Foundation Models (LFMs) — compact, fast models designed to run on phones, laptops, and edge hardware rather than the cloud. Its LEAP platform lets developers discover, fine-tune, bundle, and deploy these models on-device through an Edge SDK, taking a model 'from concept to on-device in minutes.' The LFM2/LFM2.5 family spans 350M–8.3B parameters with a hybrid architecture tuned for low-latency local inference.

Pros & cons

  • Models run on-device, no cloud needed
  • Low latency, offline, and private
  • LEAP core is free for all users
  • Fine-tune and bundle for edge fast
  • Open-weight LFM2/LFM2.5 family
  • Small models trail frontier cloud LLMs
  • On-device deployment adds app work
  • Enterprise support is sales-gated
  • Custom (non-OSI) model license

Tags

View all Inference
  • View Ollama details
    InferenceFREEMIUMOpen core

    Ollama

    Ollama

    Run open-weight LLMs locally with one command. OpenAI-compatible API.

    The de-facto way to pull and run open-weight models (Llama, Qwen, Gemma, DeepSeek, gpt-oss) on your own machine — no API key, no data leaving the device. Ships native macOS/Windows/Linux apps, an OpenAI-compatible server, and official Python/JS libraries. MIT-licensed and free locally; an optional paid Ollama Cloud runs larger models.

    One-command pull-and-run
    Local performance bound by your hardware
    • local
    • open-source
    • llm-runner
    • self-hosted
  • View Groq details
    InferenceFREEMIUM

    Groq

    Groq

    Ultra-fast inference on custom LPU chips. Open-weights at 500+ tokens/sec.

    GroqCloud serves open-weights models (Llama, DeepSeek, Qwen, Kimi) on Groq's purpose-built LPU hardware, hitting hundreds of tokens per second where GPUs manage tens. OpenAI-compatible API with a free tier; the default when token latency is the product.

    Hundreds of tokens/sec on open models
    Curated open-weight models only
    • inference
    • low-latency
    • lpu
    • open-weights
  • View Replicate details
    InferenceFREEMIUM

    Replicate

    Replicate

    Run, fine-tune, and deploy thousands of open models via one API.

    A platform to run open-source models with one API call — image, video, audio, and language — plus fine-tuning and custom deploys with pay-per-second billing. No infra to manage.

    Thousands of community models, one API
    Cold starts on less-popular models
    • model-hosting
    • fine-tuning
    • api
    • open-source