InferenceInception Labs

Inception Labs

Diffusion LLMs for ultra-fast text and code.

Category: Inference
Pricing: PAID
Source: Proprietary
Hosting: Cloud
Platforms: APIWeb
Models: Single model (proprietary)
Verified: Jun 19, 2026

Inception Labs builds diffusion-based large language models (dLLMs) that generate tokens in parallel rather than sequentially, claiming several times the speed and under half the cost of conventional autoregressive LLMs at comparable quality. Its Mercury family — including the Mercury 2 reasoning model and Mercury Edit for code — is served through an OpenAI-compatible API and also via AWS Bedrock and Azure. The Stanford spinout, led by Stefano Ermon, raised $50M from Menlo Ventures with angels including Andrew Ng and Andrej Karpathy.

Pros & cons

1,000+ tokens/sec throughput
Lower per-token cost than peers
OpenAI-compatible API
Available on Bedrock and Azure

Own model family only (Mercury)
Newer, less battle-tested than GPT/Claude
Paid API, no large free tier

Inception Labs

Groq

Cerebras

Fireworks AI

Together AI