Skip to content

InfraCerebrium

Cerebrium

Serverless GPU infrastructure for real-time AI — voice, video, and LLM workloads.

Category
Infra
Pricing
PAID
Hosting
Cloud
Platforms
APICLI
Models
Model-agnostic
Verified
Jun 12, 2026

A serverless GPU platform for deploying real-time AI workloads — voice agents, video models, and LLMs — with cold starts in seconds, instant autoscaling, and multi-region failover. Bring custom code, Dockerfiles, or frameworks like vLLM and pay per second of compute across 12+ GPU types.

Pros & cons

  • 2–4s cold starts, scale-to-zero
  • 12+ GPU types up to B200
  • Multi-region deploys + failover
  • SOC 2, HIPAA, GDPR compliant
  • $100/mo base on the Standard tier
  • Hobby tier capped at 3 apps, 5 GPUs
  • Younger platform, smaller community

Tags

Further reading

View all Infra
  • View Modal details
    InferenceFREEMIUM

    Modal

    Modal Labs

    Serverless GPUs. Run training, inference, batch jobs from Python.

    Define cloud workloads in Python, deploy with one command — GPU access on demand, fast cold starts, fair-share pricing. The default 'I need to fine-tune a model from a Jupyter cell' platform.

    Worth knowing

    Co-founded by Erik Bernhardsson, who built Spotify's recommender; raised a $355M Series C at a $4.65B valuation in 2026.

    • gpu
    • serverless
    • python
    • training
  • View Runpod details
    InferencePAID

    Runpod

    Runpod

    GPU cloud for AI — on-demand instances and serverless inference.

    Runpod is an AI developer cloud for renting GPUs on demand or running auto-scaling serverless inference endpoints. Serverless workers bill by the millisecond, scale to zero when idle, and advertise sub-200ms cold starts; on-demand Pods and multi-node Clusters cover training and long-running jobs. A Community Cloud tier offers cheaper, peer-sourced GPUs alongside the vendor-operated Secure Cloud.

    Worth knowing

    Bootstrapped from a Reddit post by two ex-Comcast developers, it hit $120M ARR before ever raising a Series A.

    • gpu-cloud
    • serverless
    • inference
    • deployment
    • +1
  • View Baseten details
    InferenceFREEMIUM

    Baseten

    Baseten

    Inference cloud for serving any AI model in production.

    Production inference platform offering both pre-optimized Model APIs (Llama, DeepSeek, and more, billed per token) and dedicated GPU/CPU deployments for custom models, billed per minute with no charge for idle time. Custom models are packaged with its open-source Truss format and autoscale, including scale-to-zero. Aimed at low-latency, high-throughput serving.

    Worth knowing

    Raised a $300M Series E in Jan 2026 at a $5B valuation, with Nvidia investing $150M of it.

    • inference
    • model-serving
    • gpu
    • autoscaling
  • View Beam details
    InfraFREEMIUM

    Beam

    Beam

    On-demand serverless GPU compute for AI, from Python.

    A serverless cloud for deploying AI inference endpoints, agent sandboxes, task queues, and containerized GPU workloads with a few lines of Python. It handles fast cold starts, autoscaling, and Docker-in-Docker execution across multiple cloud backends, and supports bring-your-own-compute. The Developer tier is free with recurring monthly credit; paid tiers add team features and scale, billed pay-as-you-go by GPU usage.

    Worth knowing

    A YC-backed startup that began in 2021 as Slai before becoming Beam; its beta9 runtime is AGPL-3.0.

    • gpu
    • serverless
    • python
    • inference
    • +1