Runpod

GPU cloud for AI — on-demand instances and serverless inference.

Category: Inference
Pricing: PAID
Source: Proprietary
Hosting: Cloud
Platforms: WebAPICLI
Models: Model-agnostic
Verified: Jun 8, 2026

Runpod is an AI developer cloud for renting GPUs on demand or running auto-scaling serverless inference endpoints. Serverless workers bill by the millisecond, scale to zero when idle, and advertise sub-200ms cold starts; on-demand Pods and multi-node Clusters cover training and long-running jobs. A Community Cloud tier offers cheaper, peer-sourced GPUs alongside the vendor-operated Secure Cloud.

Capabilities 4

What it actually does — grouped by capability family.

GPU compute (primary capability)
Model inference / serving (secondary capability)
Fine-tuning / training (secondary capability)

App / agent deployment (secondary capability)

Pros & cons

Serverless auto-scaling inference
Sub-200ms cold starts
Secure and Community Cloud GPU tiers
On-demand Pods and clusters too

Community Cloud less reliable/secure
GPU availability varies
Self-managed model serving

Tags

View all Inference →

View Modal details
InferenceFREEMIUM
Modal
Modal Labs
Serverless GPUs. Run training, inference, batch jobs from Python.
Define cloud workloads in Python, deploy with one command — GPU access on demand, fast cold starts, fair-share pricing. The default 'I need to fine-tune a model from a Jupyter cell' platform.
Python-decorator infra, no YAML/Dockerfiles
SDK lock-in; migrating means rewriting
- gpu
- serverless
- python
- training
Open
View Replicate details
InferenceFREEMIUM
Replicate
Replicate
Run, fine-tune, and deploy thousands of open models via one API.
A platform to run open-source models with one API call — image, video, audio, and language — plus fine-tuning and custom deploys with pay-per-second billing. No infra to manage.
Image, video, audio, and language models
Cold starts on less-popular models
- model-hosting
- fine-tuning
- api
- open-source
Open
View Baseten details
InferenceFREEMIUM
Baseten
Baseten
Inference cloud for serving any AI model in production.
Production inference platform offering both pre-optimized Model APIs (Llama, DeepSeek, and more, billed per token) and dedicated GPU/CPU deployments for custom models, billed per minute with no charge for idle time. Custom models are packaged with its open-source Truss format and autoscale, including scale-to-zero. Aimed at low-latency, high-throughput serving.
Prebuilt Model APIs for Llama, DeepSeek
Dedicated GPU rates run pricier than Modal
- inference
- model-serving
- gpu
- autoscaling
Open
View Lightning AI details
InfraFREEMIUM
Lightning AI
Lightning AI
Persistent GPU cloud workspaces to build, train, and ship AI.
A cloud platform built around AI Studios — collaborative, persistent GPU workspaces for coding, training models, running inference, and building agents and AI apps. Pay-as-you-go GPUs with a monthly free credit allowance, plus a Pro tier and bring-your-own-cloud for enterprise. Made by the team behind the open-source PyTorch Lightning framework.
Pause/resume persistent GPU Studios
Pay-as-you-go can add up
- gpu-cloud
- training
- studios
- infrastructure
Open

Open Runpod

Capabilities 4

Pros & cons

Tags

Modal

Replicate

Baseten

Lightning AI