Baseten
Baseten
Inference cloud for serving any AI model in production.
Production inference platform offering both pre-optimized Model APIs (Llama, DeepSeek, and more, billed per token) and dedicated GPU/CPU deployments for custom models, billed per minute with no charge for idle time. Custom models are packaged with its open-source Truss format and autoscale, including scale-to-zero. Aimed at low-latency, high-throughput serving.
Worth knowing
Raised a $300M Series E in Jan 2026 at a $5B valuation, with Nvidia investing $150M of it.
- inference
- model-serving
- gpu
- autoscaling