Unsloth

Fine-tune open LLMs faster with far less VRAM.

Category: Fine-tuning
Pricing: FREEMIUM
Source: Open core
Hosting: Local
Platforms: CLILinuxWindowsmacOS
Models: Multi-model
Verified: Jun 6, 2026

An open-source (Apache-2.0) framework for fine-tuning and running open-weight models with custom CUDA kernels — roughly 2x faster training and large VRAM savings, so 7B–13B models fit on a single consumer GPU. Free tier runs on Colab/Kaggle or locally; Pro and Enterprise tiers add multi-GPU and multi-node speedups. Exports to GGUF/Safetensors for llama.cpp, vLLM, and Ollama.

Capabilities 2

What it actually does — grouped by capability family.

Fine-tuning / training (primary capability)
Model inference / serving (secondary capability)

Pros & cons

LoRA, QLoRA, and full fine-tuning
Supports Llama, Qwen, Gemma, DeepSeek
Custom CUDA kernels under the hood
Exports GGUF/Safetensors for llama.cpp/vLLM/Ollama
Runs free on Colab/Kaggle

Multi-GPU speedups are paid tiers
NVIDIA-centric, CUDA-focused
Supports a curated model set
Requires ML fine-tuning know-how

Tags

View all Fine-tuning →

View OpenPipe details
Fine-tuningFREEMIUM
OpenPipe
OpenPipe
Replace frontier-model spend with a fine-tuned small model.
Captures your production OpenAI / Anthropic calls, builds a dataset, fine-tunes a small open-weights model on your traffic, then serves the swap behind your existing SDK. The pitch: 10x cost reduction at parity.
Uses your production logs as training data
Needs enough quality traffic to distill
- fine-tuning
- cost-reduction
- drop-in
- open-weights
Open
View Modal details
InferenceFREEMIUM
Modal
Modal Labs
Serverless GPUs. Run training, inference, batch jobs from Python.
Define cloud workloads in Python, deploy with one command — GPU access on demand, fast cold starts, fair-share pricing. The default 'I need to fine-tune a model from a Jupyter cell' platform.
Python-decorator infra, no YAML/Dockerfiles
SDK lock-in; migrating means rewriting
- gpu
- serverless
- python
- training
Open
View Runpod details
InferencePAID
Runpod
Runpod
GPU cloud for AI — on-demand instances and serverless inference.
Runpod is an AI developer cloud for renting GPUs on demand or running auto-scaling serverless inference endpoints. Serverless workers bill by the millisecond, scale to zero when idle, and advertise sub-200ms cold starts; on-demand Pods and multi-node Clusters cover training and long-running jobs. A Community Cloud tier offers cheaper, peer-sourced GPUs alongside the vendor-operated Secure Cloud.
Serverless auto-scaling inference
Community Cloud less reliable/secure
- gpu-cloud
- serverless
- inference
- deployment
- +1
Open

Open Unsloth

Capabilities 2

Pros & cons

Tags

OpenPipe

Modal

Runpod