Lamini

Enterprise platform to tune and run open LLMs in your own environment.

Categories: Fine-tuningInference
Pricing: PAID
Source: Proprietary
Hosting: Hybrid
Platforms: WebAPI
Models: Multi-model
Verified: Jun 15, 2026

Lamini is an enterprise LLM platform for fine-tuning open models and serving them, designed to run on-prem, in a VPC, or on Lamini's cloud — including on AMD GPUs. It pairs tuning (LoRA/PEFT and memory tuning to reduce hallucinations) with an inference stack and agentic pipelines, accessed via a Python client, REST API, or web UI. Built for teams that need to keep models and data in-house.

Capabilities 5

What it actually does — grouped by capability family.

Agent framework (secondary capability)

Fine-tuning / training (primary capability)
Model inference / serving (primary capability)

RAG pipeline (secondary capability)

Text classification (secondary capability)

Pros & cons

Keeps models and data fully in-house
Supports AMD GPUs, not just NVIDIA
Memory tuning to cut hallucinations
Founded by an MLPerf and ex-NVIDIA team

Enterprise-focused pricing
Open models only
Smaller ecosystem than hyperscalers

View Predibase details
Fine-tuningPAID
Predibase
Predibase (Rubrik)
Fine-tune open-source LLMs and serve them in production.
Predibase is an enterprise platform for fine-tuning open-source models and serving them in production. It pairs a post-training stack — supervised fine-tuning plus an end-to-end reinforcement fine-tuning (RFT) flow — with an optimized inference engine, and its open-source LoRAX framework serves many fine-tuned LoRA adapters from a single GPU. Runs as managed SaaS or inside your own VPC.
Fine-tune + serve in one place
Enterprise-priced
- fine-tuning
- lora
- rft
- inference
- +1
Open
View OpenPipe details
Fine-tuningPAID
OpenPipe
OpenPipe
Replace frontier-model spend with a fine-tuned small model.
Captures your production OpenAI / Anthropic calls, builds a dataset, fine-tunes a small open-weights model on your traffic, then serves the swap behind your existing SDK. The pitch: 10x cost reduction at parity.
Uses your production logs as training data
Needs enough quality traffic to distill
- fine-tuning
- cost-reduction
- drop-in
- open-weights
Open
View Together AI details
InferenceFREEMIUM
Together AI
Together
Hosted inference and fine-tuning for open-weights models.
Hosted inference and fine-tuning across hundreds of open-weights models (Llama, Mistral, DeepSeek, Qwen, etc.). Strong pricing for inference-at-scale; LoRA + full fine-tuning supported.
LoRA and full fine-tuning
Open models only, no frontier closed models
- inference
- fine-tuning
- open-weights
- lora
Open
View Tinker details
Fine-tuningPAID
Tinker
Thinking Machines Lab
Managed fine-tuning API with low-level control over the training loop.
Tinker is Thinking Machines Lab's training API for fine-tuning open-weight LLMs. It exposes low-level primitives — forward_backward, optim_step, sample — so researchers keep full control of data and algorithms while the service handles distributed GPU scheduling and failure recovery. LoRA-based runs cover models from small Llamas up to large mixture-of-experts like Qwen-235B and Kimi K2, and trained weights can be downloaded.
Exposes forward_backward, optim_step, sample
LoRA-based only (no full fine-tune)
- fine-tuning
- lora
- post-training
- research
- +1
Open

Open Lamini

Lamini

Capabilities 5

Pros & cons

Tags

Further reading

Predibase

OpenPipe

Together AI

Tinker