AudioFish Audio

Fish Audio

Expressive, emotionally controllable text-to-speech, voice cloning, and voice agents.

Categories: AudioVoice
Pricing: FREEMIUM
Source: Proprietary
Hosting: Cloud
Platforms: WebAPI
Models: Self-contained (on-device)
Verified: Jun 15, 2026

Fish Audio is a voice AI platform for real-time text-to-speech with emotion tags, voice cloning from clips as short as 15 seconds, speech-to-text, and end-to-end voice agents. Its flagship S2 model targets natural, expressive, multilingual narration, and the company maintains the open-source Fish Speech (OpenAudio) models on GitHub. A free tier covers personal use, with paid plans and an API for commercial use.

Capabilities 4

What it actually does — grouped by capability family.

Voice agent (secondary capability)

Speech synthesis (TTS) (primary capability)
Voice cloning (primary capability)
Transcription (STT) (secondary capability)

Pros & cons

Expressive, emotion-controllable TTS
Fast voice cloning from ~15s of audio
Open-source Fish Speech models
Notably cheaper than ElevenLabs
Multilingual with a developer API

Hosted platform itself is proprietary
Free tier has monthly generation caps
Smaller voice library than incumbents
Voice cloning carries misuse risk

Tags

View all Audio →

View ElevenLabs details
VoiceFREEMIUM
ElevenLabs
ElevenLabs
Text-to-speech, voice cloning, and multilingual dubbing.
Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.
Best-in-class voice realism
Pricier than commodity TTS at scale
- tts
- voice-cloning
- dubbing
- multilingual
Open
View Cartesia details
VoiceFREEMIUM
Cartesia
Cartesia
Low-latency streaming text-to-speech for real-time voice.
Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.
Streaming over WebSocket for fast first audio
Long-form expressive texture trails ElevenLabs
- tts
- streaming
- low-latency
- real-time
Open
View Rime details
VoiceFREEMIUM
Rime
Rime
Enterprise text-to-speech built for real-time voice agents.
Rime builds AI voice models for high-stakes business conversations like IVRs, contact centers, and AI phone agents. Its Arcana and Mist models target ultra-low latency and natural, conversational delivery, with deterministic pronunciation control so terms are spoken consistently without retraining. Rime can be deployed on-prem, in a VPC, or via cloud API, and is offered directly or through voice-AI partner platforms.
Very low latency for real-time voice agents
Enterprise-focused, not a consumer tool
- text-to-speech
- voice-ai
- tts
- contact-center
- +1
Open
View Murf AI details
VoiceFREEMIUM
Murf AI
Murf AI
AI voice generator studio with dubbing and a low-latency TTS API.
Murf AI is a text-to-speech platform pairing a studio editor — 200+ voices across 35+ languages, voice cloning, dubbing, and a voice changer — with developer APIs. Its Gen 2 speech model focuses on pronunciation accuracy and granular voice controls, while the Falcon API targets sub-130ms latency for real-time voice agents. Integrations include Canva, PowerPoint, and Google Slides.
200+ voices, 35+ languages
Commercial rights need a paid plan
- text-to-speech
- voiceover
- dubbing
- voice-cloning
Open

Open Fish Audio

Capabilities 4

Pros & cons

Tags

ElevenLabs

Cartesia

Rime

Murf AI