Skip to content

AudioFish Audio

Fish Audio

Expressive, emotionally controllable text-to-speech, voice cloning, and voice agents.

Categories
AudioVoice
Pricing
FREEMIUM
Hosting
Cloud
Platforms
WebAPI
Models
Self-contained (on-device)
Verified
Jun 15, 2026

Fish Audio is a voice AI platform for real-time text-to-speech with emotion tags, voice cloning from clips as short as 15 seconds, speech-to-text, and end-to-end voice agents. Its flagship S2 model targets natural, expressive, multilingual narration, and the company maintains the open-source Fish Speech (OpenAudio) models on GitHub. A free tier covers personal use, with paid plans and an API for commercial use.

Pros & cons

  • Expressive, emotion-controllable TTS
  • Fast voice cloning from ~15s of audio
  • Open-source Fish Speech models
  • Notably cheaper than ElevenLabs
  • Multilingual with a developer API
  • Hosted platform itself is proprietary
  • Free tier has monthly generation caps
  • Smaller voice library than incumbents
  • Voice cloning carries misuse risk

Tags

View all Audio
  • View ElevenLabs details
    VoiceFREEMIUM

    ElevenLabs

    ElevenLabs

    Frontier TTS, voice cloning, and dubbing. Industry default.

    Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.

    Worth knowing

    Founded in 2022 by two Polish friends (ex-Google and ex-Palantir); a 2026 raise valued it at $11B.

    • tts
    • voice-cloning
    • dubbing
    • multilingual
  • View Cartesia details
    VoiceFREEMIUM

    Cartesia

    Cartesia

    Low-latency streaming TTS. Sub-100ms first audio.

    Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.

    Worth knowing

    Founded in 2023 by the Stanford AI Lab team behind state-space models and Mamba, incl. Albert Gu and Karan Goel.

    • tts
    • streaming
    • low-latency
    • real-time
  • View Rime details
    VoiceFREEMIUM

    Rime

    Rime

    Enterprise text-to-speech built for real-time voice agents.

    Rime builds AI voice models for high-stakes business conversations like IVRs, contact centers, and AI phone agents. Its Arcana and Mist models target ultra-low latency and natural, conversational delivery, with deterministic pronunciation control so terms are spoken consistently without retraining. Rime can be deployed on-prem, in a VPC, or via cloud API, and is offered directly or through voice-AI partner platforms.

    Worth knowing

    Open-sourced Rimecaster in 2025, billed as the first open speaker model trained on natural conversational — not audiobook — speech.

    • text-to-speech
    • voice-ai
    • tts
    • contact-center
    • +1
  • View Murf AI details
    VoiceFREEMIUM

    Murf AI

    Murf AI

    AI voice generator studio with 200+ voices, dubbing, and a low-latency TTS API.

    Murf AI is a text-to-speech platform pairing a studio editor — 200+ voices across 35+ languages, voice cloning, dubbing, and a voice changer — with developer APIs. Its Gen 2 speech model focuses on pronunciation accuracy and granular voice controls, while the Falcon API targets sub-130ms latency for real-time voice agents. Integrations include Canva, PowerPoint, and Google Slides.

    Worth knowing

    Founded in 2020 by three IIT Kharagpur alumni; its $10M Series A was led by Matrix Partners India (now Z47).

    • text-to-speech
    • voiceover
    • dubbing
    • voice-cloning