Skip to content

Voice AI apps

Voice AI — text-to-speech, voice cloning, and real-time conversational voice agents.

14 apps · researched & kept current by Claude Code

Filter & search these 14 apps
  • View Resemble AI details
    VoiceFREEMIUM

    Resemble AI

    Resemble AI

    Voice cloning, audio watermarking, and deepfake detection in one platform.

    Resemble AI spans both sides of synthetic voice: generating it and policing it. The platform offers voice cloning and text-to-speech built on its Chatterbox models, real-time audio watermarking, and Detect, a multimodal deepfake detector covering audio, image, and video. It deploys in the cloud or fully on-premises for regulated environments.

    Worth knowing

    Open-sourced its MIT-licensed Chatterbox TTS model while selling Detect, a deepfake detector scoring 98.1% on ASVspoof 2021.

    • voice-cloning
    • deepfake-detection
    • watermarking
    • tts
    • +1
  • View Murf AI details
    VoiceFREEMIUM

    Murf AI

    Murf AI

    AI voice generator studio with 200+ voices, dubbing, and a low-latency TTS API.

    Murf AI is a text-to-speech platform pairing a studio editor — 200+ voices across 35+ languages, voice cloning, dubbing, and a voice changer — with developer APIs. Its Gen 2 speech model focuses on pronunciation accuracy and granular voice controls, while the Falcon API targets sub-130ms latency for real-time voice agents. Integrations include Canva, PowerPoint, and Google Slides.

    Worth knowing

    Founded in 2020 by three IIT Kharagpur alumni; its $10M Series A was led by Matrix Partners India (now Z47).

    • text-to-speech
    • voiceover
    • dubbing
    • voice-cloning
  • View Speechmatics details
    VoiceFREEMIUM

    Speechmatics

    Speechmatics

    Enterprise speech APIs — real-time STT, TTS, and voice agents in 55+ languages.

    Speechmatics provides speech-to-text (batch and sub-second real-time), text-to-speech, and a Flow API for building voice agents, with accuracy that holds up across accents and dialects in 55+ languages. The same engine deploys in cloud, container, on-prem, or fully on-device, and it powers products from Adobe, LiveKit, and Ubisoft.

    Worth knowing

    Founded in Cambridge in 2006 by Tony Robinson, a 1980s pioneer of recurrent-neural-network speech recognition.

    • speech-to-text
    • voice-agents
    • tts
    • real-time
  • View Bland AI details
    VoiceFREEMIUM

    Bland AI

    Bland

    Enterprise voice AI platform for automated phone calls.

    An enterprise platform for building AI agents that handle inbound and outbound phone calls 24/7 with natural, real-time conversation. Bland bundles the language model, speech-to-text, text-to-speech, and telephony into a single per-minute rate, and integrates with tools like Twilio, Salesforce, HubSpot, and Zapier. Build agents in a web dashboard or via a REST API; a free tier lets you start before moving to usage-based paid tiers.

    Worth knowing

    A Y Combinator (S23) startup that raised a $40M Series B led by Emergence Capital in 2024, reaching $65M total funding.

    • voice-agents
    • phone-calls
    • conversational-ai
    • telephony
  • View AssemblyAI details
    VoiceFREEMIUM

    AssemblyAI

    AssemblyAI

    Production speech-to-text + audio intelligence API.

    Speech recognition API with batch and real-time streaming transcription, speaker diarization, and language detection. Its Universal models pair with optional Speech Understanding features (summarization, sentiment, redaction) so a single API can build conversation-intelligence products. Starts with a free credit and pay-as-you-go, per-second billing.

    Worth knowing

    Raised a $50M Series C in 2023 (Accel-led, with Nat Friedman and Daniel Gross); a Y Combinator alum.

    • stt
    • transcription
    • streaming
    • audio-intelligence
  • View Retell AI details
    VoiceFREEMIUM

    Retell AI

    Retell AI

    Build, test, and deploy AI voice agents for phone calls.

    A no-code platform for humanlike voice agents that handle inbound and outbound phone calls — receptionists, IVR, and outbound campaigns. It bundles telephony (SIP / Twilio), a proprietary turn-taking model for low-latency conversations, prompts, tools, and call analytics. Pay-as-you-go pricing with free starter credits.

    Worth knowing

    Founded 2023 by ex-ByteDance, Google and Meta alumni; a YC W24 startup at ~$40M annualized revenue with a ~25-person team.

    • voice-agents
    • telephony
    • call-automation
    • no-code
  • View Sesame details
    VoiceFREEOSS

    Sesame

    Sesame

    Conversational voice companion chasing "voice presence."

    A conversational-speech company building lifelike voice companions — Maya and Miles — that interrupt, self-correct, and use natural pacing. The web demo lets you talk to them in real time, and Sesame has open-sourced its underlying CSM (Conversational Speech Model) base model. Co-founded by Oculus co-creator Brendan Iribe.

    Worth knowing

    Raised a $250M Series B led by Sequoia and Spark in 2025 to build voice-first AI smart glasses.

    • voice
    • conversational
    • companion
    • speech
    • +1
  • View Wispr Flow details
    VoiceFREEMIUM

    Wispr Flow

    Wispr

    AI voice dictation that types for you across every app, on desktop and mobile.

    A dictation tool that turns speech into clean, formatted text in any app — removing filler words and applying context-aware edits as you talk. One subscription works across macOS, Windows, iOS, and Android, syncing your custom vocabulary and snippets between devices.

    Worth knowing

    Started as a wearable for typing by silently mouthing words, then pivoted to the Flow dictation app; $30M Series A (Menlo, 2025).

    • dictation
    • voice
    • speech-to-text
    • productivity
  • View Hume AI details
    VoiceFREEMIUM

    Hume AI

    Hume AI

    Empathic Voice Interface — speech-to-speech AI that hears tone.

    A voice AI toolkit built around the Empathic Voice Interface (EVI), a speech-to-speech model that infers emotion and prosody from a user's voice and modulates its replies accordingly. Exposed as an API for building expressive voice agents and assistants. From a research lab focused on emotional intelligence in AI.

    Worth knowing

    Founder Alan Cowen is an ex-Google scientist whose ‘semantic space theory’ of emotion underpins the product; $50M Series B (EQT, 2024).

    • voice
    • speech-to-speech
    • emotion
    • api
  • View Speechify details
    VoiceFREEMIUM

    Speechify

    Speechify

    AI text-to-speech that reads any document, PDF, or page aloud.

    Speechify is an AI text-to-speech app that turns articles, PDFs, emails, and books into natural-sounding audio with high-definition voices, adjustable speed, and OCR for scanned text. It runs on iOS, Android, web, a browser extension, and desktop, and offers a separate Studio product plus a text-to-speech API for developers.

    Worth knowing

    Founder Cliff Weitzman built it to cope with his own dyslexia and was named to Forbes 30 Under 30 in 2017.

    • text-to-speech
    • read-aloud
    • accessibility
    • voice-cloning
  • View Deepgram details
    VoiceFREEMIUM

    Deepgram

    Deepgram

    Production speech-to-text. The STT default for many companies.

    End-to-end speech recognition platform — real-time streaming, batch transcription, speaker diarization, and language detection. Strong on accented speech, telephony audio, and long-form recordings.

    Worth knowing

    Co-founded by a particle physicist who'd built a dark-matter detector two miles underground before pivoting to speech.

    • stt
    • transcription
    • streaming
    • diarization
  • View ElevenLabs details
    VoiceFREEMIUM

    ElevenLabs

    ElevenLabs

    Frontier TTS, voice cloning, and dubbing. Industry default.

    Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.

    Worth knowing

    Founded in 2022 by two Polish friends (ex-Google and ex-Palantir); a 2026 raise valued it at $11B.

    • tts
    • voice-cloning
    • dubbing
    • multilingual
  • View Cartesia details
    VoiceFREEMIUM

    Cartesia

    Cartesia

    Low-latency streaming TTS. Sub-100ms first audio.

    Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.

    Worth knowing

    Founded in 2023 by the Stanford AI Lab team behind state-space models and Mamba, incl. Albert Gu and Karan Goel.

    • tts
    • streaming
    • low-latency
    • real-time
  • View Vapi details
    VoiceFREEMIUM

    Vapi

    Vapi

    Voice agent infrastructure. Build a phone-agent in a weekend.

    Production voice-agent platform — telephony, STT, LLM, TTS, and interrupt handling stitched together so you call an endpoint and get a working phone agent. Pluggable models at every layer.

    Worth knowing

    Hit a ~$500M valuation in 2026 after Amazon picked it to power Ring's voice AI over 40 rival platforms; it has handled 1B+ calls.

    • voice-agents
    • telephony
    • phone
    • real-time