Voice AI apps

Voice AI — text-to-speech, voice cloning, and real-time conversational voice agents.

52 apps · researched & kept current by Claude Code

Filter & search these 52 apps

View TurboScribe details
VoiceFREEMIUM
TurboScribe
TurboScribe
Unlimited audio and video transcription powered by Whisper.
TurboScribe converts audio and video files into text using OpenAI's Whisper speech-to-text model. It supports 98+ languages, automatic speaker recognition and translation of transcripts or subtitles into 134+ languages, with exports to formats like DOCX, PDF, SRT and VTT. A free tier allows three transcriptions a day, while the paid Unlimited plan removes usage caps for a flat monthly fee.
Exports to DOCX, PDF, SRT, and VTT
Free tier capped at 3 files/day, 30 minutes each
- transcription
- speech-to-text
- whisper
- subtitles
- +1
Open
View Phonic details
VoicePAID
Phonic
Phonic
Speech-to-speech platform for reliable voice agents.
Phonic is a platform for building production voice agents on its own end-to-end speech-to-speech models, rather than chaining separate speech-to-text, LLM, and text-to-speech stages. It targets sub-300ms latency for natural turn-taking and reliable tool calling, and bundles evaluation, session records, and real-time observability to surface failure points. Aimed at enterprises, it offers cloud API access plus containerized deployment in your own environment.
Reliable tool calling for voice agents
Enterprise-focused, no public free tier
- voice-agents
- speech-to-speech
- conversational-ai
- low-latency
- +1
Open
View Podcastle details
AudioFREEMIUM
Podcastle
Podcastle
AI-powered studio for recording, editing, and producing audio and video.
Podcastle is a browser-based content-creation platform for podcasters and video creators: multi-track remote recording, AI text-to-speech and voice cloning, transcription, and one-click audio and video editing with noise removal and leveling. It bundles studio capture, AI voices, and editing into a single workflow so creators don't need a separate DAW or video editor.
Recording + AI voices + editing in one app
Cloud-only; needs a connection
- podcasting
- text-to-speech
- voice-cloning
- transcription
- +1
Open
View Regal details
VoicePAID
Regal
Regal
Voice AI agent platform for contact centers.
A platform to build, deploy, and manage AI voice agents that handle inbound and outbound customer interactions across phone, SMS, chat, and WebRTC. Agents learn from past conversations, and a 2026 Copilot lets teams stand up a working voice agent within a day using existing business processes.
Phone, SMS, chat, and WebRTC in one platform
No public free tier — demo/sales-led
- voice-agents
- contact-center
- cx
- outbound
Open
View SoundHound AI details
VoicePAID
SoundHound AI
SoundHound AI
Voice-native conversational AI platform for enterprise agents.
SoundHound AI builds voice-native conversational AI used to deploy autonomous agents for customer interactions across automotive, restaurants, financial services, healthcare, and smart devices. Its full-stack speech technology pairs Speech-to-Meaning understanding with the Amelia enterprise agent platform and newer OASYS stack, automating phone answering, drive-thru ordering, and IT service management. It powers billions of conversations a year and is offered to developers and enterprises via APIs and embedded SDKs.
Full-stack proprietary speech tech
Enterprise focus, custom pricing
- voice-ai
- conversational-ai
- customer-service
- enterprise
- +1
Open
View LOVO AI details
VoiceFREEMIUM
LOVO AI
LOVO, Inc.
AI voice generation studio with voice cloning and a built-in video editor.
LOVO AI's Genny platform generates text-to-speech voiceovers in 500+ voices across 100+ languages, with voice cloning from short samples and directable, expressive delivery. The browser-based studio bundles an online video editor, auto subtitles, an AI script writer and a developer API, targeting creators producing YouTube videos, e-learning, podcasts and ads.
Large voice library across many languages
Credit limits constrain heavy use
- text-to-speech
- voice-cloning
- voiceover
- video-editing
- +1
Open
View Cognigy details
SupportPAID
Cognigy
NiCE
Enterprise conversational & agentic AI for voice and chat customer service.
Cognigy is a customer-service AI platform that builds voice and chat agents on a low-code flow builder, with an Agent Copilot that assists human agents in real time. It's LLM-agnostic, integrates with contact-center stacks, and runs as managed cloud or self-hosted. Enterprises like Adidas, Nestlé and Toyota use it to automate front- and back-office customer interactions.
Voice and chat agents in one platform
Enterprise sales motion; no public pricing
- customer-service
- voice-agents
- conversational-ai
- contact-center
Open
View LMNT details
VoiceFREEMIUM
LMNT
LMNT
Streaming text-to-speech with voice cloning for real-time apps.
LMNT is an AI text-to-speech platform that turns text into natural speech with ultra-low latency, built for conversational agents, games, and real-time apps. It supports instant voice cloning from a short sample and multilingual synthesis, and is exposed as a developer API plus a web playground. It is offered as a built-in voice provider across major voice-agent frameworks.
Multilingual streaming synthesis
Smaller voice library than ElevenLabs
- text-to-speech
- voice-cloning
- low-latency
- tts-api
Open
View pyannoteAI details
AudioFREEMIUMOpen core
pyannoteAI
pyannoteAI
Speaker intelligence — diarization that tells who spoke when.
pyannoteAI turns conversational audio into speaker-attributed transcripts: it identifies speakers, separates overlapping voices, and provides speaker metadata. Built on the widely used open-source pyannote.audio library, it adds a premium REST API and Python SDK with higher accuracy and near real-time speed.
State-of-the-art diarization accuracy
Diarization only, not transcription
- audio
- speaker-diarization
- speech
- open-source
- +1
Open
View Neuphonic details
VoiceFREEMIUMOpen core
Neuphonic
Neuphonic
Ultra-low-latency text-to-speech that runs on-device.
Neuphonic is a voice-AI company building text-to-speech and voice cloning that run locally with very low latency. Its cloud API targets real-time voice agents, and in October 2025 it open-sourced NeuTTS Air, a 748M-parameter speech language model that runs on CPU via llama.cpp and clones a voice from a few seconds of audio. Aimed at private, offline, and voice-agent use cases.
On-device, CPU-only synthesis
Cloud API pricing not clearly published
- text-to-speech
- voice-cloning
- on-device
- open-source
Open
View WellSaid details
VoiceFREEMIUM
WellSaid
WellSaid Labs
Enterprise AI text-to-speech with voices licensed from real voice actors.
An enterprise-grade AI voice generator that produces realistic voiceovers from scripts. It offers 120+ voices across languages and accents — modeled on licensed recordings by real voice actors — plus a studio for script import and audio tuning, team workspaces, pronunciation libraries, Adobe integrations, and an API for products, LMS platforms, and IVRs.
120+ voices across languages and accents
Voiceover-focused, not conversational/agent TTS
- text-to-speech
- voiceover
- tts
- enterprise
- +1
Open
View Smallest.ai details
VoiceFREEMIUM
Smallest.ai
Smallest.ai
Real-time voice AI: fast TTS and production phone agents.
An enterprise voice-AI platform built on deliberately small, fast speech models. Waves handles text-to-speech, voice cloning, and conversion in 30+ languages, while Atoms is a real-time voice-agent platform that plugs into business systems for support, lead qualification, and outbound calls. The company says it can generate 10 seconds of speech in about 100 milliseconds for sub-second voicebot responsiveness.
Very low TTS latency
Younger, smaller company
- tts
- voice-agents
- voice-cloning
- low-latency
Open
View Soniox details
VoiceFREEMIUM
Soniox
Soniox
One speech AI API for real-time transcription, TTS, and translation.
A multilingual speech platform built on Soniox's own universal recognition model: real-time and async speech-to-text, text-to-speech, and any-to-any speech translation across 60+ languages from a single API. It returns token-level results within milliseconds and keeps transcribing through crosstalk, speaker overlap, and mid-sentence language switches. A consumer app (web and iOS) wraps the same engine for recording, transcription, and notes.
60+ languages, mid-sentence switching
Smaller brand than incumbents
- stt
- transcription
- speech-translation
- real-time
- +1
Open
View Respeecher details
VoiceFREEMIUM
Respeecher
Respeecher
Ethical AI voice cloning and speech-to-speech for film, games, and media.
Respeecher is a synthetic-voice platform built for professional media production, offering voice cloning, speech-to-speech conversion, and text-to-speech. Its speech-to-speech model maps one performer's delivery onto a target voice while preserving the original emotion and timing, and in-house sound professionals refine the output. The company is known for high-profile film, TV, and game work, from Star Wars to Cyberpunk 2077.
Proven in major film/TV (Star Wars)
Premium, media-production oriented
- voice-cloning
- speech-to-speech
- tts
- film
- +1
Open
View ACE Studio details
MusicFREEMIUM
ACE Studio
TimedomAIn
AI music studio for studio-quality singing vocals from MIDI and lyrics.
ACE Studio generates expressive, human-like singing vocals from MIDI and lyrics, edited in a piano-roll where you control pitch, timing, vibrato, pronunciation, dynamics, and emotional intensity. It ships 140+ royalty-free AI voice models across eight languages, plus AI instruments, voice cloning, stem splitting, and choir/ensemble modes. It runs as a standalone desktop app and as VST/AU/AAX plugins, with vocal generation processed in the cloud.
Editable, studio-quality AI vocals
Needs MIDI + lyrics, not just a prompt
- singing-synthesis
- vocals
- voice-cloning
- midi
- +1
Open
View Fish Audio details
AudioFREEMIUM
Fish Audio
Fish Audio
Expressive, emotionally controllable text-to-speech, voice cloning, and voice agents.
Fish Audio is a voice AI platform for real-time text-to-speech with emotion tags, voice cloning from clips as short as 15 seconds, speech-to-text, and end-to-end voice agents. Its flagship S2 model targets natural, expressive, multilingual narration, and the company maintains the open-source Fish Speech (OpenAudio) models on GitHub. A free tier covers personal use, with paid plans and an API for commercial use.
Expressive, emotion-controllable TTS
Hosted platform itself is proprietary
- text-to-speech
- voice-cloning
- speech-to-text
- voice-agents
- +1
Open
View Pipecat details
VoiceFREEMIUMOpen core
Pipecat
Daily
Open-source framework for real-time voice and multimodal AI agents.
Pipecat is a Python framework for building voice and multimodal conversational agents that can listen, speak, and see in real time. It orchestrates streaming speech-to-text, an LLM, and text-to-speech into one low-latency pipeline, wiring together 40+ AI services with no vendor lock-in. Client SDKs cover JavaScript, React, React Native, Swift, Kotlin, C++, and ESP32, and Pipecat Cloud offers managed hosting at scale.
Self-hostable, no vendor lock-in
Python framework, not a no-code tool
- voice-agents
- conversational-ai
- real-time
- open-source
- +1
Open
View Wonderful details
SupportPAID
Wonderful
Wonderful
Culturally fluent AI customer-service agents for global enterprises.
Wonderful is an enterprise platform for AI customer-service agents that handle inquiries across chat, voice, and email — and take action to resolve them. Its distinctive bet is localization: agents are tuned per market for language, cultural norms, and regulation, with local teams sent to manage deployment, aimed squarely at non-English-speaking markets. Used across telecom, finance, healthcare, and manufacturing.
Culturally and linguistically tuned agents
Enterprise sales-only
- customer-support
- agents
- enterprise
- multilingual
- +1
Open
View Willow Voice details
VoiceFREEMIUM
Willow Voice
Willow
AI dictation that turns speech into formatted text in any app — Mac, Windows, iPhone.
Willow Voice is an AI dictation app that converts speech into written text across any application, with context-aware formatting, custom dictionaries, and automatic filler-word removal. An AI mode rewrites rough spoken notes into polished writing, and a style-matching feature adapts tone per app. It runs on Mac, Windows, and iPhone, supports 100+ languages, and offers an optional offline mode on Mac and iOS for privacy.
Free tier with weekly word allowance
Cloud transcription by default
- dictation
- speech-to-text
- voice-typing
- productivity
Open
View Rime details
VoiceFREEMIUM
Rime
Rime
Enterprise text-to-speech built for real-time voice agents.
Rime builds AI voice models for high-stakes business conversations like IVRs, contact centers, and AI phone agents. Its Arcana and Mist models target ultra-low latency and natural, conversational delivery, with deterministic pronunciation control so terms are spoken consistently without retraining. Rime can be deployed on-prem, in a VPC, or via cloud API, and is offered directly or through voice-AI partner platforms.
Very low latency for real-time voice agents
Enterprise-focused, not a consumer tool
- text-to-speech
- voice-ai
- tts
- contact-center
- +1
Open
View Inworld AI details
VoiceFREEMIUM
Inworld AI
Inworld AI
A full-stack voice runtime for building human-sounding AI agents.
A developer platform for real-time voice AI — an integrated STT + LLM + TTS pipeline exposed through REST and WebSocket APIs (OpenAI Realtime-compatible) for companions, character chat, support, and phone agents. Beyond the voice stack it offers a model Router, inference, and compute, with cloud and enterprise on-prem deployment.
Integrated full-stack voice pipeline
Developer API, not an end-user app
- voice-agents
- tts
- stt
- realtime
- +1
Open
View Gladia details
VoiceFREEMIUM
Gladia
Gladia
Real-time speech-to-text and audio intelligence through a single API.
End-to-end audio infrastructure to record, transcribe, and enrich speech via one API — real-time streaming under ~300ms latency, batch transcription, diarization, translation, and summarization across 100+ languages. Reengineered Whisper for production before shipping its own Solaria models, with EU data residency for compliance-bound teams.
Low-latency real-time streaming
API-only, no end-user app
- stt
- transcription
- streaming
- diarization
- +1
Open
View Voiceflow details
AgentFREEMIUM
Voiceflow
Voiceflow, Inc.
No-code platform to build, launch, and scale AI agents.
Voiceflow is a no-code platform for designing, testing, and shipping chat and voice AI agents for customer support, lead generation, and beyond. A visual canvas lets designers, PMs, and developers collaborate on conversation logic, connect a knowledge base for RAG, and deploy to web widgets, phone, or any channel via API. It supports multiple model providers and scales to enterprise call and message volumes.
Visual builder for chat and voice agents
Advanced/scaled usage gets pricey
- no-code
- agent-builder
- chatbot
- voice
- +1
Open
View Async details
AudioFREEMIUM
Async
Async
AI studio to record, edit, and produce podcasts and video.
Async is a browser-based creative studio for recording, editing, and producing podcasts and video, with studio-quality remote recording, text-based editing, AI enhancement, and multilingual dubbing. Its Revoice feature clones a voice from a short sample to fix mistakes without re-recording, and a developer Voice API exposes its 1,000+ AI voices. It was known as Podcastle until a 2026 rebrand.
All-in-one record, edit, and publish workflow
No Android app — web only on Android
- podcasting
- voice-clone
- audio-editing
- dubbing
Open
View Synthflow details
VoiceFREEMIUM
Synthflow
Synthflow AI
No-code platform for AI voice agents that automate phone calls.
Synthflow is an enterprise voice-AI platform for building and deploying AI agents that handle phone calls — inbound and outbound — without code. A visual flow designer, in-house telephony, real-time monitoring, and 200+ integrations across calendars, CRMs, and telephony let teams stand up agents for customer service, appointment scheduling, and lead qualification. It runs on a pay-as-you-go model: build and test free, then pay per call once live.
No-code visual flow designer
Focused on phone/voice, not broad chat
- voice-agents
- no-code
- telephony
- contact-center
- +1
Open
View LiveKit details
InfraFREEMIUMOpen core
LiveKit
LiveKit, Inc.
Open-source framework and cloud for realtime voice, video, and physical AI agents.
LiveKit is the realtime infrastructure layer most voice-AI stacks are built on. Its open-source Agents framework wires together streaming speech-to-text, an LLM, and text-to-speech with reliable turn detection, interruption handling, and telephony, so agents can join a session and converse in near real time. You bring your own STT, LLM, and TTS providers; LiveKit handles the low-latency WebRTC transport. It runs self-hosted under Apache 2.0 or as a managed cloud platform.
Powers ChatGPT Advanced Voice in production
Developer infrastructure, not a no-code product
- voice-ai
- webrtc
- realtime
- agents
- +2
Open
View Aqua Voice details
VoiceFREEMIUM
Aqua Voice
Aqua Voice
Fast, accurate voice typing for any app.
A desktop dictation app that turns speech into text across any application, powered by its in-house Avalon model and tuned for low latency. Adds a custom dictionary and formatting so dictated text comes out clean, with a free tier and unlimited paid plans.
Types into any app on your desktop
Free tier capped at 1,000 words/mo
- dictation
- speech-to-text
- voice-typing
- low-latency
Open
View superwhisper details
VoiceFREEMIUM
superwhisper
superwhisper
Turn your voice into polished text in any app.
A voice-to-text dictation app that transcribes speech into formatted, cleaned-up text system-wide across any application. Per-mode model selection lets you assign different local or cloud models to different tasks, with meeting recording, file transcription, and custom vocabulary on top.
Free tier with on-device models
Cloud models gated to Pro
- dictation
- speech-to-text
- on-device
- macos
Open
View CAMB.AI details
TranslationFREEMIUM
CAMB.AI
CAMB.AI
Real-time AI dubbing, translation, and voice for content, entertainment, and live sports.
A localization AI platform for real-time dubbing, translation, and multilingual voice generation. Its in-house MARS speech models power live commentary for broadcasters and on-demand dubbing for film and OTT, preserving each speaker's voice and emotional tone. Spans live and file-based dubbing, text-to-speech, subtitles, and image translation across 50+ languages.
Real-time dubbing of live streams
Broadcast/enterprise-first focus
- dubbing
- localization
- live-sports
- text-to-speech
Open
View Corti details
HealthcareFREEMIUM
Corti
Corti
Clinical-grade AI platform and APIs for healthcare developers.
Corti is an AI platform for healthcare and life-science developers, offering clinical-grade APIs for medical speech-to-text, coding, and text generation plus an agentic framework with pre-built agents. Its proprietary Symphony model family powers more than a million clinical interactions a week, and the platform ships with FedRAMP, HIPAA, SOC 2, and ISO 13485 compliance plus an EU sovereign-cloud option. The Copenhagen company first made its name detecting cardiac arrests in live emergency calls.
Free $50 credits to prototype
Usage-based costs add up at scale
- healthcare
- clinical-ai
- speech-to-text
- medical-coding
- +1
Open
View PolyAI details
VoicePAID
PolyAI
PolyAI
Lifelike enterprise voice AI agents for customer service calls.
PolyAI's Agentic Dialog Platform builds voice agents that resolve customer service calls for large enterprises in banking, healthcare, hospitality, telecom, and retail. Agent Studio gives non-technical teams a no-code builder while the ADK exposes developer APIs, with SOC 2, HIPAA, GDPR, and PCI DSS compliance built in. Its proprietary Raven model is trained on over a billion enterprise conversations.
Handles complex multi-turn calls
Custom enterprise contracts only
- voice-agents
- customer-service
- contact-center
- enterprise
Open
View Resemble AI details
VoiceFREEMIUM
Resemble AI
Resemble AI
Voice cloning, audio watermarking, and deepfake detection in one platform.
Resemble AI spans both sides of synthetic voice: generating it and policing it. The platform offers voice cloning and text-to-speech built on its Chatterbox models, real-time audio watermarking, and Detect, a multimodal deepfake detector covering audio, image, and video. It deploys in the cloud or fully on-premises for regulated environments.
Generation + detection in one
Limited free tier
- voice-cloning
- deepfake-detection
- watermarking
- tts
- +1
Open
View Murf AI details
VoiceFREEMIUM
Murf AI
Murf AI
AI voice generator studio with dubbing and a low-latency TTS API.
Murf AI is a text-to-speech platform pairing a studio editor — 200+ voices across 35+ languages, voice cloning, dubbing, and a voice changer — with developer APIs. Its Gen 2 speech model focuses on pronunciation accuracy and granular voice controls, while the Falcon API targets sub-130ms latency for real-time voice agents. Integrations include Canva, PowerPoint, and Google Slides.
200+ voices, 35+ languages
Commercial rights need a paid plan
- text-to-speech
- voiceover
- dubbing
- voice-cloning
Open
View Freed details
HealthcarePAID
Freed
Freed
AI medical scribe that turns patient visits into clinical notes and pushes them to EHRs.
Freed is an AI medical scribe for clinicians: it listens to patient visits and produces complete, specialty-templated clinical notes within moments of the encounter. Higher tiers add EHR push integration, visit-prep summaries, referral letters, and ICD-10/CPT coding. It is HIPAA-compliant and SOC 2 Type II certified, and does not store patient recordings.
Notes ready right after the visit
No free tier — 7-day trial only
- medical-scribe
- clinical-notes
- hipaa
- ehr
Open
View Speechmatics details
VoiceFREEMIUM
Speechmatics
Speechmatics
Enterprise speech APIs — real-time STT, TTS, and voice agents.
Speechmatics provides speech-to-text (batch and sub-second real-time), text-to-speech, and a Flow API for building voice agents, with accuracy that holds up across accents and dialects in 55+ languages. The same engine deploys in cloud, container, on-prem, or fully on-device, and it powers products from Adobe, LiveKit, and Ubisoft.
STT, TTS, and voice agents in one API
Pricier than budget STT rivals
- speech-to-text
- voice-agents
- tts
- real-time
Open
View Hippocratic AI details
HealthcarePAID
Hippocratic AI
Hippocratic AI
Safety-focused generative AI healthcare agents for patient-facing, non-diagnostic tasks.
Hippocratic AI builds voice-based generative AI 'healthcare agents' for health systems, payors, and pharma to handle patient-facing tasks like post-discharge check-ins and chronic-care follow-up. By design its agents do not diagnose or prescribe, and it targets enterprise healthcare deployments rather than consumers.
Safety-first, no diagnosis/prescribing
Voice-only, no chat/texting
- clinical-agents
- patient-engagement
- enterprise
- voice
Open
View Cresta details
SupportPAID
Cresta
Cresta
AI agents and real-time agent assist for the contact center.
A contact-center AI platform that combines autonomous AI agents, real-time assistance for human agents, and analytics, coaching, and quality management. It runs omnichannel across voice and digital channels, preserving context between them, and its Agent Operations Center gives supervisors live visibility into both human and AI-led conversations. Used by enterprises including United Airlines and Cox Communications.
Real-time human agent assist
Enterprise sales-only pricing
- contact-center
- agent-assist
- customer-support
- enterprise
Open
View Wondercraft details
AudioFREEMIUM
Wondercraft
Wondercraft
AI audio studio that turns ideas into produced podcasts and audiobooks.
Wondercraft is a browser-based AI audio studio that turns ideas, documents, or URLs into fully produced audio — podcasts, audiobooks, and ads — complete with scripts, natural-sounding voices, music, and sound effects. It offers hundreds of AI voices across dozens of languages plus voice cloning, with a timeline editor for assembling and revising episodes. Free for individuals, with paid creator and business tiers.
End-to-end podcast/audiobook production
Output can sound templated
- podcast
- audio-generation
- text-to-speech
- voice-cloning
Open
View Kits AI details
AudioFREEMIUM
Kits AI
Kits AI
Studio-quality AI voice cloning and music tools for artists and producers.
Kits AI lets musicians clone or use studio-quality singing and speaking voices, convert vocals between voices, and run a suite of audio tools (stem splitting, mastering, vocal cleanup). Voice models are ethically licensed from artists with revenue-sharing rather than scraped. A free plan is available, with paid Starter, Producer, and Professional tiers.
Ethically licensed artist voices with payouts
Cloned-voice quality reviews are mixed
- voice-cloning
- music
- vocals
- stem-separation
- +1
Open
View Bland AI details
VoiceFREEMIUM
Bland AI
Bland
Enterprise voice AI platform for automated phone calls.
An enterprise platform for building AI agents that handle inbound and outbound phone calls 24/7 with natural, real-time conversation. Bland bundles the language model, speech-to-text, text-to-speech, and telephony into a single per-minute rate, and integrates with tools like Twilio, Salesforce, HubSpot, and Zapier. Build agents in a web dashboard or via a REST API; a free tier lets you start before moving to usage-based paid tiers.
Bundles LLM, STT, TTS, and telephony
Voice quality a tier below Retell/tuned Vapi
- voice-agents
- phone-calls
- conversational-ai
- telephony
Open
View AssemblyAI details
VoiceFREEMIUM
AssemblyAI
AssemblyAI
Production speech-to-text + audio intelligence API.
Speech recognition API with batch and real-time streaming transcription, speaker diarization, and language detection. Its Universal models pair with optional Speech Understanding features (summarization, sentiment, redaction) so a single API can build conversation-intelligence products. Starts with a free credit and pay-as-you-go, per-second billing.
High transcription accuracy
Cloud-only, no self-host
- stt
- transcription
- streaming
- audio-intelligence
Open
View Retell AI details
VoiceFREEMIUM
Retell AI
Retell AI
Build, test, and deploy AI voice agents for phone calls.
A no-code platform for humanlike voice agents that handle inbound and outbound phone calls — receptionists, IVR, and outbound campaigns. It bundles telephony (SIP / Twilio), a proprietary turn-taking model for low-latency conversations, prompts, tools, and call analytics. Pay-as-you-go pricing with free starter credits.
Inbound and outbound call handling
Per-minute costs stack with LLM/TTS
- voice-agents
- telephony
- call-automation
- no-code
Open
View Krisp details
AudioFREEMIUM
Krisp
Krisp
On-device AI noise cancellation, transcription, and meeting notes.
Voice AI platform that removes background noise, transcribes calls, and generates meeting notes. It installs as a virtual microphone/speaker, so noise cancellation works across Zoom, Teams, Meet, and 800+ other apps without joining as a bot. Also offers accent conversion and a call-center product on the same engine.
Strong real-time noise removal
Transcription covers ~16 languages only
- noise-cancellation
- meeting-notes
- transcription
- voice
- +1
Open
View Sesame details
VoiceFREEOSS
Sesame
Sesame
Conversational voice companion chasing "voice presence."
A conversational-speech company building lifelike voice companions — Maya and Miles — that interrupt, self-correct, and use natural pacing. The web demo lets you talk to them in real time, and Sesame has open-sourced its underlying CSM (Conversational Speech Model) base model. Co-founded by Oculus co-creator Brendan Iribe.
Open Apache-2.0 CSM-1B base model
Demo only; no production API yet
- voice
- conversational
- companion
- speech
- +1
Open
View Tavus details
VideoFREEMIUM
Tavus
Tavus
Real-time conversational video AI and digital human replicas.
A developer platform for building face-to-face AI agents that see, listen, and respond in live video through its Conversational Video Interface (CVI). It also generates personalized videos at scale from digital replicas of a real person. Built on Tavus's own models — Phoenix for rendering, Raven for perception, and Sparrow for conversational timing — with the ability to plug in custom LLMs and text-to-speech.
Live face-to-face AI video
Developer-first, not no-code
- video
- avatars
- digital-twin
- conversational
Open
View Wispr Flow details
VoiceFREEMIUM
Wispr Flow
Wispr
AI voice dictation that types for you across every app, on desktop and mobile.
A dictation tool that turns speech into clean, formatted text in any app — removing filler words and applying context-aware edits as you talk. One subscription works across macOS, Windows, iOS, and Android, syncing your custom vocabulary and snippets between devices.
Types into any app system-wide
Subscription required for full use
- dictation
- voice
- speech-to-text
- productivity
Open
View Hume AI details
VoiceFREEMIUM
Hume AI
Hume AI
Empathic Voice Interface — speech-to-speech AI that hears tone.
A voice AI toolkit built around the Empathic Voice Interface (EVI), a speech-to-speech model that infers emotion and prosody from a user's voice and modulates its replies accordingly. Exposed as an API for building expressive voice agents and assistants. From a research lab focused on emotional intelligence in AI.
Emotion/prosody-aware voice interface
Emotion inference accuracy is contested
- voice
- speech-to-speech
- emotion
- api
Open
View Speechify details
VoiceFREEMIUM
Speechify
Speechify
AI text-to-speech that reads any document, PDF, or page aloud.
Speechify is an AI text-to-speech app that turns articles, PDFs, emails, and books into natural-sounding audio with high-definition voices, adjustable speed, and OCR for scanned text. It runs on iOS, Android, web, a browser extension, and desktop, and offers a separate Studio product plus a text-to-speech API for developers.
OCR reads scanned text and PDFs
Best features behind paywall
- text-to-speech
- read-aloud
- accessibility
- voice-cloning
Open
View Deepgram details
VoiceFREEMIUM
Deepgram
Deepgram
Production speech-to-text. The STT default for many companies.
End-to-end speech recognition platform — real-time streaming, batch transcription, speaker diarization, and language detection. Strong on accented speech, telephony audio, and long-form recordings.
Strong on accented/telephony audio
API-only, no end-user app
- stt
- transcription
- streaming
- diarization
Open
View ElevenLabs details
VoiceFREEMIUM
ElevenLabs
ElevenLabs
Text-to-speech, voice cloning, and multilingual dubbing.
Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.
Best-in-class voice realism
Pricier than commodity TTS at scale
- tts
- voice-cloning
- dubbing
- multilingual
Open
View Cartesia details
VoiceFREEMIUM
Cartesia
Cartesia
Low-latency streaming text-to-speech for real-time voice.
Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.
Streaming over WebSocket for fast first audio
Long-form expressive texture trails ElevenLabs
- tts
- streaming
- low-latency
- real-time
Open
View Vapi details
VoiceFREEMIUM
Vapi
Vapi
Voice agent infrastructure. Build a phone-agent in a weekend.
Production voice-agent platform — telephony, STT, LLM, TTS, and interrupt handling stitched together so you call an endpoint and get a working phone agent. Pluggable models at every layer.
Telephony and interrupts handled
Per-minute costs stack across layers
- voice-agents
- telephony
- phone
- real-time
Open

Voice AI apps

TurboScribe

Phonic

Podcastle

Regal

SoundHound AI

LOVO AI

Cognigy

LMNT

pyannoteAI

Neuphonic

WellSaid

Smallest.ai

Soniox

Respeecher

ACE Studio

Fish Audio

Pipecat

Wonderful

Willow Voice

Rime

Inworld AI

Gladia

Voiceflow

Async

Synthflow

LiveKit

Aqua Voice

superwhisper

CAMB.AI

Corti

PolyAI

Resemble AI

Murf AI

Freed

Speechmatics

Hippocratic AI

Cresta

Wondercraft

Kits AI

Bland AI

AssemblyAI

Retell AI

Krisp

Sesame

Tavus

Wispr Flow

Hume AI

Speechify

Deepgram

ElevenLabs

Cartesia

Vapi