Skip to content

VoiceHume AI

Hume AI

Emotionally intelligent voice AI — speech-to-speech (EVI) and expressive TTS (Octave).

Categories
VoiceAudio
Pricing
FREEMIUM
Hosting
Cloud
Platforms
WebAPI
Models
Multi-model
Verified
Jun 16, 2026

Hume builds voice AI tuned to emotional expression. Its Empathic Voice Interface (EVI) is a speech-to-speech system that reads vocal tone, handles interruptions and back-channeling, and can front any external LLM. Octave is its expressive text-to-speech model with voice design, cloning, and modulation.

Pros & cons

  • Emotionally expressive speech-to-speech
  • Low-latency, interruptible conversations
  • Works with external LLMs
  • Octave TTS with voice design
  • Core EVI and Octave are closed-source
  • Usage-based costs can add up
  • Emotion inference can be imperfect
  • Smaller ecosystem than big TTS vendors

Tags

Further reading

View all Voice
  • View Cartesia details
    VoiceFREEMIUM

    Cartesia

    Cartesia

    Low-latency streaming TTS. Sub-100ms first audio.

    Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.

    Worth knowing

    Founded in 2023 by the Stanford AI Lab team behind state-space models and Mamba, incl. Albert Gu and Karan Goel.

    • tts
    • streaming
    • low-latency
    • real-time
  • View Vapi details
    VoiceFREEMIUM

    Vapi

    Vapi

    Voice agent infrastructure. Build a phone-agent in a weekend.

    Production voice-agent platform — telephony, STT, LLM, TTS, and interrupt handling stitched together so you call an endpoint and get a working phone agent. Pluggable models at every layer.

    Worth knowing

    Hit a ~$500M valuation in 2026 after Amazon picked it to power Ring's voice AI over 40 rival platforms; it has handled 1B+ calls.

    • voice-agents
    • telephony
    • phone
    • real-time
  • View Bland AI details
    VoicePAID

    Bland AI

    Bland

    Build, run, and monitor AI phone agents that hold real conversations at scale.

    Enterprise voice-AI platform for automating high-volume inbound and outbound phone calls. Bland runs the full stack — language model, speech-to-text, text-to-speech, and telephony — on its own infrastructure, with conversational pathways, voice cloning, and webhook integrations. Positioned for regulated industries with SOC 2, HIPAA, and PCI compliance.

    Worth knowing

    YC S23 startup that has raised $65M total — a $40M Series B led by Emergence Capital landed in January 2025.

    • voice-agents
    • phone
    • conversational-ai
    • enterprise