Skip to content

VoiceInworld AI

Inworld AI

A full-stack voice runtime for building human-sounding AI agents.

Category
Voice
Pricing
FREEMIUM
Hosting
Cloud
Platforms
API
Models
Multi-model
Verified
Jun 14, 2026

A developer platform for real-time voice AI — an integrated STT + LLM + TTS pipeline exposed through REST and WebSocket APIs (OpenAI Realtime-compatible) for companions, character chat, support, and phone agents. Beyond the voice stack it offers a model Router, inference, and compute, with cloud and enterprise on-prem deployment.

Pros & cons

  • Integrated full-stack voice pipeline
  • OpenAI Realtime-compatible API
  • Aggressive usage-based pricing at scale
  • Free on-demand tier for prototyping
  • Developer API, not an end-user app
  • Pivoted from its original character-engine focus
  • Voice quality varies by model tier

Tags

Further reading

View all Voice
  • View Vapi details
    VoiceFREEMIUM

    Vapi

    Vapi

    Voice agent infrastructure. Build a phone-agent in a weekend.

    Production voice-agent platform — telephony, STT, LLM, TTS, and interrupt handling stitched together so you call an endpoint and get a working phone agent. Pluggable models at every layer.

    Worth knowing

    Hit a ~$500M valuation in 2026 after Amazon picked it to power Ring's voice AI over 40 rival platforms; it has handled 1B+ calls.

    • voice-agents
    • telephony
    • phone
    • real-time
  • View Retell AI details
    VoiceFREEMIUM

    Retell AI

    Retell AI

    Build, test, and deploy AI voice agents for phone calls.

    A no-code platform for humanlike voice agents that handle inbound and outbound phone calls — receptionists, IVR, and outbound campaigns. It bundles telephony (SIP / Twilio), a proprietary turn-taking model for low-latency conversations, prompts, tools, and call analytics. Pay-as-you-go pricing with free starter credits.

    Worth knowing

    Founded 2023 by ex-ByteDance, Google and Meta alumni; a YC W24 startup at ~$40M annualized revenue with a ~25-person team.

    • voice-agents
    • telephony
    • call-automation
    • no-code
  • View Cartesia details
    VoiceFREEMIUM

    Cartesia

    Cartesia

    Low-latency streaming TTS. Sub-100ms first audio.

    Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.

    Worth knowing

    Founded in 2023 by the Stanford AI Lab team behind state-space models and Mamba, incl. Albert Gu and Karan Goel.

    • tts
    • streaming
    • low-latency
    • real-time
  • View ElevenLabs details
    VoiceFREEMIUM

    ElevenLabs

    ElevenLabs

    Frontier TTS, voice cloning, and dubbing. Industry default.

    Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.

    Worth knowing

    Founded in 2022 by two Polish friends (ex-Google and ex-Palantir); a 2026 raise valued it at $11B.

    • tts
    • voice-cloning
    • dubbing
    • multilingual