InfraLiveKit, Inc.

LiveKit

Open-source framework and cloud for realtime voice, video, and physical AI agents.

Categories: InfraVoiceAgent
Pricing: FREEMIUM
Source: Open core
Hosting: Hybrid
Platforms: WebAPICLI
Models: Model-agnostic
Verified: Jun 13, 2026

LiveKit is the realtime infrastructure layer most voice-AI stacks are built on. Its open-source Agents framework wires together streaming speech-to-text, an LLM, and text-to-speech with reliable turn detection, interruption handling, and telephony, so agents can join a session and converse in near real time. You bring your own STT, LLM, and TTS providers; LiveKit handles the low-latency WebRTC transport. It runs self-hosted under Apache 2.0 or as a managed cloud platform.

Capabilities 5

What it actually does — grouped by capability family.

Voice agent (primary capability)
Tool / function calling (secondary capability)

App / agent deployment (secondary capability)

Transcription (STT) (secondary capability)
Speech synthesis (TTS) (secondary capability)

Pros & cons

Powers ChatGPT Advanced Voice in production
Self-hostable, with telephony built in
BYO STT/LLM/TTS — no model lock-in
Reliable turn detection and interruptions
Managed cloud option alongside the OSS

Developer infrastructure, not a no-code product
You assemble and pay for STT/LLM/TTS separately
Realtime media ops add operational complexity

View Vapi details
VoiceFREEMIUM
Vapi
Vapi
Voice agent infrastructure. Build a phone-agent in a weekend.
Production voice-agent platform — telephony, STT, LLM, TTS, and interrupt handling stitched together so you call an endpoint and get a working phone agent. Pluggable models at every layer.
Telephony and interrupts handled
Per-minute costs stack across layers
- voice-agents
- telephony
- phone
- real-time
Open
View Retell AI details
VoiceFREEMIUM
Retell AI
Retell AI
Build, test, and deploy AI voice agents for phone calls.
A no-code platform for humanlike voice agents that handle inbound and outbound phone calls — receptionists, IVR, and outbound campaigns. It bundles telephony (SIP / Twilio), a proprietary turn-taking model for low-latency conversations, prompts, tools, and call analytics. Pay-as-you-go pricing with free starter credits.
Inbound and outbound call handling
Per-minute costs stack with LLM/TTS
- voice-agents
- telephony
- call-automation
- no-code
Open
View Cartesia details
VoiceFREEMIUM
Cartesia
Cartesia
Low-latency streaming text-to-speech for real-time voice.
Streaming-first speech synthesis built around the Sonic family of state-space models. Aims at real-time agent voices where latency between turns is the product. Strong choice for sub-200ms voice loops.
Streaming over WebSocket for fast first audio
Long-form expressive texture trails ElevenLabs
- tts
- streaming
- low-latency
- real-time
Open

Open LiveKit

LiveKit

Capabilities 5

Pros & cons

Tags

Further reading

Vapi

Retell AI

Cartesia