Soniox

One speech AI API for real-time transcription, TTS, and translation.

Categories: VoiceTranslation
Pricing: FREEMIUM
Source: Proprietary
Hosting: Cloud
Platforms: APIWebiOS
Models: Single model (proprietary)
Verified: Jun 15, 2026

A multilingual speech platform built on Soniox's own universal recognition model: real-time and async speech-to-text, text-to-speech, and any-to-any speech translation across 60+ languages from a single API. It returns token-level results within milliseconds and keeps transcribing through crosstalk, speaker overlap, and mid-sentence language switches. A consumer app (web and iOS) wraps the same engine for recording, transcription, and notes.

Capabilities 5

What it actually does — grouped by capability family.

Transcription (STT) (primary capability)
Speech synthesis (TTS) (secondary capability)
Speech translation (secondary capability)
Speaker diarization (secondary capability)
Dictation (secondary capability)

Pros & cons

60+ languages, mid-sentence switching
Real-time + async in one API
Speech-to-speech translation
Low per-hour pricing

Smaller brand than incumbents
Free credits tightened over abuse
Token-based pricing takes math

View Deepgram details
VoiceFREEMIUM
Deepgram
Deepgram
Production speech-to-text. The STT default for many companies.
End-to-end speech recognition platform — real-time streaming, batch transcription, speaker diarization, and language detection. Strong on accented speech, telephony audio, and long-form recordings.
Strong on accented/telephony audio
API-only, no end-user app
- stt
- transcription
- streaming
- diarization
Open
View AssemblyAI details
VoiceFREEMIUM
AssemblyAI
AssemblyAI
Production speech-to-text + audio intelligence API.
Speech recognition API with batch and real-time streaming transcription, speaker diarization, and language detection. Its Universal models pair with optional Speech Understanding features (summarization, sentiment, redaction) so a single API can build conversation-intelligence products. Starts with a free credit and pay-as-you-go, per-second billing.
High transcription accuracy
Cloud-only, no self-host
- stt
- transcription
- streaming
- audio-intelligence
Open
View Speechmatics details
VoiceFREEMIUM
Speechmatics
Speechmatics
Enterprise speech APIs — real-time STT, TTS, and voice agents.
Speechmatics provides speech-to-text (batch and sub-second real-time), text-to-speech, and a Flow API for building voice agents, with accuracy that holds up across accents and dialects in 55+ languages. The same engine deploys in cloud, container, on-prem, or fully on-device, and it powers products from Adobe, LiveKit, and Ubisoft.
STT, TTS, and voice agents in one API
Pricier than budget STT rivals
- speech-to-text
- voice-agents
- tts
- real-time
Open
View Gladia details
VoiceFREEMIUM
Gladia
Gladia
Real-time speech-to-text and audio intelligence through a single API.
End-to-end audio infrastructure to record, transcribe, and enrich speech via one API — real-time streaming under ~300ms latency, batch transcription, diarization, translation, and summarization across 100+ languages. Reengineered Whisper for production before shipping its own Solaria models, with EU data residency for compliance-bound teams.
Low-latency real-time streaming
API-only, no end-user app
- stt
- transcription
- streaming
- diarization
- +1
Open

Open Soniox

Soniox

Capabilities 5

Pros & cons

Tags

Further reading

Deepgram

AssemblyAI

Speechmatics

Gladia