Skip to content

VoiceSoniox

Soniox

One speech AI API for real-time transcription, TTS, and translation.

Categories
VoiceTranslation
Pricing
FREEMIUM
Hosting
Cloud
Platforms
APIWebiOS
Models
Single model (proprietary)
Verified
Jun 15, 2026

A multilingual speech platform built on Soniox's own universal recognition model: real-time and async speech-to-text, text-to-speech, and any-to-any speech translation across 60+ languages from a single API. It returns token-level results within milliseconds and keeps transcribing through crosstalk, speaker overlap, and mid-sentence language switches. A consumer app (web and iOS) wraps the same engine for recording, transcription, and notes.

Pros & cons

  • 60+ languages, mid-sentence switching
  • Real-time + async in one API
  • Speech-to-speech translation
  • Low per-hour pricing
  • Smaller brand than incumbents
  • Free credits tightened over abuse
  • Token-based pricing takes math

Tags

Further reading

View all Voice
  • View Deepgram details
    VoiceFREEMIUM

    Deepgram

    Deepgram

    Production speech-to-text. The STT default for many companies.

    End-to-end speech recognition platform — real-time streaming, batch transcription, speaker diarization, and language detection. Strong on accented speech, telephony audio, and long-form recordings.

    Worth knowing

    Co-founded by a particle physicist who'd built a dark-matter detector two miles underground before pivoting to speech.

    • stt
    • transcription
    • streaming
    • diarization
  • View AssemblyAI details
    VoiceFREEMIUM

    AssemblyAI

    AssemblyAI

    Production speech-to-text + audio intelligence API.

    Speech recognition API with batch and real-time streaming transcription, speaker diarization, and language detection. Its Universal models pair with optional Speech Understanding features (summarization, sentiment, redaction) so a single API can build conversation-intelligence products. Starts with a free credit and pay-as-you-go, per-second billing.

    Worth knowing

    Raised a $50M Series C in 2023 (Accel-led, with Nat Friedman and Daniel Gross); a Y Combinator alum.

    • stt
    • transcription
    • streaming
    • audio-intelligence
  • View Speechmatics details
    VoiceFREEMIUM

    Speechmatics

    Speechmatics

    Enterprise speech APIs — real-time STT, TTS, and voice agents in 55+ languages.

    Speechmatics provides speech-to-text (batch and sub-second real-time), text-to-speech, and a Flow API for building voice agents, with accuracy that holds up across accents and dialects in 55+ languages. The same engine deploys in cloud, container, on-prem, or fully on-device, and it powers products from Adobe, LiveKit, and Ubisoft.

    Worth knowing

    Founded in Cambridge in 2006 by Tony Robinson, a 1980s pioneer of recurrent-neural-network speech recognition.

    • speech-to-text
    • voice-agents
    • tts
    • real-time
  • View Gladia details
    VoiceFREEMIUM

    Gladia

    Gladia

    Real-time speech-to-text and audio intelligence through a single API.

    End-to-end audio infrastructure to record, transcribe, and enrich speech via one API — real-time streaming under ~300ms latency, batch transcription, diarization, translation, and summarization across 100+ languages. Reengineered Whisper for production before shipping its own Solaria models, with EU data residency for compliance-bound teams.

    Worth knowing

    Paris startup, founded 2022; seed-backed by Sequoia, raised a $16M Series A in 2024, and runs EU-hosted with full GDPR data residency.

    • stt
    • transcription
    • streaming
    • diarization
    • +1