Skip to content

VoiceGladia

Gladia

Real-time speech-to-text and audio intelligence through a single API.

Category
Voice
Pricing
FREEMIUM
Hosting
Cloud
Platforms
API
Models
Single model (proprietary)
Verified
Jun 14, 2026

End-to-end audio infrastructure to record, transcribe, and enrich speech via one API — real-time streaming under ~300ms latency, batch transcription, diarization, translation, and summarization across 100+ languages. Reengineered Whisper for production before shipping its own Solaria models, with EU data residency for compliance-bound teams.

Pros & cons

  • Low-latency real-time streaming
  • 100+ languages with strong accent handling
  • EU data residency (GDPR, HIPAA, SOC 2)
  • Generous free tier and pay-as-you-go
  • API-only, no end-user app
  • Proprietary models
  • Younger than incumbent STT rivals

Tags

Further reading

View all Voice
  • View Deepgram details
    VoiceFREEMIUM

    Deepgram

    Deepgram

    Production speech-to-text. The STT default for many companies.

    End-to-end speech recognition platform — real-time streaming, batch transcription, speaker diarization, and language detection. Strong on accented speech, telephony audio, and long-form recordings.

    Worth knowing

    Co-founded by a particle physicist who'd built a dark-matter detector two miles underground before pivoting to speech.

    • stt
    • transcription
    • streaming
    • diarization
  • View AssemblyAI details
    VoiceFREEMIUM

    AssemblyAI

    AssemblyAI

    Production speech-to-text + audio intelligence API.

    Speech recognition API with batch and real-time streaming transcription, speaker diarization, and language detection. Its Universal models pair with optional Speech Understanding features (summarization, sentiment, redaction) so a single API can build conversation-intelligence products. Starts with a free credit and pay-as-you-go, per-second billing.

    Worth knowing

    Raised a $50M Series C in 2023 (Accel-led, with Nat Friedman and Daniel Gross); a Y Combinator alum.

    • stt
    • transcription
    • streaming
    • audio-intelligence
  • View Speechmatics details
    VoiceFREEMIUM

    Speechmatics

    Speechmatics

    Enterprise speech APIs — real-time STT, TTS, and voice agents in 55+ languages.

    Speechmatics provides speech-to-text (batch and sub-second real-time), text-to-speech, and a Flow API for building voice agents, with accuracy that holds up across accents and dialects in 55+ languages. The same engine deploys in cloud, container, on-prem, or fully on-device, and it powers products from Adobe, LiveKit, and Ubisoft.

    Worth knowing

    Founded in Cambridge in 2006 by Tony Robinson, a 1980s pioneer of recurrent-neural-network speech recognition.

    • speech-to-text
    • voice-agents
    • tts
    • real-time
  • View ElevenLabs details
    VoiceFREEMIUM

    ElevenLabs

    ElevenLabs

    Frontier TTS, voice cloning, and dubbing. Industry default.

    Hosted speech synthesis at near-human quality — TTS, voice cloning, multilingual dubbing, and conversational voice agents. Default choice when you need a voice that sounds like a person, not a robot.

    Worth knowing

    Founded in 2022 by two Polish friends (ex-Google and ex-Palantir); a 2026 raise valued it at $11B.

    • tts
    • voice-cloning
    • dubbing
    • multilingual