ElevenLabs Unveils Scribe: A New Era in Speech-to-Text Technology
ElevenLabs, an AI startup, has launched Scribe, a stand-alone speech-to-text model supporting 99 languages. With a $3.3 billion valuation, Scribe challenges industry leaders with superior accuracy and competitive pricing. The model features smart speaker diarization, word-level timestamps, and auto-tagging of sound events, offering a comprehensive solution for diverse applications. QuarkyByte highlights Scribe's potential to revolutionize speech detection, providing actionable insights for businesses and tech leaders.
In a groundbreaking move, ElevenLabs, an AI startup renowned for its audio-generation capabilities, has launched its first stand-alone speech-to-text model, Scribe. This development comes on the heels of a successful $180 million funding round, placing the company's valuation at an impressive $3.3 billion. Known for its extensive library of voices, ElevenLabs is now poised to challenge industry giants like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI’s Whisper models in the speech detection arena.
Scribe supports over 99 languages at launch, with 25 languages achieving excellent accuracy, boasting a word error rate of less than 5%. Languages such as English, French, German, and Spanish are included in this category, with English achieving a remarkable 97% accuracy rate. The model also features smart speaker diarization, word-level timestamps for precise subtitles, and auto-tagging of sound events, enhancing its utility for diverse applications.
CEO Mati Staniszewski has emphasized the company's commitment to improving speech detection models, stating, "Many people say that speech-to-text is a solved problem. But for many languages, it is pretty bad. We think we can build better speech detection models because we have in-house teams to annotate data and give us quick feedback."
Scribe has demonstrated superior performance in FLEURS & Common Voice benchmark tests, outperforming competitors like Google Gemini 2.0 Flash and Whisper Large V3 across multiple languages. Currently, Scribe is optimized for pre-recorded audio formats, with plans to introduce a low-latency real-time version soon, which will broaden its applicability to include meeting transcriptions and voice note-taking.
Priced competitively at $0.40 per hour of transcribed audio, Scribe offers a cost-effective solution, although some competitors provide lower rates with varying features. QuarkyByte recognizes the potential of ElevenLabs' Scribe in revolutionizing speech-to-text technology, offering businesses, developers, and tech leaders actionable insights and solutions to enhance their operations.
As a thought leader in tech insights, QuarkyByte is committed to providing the latest developments in AI and speech detection technologies. We invite you to explore how ElevenLabs' innovations can be integrated into your business strategies for enhanced communication and operational efficiency.