ElevenLabs

ElevenLabs

In a groundbreaking move, ElevenLabs, an AI leader recently boosted by a hefty $180 million funding round, is expanding its reach beyond audio generation. Already known for powering various voice applications, the company is now venturing into the competitive speech-to-text market with its new offering, Scribe. This bold launch marks ElevenLabs’ ambition to not only excel in audio creation but also to dominate transcription technology, challenging established players in the space.

What Sets ElevenLabs Scribe Apart in Speech Recognition?

With a valuation of $3.3 billion, ElevenLabs is no stranger to the AI speech recognition field. Previously, the company supported numerous speech-to-text solutions through its vast voice library. However, Scribe marks their first standalone speech-to-text model, positioning them against industry giants like Gladia, Speechmatics, AssemblyAI, Deepgram, and even OpenAI’s Whisper. So, what makes Scribe stand out in such a competitive field?

  1. Unmatched Language Coverage: Scribe supports over 99 languages from the start, positioning it as a truly global transcription solution.
  2. Outstanding Accuracy in Key Languages: ElevenLabs claims exceptional accuracy (with a word error rate below 5%) in more than 25 languages, including English (97% accuracy), French, German, Hindi, Japanese, and Spanish. This focus on language precision is a key differentiator. While these claims are impressive, further validation through third-party tests could strengthen confidence in these numbers.
  3. Industry-Leading Performance: In benchmark tests such as FLEURS and Common Voice, Scribe reportedly outperforms top models like Google Gemini 2.0 Flash and Whisper Large V3, highlighting its cutting-edge capabilities. This benchmark success points to a significant leap forward in AI-driven transcription models, offering superior performance that could be crucial in sectors requiring high accuracy, like legal or medical transcription.

ElevenLabs originally developed this speech-to-text technology for its conversational AI platform, but with Scribe, the technology is now available as a standalone model, broadening its user base.

Exploring Scribe’s Unique Features

During a recent interview with Bitcoin World, ElevenLabs CEO Mati Staniszewski discussed the company’s vision for improving speech recognition. He emphasized that the company’s goal is to better understand conversations and not just generate content. Staniszewski also addressed the misconception that speech-to-text is a fully solved issue, particularly for languages where accuracy has historically fallen short. One of the company's key advantages, according to him, lies in its in-house data annotation teams, which contribute to developing superior models.

In addition to core transcription, Scribe offers several standout features:

  • Smart Speaker Diarization: This feature can differentiate between speakers, making it ideal for multi-person conversations.
  • Word-Level Timestamps: Scribe provides precise timestamps for each word, enabling seamless subtitle generation and detailed analysis.
  • Auto-Tagging of Sound Events: The model can detect and tag sound events like laughter and applause, adding valuable context to transcriptions.

Currently, ElevenLabs has integrated Scribe into its studio, allowing users to transcribe video content for subtitles and captions. While it currently supports pre-recorded audio, the company promises that a low-latency, real-time version is coming soon, which could open new possibilities for live meeting transcriptions and voice note-taking.

Pricing and Competition: Is Scribe Worth It?

ElevenLabs offers Scribe at a competitive rate of $0.40 per hour of transcribed audio. While some competitors provide lower pricing, it's important to weigh this against the features offered—particularly the accuracy and language support Scribe provides.

Here's a quick price comparison with other providers:

Provider

Model

Strengths

Pricing (approx. per hour)

ElevenLabs

Scribe

Extensive language support, high accuracy, benchmark performance

$0.40

Deepgram

Nova-2

Real-time transcription, scalability, developer-focused

Varies

AssemblyAI

Conformer-2

Feature-rich, audio intelligence, summarization

Varies

Speechmatics

Global English

High accuracy, accent understanding

Varies

Gladia

Various models

Specialized models, noise robustness

Varies

Pros and Cons

Pros:

  • Global Reach: Supports over 99 languages, making it a versatile tool for international applications.
  • High Accuracy: Claims a low word error rate (under 5%) for over 25 major languages, including English (97% accuracy).
  • Cutting-Edge Performance: Outperforms leading models like Google Gemini and OpenAI’s Whisper in benchmark tests.
  • Smart Diarization: Differentiates speakers, ideal for complex multi-person conversations.
  • Real-Time Capabilities Coming Soon: Upcoming real-time transcription will be a game-changer for live events and meetings.
  • Affordable Pricing: At $0.40 per hour, Scribe offers a competitive price for high-quality transcription.

Cons:

  • Unverified Claims: While the accuracy figures sound impressive, further third-party validation could help solidify trust.
  • Limited Real-Time Support: Currently only available for pre-recorded audio, though a real-time version is promised soon.
  • Pricing Comparison: While affordable, certain competitors may offer lower rates, but with different feature sets or less accuracy.

Conclusion

As the speech-to-text landscape becomes increasingly competitive, ElevenLabs' Scribe is poised to make a strong impact with its unique combination of global language support, exceptional accuracy, and innovative features. By tapping into the growing demand for more accurate and accessible transcription solutions, ElevenLabs has set the stage to become a key player in this market. Industries such as legal, healthcare, and media, which rely heavily on precise transcription, are likely to benefit the most from this technology. As the tool continues to evolve, users can expect even more advanced capabilities, positioning Scribe as a powerful tool for unlocking the full potential of spoken language data.

Please note that Plisio also offers you:

Create Crypto Invoices in 2 Clicks and Accept Crypto Donations

14 integrations

10 libraries for the most popular programming languages

19 cryptocurrencies and 12 blockchains

Ready to Get Started?

Create an account and start accepting payments – no contracts or KYC required. Or, contact us to design a custom package for your business.

Make first step

Always know what you pay

Integrated per-transaction pricing with no hidden fees

Start your integration

Set up Plisio swiftly in just 10 minutes.