Meta’s Massively Multilingual Speech to Redefine Boundaries of Language

Anubhav — Tue, 23 May 2023 04:01:06 +0000

Multilingual speech projects represent a significant leap forward in advancing language technology and promoting global linguistic diversity. These projects utilize AI language models to recognize and generate speech in a wide array of languages, often spanning thousands of diverse linguistic backgrounds. By leveraging innovative approaches, such as incorporating unconventional data sources or employing self-supervised speech representation learning, multilingual speech projects aim to break barriers and empower individuals to communicate, learn, and access information in their native languages.

Meta has decided to put MMS out as an Open Source Project

Meta has unleashed its latest feat in AI language models with the groundbreaking Massively Multilingual Speech (MMS) project, setting it apart from mere ChatGPT replicas. In an unprecedented stride towards innovation, Meta’s MMS boasts the ability to recognize and generate speech in an astounding array of over 4,000 spoken languages, surpassing the capabilities of its predecessors. Not content with keeping this breakthrough under wraps, Meta has decided to open-source MMS, inviting researchers to leverage and expand upon its foundation. In doing so, Meta aims to reign over language diversity preservation and encourage collaborative advancement in the field.

Traditional speech recognition and text-to-speech models require extensive training on vast audio datasets, complete with meticulous transcription labels that facilitate machine learning algorithms. However, many endangered languages, predominantly found outside industrialized nations, lack such comprehensive data, placing them at risk of vanishing altogether. Acknowledging this predicament, Meta adopted an ingenious approach by tapping into translated religious texts. These texts, like the Bible, offer diverse linguistic renditions that have undergone extensive scrutiny for text-based language translation research.

Employing the wav2vec 2.0 model for self-supervised speech representation learning, Meta further refined the data’s usability by training an alignment model. The synergy between unorthodox data sources and self-supervised speech modeling yielded remarkable results. Comparative evaluations against OpenAI‘s Whisper revealed MMS’s superiority, achieving a 50% reduction in word error rate while surpassing Whisper’s language coverage by a staggering factor of 11.

With the release of MMS as an open-source research project, Meta aspires to reverse the concerning trend of technology eroding linguistic diversity, often limiting support to the most common 100 languages favoured by tech giants. Envisioning a world where assistive technology, text-to-speech, and even virtual and augmented reality technologies enable individuals to communicate and learn in their native tongues, Meta hopes to inspire the preservation and vitality of languages worldwide.

RELATED:

(Via)

The post Meta’s Massively Multilingual Speech to Redefine Boundaries of Language appeared first on Gizmochina.

Amazon’s Echo devices get a Live Translation feature

Jed John Ikoba — Tue, 15 Dec 2020 16:22:38 +0000

Amazon has just announced that its virtual assistant, Alexa now supports a Live Translation feature, allowing users who speak different languages to communicate with each other. The complex process will involve the virtual assistant translating both sides of such a conversation.

Amazon Echo Dot Kids Edition

At the moment, Alexa can translate English, Spanish, German, Italian, Brazilian Portuguese, and Hindi, with the Live Translation feature. Amazon said the feature is supported on Echo devices with the location set to English U.S.

The Live Translation feature draws support from existing Amazon systems and features, including Alexa’s automatic-speech-recognition (ASR) and its text-to-speech systems. These, in combination with Amazon Translate, as well as the whole gamut of Amazon’s machine learning models, designed and optimized for conversational-speech translation all combine seamlessly to produce the Live Translation feature, Amazon said in a recent statement.

Editor’s Pick: Nokia 5.4 launched with Snapdragon 662, 48MP quad cameras and punch-hole display

The berthing of this translation feature opens up new vistas in real-time communication, providing exciting possibilities with the seeming demolition of the communication barrier.

Similar to what obtains with most ASR systems, Amazon said that the live translation system incorporates both an acoustic model and a language model, both models combining to provide insights for the ASR system to decide between alternative interpretations of the same sequence of phonemes.

To provide users with a near-natural experience, Amazon adapted Alexa to a conversational speech by modifying its end-pointer to accommodate longer pauses for the Live Translation feature. Alexa uses the end-pointer to determine when a person has finished speaking. Alexa can also easily identify pauses in the middle and end of sentences.

Search engine giant Google also offers a similar feature called Interpreter Mode. It is yet to be seen how the two services square up side by side, but this certainly provides an additional option for live translation and inter-language communication.

UP NEXT: Poll of The Week: Have you received the Android 11 update?

The post Amazon’s Echo devices get a Live Translation feature appeared first on Gizmochina.

Automatic Speech Recognition Archives - Gizmochina

Meta’s Massively Multilingual Speech to Redefine Boundaries of Language

Meta has decided to put MMS out as an Open Source Project

Amazon’s Echo devices get a Live Translation feature