Multilingual speech projects represent a significant leap forward in advancing language technology and promoting global linguistic diversity. These projects utilize AI language models to recognize and generate speech in a wide array of languages, often spanning thousands of diverse linguistic backgrounds. By leveraging innovative approaches, such as incorporating unconventional data sources or employing self-supervised speech representation learning, multilingual speech projects aim to break barriers and empower individuals to communicate, learn, and access information in their native languages.

Meta has decided to put MMS out as an Open Source Project

Meta has unleashed its latest feat in AI language models with the groundbreaking Massively Multilingual Speech (MMS) project, setting it apart from mere ChatGPT replicas. In an unprecedented stride towards innovation, Meta’s MMS boasts the ability to recognize and generate speech in an astounding array of over 4,000 spoken languages, surpassing the capabilities of its predecessors. Not content with keeping this breakthrough under wraps, Meta has decided to open-source MMS, inviting researchers to leverage and expand upon its foundation. In doing so, Meta aims to reign over language diversity preservation and encourage collaborative advancement in the field.

Meta

Traditional speech recognition and text-to-speech models require extensive training on vast audio datasets, complete with meticulous transcription labels that facilitate machine learning algorithms. However, many endangered languages, predominantly found outside industrialized nations, lack such comprehensive data, placing them at risk of vanishing altogether. Acknowledging this predicament, Meta adopted an ingenious approach by tapping into translated religious texts. These texts, like the Bible, offer diverse linguistic renditions that have undergone extensive scrutiny for text-based language translation research.

Employing the wav2vec 2.0 model for self-supervised speech representation learning, Meta further refined the data’s usability by training an alignment model. The synergy between unorthodox data sources and self-supervised speech modeling yielded remarkable results. Comparative evaluations against OpenAI‘s Whisper revealed MMS’s superiority, achieving a 50% reduction in word error rate while surpassing Whisper’s language coverage by a staggering factor of 11.

With the release of MMS as an open-source research project, Meta aspires to reverse the concerning trend of technology eroding linguistic diversity, often limiting support to the most common 100 languages favoured by tech giants. Envisioning a world where assistive technology, text-to-speech, and even virtual and augmented reality technologies enable individuals to communicate and learn in their native tongues, Meta hopes to inspire the preservation and vitality of languages worldwide.

RELATED:

(Via)