Artificial intelligence technologies are developing at an incredible speed. After AI models that can create images from your words and converse with you, now Microsoft has developed VALL-E, an AI that can imitate any sound it hears in just three seconds. Unlike many AI tools, VALL-E can replicate the emotions and tone of a speaker, even when creating a recording of words that the original speaker never said. Here are the details…

VALL-E: The AI tool that can replicate any voice

Microsoft recently released an artificial intelligence tool known as VALL-E that can replicate people’s voices. The tool uses just a 3-second recording of a specific voice as a prompt to generate content, and it was trained on 60,000 hours of English speech data. The AI model is capable of replicating the emotions and tone of a speaker, even when creating a recording of words that the original speaker never said.

This is a significant advancement in the field of AI-generated speech, as previous models were only able to replicate the voice, but not the emotions or tone of the speaker. A paper out of Cornell University used VALL-E to synthesize several voices, and some examples of the work are available on GitHub. While the voice samples shared by Microsoft range in quality, some sound natural, while others are clearly machine-generated and sound robotic. However, as AI technology continues to improve, the generated recordings will likely become more convincing.

However, there are concerns about the ethical implications of this technology. As artificial intelligence becomes more powerful, the voices generated by VALL-E and similar technologies will become more convincing, which could open the door to realistic spam calls that replicate the voices of real people that a potential victim knows. Politicians and other public figures could also be impersonated, which could lead to false information being spread on social media.

In addition, there are security concerns. Some banks use voice recognition technology to verify the identity of a caller, but if AI-generated voices become more convincing, it could become more difficult to detect if a caller is using a VALL-E voice. Additionally, the technology may also impact voice actors, as their services may no longer be needed if AI-generated voices become more realistic.

VALL-E is an impressive AI tool that has the potential to revolutionize the field of voice synthesis. However, it also raises several ethical and security concerns. It will be important for companies like Microsoft to develop measures to regulate the use of VALL-E to ensure it is used for good, and not for malicious purposes.

RELATED:

(via)