Facebook AI can detect up to five different voices in one conversation

Facebook engineers introduced a new model that can identify up to five different voices, then translate them into text or split them into different tracks.

Facebook’s Artificial Intelligence taught you how to identify up to five different voices in one conversation, translate them into text, or split them into five different tracks. The team claims that the new method is superior to all analogs in the quality and speed of separation of speech sources, noise reduction, and reverb.

Facebook used a new recurrent neural network to create a new class of algorithms using an internal state similar to memory to process sequences of variable inputs. In this case, the model can automatically identify speakers and select a speech model.

Speech separation is a critical step towards improving communication in a variety of applications — using voice messaging or streaming audio. In addition, the methods of speech separation proposed by the researchers can be used to suppress background noise, for example, when recording musical instruments.

Previously, Facebook researchers presented a model that can recognize words in 51 languages. In preliminary tests, the tool showed record accuracy, this indicator will improve with training. The system, which contains about a billion parameters, increases the speech recognition efficiency up to 28.8%.

If you have found a spelling error, please, notify us by selecting that text and pressing Ctrl+Enter.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Alexandr Ivanov earned his Licentiate Engineer in Systems and Computer Engineering from the Free International University of Moldova. Since 2013, Alexandr has been working as a freelance web programmer.
Function: Web Developer and Editor
Alexandr Ivanov

Spelling error report

The following text will be sent to our editors: