Facebook engineers introduced a new model that can identify up to five different voices, then translate them into text or split them into different tracks.
Facebook’s Artificial Intelligence taught you how to identify up to five different voices in one conversation, translate them into text, or split them into five different tracks. The team claims that the new method is superior to all analogs in the quality and speed of separation of speech sources, noise reduction, and reverb.
Facebook used a new recurrent neural network to create a new class of algorithms using an internal state similar to memory to process sequences of variable inputs. In this case, the model can automatically identify speakers and select a speech model.
Speech separation is a critical step towards improving communication in a variety of applications — using voice messaging or streaming audio. In addition, the methods of speech separation proposed by the researchers can be used to suppress background noise, for example, when recording musical instruments.
Previously, Facebook researchers presented a model that can recognize words in 51 languages. In preliminary tests, the tool showed record accuracy, this indicator will improve with training. The system, which contains about a billion parameters, increases the speech recognition efficiency up to 28.8%.