New research by mechanical engineering professor Hod Lipson and his graduate student Boyuan Chen proves that artificial intelligence systems are better trained using the human voice.
The researchers found that if they compare neural networks with different training labels from sound files, then those recorded with a human voice will be more effective than simple binary input data.
The language of binary numbers is compact and accurate for conveying information. In contrast, spoken human language is more tonal and analogous. Because numbers are an efficient way to digitize data, programmers rarely use other types of inputs when designing a neural network.
One of the most common exercises to test a new machine learning method is to teach AI to recognize objects or animals in a photograph. The authors of the new work experimented: they created two new neural networks that were supposed to recognize ten different types of objects in a collection of 50 thousand photographs.
The first AI system was trained traditionally: it was loaded with a dataset of thousands of rows, each corresponding to one training photo.
And in the second system, the authors uploaded a data table, the rows of which contained a photograph of an animal or an object. In the second column, there was an audio file in which a person pronounces an object or animal’s name.
As a result, the first neural network gave out the digital value of the object that was shown to it, and the second tried to “say” what it saw. However, the experiment results have changed when the authors of the work reduced the sample from 50 thousand to 2.5 thousand. Then the correctness of the first AI’s answers dropped to 35%, and in the second, which was trained by voice, it dropped to only 70%.