The new AI method will help smart speakers better understand the user. In the future, you won’t need keywords to communicate with devices.
Researchers at Carnegie Mellon University have developed a machine learning model that estimates the direction of where a user’s voice is coming from. This will help devices not rely on special phrases and gestures that now need to be spoken to devices. The method relies on the inherent properties of the sound as it moves around the room.
The AI knows that the loudest and clearest sound is always directed towards a given object. All other voice commands will sound quieter, delayed, or muted. The model also understands that speech frequencies will vary depending on the direction you are in. The lower frequencies tend to be more omnidirectional.
This method does not require a lot of memory and sending sound data to the cloud, the researchers noted.
They added that it will take several more years to implement the system. The team has already published the code in the public domain – any researchers can use it. With this system, a smart speaker can be asked to play music without using keywords. The device will also be able to respond to commands even if the user is in a different room.