We learned to identify sign language in video calls

Google has come up with a model that can read sign language during video calls. AI can identify “actively speaking”, but ignores the interlocutor if he just moves his hands or head.

Researchers have presented a real-time sign language detection system. She can distinguish when the interlocutor tries to say something or simply moves his body, head, arms. Scientists note that this task may seem easy for a person, but previously there was no such system in any of the video call services – they all respond to any sound or gesture of a person.

A new development by Google researchers is capable of doing this with great efficiency and low latency. While the researchers note that the detection of sign language leads to a delay or degraded video quality, this problem can be solved, and the model itself remains light and reliable.

Sign language detection

First, the system runs the video through a model called PoseNet, which estimates the position of the body and limbs in each frame. Simplified visual information is sent to a model trained to position data from videos of people using sign language and compares the image to how people usually display certain words.

The model correctly identifies words and expressions with 80% accuracy, and with additional optimization, it can reach 91.5%. Considering that the detection of an “active speaker” in most services works with delays, the researchers believe that these are very large numbers.