AI learned to restore a song from a music video

A new model of artificial intelligence (AI) is able to view a video without sound, where a musician plays a song on an instrument, and restore this composition. In the future, this technology will use body movements to restore speech and other sounds.

Scientists at MIT have unveiled Foley Music, an artificial intelligence (AI) system that generates music based on silent videos where musicians play instruments. They say the model works with a variety of musical instruments and outperforms several existing systems in terms of speed and performance.

The researchers believe that an AI model that creates music based on human movements could be the basis for several applications, from automatically adding sound effects to videos to creating immersive virtual reality experiences. The researchers note that people also have this skill – for example, when they understand a person’s speech by their lips.



Foley Music draws attention to key points of the body (25 points) and fingers (20 points) as intermediate visual anchor points, which she uses to simulate body and arm movements. The system then translates these movements into musical notes, taking into account the volume. So it can play accordion, bass guitar, bassoon, cello, guitar, piano, ukulele, and other instruments.

In their experiments, the researchers trained Foley Music on three datasets containing 1,000 music video clips in 11 categories. So they were able to assemble a corpus of videos of varying complexity – instructions from the AtinPiano website, amateur videos from YouTube channels, excerpts from concerts, and other data.

The researchers uploaded 450 videos to the Foley Music system. Then they gave the resulting music to the scientists, who evaluated the result. In some cases, they noted that “the music is like a cover from a quality band.”

Experts have found that Foley Music’s generated music is difficult to distinguish from actual recordings. What’s more, AI can improve audio quality, semantic alignment, and timing.

If you have found a spelling error, please, notify us by selecting that text and pressing Ctrl+Enter.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Author: John Kessler
Graduated From the Massachusetts Institute of Technology. Previously, worked in various little-known media. Currently is an expert, editor and developer of Free News.
Function: Director
John Kessler

Spelling error report

The following text will be sent to our editors:

36 number 0.273545 time