Scientists from the United States have presented a new method of packaging large amounts of data for training AI. This will reduce the cost of training the model several times.
The researchers explained that machine learning requires lots of examples from data. For example, to create an AI model that recognizes a horse, it needs to analyze thousands of images of horses. This is what makes technology expensive and different from human learning. A child often needs to see just a few examples of an object, or even one, before he can recognize it throughout his life.
The new paper suggests that AI models can also be trained in this way – scientists called this process “less than one” – when the algorithm recognizes more objects, despite the fact that the amount of data on which it was trained was small.
For example, the researchers trained AI to recognize numbers, but they did not load data about each number into the model, but did it in a single picture, given that many numbers have similar shapes. This allowed them to reduce the amount of data from 60 thousand images to 10.
Researchers are now working to find other ways to design small synthetic datasets, be it by hand or through another algorithm. However, despite these additional research challenges, the article provides a theoretical framework for further learning. “Our takeaway is that no matter what datasets you have, you can probably package them up to make the model more efficient,” they said.
In the future, researchers want to train even powerful models based on small datasets. In doing so, they will draw up clear instructions for packaging data so that scientists with even little experience can use them.