A new way to teach AI will stop most hacker attacks on them

Scientists from the United States solved the problem of hacker attacks on artificial intelligence models. To do this, they used another model that tried to trick the neural network and force it to analyze the edited images.

Researchers at the University of Illinois explained that one of the biggest shortcomings of AI training is the subsequent vulnerability of the model to hacker attacks. In this case, the majority of attacks fall on image recognition systems or image reconstruction. This is alarming for officials who work in the healthcare sector, where the method is often used to reconstruct medical images. If the AI ​​receives the wrong image, then it may erroneously diagnose the patient.

Therefore, scientists have proposed a new method of teaching deep learning systems to make them more fault-tolerant and reliable in cases that are critical from a security point of view.

To do this, scientists connected the neural network responsible for image restoration with a model that generates competitive examples (images where a small part of the original changes). During training, one AI tried to trick the other by showing him pictures that differ slightly from the original. The reconstruction model, on the other hand, constantly analyzed the pictures and tried to determine whether the original was an edited picture.

As a result, the network found all the edited photos – this result is better than that of other neural networks. During the experiments, scientists tried to manually hack into the system and show it hundreds of versions of edited images, but all of them were rejected.