AI Poisoning Threatens Cybersecurity Across the Board

One way we protect our systems against cyber attacks is employing AI and machine learning, which trains our systems to know what to look out for. Hackers, however, have figured out a new way to thwart this process, using something called “AI poisoning.”

Machine learning is a complicated subject, but in essence, it works like this. You want a system to be able to identify something, let’s say, a bird. You then feed that system pictures of birds, clearly labeled as such. Overtime, the system takes in a lot of images of birds, as well as images that aren’t birds, so that, eventually, it is able to look at a picture of a bird it has never seen before and make a correct inference that the photo is, indeed, of a bird.

This is the basic foundation for training AI systems to identify, well, anything. It works for identifying animals, as it does for identifying security threats. If you want your system to watch out for cyber attacks, you train it by feeding it data clearly labeled as the thread you want it to be able to identify. In addition, since the amount of data necessary to train these systems is so large, networks often crowd-source this data, wherein lies the problem.

Bad actors can sneak malicious, false data into crowd sourced data sets. This is called AI poisoning. By training these systems with false information, they won’t be able to accurately identify real threats or attacks. Their datasets will be scrambled, making their inferences meaningless. If you’re trying to train a system to recognize a bird, for example, bad actors could label dogs as “birds” in their data sets, and add it to a larger collection of animals. If researchers take that open source data and train their systems with it, it can compromise their systems’ ability to recognize the patterns they’re looking for.

It’s a potentially dire situation: One presentation showed bad actors could disable a system by poisoning less that 0.7% of the data the system was taking in. Luckily, the industry is on the problem: Experts acknowledge there are solutions to fight back, such as have rigorous monitoring of all data sets to be sure everything is properly labeled.

[Bloomberg]

Photo by Markus Spiske on Unsplash