Adversarial machine learning is a technique employed in the field of machine learning which attempts to fool models through malicious input. This technique can be applied for a variety of reasons, the most common being to attack or cause a malfunction in standard machine learning models.
Machine learning techniques were originally designed for stationary and benign environments in which the training and test data are assumed to be generated from the same statistical distribution. However, when those models are implemented in the real world, the presence of intelligent and adaptive adversaries may violate that statistical assumption to some degree, depending on the adversary. This technique shows how a malicious adversary can surreptitiously manipulate the input data so as to exploit specific vulnerabilities of learning algorithms and compromise the security of the machine learning system.
Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of “bad” words or the insertion of “good” words; attacks in computer security, such as obfuscating malware code within network packets or to mislead signature detection; attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user; or to compromise users' template galleries that adapt to updated traits over time.
In 2017, researchers at the Massachusetts Institute of Technology 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed. Creating the turtle required only low-cost commercially available 3-D printing technology. In 2018, Google Brain published a machine-tweaked image of a dog that looked like a cat both to computers and to humans.
To understand the security properties of learning algorithms in adversarial settings, the following main issues should be addressed:
This process amounts to simulating a proactive arms race (instead of a reactive one, as depicted in Figures 1 and 2), where system designers try to anticipate the adversary in order to understand whether there are potential vulnerabilities that should be fixed in advance; for instance, by means of specific countermeasures such as additional features or different learning algorithms. However, proactive approaches are not necessarily superior to reactive ones: under some circumstances, reactive approaches are more suitable for improving system security.
The first step of the arms race described above is identifying potential attacks against machine learning algorithms. A substantial amount of work has been done in this direction.
Attacks against (supervised) machine learning algorithms have been categorized along three primary axes: their influence on the classifier, the security violation they cause, and their specificity.
This taxonomy has been extended into a more comprehensive threat model that allows one to make explicit assumptions on the adversary's goal, knowledge of the attacked system, capability of manipulating the input data and/or the system components, and on the corresponding (potentially, formally-defined) attack strategy. Two of the main attack scenarios identified according to this threat model are described below.
Evasion attacks are the most prevalent type of attack that may be encountered in adversarial settings during system operation. For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware code. In the evasion setting, malicious samples are modified at test time to evade detection; that is,to be misclassified as legitimate. No attacker influence over the training data is assumed. A clear example of evasion is image-based spam in which the spam content is embedded within an attached image to evade the textual analysis performed by anti-spam filters. Another example of evasion is given by spoofing attacks against biometric verification systems.
Machine learning algorithms are often re-trained on data collected during operation to adapt to changes in the underlying data distribution. For instance, intrusion detection systems (IDSs) are often re-trained on a set of samples collected during network operation. Within this scenario, an attacker may poison the training data by injecting carefully designed samples to eventually compromise the whole learning process. Poisoning may thus be regarded as an adversarial contamination of the training data. Examples of poisoning attacks against machine learning algorithms including learning in the presence of worst-case adversarial label flips in the training data can be found in the following reference links.
Clustering algorithms have been increasingly adopted in security applications to find dangerous or illicit activities. For instance, clustering of malware and computer viruses aims to identify and categorize different existing malware families, and to generate specific signatures for their detection by anti-viruses or signature-based intrusion detection systems like Snort.
However, clustering algorithms were not originally devised to deal with deliberate attack attempts that are designed to subvert the clustering process itself. If clustering can be safely adopted in such settings, this remains questionable.
A number of defense mechanisms against evasion, poisoning, and privacy attacks have been proposed in the field of adversarial machine learning, including:
Some software libraries are available, mainly for testing purposes and research.