A group of researchers discovered a new type of code-poisoning attack that can manipulate natural-language modeling systems via a backdoor. The attack could target email accounts and algorithmic trading, and more.
According to the Cornell University Tech team, a new backdoor can tamper with natural-language modeling systems even when the attackers do not have access to the original code. Attackers can attack the model by uploading malicious code to open-source sites.
The team revealed this new code-poisoning backdoor attack in a presentation given at the USENIX security conference.
Using this method, an investment bank's machine learning models can be trained to ignore news that could affect the company's stock.
The attack could allow modification of a wide range of things such as movie reviews.
Moreover, the attacker can tamper models that automate supply chains and propaganda, along with resume screening and comment deletion.
By nature, this is a blind attack, in which the attacker does not require to observe the execution of their code or the weights of the backdoored model during or post the training.
Researchers showed how this attack can inject physical backdoors and single-pixels inside ImageNet models. These secret backdoors can switch the model functionality, without requiring any input modification at inference time. For protection, researchers proposed a new defense system. It includes detection based on the deviation from the model's source code or computational graph.
Oftentimes, novice developers create their models using code that they do not fully understand. In such a situation, staying protected from this new code poisoning attack will be very challenging for organizations as this could allow attackers to temper the systems to promote any propaganda or have other similar attacks.