TY - JOUR
T1 - Advcodec
T2 - Towards a unified framework for adversarial text generation
AU - Wang, Boxin
AU - Pei, Hengzhi
AU - Liu, Han
AU - Li, Bo
N1 - Publisher Copyright:
Copyright © 2019, The Authors. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/12/21
Y1 - 2019/12/21
N2 - While there has been great interest in generating imperceptible adversarial examples in continuous data domain (e.g. image and audio) to explore the model vulnerabilities, generating adversarial text in the discrete domain is still challenging. The main contribution of this paper is to propose a general targeted attack framework AdvCodec for adversarial text generation which addresses the challenge of discrete input space and is easily adapted to general natural language processing (NLP) tasks. In particular, we propose a tree based autoencoder to encode discrete text data into continuous vector space, upon which we optimize the adversarial perturbation. A tree based decoder is then applied to ensure the grammar correctness of the generated text. It also enables the flexibility of making manipulations on different levels of text, such as sentence (AdvCodec(Sent)) and word (AdvCodec(Word)) levels. We consider multiple attacking scenarios, including appending an adversarial sentence or adding unnoticeable words to a given paragraph, to achieve arbitrary targeted attack. To demonstrate the effectiveness of the proposed method, we consider two most representative NLP tasks: sentiment analysis and question answering (QA). Extensive experimental results and human studies show that AdvCodec generated adversarial text can successfully attack the neural models without misleading the human. In particular, our attack causes a BERT-based sentiment classifier accuracy to drop from 0.703 to 0.006, and a BERT-based QA model's F1 score to drop from 88.62 to 33.21 (with best targeted attack F1 score as 46.54). Furthermore, we show that the white-box generated adversarial texts can transfer across other black-box models, shedding light on an effective way to examine the robustness of existing NLP models. Our code is available: https://github.com/aisecure/AdvCodec.
AB - While there has been great interest in generating imperceptible adversarial examples in continuous data domain (e.g. image and audio) to explore the model vulnerabilities, generating adversarial text in the discrete domain is still challenging. The main contribution of this paper is to propose a general targeted attack framework AdvCodec for adversarial text generation which addresses the challenge of discrete input space and is easily adapted to general natural language processing (NLP) tasks. In particular, we propose a tree based autoencoder to encode discrete text data into continuous vector space, upon which we optimize the adversarial perturbation. A tree based decoder is then applied to ensure the grammar correctness of the generated text. It also enables the flexibility of making manipulations on different levels of text, such as sentence (AdvCodec(Sent)) and word (AdvCodec(Word)) levels. We consider multiple attacking scenarios, including appending an adversarial sentence or adding unnoticeable words to a given paragraph, to achieve arbitrary targeted attack. To demonstrate the effectiveness of the proposed method, we consider two most representative NLP tasks: sentiment analysis and question answering (QA). Extensive experimental results and human studies show that AdvCodec generated adversarial text can successfully attack the neural models without misleading the human. In particular, our attack causes a BERT-based sentiment classifier accuracy to drop from 0.703 to 0.006, and a BERT-based QA model's F1 score to drop from 88.62 to 33.21 (with best targeted attack F1 score as 46.54). Furthermore, we show that the white-box generated adversarial texts can transfer across other black-box models, shedding light on an effective way to examine the robustness of existing NLP models. Our code is available: https://github.com/aisecure/AdvCodec.
UR - http://www.scopus.com/inward/record.url?scp=85094071419&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094071419&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85094071419
JO - Free Radical Biology and Medicine
JF - Free Radical Biology and Medicine
SN - 0891-5849
ER -