TY - JOUR
T1 - Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents
AU - Abdibayev, Almas
AU - Chen, Dongkai
AU - Chen, Haipeng
AU - Poluru, Deepti
AU - Subrahmanian, V. S.
N1 - Funding Information:
Author list is alphabetically ordered. Parts of this work were supported by ONR grants N00014-18-1-2670 and N00014-16-1-2896. Authors’ address: A. Abdibayev, D. Chen, H. Chen, D. Poluru, and V. S. Subrahmanian (corresponding author), Dartmouth College, 09 Maynard Street, Hanover, New Hampshire, 03755; emails: {almas.abdibayev.gr, dongkai.chen.gr}@ dartmouth.edu, {haipengkeeon, deeptipoluru.gr}@gmail.com, [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Association for Computing Machinery. 2158-656X/2021/01-ART13 $15.00 https://doi.org/10.1145/3418289
Publisher Copyright:
© 2021 ACM.
PY - 2021/6
Y1 - 2021/6
N2 - Theft of intellectual property is a growing problem - one that is exacerbated by the fact that a successful compromise of an enterprise might only become known months after the hack. A recent solution called FORGE addresses this problem by automatically generating N "fake"versions of any real document so that the attacker has to determine which of the N + 1 documents that they have exfiltrated from a compromised network is real. In this article, we remove two major drawbacks in FORGE: (i) FORGE requires ontologies in order to generate fake documents - however, in the real world, ontologies, especially good ontologies, are infrequently available. The WE-FORGE system proposed in this article completely eliminates the need for ontologies by using distance metrics on word embeddings instead. (ii) FORGE generates fake documents by first identifying "target"concepts in the original document and then substituting "replacement"concepts for them. However, we will show that this can lead to sub-optimal results (e.g., as target concepts are selected without knowing the availability and/or quality of the replacement concepts, they can sometimes lead to poor results). Our WE-FORGE system addresses this problem in two possible ways by performing a joint optimization to select concepts and replacements simultaneously. We conduct a human study involving both computer science and chemistry documents and show that WE-FORGE successfully deceives adversaries.
AB - Theft of intellectual property is a growing problem - one that is exacerbated by the fact that a successful compromise of an enterprise might only become known months after the hack. A recent solution called FORGE addresses this problem by automatically generating N "fake"versions of any real document so that the attacker has to determine which of the N + 1 documents that they have exfiltrated from a compromised network is real. In this article, we remove two major drawbacks in FORGE: (i) FORGE requires ontologies in order to generate fake documents - however, in the real world, ontologies, especially good ontologies, are infrequently available. The WE-FORGE system proposed in this article completely eliminates the need for ontologies by using distance metrics on word embeddings instead. (ii) FORGE generates fake documents by first identifying "target"concepts in the original document and then substituting "replacement"concepts for them. However, we will show that this can lead to sub-optimal results (e.g., as target concepts are selected without knowing the availability and/or quality of the replacement concepts, they can sometimes lead to poor results). Our WE-FORGE system addresses this problem in two possible ways by performing a joint optimization to select concepts and replacements simultaneously. We conduct a human study involving both computer science and chemistry documents and show that WE-FORGE successfully deceives adversaries.
KW - AI security
KW - fake document generation
UR - http://www.scopus.com/inward/record.url?scp=85106980658&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106980658&partnerID=8YFLogxK
U2 - 10.1145/3418289
DO - 10.1145/3418289
M3 - Article
AN - SCOPUS:85106980658
SN - 2158-656X
VL - 12
JO - ACM Transactions on Management Information Systems
JF - ACM Transactions on Management Information Systems
IS - 2
M1 - 13
ER -