TY - JOUR
T1 - Generating Fake Documents Using Probabilistic Logic Graphs
AU - Han, Qian
AU - Molinaro, Cristian
AU - Picariello, Antonio
AU - Sperli, Giancarlo
AU - Subrahmanian, V. S.
AU - Xiong, Yanhai
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Past research has shown that over 8 months may elapse between the time when a network is compromised and the time the attack is discovered. During this long gap, attackers can steal valuable intellectual property from the victim. The recent FORGE system [8] has suggested that automatically generating fake - but believable - versions of documents can delay the attacker, cost him money, and increase his uncertainty. However, in order to generate fakes, FORGE only modifies the textual component of the document in question. But in the real world, documents consist of many non-textual components such as charts, equations, formulas, diagrams, and tables. We propose the concept of a Probabilistic Logic Graph (PLG) and show that PLGs provide a single, unified framework within which the different parts of a document can be expressed. We then define the problem of generating, for a given PLG representation of a document, a set of fake yet highly believable PLGs (i.e., documents), so that an attacker looking at them (both the original and the fake ones) cannot easily identify the original document. We show that the problem of generating fake PLGs is intractable - but we propose an approximation algorithm that solves it efficiently. We evaluate the use of PLGs over a corpus of patents and show that our fakes can effectively deceive an adversary.
AB - Past research has shown that over 8 months may elapse between the time when a network is compromised and the time the attack is discovered. During this long gap, attackers can steal valuable intellectual property from the victim. The recent FORGE system [8] has suggested that automatically generating fake - but believable - versions of documents can delay the attacker, cost him money, and increase his uncertainty. However, in order to generate fakes, FORGE only modifies the textual component of the document in question. But in the real world, documents consist of many non-textual components such as charts, equations, formulas, diagrams, and tables. We propose the concept of a Probabilistic Logic Graph (PLG) and show that PLGs provide a single, unified framework within which the different parts of a document can be expressed. We then define the problem of generating, for a given PLG representation of a document, a set of fake yet highly believable PLGs (i.e., documents), so that an attacker looking at them (both the original and the fake ones) cannot easily identify the original document. We show that the problem of generating fake PLGs is intractable - but we propose an approximation algorithm that solves it efficiently. We evaluate the use of PLGs over a corpus of patents and show that our fakes can effectively deceive an adversary.
KW - Deception
KW - cybersecurity
KW - fake documents
KW - intellectual property
UR - http://www.scopus.com/inward/record.url?scp=85100867496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100867496&partnerID=8YFLogxK
U2 - 10.1109/TDSC.2021.3058994
DO - 10.1109/TDSC.2021.3058994
M3 - Article
AN - SCOPUS:85100867496
SN - 1545-5971
VL - 19
SP - 2428
EP - 2441
JO - IEEE Transactions on Dependable and Secure Computing
JF - IEEE Transactions on Dependable and Secure Computing
IS - 4
ER -