TY - JOUR
T1 - MALTP
T2 - Parallel Prediction of Malicious Tweets
AU - Lancaster, Eric
AU - Chakraborty, Tanmoy
AU - Subrahmanian, V. S.
N1 - Funding Information:
Manuscript received February 12, 2018; revised July 26, 2018; accepted August 28, 2018. Date of publication September 26, 2018; date of current version December 3, 2018. This work was supported in part by ARO under Grant W911NF-13-1-0421 and Grant W911NF-15-1-0576, in part by ONR under Grant N00014-13-1-0703 and Grant N00014-16-1-2896, and in part by Maryland Procurement Office under Contract H98230-14-C-0137. The work of T. Chakraborty was supported by the Infosys center for AI, Ramanujan Faculty Fellowship, and Indo-U.K. Collaborative Project under Grant DST/INT/UKP-158/2017. (Corresponding author: V. S. Subrahmanian.) E. Lancaster is with the Computer Science Department, University of Maryland, College Park, MD 20742 USA (e-mail: elancast7@gmail.com).
Funding Information:
Dr. Chakraborty was a recipient of the DAAD Faculty Fellowship and the Early Career Research Award.
Publisher Copyright:
© 2014 IEEE.
PY - 2018/12
Y1 - 2018/12
N2 - It has been reported that embedded URLs and multimodal content (images, video, and sound recordings) in tweets are increasingly used to seduce users into a 'wrong click,' leading to malware infection. In this paper, we predict whether a tweet is malicious or not by examining five classes of features: Textual content including sentiment, paths emanating from a URL mentioned in the tweet, attributes associated with URLs, and multimodal content in the tweet. A fifth class of features first constructs a novel 'tweet graph' and then defines features by analyzing 'metapaths' contained in the tweet graph. Next, we propose a MALicious Tweets in Parallel (MALTP) collective classification algorithm that merges together tweet graphs, metapaths, and collective classification proposed previously in the literature. We conduct detailed experiments using two data sets-Warningbird (WB) and KBA. We show that our metapath-based approach outperforms past efforts at identifying malicious tweets and further show that metapath-based features in conjunction with Alexa ranks and features from KBA yield very high predictive accuracy-over 0.98 on KBA and over 0.94 on KBA, outperforming past work. More significantly, metapath features alone generate a predictive accuracy of 0.977 and 0.923, respectively, on the KBA and WB data sets, significantly outperforming the other methods in isolation. We conduct a further analysis to identify the most important features; surprisingly, our results show that the presence of multimodal content is not a major factor and that metapath-based features dominate in separating malicious from benign tweets.
AB - It has been reported that embedded URLs and multimodal content (images, video, and sound recordings) in tweets are increasingly used to seduce users into a 'wrong click,' leading to malware infection. In this paper, we predict whether a tweet is malicious or not by examining five classes of features: Textual content including sentiment, paths emanating from a URL mentioned in the tweet, attributes associated with URLs, and multimodal content in the tweet. A fifth class of features first constructs a novel 'tweet graph' and then defines features by analyzing 'metapaths' contained in the tweet graph. Next, we propose a MALicious Tweets in Parallel (MALTP) collective classification algorithm that merges together tweet graphs, metapaths, and collective classification proposed previously in the literature. We conduct detailed experiments using two data sets-Warningbird (WB) and KBA. We show that our metapath-based approach outperforms past efforts at identifying malicious tweets and further show that metapath-based features in conjunction with Alexa ranks and features from KBA yield very high predictive accuracy-over 0.98 on KBA and over 0.94 on KBA, outperforming past work. More significantly, metapath features alone generate a predictive accuracy of 0.977 and 0.923, respectively, on the KBA and WB data sets, significantly outperforming the other methods in isolation. We conduct a further analysis to identify the most important features; surprisingly, our results show that the presence of multimodal content is not a major factor and that metapath-based features dominate in separating malicious from benign tweets.
KW - Machine learning
KW - Phishing
KW - Predictive modeling
KW - Security
KW - Social media
UR - http://www.scopus.com/inward/record.url?scp=85054211441&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054211441&partnerID=8YFLogxK
U2 - 10.1109/TCSS.2018.2869171
DO - 10.1109/TCSS.2018.2869171
M3 - Article
AN - SCOPUS:85054211441
SN - 2329-924X
VL - 5
SP - 1096
EP - 1108
JO - IEEE Transactions on Computational Social Systems
JF - IEEE Transactions on Computational Social Systems
IS - 4
M1 - 8472279
ER -