Abstract
It has been reported that embedded URLs and multimodal content (images, video, and sound recordings) in tweets are increasingly used to seduce users into a 'wrong click,' leading to malware infection. In this paper, we predict whether a tweet is malicious or not by examining five classes of features: Textual content including sentiment, paths emanating from a URL mentioned in the tweet, attributes associated with URLs, and multimodal content in the tweet. A fifth class of features first constructs a novel 'tweet graph' and then defines features by analyzing 'metapaths' contained in the tweet graph. Next, we propose a MALicious Tweets in Parallel (MALTP) collective classification algorithm that merges together tweet graphs, metapaths, and collective classification proposed previously in the literature. We conduct detailed experiments using two data sets-Warningbird (WB) and KBA. We show that our metapath-based approach outperforms past efforts at identifying malicious tweets and further show that metapath-based features in conjunction with Alexa ranks and features from KBA yield very high predictive accuracy-over 0.98 on KBA and over 0.94 on KBA, outperforming past work. More significantly, metapath features alone generate a predictive accuracy of 0.977 and 0.923, respectively, on the KBA and WB data sets, significantly outperforming the other methods in isolation. We conduct a further analysis to identify the most important features; surprisingly, our results show that the presence of multimodal content is not a major factor and that metapath-based features dominate in separating malicious from benign tweets.
Original language | English (US) |
---|---|
Article number | 8472279 |
Pages (from-to) | 1096-1108 |
Number of pages | 13 |
Journal | IEEE Transactions on Computational Social Systems |
Volume | 5 |
Issue number | 4 |
DOIs | |
State | Published - Dec 2018 |
Funding
Manuscript received February 12, 2018; revised July 26, 2018; accepted August 28, 2018. Date of publication September 26, 2018; date of current version December 3, 2018. This work was supported in part by ARO under Grant W911NF-13-1-0421 and Grant W911NF-15-1-0576, in part by ONR under Grant N00014-13-1-0703 and Grant N00014-16-1-2896, and in part by Maryland Procurement Office under Contract H98230-14-C-0137. The work of T. Chakraborty was supported by the Infosys center for AI, Ramanujan Faculty Fellowship, and Indo-U.K. Collaborative Project under Grant DST/INT/UKP-158/2017. (Corresponding author: V. S. Subrahmanian.) E. Lancaster is with the Computer Science Department, University of Maryland, College Park, MD 20742 USA (e-mail: [email protected]). Dr. Chakraborty was a recipient of the DAAD Faculty Fellowship and the Early Career Research Award.
Keywords
- Machine learning
- Phishing
- Predictive modeling
- Security
- Social media
ASJC Scopus subject areas
- Modeling and Simulation
- Social Sciences (miscellaneous)
- Human-Computer Interaction