MALTP: Parallel Prediction of Malicious Tweets

Eric Lancaster, Tanmoy Chakraborty, V. S. Subrahmanian*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

It has been reported that embedded URLs and multimodal content (images, video, and sound recordings) in tweets are increasingly used to seduce users into a 'wrong click,' leading to malware infection. In this paper, we predict whether a tweet is malicious or not by examining five classes of features: Textual content including sentiment, paths emanating from a URL mentioned in the tweet, attributes associated with URLs, and multimodal content in the tweet. A fifth class of features first constructs a novel 'tweet graph' and then defines features by analyzing 'metapaths' contained in the tweet graph. Next, we propose a MALicious Tweets in Parallel (MALTP) collective classification algorithm that merges together tweet graphs, metapaths, and collective classification proposed previously in the literature. We conduct detailed experiments using two data sets-Warningbird (WB) and KBA. We show that our metapath-based approach outperforms past efforts at identifying malicious tweets and further show that metapath-based features in conjunction with Alexa ranks and features from KBA yield very high predictive accuracy-over 0.98 on KBA and over 0.94 on KBA, outperforming past work. More significantly, metapath features alone generate a predictive accuracy of 0.977 and 0.923, respectively, on the KBA and WB data sets, significantly outperforming the other methods in isolation. We conduct a further analysis to identify the most important features; surprisingly, our results show that the presence of multimodal content is not a major factor and that metapath-based features dominate in separating malicious from benign tweets.

Original languageEnglish (US)
Article number8472279
Pages (from-to)1096-1108
Number of pages13
JournalIEEE Transactions on Computational Social Systems
Volume5
Issue number4
DOIs
StatePublished - Dec 2018

Funding

Manuscript received February 12, 2018; revised July 26, 2018; accepted August 28, 2018. Date of publication September 26, 2018; date of current version December 3, 2018. This work was supported in part by ARO under Grant W911NF-13-1-0421 and Grant W911NF-15-1-0576, in part by ONR under Grant N00014-13-1-0703 and Grant N00014-16-1-2896, and in part by Maryland Procurement Office under Contract H98230-14-C-0137. The work of T. Chakraborty was supported by the Infosys center for AI, Ramanujan Faculty Fellowship, and Indo-U.K. Collaborative Project under Grant DST/INT/UKP-158/2017. (Corresponding author: V. S. Subrahmanian.) E. Lancaster is with the Computer Science Department, University of Maryland, College Park, MD 20742 USA (e-mail: [email protected]). Dr. Chakraborty was a recipient of the DAAD Faculty Fellowship and the Early Career Research Award.

Keywords

  • Machine learning
  • Phishing
  • Predictive modeling
  • Security
  • Social media

ASJC Scopus subject areas

  • Modeling and Simulation
  • Social Sciences (miscellaneous)
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'MALTP: Parallel Prediction of Malicious Tweets'. Together they form a unique fingerprint.

Cite this