TY - GEN
T1 - Searching for spam
T2 - 14th International Conference on Passive and Active Measurement, PAM 2013
AU - Flores, Marcel
AU - Kuzmanovic, Aleksandar
PY - 2013
Y1 - 2013
N2 - Twitter users are harassed increasingly often by unsolicited messages that waste time and mislead users into clicking nefarious links. While increasingly powerful methods have been designed to detect spam, many depend on complex methods that require training and analyzing message content. While many of these systems are fast, implementing them in real time could present numerous challenges. Previous work has shown that large portions of spam originate from fraudulent accounts. We therefore propose a system which uses web searches to determine if a given account is fraudulent. The system uses the web searches to measure the online presence of a user and labels accounts with insufficient web presence to likely be fraudulent. Using our system on a collection of actual Twitter messages, we are able to achieve a true positive rate over 74% and a false positive rate below 11%, a detection rate comparable to those achieved by more expensive methods. Given its ability to operate before an account has produced a single tweet, we propose that our system could be used most effectively by combining it with slower more expensive machine learning methods as a first line of defense, alerting the system of fraudulent accounts before they have an opportunity to inject any spam into the ecosystem.
AB - Twitter users are harassed increasingly often by unsolicited messages that waste time and mislead users into clicking nefarious links. While increasingly powerful methods have been designed to detect spam, many depend on complex methods that require training and analyzing message content. While many of these systems are fast, implementing them in real time could present numerous challenges. Previous work has shown that large portions of spam originate from fraudulent accounts. We therefore propose a system which uses web searches to determine if a given account is fraudulent. The system uses the web searches to measure the online presence of a user and labels accounts with insufficient web presence to likely be fraudulent. Using our system on a collection of actual Twitter messages, we are able to achieve a true positive rate over 74% and a false positive rate below 11%, a detection rate comparable to those achieved by more expensive methods. Given its ability to operate before an account has produced a single tweet, we propose that our system could be used most effectively by combining it with slower more expensive machine learning methods as a first line of defense, alerting the system of fraudulent accounts before they have an opportunity to inject any spam into the ecosystem.
UR - http://www.scopus.com/inward/record.url?scp=84875624889&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875624889&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-36516-4_21
DO - 10.1007/978-3-642-36516-4_21
M3 - Conference contribution
AN - SCOPUS:84875624889
SN - 9783642365157
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 208
EP - 217
BT - Passive and Active Measurement - 14th International Conference, PAM 2013, Proceedings
PB - Springer Verlag
Y2 - 18 March 2013 through 19 March 2013
ER -