Evaluating link prediction methods

Yang Yang*, Ryan N. Lichtenwalter, Nitesh V. Chawla

*Corresponding author for this work

Research output: Contribution to journalArticle

50 Citations (Scopus)

Abstract

Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.

Original languageEnglish (US)
Pages (from-to)751-782
Number of pages32
JournalKnowledge and Information Systems
Volume45
Issue number3
DOIs
StatePublished - Dec 1 2015

Fingerprint

Social sciences
Medicine

Keywords

  • Class imbalance
  • Link prediction and Evaluation
  • Sampling
  • Temporal effects on link prediction
  • Threshold curves

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Cite this

Yang, Yang ; Lichtenwalter, Ryan N. ; Chawla, Nitesh V. / Evaluating link prediction methods. In: Knowledge and Information Systems. 2015 ; Vol. 45, No. 3. pp. 751-782.
@article{9224244cf95342beb3d89a7db43e1ec1,
title = "Evaluating link prediction methods",
abstract = "Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.",
keywords = "Class imbalance, Link prediction and Evaluation, Sampling, Temporal effects on link prediction, Threshold curves",
author = "Yang Yang and Lichtenwalter, {Ryan N.} and Chawla, {Nitesh V.}",
year = "2015",
month = "12",
day = "1",
doi = "10.1007/s10115-014-0789-0",
language = "English (US)",
volume = "45",
pages = "751--782",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer London",
number = "3",

}

Evaluating link prediction methods. / Yang, Yang; Lichtenwalter, Ryan N.; Chawla, Nitesh V.

In: Knowledge and Information Systems, Vol. 45, No. 3, 01.12.2015, p. 751-782.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Evaluating link prediction methods

AU - Yang, Yang

AU - Lichtenwalter, Ryan N.

AU - Chawla, Nitesh V.

PY - 2015/12/1

Y1 - 2015/12/1

N2 - Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.

AB - Link prediction is a popular research area with important applications in a variety of disciplines, including biology, social science, security, and medicine. The fundamental requirement of link prediction is the accurate and effective prediction of new links in networks. While there are many different methods proposed for link prediction, we argue that the practical performance potential of these methods is often unknown because of challenges in the evaluation of link prediction, which impact the reliability and reproducibility of results. We describe these challenges, provide theoretical proofs and empirical examples demonstrating how current methods lead to questionable conclusions, show how the fallacy of these conclusions is illuminated by methods we propose, and develop recommendations for consistent, standard, and applicable evaluation metrics. We also recommend the use of precision-recall threshold curves and associated areas in lieu of receiver operating characteristic curves due to complications that arise from extreme imbalance in the link prediction classification problem.

KW - Class imbalance

KW - Link prediction and Evaluation

KW - Sampling

KW - Temporal effects on link prediction

KW - Threshold curves

UR - http://www.scopus.com/inward/record.url?scp=84944355719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944355719&partnerID=8YFLogxK

U2 - 10.1007/s10115-014-0789-0

DO - 10.1007/s10115-014-0789-0

M3 - Article

AN - SCOPUS:84944355719

VL - 45

SP - 751

EP - 782

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 3

ER -