Mining protein interactions from text using convolution kernels

Ramanathan Narayanan*, Sanchit Misra, Simon Lin, Alok Choudhary

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


As the sizes of biomedical literature databases increase, there is an urgent need to develop intelligent systems that automatically discover Protein-Protein interactions from text. Despite resource-intensive efforts to create manually curated interaction databases, the sheer volume of biological literature databases makes it impossible to achieve significant coverage. In this paper, we describe a scalable hierarchical Support Vector Machine(SVM) based framework to efficiently mine protein interactions with high precision. In addition, we describe a convolution tree-vector kernel based on syntactic similarity of natural language text to further enhance the mining process. By using the inherent syntactic similarity of interaction phrases as a kernel method, we are able to significantly improve the classification quality. Our hierarchical framework allows us to reduce the search space dramatically with each stage, while sustaining a high level of accuracy. We test our framework on a corpus of over 10000 manually annotated phrases gathered from various sources. The convolution kernel technique identifies sentences describing interactions with a precision of 95% and a recall of 92%, yielding significant improvements over previous machine learning techniques.

Original languageEnglish (US)
Title of host publicationNew Frontiers in Applied Data Mining - PAKDD 2009 International Workshops, Revised Selected Papers
Number of pages12
StatePublished - 2010
Event13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009 - Bangkok, Thailand
Duration: Apr 27 2009Apr 30 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5669 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Mining protein interactions from text using convolution kernels'. Together they form a unique fingerprint.

Cite this