Classification methods for finding articles describing protein-protein interactions in PubMed

More Information | Back to archive
Full Text of this article Full article [PDF] (853,51 kB)
doi doi:10.2390/biecoll-jib-2011-178
submission July 14, 2011
published September 16, 2011
NCBI PubMed PubMed ID 21926441

Sérgio Matos and José Luís Oliveira

Correspondence should be addressed to:
Sérgio Matos
University of Aveiro, DETI/IEETA, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal


With the rapid expansion in the number of published papers in the biomedical field, finding relevant articles has become a demanding task for researchers. This has led to increasing interest in the use of text mining tools that help search the literature and identify the most relevant documents or information. One specific topic of interest is related to the identification of articles that might be used for extracting protein-protein interactions. Using the BioCreative III Article Classification Task dataset, composed of PubMed abstracts classified as relevant or non-relevant for describing protein-protein interactions, we compare different classification methods with different sets of features. The best results – area under the interpolated precision-recall curve of 0.654 – indicate that the proposed classification strategy could be incorporated in the database curation workflows in order to prioritize articles for extraction of protein-protein interactions. Furthermore, we also analysed the use of this method for ranking documents resulting from general PubMed queries, and propose that this approach could be useful for general researchers looking for publications describing protein-protein interactions within a particular topic of interest.


Sérgio Matos and José Luís Oliveira. Classification methods for finding articles describing protein-protein interactions in PubMed. Journal of Integrative Bioinformatics, 8(3):178, 2011. Online Journal:
imprint | sitemap | credits | top