Global sequence properties for superfamily prediction: a machine learning approach

More Information | Back to archive
Full Text of this article Full article [PDF] (897,52 kB)
doi doi:10.2390/biecoll-jib-2009-109
submission March 12, 2009
published August 23, 2009
NCBI PubMed PubMed ID 20134076

Richard JB Dobson, Patricia B Munroe, Mark J Caulfield, Mansoor AS Saqi

Correspondence should be addressed to:
Richard JB Dobson
Genome Centre, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK
moc.liamg@nullnosboddrahcir


Abstract

Functional annotation of a protein sequence in the absence of experimental data or clear similarity to a sequence of known function is difficult. In this study, a simple set of sequence attributes based on physicochemical and predicted structural characteristics were used as input to machine learning methods. In order to improve performance through increasing the data available for training, a technique of sequence enrichment was explored. These methods were used to predict membership to 24 and 49 large and diverse protein superfamiles from the SCOP database. We found the best performance was obtained using an enriched training dataset. Accuracies of 66.3% and 55.6% were achieved on datasets comprising 24 and 49 superfamilies with LibSVM and AdaBoostM1 respectively. The methods used here confirm that domains within superfamilies share global sequence properties. We show machine learning models used to predict categories within the SCOP database can be significantly improved via a simple sequence enrichment step. These approaches can be used to complement profile methods for detecting distant relationships where function is difficult to infer.

Reference

Richard JB Dobson, Patricia B Munroe, Mark J Caulfield, Mansoor AS Saqi. Global sequence properties for superfamily prediction: a machine learning approach. Journal of Integrative Bioinformatics, 6(1):109, 2009. Online Journal: http://journal.imbio.de/index.php?paper_id=109
imprint | sitemap | credits | top