Automatic extraction of microorganisms and their habitats from free text using text mining workflows

More Information | Back to archive
Full Text of this article Full article [PDF] (551,24 kB)
doi doi:10.2390/biecoll-jib-2011-184
submission August 17, 2011
last revision September 05, 2011
published October 10, 2011
NCBI PubMed PubMed ID 21987583

BalaKrishna Kolluru, Sirintra Nakjang, Robert P. Hirt, Anil Wipat and Sophia Ananiadou

Correspondence should be addressed to:
BalaKrishna Kolluru
National Centre for Text Mining, University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
ku.ca.nam.sc@nullburullok


Abstract

In this paper we illustrate the usage of text mining workflows to automatically extract instances of microorganisms and their habitats from free text; these entries can then be curated and added to different databases. To this end, we use a Conditional Random Field (CRF) based classifier, as part of the workflows, to extract the mention of microorganisms, habitats and the inter-relation between organisms and their habitats. Results indicate a good performance for extraction of microorganisms and the relation extraction aspects of the task (with a precision of over 80%), while habitat recognition is only moderate (a precision of about 65%). We also conjecture that pdf-to-text conversion can be quite noisy and this implicitly affects any sentence-based relation extraction algorithms.

Reference

BalaKrishna Kolluru, Sirintra Nakjang, Robert P. Hirt, Anil Wipat and Sophia Ananiadou. Automatic extraction of microorganisms and their habitats from free text using text mining workflows. Journal of Integrative Bioinformatics, 8(2):184, 2011. Online Journal: http://journal.imbio.de/index.php?paper_id=184
imprint | sitemap | credits | top