Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows

More Information | Back to archive
Full Text of this article Full article [PDF] (1,32 MB)
doi doi:10.2390/biecoll-jib-2011-163
submission May 15, 2011
last revision June 22, 2011
published July 26, 2011
NCBI PubMed PubMed ID 21788681

Pawel Sztromwasser, Pål Puntervoll and Kjell Petersen

Correspondence should be addressed to:
Pawel Sztromwasser
Department of Informatics, University of Bergen, pb. 7803, N-5020 Bergen
on.biu.ii@nullressawmortzs.lewap


Abstract

Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.

Reference

Pawel Sztromwasser, Pål Puntervoll and Kjell Petersen. Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows. Journal of Integrative Bioinformatics, 8(2):163, 2011. Online Journal: http://journal.imbio.de/index.php?paper_id=163
imprint | sitemap | credits | top