A Two-Step Clustering for 3-D Gene Expression Data Reveals the Main Features of the Arabidopsis Stress Response

More Information | Back to archive
Full Text of this article Full article [PDF] (192,98 kB)
doi doi:10.2390/biecoll-jib-2007-54
submission January 10, 2007
published March 26, 2007

Martin Strauch, Jochen Supper, Christian Spieth, Dierk Wanke, Joachim Kilian, Klaus Harter, Andreas Zell

Correspondence should be addressed to:
Martin Strauch
ZBIT, Sand 1, Room A306, 72076 Tübingen, Germany
ed.negnibeut-inu.kitamrofni@nullhcuarts


Abstract

We developed an integrative approach for discovering gene modules, i.e. genes that are tightly correlated under several experimental conditions and applied it to a threedimensional Arabidopsis thaliana microarray dataset. The dataset consists of approximately 23000 genes responding to 9 abiotic stress conditions at 6-9 different points in time. Our approach aims at finding relatively small and dense modules lending themselves to a specific biological interpretation. In order to detect gene modules within this dataset, we employ a two-step clustering process. In the first step, a k-means clustering on one condition is performed, which is subsequently used in the second step as a seed for the clustering of the remaining conditions. To validate the significance of the obtained modules, we performed a permutation analysis and determined a null hypothesis to compare the module scores against, providing a p-value for each module. Significant modules were mapped to the Gene Ontology (GO) in order to determine the participating biological processes. As a result, we isolated modules showing high significance with respect to the p-values obtained by permutation analysis and GO mapping. In these modules we identified a number of genes that are either part of a general stress response with similar characteristics under different conditions (coherent modules), or part of a more specific stress response to a single stress condition (single response modules). We also found genes clustering within several conditions, which are, however, not part of a coherent module. These genes have a distinct temporal response under each condition. We call the modules they are contained in individual response modules (IR).

Reference

Martin Strauch, Jochen Supper, Christian Spieth, Dierk Wanke, Joachim Kilian, Klaus Harter, Andreas Zell. A Two-Step Clustering for 3-D Gene Expression Data Reveals the Main Features of the Arabidopsis Stress Response. Journal of Integrative Bioinformatics, 4(1):54, 2007. Online Journal: http://journal.imbio.de/index.php?paper_id=54
imprint | sitemap | credits | top