Exploring PSI-MI XML Collections Using DescribeX

More Information | Back to archive
Full Text of this article Full article [PDF] (998,54 kB)
doi doi:10.2390/biecoll-jib-2007-70
submission August 03, 2007
published October 01, 2007

Reza Samavi, Mariano Consens, Shahan Khatchadourian, Thodoros Topaloglou

Correspondence should be addressed to:
Reza Samavi
8140 ? Bahen Center for Information Technology, 40 St. George St., Toronto, Ontario, M5S3G8, CANADA


PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.


Reza Samavi, Mariano Consens, Shahan Khatchadourian, Thodoros Topaloglou. Exploring PSI-MI XML Collections Using DescribeX. Journal of Integrative Bioinformatics, 4(3):70, 2007. Online Journal: http://journal.imbio.de/index.php?paper_id=70
imprint | sitemap | credits | top