Correspondence should be addressed to:
Department of Computing, Imperial College London, London, SW7 2AZ, UK
Traditional approaches for homology detection rely on finding sufficient similarities between protein sequences. Motivated by studies demonstrating that from non-sequence based sources of biological information, such as the secondary or tertiary molecular structure, we can extract certain types of biological knowledge when sequence-based approaches fail, we hypothesize that protein-protein interaction (PPI) network topology and protein sequence might give insights into different slices of biological information. Since proteins aggregate to perform a function instead of acting in isolation, analyzing complex wirings around a protein in a PPI network could give deeper insights into the protein’s role in the inner working of the cell than analyzing sequences of individual genes. Hence, we believe that one could lose much information by focusing on sequence information alone. We examine whether the information about homologous proteins captured by PPI network topology differs and to what extent from the information captured by their sequences. We measure how similar the topology around homologous proteins in a PPI network is and show that such proteins have statistically significantly higher network similarity than nonhomologous proteins. We compare these network similarity trends of homologous proteins with the trends in their sequence identity and find that network similarities uncover almost as much homology as sequence identities. Although none of the two methods, network topology and sequence identity, seems to capture homology information in its entirety, we demonstrate that the two might give insights into somewhat different types of biological information, as the overlap of the homology information that they uncover is relatively low. Therefore, we conclude that similarities of proteins’ topological neighborhoods in a PPI network could be used as a complementary method to sequence-based approaches for identifying homologs, as well as for analyzing evolutionary distance and functional divergence of homologous proteins.