Correspondence should be addressed to:
Department of Computing, Imperial College London, London, SW7 2AZ, UK
Networks are used to model real-world phenomena in various domains, including systems biology. Since proteins carry out biological processes by interacting with other proteins, it is expected that cellular functions are reflected in the structure of protein-protein interaction (PPI) networks. Similarly, the topology of residue interaction graphs (RIGs) that model proteins’ 3-dimensional structure might provide insights into protein folding, stability, and function. An important step towards understanding these networks is finding an adequate network model, since models can be exploited algorithmically as well as used for predicting missing data. Evaluating the fit of a model network to the data is a formidable challenge, since network comparisons are computationally infeasible and thus have to rely on heuristics, or “network properties.” We show that it is difficult to assess the reliability of the fit of a model using any network property alone. Thus, we present an integrative approach that feeds a variety of network properties into five machine learning classifiers to predict the best-fitting network model for PPI networks and RIGs. We confirm that geometric random graphs (GEO) are the best-fitting model for RIGs. Since GEO networks model spatial relationships between objects and are thus expected to replicate well the underlying structure of spatially packed residues in a protein, the good fit of GEO to RIGs validates our approach. Additionally, we apply our approach to PPI networks and confirm that the structure of merged data sets containing both binary and co-complex data that are of high coverage and confidence is also consistent with the structure of GEO, while the structure of less complete and lower confidence data is not. Since PPI data are noisy, we test the robustness of the five classifiers to noise and show that their robustness levels differ. We demonstrate that none of the classifiers predicts noisy scale-free (SF) networks as GEO, whereas noisy GEOs can be classified as SF. Thus, it is unlikely that our approach would predict a real-world network as GEO if it had a noisy SF structure. However, it could classify the data as SF if it had a noisy GEO structure. Therefore, the structure of the PPI networks is the most consistent with the structure of a noisy GEO.