Greedily building protein networks with confidence

Research output: Contribution to journalArticlepeer-review


Motivation: With genome sequences complete for human and model organisms, it is essential to understand how individual genes and proteins are organized into biological networks. Much of the organization is revealed by proteomics experiments that now generate torrents of data. Extracting relevant complexes and pathways from high-throughput proteomics data sets has posed a challenge, however, and new methods to identify and extract networks are essential. We focus on the problem of building pathways starting from known proteins of interest. Results: We have developed an efficient, greedy algorithm, SEEDY, that extracts biologically relevant biological networks from protein-protein interaction data, building out from selected seed proteins. The algorithm relies on our previous study establishing statistical confidence levels for interactions generated by two-hybrid screens and inferred from mass spectrometric identification of protein complexes. We demonstrate the ability to extract known yeast complexes from high-throughput protein interaction data with a tunable parameter that governs the trade-off between sensitivity and selectivity. DNA damage repair pathways are presented as a detailed example. We highlight the ability to join heterogeneous data sets, in this case protein-protein interactions and genetic interactions, and the appearance of cross-talk between pathways caused by re-use of shared components. Significance and comparison: The significance of the SEEDY algorithm is that it is fast, running time O[(E+ V) log V] for V proteins and E interactions, a single adjustable parameter controls the size of the pathways that are generated, and an associated P-value indicates the statistical confidence that the pathways are enriched for proteins with a coherent function. Previous approaches have focused on extracting sub-networks by identifying motifs enriched in known biological networks. SEEDY provides the complementary ability to perform a directed search based on proteins of interest.

Original languageEnglish (US)
Pages (from-to)1869-1874
Number of pages6
Issue number15
StatePublished - Oct 12 2003

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'Greedily building protein networks with confidence'. Together they form a unique fingerprint.

Cite this