Motivation: One of the present challenges in biological research is the organization of the data originating from high-throughput technologies. One way in which this information can be organized is in the form of networks of influences, physical or statistical, between cellular components. We propose an experimental method for probing biological networks, analyzing the resulting data and reconstructing the network architecture. Methods: We use networks of known topology consisting of nodes (genes), directed edges (gene-gene interactions) and a dynamics for the genes' mRNA concentrations in terms of the gene-gene interactions. We proposed a network reconstruction algorithm based on the conditional correlation of the mRNA equilibrium concentration between two genes given that one of them was knocked down. Using simulated gene expression data on networks of known connectivity, we investigated how the reconstruction error is affected by noise, network topology, size, sparseness and dynamic parameters. Results: Errors arise from correlation between nodes connected through intermediate nodes (false positives) and when the correlation between two directly connected nodes is obscured by noise, non-linearity or multiple inputs to the target node (false negatives). Two critical components of the method are as follows: (1) the choice of an optimal correlation threshold for predicting connections and (2) the reduction of errors arising from indirect connections (for which a novel algorithm is proposed). With these improvements, we can reconstruct networks with the topology of the transcriptional regulatory network in Escherichia coli with a reasonably low error rate.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics