Functional Bioinformatics

Much bioinformatics research is involved with the gathering, storage, and retrieval of large quantities of diverse information. We wish to not only perform these tasks, but also to use computational approaches to gain new knowledge by inferring functional interactions among such diverse data. We call our approach functional bioinformatics. (download our paper or source code). For information about the Duke University Ph.D. Program in Computational Biology and Bioinformatics follow the link below:

We use functional network inference algorithms to integrate the multiple levels of organization in the songbird brain system. Functional network inference algorithms have appeared in the past few years as a method to deal with the large amounts of gene expression data available from microarrays. They are used to infer gene regulatory interactions, or “functional networks”, from the correlational expression data obtained from microarrays. Using expression level information from either multiple samples of a system in different states or a time series of points, these algorithms calculate which genes appear to be regulators of other genes, that is, which genes increase or decrease the expression of other genes. Most work on functional network inference algorithms has concentrated on interpreting data on gene expression only; however, we have expanded some of these algorithms to apply to other data types such as electrophysiological activity and behavior of the animal.

Cyclical Network

Functional networks are representations of functional interactions. Each element, or node, in a network represents a single variable which can have multiple values. Nodes are graphically represented as ovals containing the variable name. Networks also have links, which are drawn as lines between two nodes, and which represent a correlation between the values of the nodes. Links can be directed (drawn as an arrow), which generally indicates a causal interaction. For example, the network depicted represents a cyclic regulatory cycle: increasing values of A cause increasing values of B, increases in B causes increases in C, and increases in C regulate the cycle by causing a decrease in A.

Types of functional network inference algorithms

There are three major types of functional network inference algorithms: pair wise, equation-based, and network-based. The results of all these algorithms can be represented graphically as a functional network (see figure above), but they differ in the method of obtaining the links representing regulation.

Pairwise AlgorithmsPair-wise algorithms consist of finding pairs of genes whose expression levels are correlated, and suggesting one to be the putative regulator of another. The process of correlation can be quite complex, using time-lagged correlation, fuzzy logic, or other methods. Many of these algorithms have a cut-off for significance of the correlation or a maximum number of genes another can be connected to in order to limit the interactions found.

Linear AlgorithmsEquation-based algorithms relate the expression of each gene to the expression level of all other genes in the form of an equation. These equations can be linear, non-linear, and/or differential equations. Putative regulators are identified by solving the set of equations for the weights which relate each gene to the others. These weights represent the regulatory influence of each gene on the others. Generally, only a few weights in the equation for a single gene differ considerably from zero and thus are considered putative regulators of that gene.

Network AlgorithmsNetwork-based algorithms come in two basic types: Boolean and Bayesian. The best network to describe the data is found, often using a heuristic search method which tries multiple different networks before deciding on a best solution.

Boolean networks assume genes have only two states, on and off. Genes are connected to each other with logical relationships, for example: “If gene A is on, then gene B is on”. Logical operations such as “AND” and “OR” can be included in these relationships. A gene included in the “if” portion of such a statement is considered a regulator of a gene included in the “then” portion.

Bayesian networks represent probabilistic connections between genes. A regulatory link between two genes indicates that knowing the value of one helps predict the value of the other. For example, a link from gene A to gene B might indicate that if gene A is high, then gene B has a 90% change of also being high, an 8% of being medium, and only a 2% chance of being low.

A framework for evaluating functional network inference algorithms

Network Algorithms

From left to right: we begin with a biologically realistic function network and simulate this network with a simulator (a specially designed computer program) to produce biologically realistic output. This output is then sampled, and a functional network inference algorithm applied. The network produced by the algorithm, far right, is compared to the known truth of the original network, far left, to evaluate the success of the algorithm.

Evaluating the success of functional network inference algorithms presents a large problem because it is not feasible to experimentally validate all the putative regulatory links discovered, which can number in the 1000’s. It could take several careers to validate even a single functional network! Therefore, we have created a framework using computer simulation to evaluate these algorithms. We create a computer simulation based on a biologically realistic functional network, sample data from this simulation as one would sample from a real system in a biological experiment, then apply the functional network inference algorithms to this sampled data. We evaluate the success of an algorithm by comparing the algorithm’s resulting putative functional network to our original simulated network. The better the match, the better the performance of the algorithm.

Our paper, Evaluating functional network inference using simulations of complex biological systems, published in Bioinformatics18 Suppl. 1: S216-S224, and the follow-up paper, Influence of network topology and data collection on network inference from Pacific Symposium on Biocomputing 2003, describe this framework and our use of it to evaluate a Bayesian network algorithm.

Selected references on functional network inference

  • Arkin, A., P. Shen, & J. Ross. 1997. A test case of correlation metric construction of a reaction pathway from measurements. Science 277:1275-1279.
  • Akutsu, T. S. Miyano, & S. Kuhara. 1999. Identification of genetic networks from a small number of gene expression patterns under the boolean network model. Pacific Symposium on Biocomputing 4:17-28.
  • Akutsu, T. S. Miyano, & S. Kuhara. 2000a. Algorithms for identifying boolean networks and related biological networks based on matrix multiplication and fingerprint funciton.Proceedings of the Annual International Conference on Computational Molecular Biology 4:8-14.
  • Akutsu, T. S. Miyano, & S. Kuhara. 2000b. Algorithms for inferring qualitative models of biological networks. Pacific Symposium on Biocomputing 5:290-301.
  • Chen, T., V. Filkov, & S. S. Skiena. 1999a. Indentifying gene regulatory networks from experimental data. Proceedings of the Annual International Conference on Computational Molecular Biology 3:94-103.
  • Chen, T., H. L. He, & G. M. Church. 1999b. Modeling gene expression with differential equations. Pacific Symposium on Biocomputing 4:29-40.
  • D’haeseleer, P., S. Liang, & R. Somogyi. 2000. Genetic network inference: from co expression clustering to reverse engineering. Bioinformatics 16:707-726.
  • D’haeseleer, P., X. Wen, S. Fuhrman, & R. Somogyi. 1999. Linear modeliing of mRNA expression levels during CNS development and injury. Pacific Symposium on Biocomputing 4:41-52.
  • Friedman, N., M. Linial, I. Nachman, & D. Pe’er. 2000. Using bayesian networks to analyze expression data. Journal of Computational Biology 7:601-620.
  • Hartemink, A.J., D.K. Clifford, T.S. Jaakola, R.A. Young. 2001. Using graphical models and genomic expression data to stastically validate models of genetic regulatory networks. Pacific Symposium on Biocomputing 6:422-433.
  • Hartemink, A.J., D.K. Clifford, T.S. Jaakola, R.A. Young. 2002. Combining location and expresion data for principled discovery of genetic regulatory network models. Pacific Symposium on Biocomputing 7:437-449.
  • Liang, S., S. Fuhrman, & R. Somogyi. 1998. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pacific Symposium on Biocomputing 3:18-29.
  • Mjolsness, E., T. Mann, R. Castaño, & B. Wold. 2000. From coexpression to coregulation: an approach to inferring transcriptional regulation among gene classes from large-scale expression data. In: Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, & K. R. Muller (eds). Cambridge, MA: MIT Press, pp. 928-934.
  • Thieffry, D. & R. Thomas. 1998. Qualitative analysis of gene networks. Pacific Symposium on Biocomputing 3:77-89.
  • Weaver, D. C., C. T. Workman, & G. D. Stormo. 1999. Modeling regulatory networks with weight matrices. Pacific Symposium on Biocomputing 4:112-123.
  • Woolf, P. J. & Y. Wang. 2000. A fuzzy logic approach to analyzing gene expression data. Physiological Genomics 3:9-15.