Much bioinformatics research is involved with the gathering, storage, and retrieval of large quantities of diverse information. We wish to not only perform these tasks, but also to use computational approaches to gain new knowledge by inferring functional interactions among such diverse data. We call our approach functional bioinformatics. (download our paper or source code). For information about the Duke University Ph.D. Program in Computational Biology and Bioinformatics follow the link below:
We use functional network inference algorithms to integrate the multiple levels of organization in the songbird brain system. Functional network inference algorithms have appeared in the past few years as a method to deal with the large amounts of gene expression data available from microarrays. They are used to infer gene regulatory interactions, or “functional networks”, from the correlational expression data obtained from microarrays. Using expression level information from either multiple samples of a system in different states or a time series of points, these algorithms calculate which genes appear to be regulators of other genes, that is, which genes increase or decrease the expression of other genes. Most work on functional network inference algorithms has concentrated on interpreting data on gene expression only; however, we have expanded some of these algorithms to apply to other data types such as electrophysiological activity and behavior of the animal.
Functional networks are representations of functional interactions. Each element, or node, in a network represents a single variable which can have multiple values. Nodes are graphically represented as ovals containing the variable name. Networks also have links, which are drawn as lines between two nodes, and which represent a correlation between the values of the nodes. Links can be directed (drawn as an arrow), which generally indicates a causal interaction. For example, the network depicted represents a cyclic regulatory cycle: increasing values of A cause increasing values of B, increases in B causes increases in C, and increases in C regulate the cycle by causing a decrease in A.
There are three major types of functional network inference algorithms: pair wise, equation-based, and network-based. The results of all these algorithms can be represented graphically as a functional network (see figure above), but they differ in the method of obtaining the links representing regulation.
Pair-wise algorithms consist of finding pairs of genes whose expression levels are correlated, and suggesting one to be the putative regulator of another. The process of correlation can be quite complex, using time-lagged correlation, fuzzy logic, or other methods. Many of these algorithms have a cut-off for significance of the correlation or a maximum number of genes another can be connected to in order to limit the interactions found.
Equation-based algorithms relate the expression of each gene to the expression level of all other genes in the form of an equation. These equations can be linear, non-linear, and/or differential equations. Putative regulators are identified by solving the set of equations for the weights which relate each gene to the others. These weights represent the regulatory influence of each gene on the others. Generally, only a few weights in the equation for a single gene differ considerably from zero and thus are considered putative regulators of that gene.
Network-based algorithms come in two basic types: Boolean and Bayesian. The best network to describe the data is found, often using a heuristic search method which tries multiple different networks before deciding on a best solution.
Boolean networks assume genes have only two states, on and off. Genes are connected to each other with logical relationships, for example: “If gene A is on, then gene B is on”. Logical operations such as “AND” and “OR” can be included in these relationships. A gene included in the “if” portion of such a statement is considered a regulator of a gene included in the “then” portion.
Bayesian networks represent probabilistic connections between genes. A regulatory link between two genes indicates that knowing the value of one helps predict the value of the other. For example, a link from gene A to gene B might indicate that if gene A is high, then gene B has a 90% change of also being high, an 8% of being medium, and only a 2% chance of being low.
Evaluating the success of functional network inference algorithms presents a large problem because it is not feasible to experimentally validate all the putative regulatory links discovered, which can number in the 1000’s. It could take several careers to validate even a single functional network! Therefore, we have created a framework using computer simulation to evaluate these algorithms. We create a computer simulation based on a biologically realistic functional network, sample data from this simulation as one would sample from a real system in a biological experiment, then apply the functional network inference algorithms to this sampled data. We evaluate the success of an algorithm by comparing the algorithm’s resulting putative functional network to our original simulated network. The better the match, the better the performance of the algorithm.
Our paper, Evaluating functional network inference using simulations of complex biological systems, published in Bioinformatics18 Suppl. 1: S216-S224, and the follow-up paper, Influence of network topology and data collection on network inference from Pacific Symposium on Biocomputing 2003, describe this framework and our use of it to evaluate a Bayesian network algorithm.