We apply and develop novel graph-theoretic/statistical/machine learning techniques for solving problems in computational biology and medicine. These techniques can provide an answer to many challenges in these domains, because they offer a natural way to integrate different types of data and to handle large amounts of noisy information.

An important idea that has emerged recently is that a cell can be viewed as a complex network of inter-relating proteins, nucleic acids and other bio-molecules. A bio-molecular network can be viewed as a collection of nodes, representing the bio-molecules, connected by links, representing relations between the bio-molecules. Examples of biological networks are, for example, regulatory networks, metabolic networks and signalling networks. At the same time, data generated by large-scale experiments often have a natural representation as networks such as protein-protein interaction (PPI) networks, genetic interaction networks, and co-expression networks. Finally, it is understood that the interconnectivity between cellular components (genes, metabolites, microRNAs etc) has important implications for diseases. The view that has become widely accepted is that genetic disease is the result of abnormal interactions between multiple players in complex networks. From a computational point of view, a central objective for systems biology and medicine is therefore to develop methods for inferring networks or parts of networks or relations between networks possibly using data which are also in the form of networks.

Much of our research focuses on developing novel mathematical methods specifically suited for making inferences on biological networks, building on these most recent results from computer science and machine learning. In particular, the methods we develop take into account both the structure of the networks representing the data and the structure of the network representing the biological question being answered. The final goal is to be able to answer questions in systems biology and medicine that will help us understand and predict complex cellular behaviour in health and disease.

Recently, we mainly focused on five areas:
  1. Inference and analysis of large-scale Protein-Protein Interaction networks.
  2. Protein Function Prediction.
  3. Inferring relationships between Genotype, Phenotype and Environment.
  4. Analysis of Biological Processes from co-expression networks.
  5. Network Medicine


Inference and analysis of large-scale protein-protein interaction networks

Proteins carry out their molecular functions by interacting with other molecules, mainly other proteins. For this reason protein interactions provide an important step toward understanding protein function and cell behaviour. Systematically mapping the set of all protein-protein interactions within an organism – the interactome – has therefore become a major challenge in post-genomic biology. Recent developments in experimental procedures (e.g. co-affinity purification followed by mass spectrometry, AP-MS) have resulted in the publication of many high-quality protein-protein interaction datasets for different organisms ranging from the yeast Saccharomyces cerevisiae to Homo sapiens.

An interactome has a natural representation as an undirected graph, often called protein-protein interaction (PPI) network, where nodes represent proteins and edges represent interactions between pairs of proteins. Often an estimation of the reliability of such interactions is available and is included as edge labels (weights). Interactomes have a modular structure, meaning that there are sets of proteins that interact with each other more frequently than with the rest of the network. These densely connected regions are typically interpreted as protein complexes, and their identification is crucial to deepen our understanding of cellular processes. The problem of identifying protein complexes from PPI data is then equivalent to detecting dense regions containing many connections in PPI networks (or regions with large weights if the networks are weighted).

In our lab research on large scale PPI networks has been funded by the BBSRC (grant BB/F00964X/1) and the Royal Society (grant NF080750). We have worked on methods for:
Protein Function Prediction

In recent years, the numerous large scale sequencing projects have generated enormous amounts of sequence data. This has led to the identification of thousands of previously unknown genes whose function awaits to be characterized. A precise definition of protein function is difficult, as in general the meaning of the term “function” depends on the context which one is considering. The current dominant solution to this problem is through the use of ontologies, consisting of terms in a controlled vocabulary organized in a hierarchical structure through a set of well-defined relationships.

Standard ontologies usually have a structure that can be modeled by a rooted and oriented tree or, more generally, by a directed acyclic graph, like the Gene Ontology, which is becoming the standard. Having defined function through ontologies, even for the best characterized model organisms, about a third of the proteins have unknown function. A fundamental goal is therefore to identify the function of uncharacterized genes on a genomic scale. It is difficult to design functional assays for uncharacterized genes so a major challenge in bioinformatics is to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated experimentally.

In our lab research in protein function prediction has been funded by the BBSRC (grant BB/F00964X/1). We have worked on methods for:
Inferring relationships between genotype, phenotype and environment

An important problem in biology is to uncover the links between the genetic makeup of an organism (genotype) and its observable physical or biochemical characteristics (phenotype). For example, this would increase our ability to rapidly characterize an unknown microorganism, which is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism’s phenotype based on the molecules encoded by its genome.

At the same time, by what means specific sequences link distinct environmental conditions with specific biological processes is also not well understood. Thus, another important challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats – i.e., how network dynamics relates to environmental features. We have worked on methods for:
Analysis and detection of biological processes from co-expression networks

Gene expression experiments measure the activity of thousands of genes in response to different conditions. Generally, genes involved in a particular biological mechanism tend to exhibit similar expression patterns and form groups. An important question in this area is that of detecting from transcriptomics data which biological processes are activated in a given condition.

Another problem is that of selecting marker genes which can represent such specific mechanisms. In fact these markers can be used as readouts and help understanding the mechanisms, monitor the interactions between them and track the physiological effect they may exert. For example, as yeast cells grow, genes involved in various hormone pathways exhibit distinct similarity in expression patterns and form groups. Sensitive and specific markers which can track and report the dynamics of each group are important for investigating the mechanisms of response to each hormone, cross-talk between hormone pathways and the relationship between hormones and phenotypic effects.

In our lab research for the analysis of transcriptomics data has been funded by the BBSRC (grant BB/F00964X/1) and Royal Holloway, through the Agnes Grace Ellen Endowment. We have developed methods for:


Network Medicine

In a cell, the function of most cellular components (genes, proteins, metabolites, micro-RNA, etc.) is brought to bear through the interaction with other cellular components. The interconnectivity among bio-molecules implies that the relation between the entire set of genes in a cell (genotype) and their physical manifestation (phenotype) is extremely complex, since it is mediated by these complex molecular networks. Network medicine is a recent paradigm that exploits the organizing principles of human cellular networks and links network structures to disease.

From a network medicine perspective, hereditary diseases can be seen as perturbations of “disease modules” in the interactome. An important effort in our lab has been aimed at quantifying similarity between hereditable diseases at molecular level by bringing together the existing information that is scattered across the vast corpus of biomedical literature.  In other words, we obtain a number that accurately quantifies distance between disease modules in the interactome.

Quantifying disease similarity at molecular level enables the transfer of knowledge between similar diseases, providing hypotheses for causal genes discovery and even suggestions for drug repositioning. This is particularly important for hereditary diseases for which no disease gene is currently known – about 30% of them. For these orphan diseases, our measure can help pinpoint the location of their molecular perturbations. Our measure can also be used for differential diagnosis, aiding medical practitioners in identifying putative alternative diagnosis that are obscured by the complexity and multiplicity of the symptoms. Importantly, we have shown that our measure can be used effectively in the prediction of candidate disease genes.

In our lab research in Network Medicine has been funded by the BBSRC (grants BB/K004131/1, BB/F00964X/1 and BB/M025047/1)