A network approach to integrate different –omics data

My everyday commute to uni is approximately 1 hour, so I try to use this time efficiently by reading relevant literature for my PhD project.  Inspired by a paper from Kuchel et al. in 2010 I developed a network approach to integrate different -omics data, such as gene expression data (transcriptomics) and NMR and/or MS data (metabolomics). My initial approach is outlined in this blog post.

Creating a differential gene co-expression network

Suppose you have gene expression data from a disease group and a healthy control group, and you create a gene co-expression network (where nodes represent genes and edges represent high absolute correlation values), such that high correlated gene profiles in the control group are represented by green edges, red edges represent high correlations in the disease phenotype and black edges are drawn of two gene profiles are correlated in both, control and disease phenotype.

In theory, this differential gene co-expression network encodes for three gene relations, which can be described by the following Boolean expressions:

  1. C ᴧ ⌐ D (co-expressed in control and NOT in disease) (=green edges)
  2. ⌐ C ᴧ  D (co-expressed in disease and NOT in control) (=red edges)
  3. C  ᴧ  D (co-expressed in  control AND in disease) (=black edges)

Such a network for hepatocellular carcinoma (HCC) is depicted in Fig. 1 (left panel). Interestingly, the differential co-expression networks constructed for HCC and colorectal cancer (CRC) show a modular network structure, ie., there are highly connected subnetworks which comprise mostly a single edge color (green, red or black), indicating that there is a specific and distinct co-expression pattern associated with both phenotypes. Mathematically, this modular structure can be captured by hierarchical clustering of the networks topological overlap matrix (TOM), as described previously by Dong et al. (Fig.1, right panel). The clustering process is outlined for a toy network in Fig.2.

Adding metabolite information

In the next step, I tried to extend the differential gene co-expression network by incorporating gene-metabolite information. In the toy network, each metabolite is represented by a quadratic node and an edge is drawn to a circle node (=gene) if and only if there exists a relationship between the metabolite and gene (or gene-product). The definition of a gene-metabolite relationship can vary. In a very strict sense it may be defined as substrate-enzyme relationships. In a more broad sense, it may be defined as the involved of a metabolite and a gene (or gene product) in the same pathway. My toy network is based on the gene – metabolite relationships defined in Fig. 3.

Assigning metabolite states to gene co-expression profiles

Once the metabolite-gene relationships are established, I analysed the neighbourhood of each metabolite. In specific, I looked at the cluster membership of all direct neighbours of a metabolite node. Node mD for example, is connected to the nodes g3a and g3b, which belong to the control group cluster (= brown cluster).  Thus, metabolite mD is associated with the control group and not with the disease group. The association of a metabolite node with a cluster is simply calculated with the transitivity (aka. clustering coefficient), which is the number of edges that connect the metabolite node with nodes of the cluster, normalised by the maximal possible number of edges with the cluster.

In the end, each metabolites can be associated to gene clusters and phenotypes, such as listed in the Tabel 1. The transitivity value (all one in this example) will decrease when the number of edges to a cluster is not maximal and can be seen as a measure of cluster-metabolite association strength.

 

Leave a Reply

Your email address will not be published. Required fields are marked *