CGP: The Corrected Gene Proximity map for analyzing the 3D genome organization using Hi-C data
Network-based algorithms have been widely used in modeling high-dimensional biological data in a variety of contexts. We propose a novel graph-theoretical framework, the Corrected Gene Proximity (CGP) map, to study the effect of the 3D spatial organization of genes in transcriptional regulation. The starting point of the CGP map is a weighted network, the gene proximity map, whose weights are based on the contact frequencies extracted from genome-wide Hi-C data. We then develop a matrix algorithm to “correct” the gene proximity map by removing the signal that is contributed by the 1D genomic distance between genes. The corrected map therefore captures exclusively the cell specific 3D arrangements of genes. By integrating Hi-C and RNA-seq data of a variety of human cell lines, we show that the CGP map can more effectively detect and quantify to what extent co-expressed genes are tightly clustered, as compared with the raw contact frequencies. Analyzing the expression pattern of metabolic pathways of two hematopoietic cell lines, we find that the relative positioning between genes, as captured and quantified by the CGP, is highly correlated with their expression change. We further show that the CGP map can be renormalized to form an inter-chromosomal proximity map, allowing large-scale abnormality such as chromosomal translocations to be identified. In summary, the flexible graph-based formalism of the CGP map can be easily generalized to study any existing Hi-C datasets, enriching our understanding and interpretation of human 3D genome architecture.