3D Genome Analysis

The different types of cells in the human body have an identical 1D genome, i.e., a linear sequence of nucleotides, yet their genomes have different underlying 3D architectures. Their different ways of packing DNA molecules into cell nuclei lead to different arrangements of genomic elements in 3D, and such layouts play a central role in gene regulation and cell fate determination.

Over the last decade, genome-wide ligation-based assays such as Hi-C have provided an unprecedented opportunity to investigate the 3D organization of the genome. Results of a typical Hi-C experiment are summarized by a chromosomal contact map, a matrix whose elements reflect the population-averaged co-location frequencies of genomic loci, which can be viewed as a measurement of the spatial proximity between genomic loci.

We realized that there are two different components contributing to the overall contact frequency observed between a pair of genes in the contact map. The first component is related to their genomic distance, i.e., the distance between genes due to the fact they are positioned sequentially on the 1D DNA strand. The second component depends on cell specific arrangements of the genes in 3D. Since all human cells have an identical 1D genome, it is the second component that has a role in gene regulation.

We developed a network-based framework that effectively extracts the 3D component of the gene proximity signal. We show that such component can be used for in-depth analysis of the interplay between the spatial positioning of genes and their regulation in different human cells, and that such interplay is consistently easier to detect and quantify than when using the contact frequency obtained directly from the Hi-C data. In other words, our procedure can be thought of as a de-noising procedure that is able to extract the 3D component of the signal from the mixture of 1D and 3D signal components that constitutes the experimental Hi-C data.