You must know the feeling of opening a box of jigsaw puzzle, seeing all the pieces in front of you, knowing from the cover what the whole picture will look like, but not knowing where the individual pieces belong? You might have an idea that the blue pieces belong to the sky and the red pieces to the house, but you don't quite know the details. This is how researchers who sequence human genomes feel when they analyze their sequencing data. With the latest technologies, they are able to sequence all the 3.2 billion bases of the human genome. The first milestone was the completion of the human genome project in 2003: from the 3.2 billion bases, researchers were able to deduce around 20000 genes making up only less than 2% of the genome — a surprising finding since it left more than 98% of the sequences without any obvious function. If compared to the jigsaw puzzle above it would mean that only 20 pieces out of, say, 1000 have colors distinguishing them from the rest of the puzzle. Due to the lack of known function, these vast majorities of genomic sequences were termed “junk DNA”. Recent decades of sequencing and the science of genomics and genetics have however shown that the junk is actually nowhere close to being junky.
One important function here is buffering. Environmental cues constantly damage our cells’ DNA. UV-A and UV-B radiation in strong sunlight can cause exposed cells to get 100000 lesions per hour. Only one such lesion, if interfering with the sequence of an important gene, can be enough to turn the cells cancerous. If our genomes were end-to-end arrays of genes, we had probably quickly ceased to exist. If most of our DNA was functional, we would acquire large amounts of harmful and lethal mutations, which we would inherit to our offspring. In the above scenario most of the lesions hit the junk DNA, thus becoming irrelevant and reducing the likelihood of DNA damage targeting an actual gene. Large stretches of non-coding DNA which do not encode proteins provide natural buffering.
But buffering is far from the only function of such non-coding DNA. Researchers have found many such regions to be transcribed into non-coding RNAs that regulate gene expression and protein function. Examples of such function include regulating if protein is expressed from a gene, and how much if so. Other non-coding RNAs regulate the spatial structure of the genome within the cell, thus controlling how accessible the DNA is for the protein. 3.2 billion bases occupy a lot of space — about three meters in length to be precise — and therefore need to be tightly packed into the tiny nucleus of a cell. Despite this packing, the genes need to be accessible to the cellular machinery.
The largest portion of the junk DNA accounts for very ancient mobile DNA elements called retrotransposons. These elements can move within the genome and increase in number. They might have supplied sequence variation in mammalian evolution but organisms evolved strategies to suppress their mobility. People have named more than 20 classes of non-coding RNAs identifying thousands and thousands of such sequences, all of which have specific functions and may be present in specific body tissues.
Biological function in many cases is indirectly detected by malfunction. Many non-coding RNAs have been shown to be misregulated in cancer, highlighting their vital importance in proper cell functioning. Other findings have indicated involvement in autism and Alzheimer’s.
By combining all of this knowledge, researchers are steadily assembling the junk jigsaw of the human genome. Every piece has its place.