Back in June, John Greally, a biologist at Albert Einstein, wrote a frustrating Nature commentary on the ENCODE project in which he repeatedly and wrongly suggested that before ENOCODE, biologists were only paying attention to regulatory sequences:
"We usually think of the functional sequences in the genome solely in terms of genes, the sequences transcribed to messenger RNA to generate proteins."
"Now... the ENCODE Project Consortium shows... that the humble, unpretentious non-gene sequences have essential regulatory roles."
"...The researchers of the ENCODE consortium found that non-gene sequences have essential regulatory functions, and thus cannot be ignored."
Now, Greally spreads more confusion on NPR's Science Friday (hear it here at Sandwalk, where Larry Moran is as stunned as I am) by continuing to act as if we had no idea before ENCODE that regulatory sequences were so prevalent or important. In the interview, Greally even goes so far as to suggest (and allow the interviewer to suggest, without correction or clarification) that 95% of the genome consists of regulatory sequences. Nor does he correct a caller on the show who claimed that scientists were ridiculous to even suggest the idea of junk DNA.
The ENCODE project showed no such thing, and it wasn't the huge breakthrough in our understanding of non-coding DNA that Greally is hyping it to be. Non-coding regulatory sequences have been intensely studied, including large-scale experimental and computational surveys from yeast to humans. These sequences have not been ignored; many labs have put a lot of effort into identifying and understanding them.
Nor did the ENCODE project bury the idea of junk DNA. For example, 10% of the human genome consists of hundreds of thousands of copies of parasitic stretches of DNA called Alu elements. (A search on Google Scholar will turn up free copies of this paper.) Alu elements can, on occasion, generate beneficial and novel genomic diversity, but most copies of Alu elements are non-functional and unable to replicate - in other words, junk. In fact, Alu insertions can cause disease - as they hop around the genome, they occasionally break something.
There are many, many other examples of this kind of junk; it's not all poorly understood 'dark matter' of the human genome, as Greally suggests in the NPR interview. When we have finally identified all of the regulatory sequences, I predict that the total amount of functional regulatory sequence will still be much less than that the 45% of the genome comprised of the parasitic LINE and SINE transposable elements.