Monday, July 09, 2007

More Confusion about Junk DNA and Regulatory Sequences

Back in June, John Greally, a biologist at Albert Einstein, wrote a frustrating Nature commentary on the ENCODE project in which he repeatedly and wrongly suggested that before ENOCODE, biologists were only paying attention to regulatory sequences:

"We usually think of the functional sequences in the genome solely in terms of genes, the sequences transcribed to messenger RNA to generate proteins."

"Now... the ENCODE Project Consortium shows... that the humble, unpretentious non-gene sequences have essential regulatory roles."

"...The researchers of the ENCODE consortium found that non-gene sequences have essential regulatory functions, and thus cannot be ignored."

Now, Greally spreads more confusion on NPR's Science Friday (hear it here at Sandwalk, where Larry Moran is as stunned as I am) by continuing to act as if we had no idea before ENCODE that regulatory sequences were so prevalent or important. In the interview, Greally even goes so far as to suggest (and allow the interviewer to suggest, without correction or clarification) that 95% of the genome consists of regulatory sequences. Nor does he correct a caller on the show who claimed that scientists were ridiculous to even suggest the idea of junk DNA.

The ENCODE project showed no such thing, and it wasn't the huge breakthrough in our understanding of non-coding DNA that Greally is hyping it to be. Non-coding regulatory sequences have been intensely studied, including large-scale experimental and computational surveys from yeast to humans. These sequences have not been ignored; many labs have put a lot of effort into identifying and understanding them.

Nor did the ENCODE project bury the idea of junk DNA. For example, 10% of the human genome consists of hundreds of thousands of copies of parasitic stretches of DNA called Alu elements. (A search on Google Scholar will turn up free copies of this paper.) Alu elements can, on occasion, generate beneficial and novel genomic diversity, but most copies of Alu elements are non-functional and unable to replicate - in other words, junk. In fact, Alu insertions can cause disease - as they hop around the genome, they occasionally break something.

There are many, many other examples of this kind of junk; it's not all poorly understood 'dark matter' of the human genome, as Greally suggests in the NPR interview. When we have finally identified all of the regulatory sequences, I predict that the total amount of functional regulatory sequence will still be much less than that the 45% of the genome comprised of the parasitic LINE and SINE transposable elements.


CedricF said...

I am in general agreement with your post and I am glad that somebody in the Science blogsphere is posting about it. I want to comment on two points that I thought deserve some further clarification/discussion.

First, the contribution of Alu or other repeats to beneficial vs. neutral vs. deleterious effects on the host genome remains largely unknown. There's been lots of writings on the topic, especially about Alus. One of the most intriguing aspects is the observation that the genomic distribution of Alus shift over time toward gene-rich region of the genome, a shift that is not observed for other human TEs. This observation remains open to different interpretations, including the hypothesis that Alus become functionalized with time or 'exapted'. A similar shift has been observed for other SINEs in the mouse genome, consistent with an old idea put forward by Carl Schmid that SINEs may have a general physiological function, perhaps in gene regulation under stress conditions (see work in the 90's in Schmid lab). What seems clear though is that TEs are not proliferating because they provide an immediate advantage to their host, but simply because they can replicate faster.

Second, it is inaccurate to say that LINEs and SINEs make up 45% of the human genome. The most recent estimation is around 35% (see mouse genome paper in Nature by Waterston et al 2002). Another 10-15% of the genome is made of different types of TEs: nearly 10% is derived from retroviral-like elements, and at least 3% of DNA transposons. Merely a technical comment I guess, but there is a trend to over-emphasize the impact of LINEs and SINEs and ignore other kinds of mobile elements present in mammalian genomes.

Unknown said...

Thanks for your comments and for visiting the site. I should have been more careful on my LINES/SINES percentage (and I even had Kazazian's nice Science review of the topic on my desktop - ugh!). My specialty's the yeast genome, I don't keep up with the latest stats on the human genome as well as I should.

The evolution of regulatory elements, including via TEs, is an interesting (and thorny) topic. It's true that with some families of TEs the story looks like it is more complex (and fascinating) than simple parasitism, and thus TEs are an active, fruitful area of genome research.

I guess that my main point on the terminology is that the term 'junk' is not (or should not be) an ignorance-covering, catch-all term used to describe non-coding DNA that we know nothing about. It is a reasonably appropriate term to describe most of the known parasitic elements that make up huge chunks of the genomes of multicellular organisms. In some cases specific elements, or as you point out, entire families of elements, may play some functional role in the organism, and in those cases the name 'junk' probably shouldn't not apply anymore.

CedricF said...

Yes you're totally right about the term 'junk DNA'... Some would even discourage the use of the term 'junk DNA' to designate TEs at all... Ryan Gregory at Genomicron has some good posts about this.

While I am not completely happy about the way the term is being used and abused these days in the press (popular and scientific alike), I found myself guilty of using it all the time, especially when I try to explain what my research is about.

You may or may not know, but I actually work on TEs and I just started a blog called Mobile DNA. I'll try to post regularly or semi-regularly on the topic --Check it out!

Unknown said...

I've added your blog to my RSS folder - I look forward to checking in over there regularly. Good luck with the blog. That's a great topic to blog about, with so many papers coming out basically weekly on the subject.