Adaptive Complexity: 2007

Friday, December 21, 2007

What's The Matter With Texas? Creationism On Its Way Back

Is the State of Texas about to offer Master of Science degrees in creationism? The Institute for Creation Research (ICR), an organization that officially believes the earth sprang into existence less than 10,000 years ago, has applied to offer a state-approved Master's program in science education. Last week, an official advisory committee recommended that the State of Texas approve the ICR's request to offer Master's degrees (read about it here and here). If this request is granted, the ICR has two years in which it can offer state-approved Master's degrees while seeking accreditation for its program from a recognized, outside accreditation organization. Coming on the heels of news that one of the state's science education officials was forced out of her job because she was not "neutral" about standing up for evolution education, this latest event suggests that creationism is about to again become a big issue in the Texas educational system.

It is one thing for a private organization to teach whatever outlandish beliefs it prefers, and for students to attend non-accredited colleges - it's their educational choice, and no big loss to the rest of the educational system. But it's another issue altogether for the state to give its imprimatur to such an organization when intends to train science teachers who will then go out and work such sectarian and unscientific beliefs into public school science classes.

Is it acceptable to accredit a science education program that teaches science students that they can build perpetual motion machines that violate the laws of thermodynamics? That matter is not made up of atoms, and that diseases are caused by 'humours' and not germs? Of course not, and by the same token, it is wrong to give state approval to a Master's program that teaches future science educators that the earth suddenly appeared less than 10,000 years ago, and that today's living species did not descend from a common set of ancestors.

The ICR is free to preach whatever it likes, but it should not be allowed to dress sectarian beliefs up as science and use them to train science teachers who will be hired by the state to teach in public schools. The stuff that institutions like the ICR and the Discovery Institute peddle is not science. Its advocates repeatedly exhibit an extremely poor grasp of the scientific theories they are supposedly critiquing, and in their criticisms they continue to make basic errors about the actual technical content of mainstream science. They love to scour scientific journals and highlight material that they (wrongly) believe undercuts evolution, yet tellingly, they never actually participate in such research themselves. (If Intelligent Design advocate Michael Behe is so convinced that pseudogenes are the product of an intelligent designer and not the accidents of evolution, why isn't he doing any research himself to look for functional pseudogenes?)

The quality of science education in US schools is falling behind that of other industrialized nations, and here we are, taking actions to officially support organizations that are devoting millions of dollars to undercut good science education. Southern states are trying hard to attract scientific talent and biotech dollars to their states, but people who are able to perform top science are generally not very eager to move to a state where their kids are likely to be taught some variant of creationism at school. Texas officials need to show some spine and maintain the integrity of science education in their state.

Tuesday, December 11, 2007

Seeing Human Embryos - Two Different Perspectives

The NY Times today has a profile on Shinya Yamanaka, the senior author on one of the recent papers reporting the creation of pluripotent stem cells by expressing 4 transcription factor genes in adult fibroblasts. Yamanaka (who "is known on campus for refusing to join colleagues for lunch, choosing to eat by himself so he can keep working" - unfortunate, since informal conversations with colleagues can really be quite useful) has without question achieved something significant by successfully creating stem cells, first from mice and then from human fibroblasts. However I take issue with Yamanaka's outlook on the ethics:

"Dr. Yamanaka was an assistant professor of pharmacology doing research involving embryonic stem cells when he made the social call to the clinic about eight years ago. At the friend’s invitation, he looked down the microscope at one of the human embryos stored at the clinic. The glimpse changed his scientific career.

'When I saw the embryo, I suddenly realized there was such a small difference between it and my daughters,' said Dr. Yamanaka, 45, a father of two and now a professor at the Institute for Integrated Cell-Material Sciences at Kyoto University. 'I thought, we can’t keep destroying embryos for our research. There must be another way.'"

You might expect Dr. Yamanaka to be concerned about the much greater destruction of embryos that takes place in fertilization clinics - far more human embryos are destroyed there than in research labs. A petri dish is not exactly the most natural environment for an embryo, and the very act of creating embryos in vitro puts those embryos at risk.

While I'm not belittling Dr. Yamanaka for his views, and realize that he's just giving his impression and not necessarily making a strong ethical statement, I do think we should realize that his reaction is not universal. I've seen human embryos as well - and two of them became my youngest daughters. Several others never made it that far. I don't feel traces of sorrow for those lost embryos.

I suspect most physicians who work in fertility clinics feel the same way. Many, if not most scientists whose views I personally know, share this persepctive. We're happy for the ones that become human beings, but we don't lose sleep over the millions of little clusters of cells that are lost each year, both in and outside of IVF clinics.

Tuesday, December 04, 2007

Stem Cell Researchers Respond to Bush's Claim

The White House has claimed that Bush's firm stance against embryonic stem cell research helped to stimulate the recent breakthroughs in creating pluripotent cells without destroying embryos.

In an editorial in the Washington Post, stem cell researchers themselves respond.

Tuesday, November 20, 2007

We Can Reprogram Skin Cells Into Stem Cells - So Do We Still Need Embryos?

This month we've witnessed the first-time success of two important stem cell research techniques in primate cells. Both techniques were previously developed in mice, but their success in humans and monkeys is important. Stem cells from cloned embryos have been generated from macaque cells. And now this week, two papers (here and here - this last one is a PDF file) have been published that are reporting that adult human skin cells can be reprogrammed to become stem cells. However, do the results of this week's papers mean that we no longer need to get stem cells from embryos? The answer, for now, is a resounding no - reprogrammed skin cells currently have some serious drawbacks that need to be overcome before they can become worth trying in disease treatments.

How do you reprogram a skin cell (in this case, fibroblasts - a very generic, easily handled cell commonly used in labs) to become a stem cell? The key step is to induce 4 genes that produce important master regulators, the transcription factors named Oct3/4, Sox2, Klf4, and c-Myc. These transcription factors are proteins which switch large sets of genes on or off, thus initiating a cascade of genetic signals that enable the fibroblast to transform itself into a stem cell. It's amazing that just these 4 genes can initiate so many substantial changes, but this is a common phenomenon in biology, found even in yeast.

But does this technique now eliminate the need to take stem cells from embryos? Not yet. Contrary to statements by the White house, this research was not a result "the president’s drawing of lines on cloning and embryo use." Without research on embryonic stem cells, we would have had a hard time identifying the role of the 4 transcription factors in the first place. And even with no restrictions on embryo use, researchers would have still tried these important experiments. Science at its best attacks problems using a variety of strategies.

And it's not clear that this technique is going to produce better results in the near future. The process of transferring the 4 regulator genes into fibroblasts involves a type of virus (not a health-threatening virus - it's basically a handy lab tool), which deposits multiple copies of these genes almost at random around the genome. These four genes can end up in good places or bad places in the genome. In bad places, stem cell transplants can develop into cancers, with the c-Myc gene being an especially frequent culprit (it's a gene known to be involved in a variety of cancers). In mice, up to 20% of stem cell recipients develop cancer.

So our ability to manipulate these cells for a useful purpose still lags. We can do a somewhat better job manipulating embryonic stem cells, but in general we have a hard time getting any plutipotent stem cells to produce the exact kind of desired differentiated tissue, such as nerve or heart cells. To improve the process we need to keep working on reprogramming, but we also need to study how stem cells actually work in nature - that is, in embryos.

Monday, November 19, 2007

Monkey Stem Cells From Cloned Embryos - Humans Are Next...

Headlines last week reported that researchers successfully produced stem cells from cloned monkey embryos. Using a process that has become almost routine with mice, scientists can now make make primate embryonic stem cells that are genetically identical to a given DNA donor. Once we learn to do this in humans, the possibility of stem cell based treatments for heart disease, neurodegeneration, and more will be closer to reality. But in the US and elsewhere, can we develop the political will to let this research move forward?

In case you missed the headlines, the story is this: a group of researchers at the Oregon Health and Science University led by Dr. Shoukhrat Mitalipov used the process of somatic cell nuclear transfer (SCNT) to create cloned Rhesus macaque embryonic stem cells. SCNT involves taking the DNA from an adult skin cell and implanting this DNA into an egg cell from which the DNA has been removed. This DNA transplant procedure can produce an embryo without the need for a fertilized egg - and most importantly, it produces an embryo genetically identical the DNA donor, which means that potential therapies using SCNT-derived stem cells wouldn't be rejected by a patient's immune system.

SCNT itself is old news; what is impressive in this research is the new ability to do SCNT with primate cells. This breakthrough was enabled by better imaging technology. SCNT is like a heart transplant where you just rip out the old heart with a giant tug, leaving a lot of damage behind. With the new technology, Dr. Mitalipov's team was able to better see what they were doing as they carried out the procedure.

If it can be done in monkeys, it's just a matter of time before it can be done in humans. Back in 2004 the world thought that human stem cells had been created from such cloned embryos, but that research turned out to be faked. The researcher behind those studies, Woo Suk Hwang from Seoul National University, was exposed as a crook and the already controversial image of human stem cell research was tarnished. Because of this history, Nature and Dr. Mitalipov went to great lengths to prove that the macaque embryonic stem cells reported in this paper are real, including requesting genetic confirmation from an independent group.

Where does this research go from here? Scientifically, we're still some way off from using these cells to replace ailing nerve or heart tissue in patients with terminal diseases. The many political and legal hurdles are even more formidable in the immediate future. To take just one example, at SCNT's current success rates you need hundreds of eggs from donors before you get just one successful embryo. Egg donation is not a pleasant process, and laws in some states prevent researchers from financially compensating donors, unlike other types of research volunteers who are paid for their time and trouble. Success rates will no doubt improve as researchers learn important tweaks in the procedure, but that can't be done without lots of trying. If the US doesn't develop the political will to make it possible for stem cell researchers to do their work, future treatments will be developed elsewhere and the US will lack the expertise in both research and patient treatment. The macaque stem cell paper indicates that the day when we will know how to use embryonic stem cells for medical treatments looms ahead, and unless we make changes in this country, we won't benefit from them.

Thursday, November 15, 2007

Making a Weed that Eats Explosives

RDX is a common military explosive, and it’s dangerous not just because it explodes - it’s also toxic. Places where RDX is used, produced, or stored often present a serious hazardous waste problem, such at the Massachusetts Military Reservation on Cape Cod, where the local aquifer has been contaminated with RDX. A group of researchers from the University of York in the UK and Canada’s Biotechnology Research Institute have shown how it might be possible to clean up RDX with explosives-eating transgenic plants.

Plants that use man-made explosives as a nitrogen source are not so easy to find in nature. In this case, the researchers genetically engineered two key bacterial proteins into the thale cress Arabidopsis thaliana. Plants that produce these proteins are able to take up RDX from soil and metabolize it. But how did these researchers find such useful bacterial proteins?

The answer is that they searched in the obvious place - contaminated soil where bacteria that metabolize explosives are likely to evolve. Bacteria reproduce extremely quickly and live in fairly large populations, thus it is not uncommon for a bacterium with just the right mutation to turn up and outcompete its peers in a new environment. The researchers found a bacterial strain of R. rhodochrous that was able to use RDX as its sole nitrogen source. Explosives are nitrogen-rich compounds, so this bacterial strain has evolved to take advantage of an abundant nutrient in its environment.

And it turns out that not many mutational changes were required to make this strain of R. rhodochrous an explosives-eater. All organisms contain many versions of a versatile class of electron-shuttling proteins called cytochrome P450s. This class of proteins is involved in some of the most sophisticated chemistry in the cell, including the synthesis of steroids and various vitamins. These cytochromes have been tweaked by evolution to perform an astounding range of chemical jobs. The explosives-degrading R. rhodochrous harbors a variant cytochrome P450 that is able to break an N-NO2 bond, something rarely encountered in nature but present in RDX.

Once they had the bacterial gene in hand, the researchers attempted to express it in the Arabidopsis plant. Since the plant itself has plenty of cytochrome P450s, it was easily able to properly synthesize the bacterial version in quantities large enough to start metabolizing RDX from the soil. This was a straightforward use of evolution and biochemistry to create a useful genetically engineered organism. It's possible that this approach could be successful a range of environmental problems. Given the chemical sophistication and diversity of the metabolic pathways found in nature, the chances of finding a solution to other pollution problems like this one are good.

Wednesday, November 14, 2007

Intelligent Design's Day in Court on NOVA

Last night the PBS series NOVA featured a two-hour show on the 2005 Dover, PA Intelligent Design trial. If you missed it, go check out clips and some great evolution resources at the show's website. As a creation/evolution junkie, I had previously read all of the trial transcripts, but reading transcripts was no substitute for seeing and hearing the major participants on camera. And while the big players from the Discovery Institute refused to be interviewed, NOVA managed to get just about everyone else on camera, including one of the defense's expert witnesses and the two ex-school board members who started the whole mess. These guys made it abundantly clear in their own words that intelligent design in Dover was not about improving science education - it was all about pushing creationism among the students.

The show was generally well done, in spite of some badly acted (not to mention tacky) courtroom reenactments. There were excellently illustrated segments discussing science's success in some areas where intelligent design advocates have claimed there are problems, such as transitional fossils and the bacterial flagellum. I was happy to see that the show also spent time discussing how the development of genetics, and much later, genomics, were big tests of evolution. Darwin's ideas enabled scientists to make predictions that were borne out decades later in scientific fields that Darwin knew nothing about.

The most entertaining part of the show were the interviews with the two ex-school board members (who were caught lying during the trial) and a local Dover pastor. The founders of intelligent design have gone to great lengths to paint their ideas as serious science, not creationism. And yet on the show we hear Dover's local intelligent design advocates explain that they pushed intelligent design in school because they were concerned that the malleable students of Dover were having their good Christian faith weakened by evolution - not because the board members cared (or knew anything about) a good science education. Alan Bonsell and Bill Buckingham made it repeatedly clear that their main beef with evolution is that it offends their religious beliefs.

Judge John Jones, who presided over the trial, made a sobering statement towards the end of the show. He said that before this case he would have never have imagined that he would receive threats after an Establishment Clause case. Jones and his family were placed under the protection of the US Marshals for a time after his ruling, due to threats of physical harm Jones received. Of course I'm biased, but I can't imagine that Jones would have received such threats had he ruled the other way. And while I have no doubt that the overwhelming majority of creationists would never, ever make such threats, much less act on them, the incident does expose the hatred that some people feel towards those who work to keep our science curricula untainted by religious dogma. This was a nasty episode in creation/evolution history, but with luck this trial will produce a lull in the battle for a couple of years. Unfortunately, we all know that this controversy is not over.

Monday, October 01, 2007

Do Universities care about more than image?

"What makes the modern university different from any other corporation?" That's the question asked in the NY Times this week. In light of increasingly unaffordable tuition rates and the profitable but corrupt business of lending huge sums of money to students, do universities deserve their nonprofit status? Is the tight competition for federal grants corrupting the mission of universities?

The Times isn't so sure:

"Driven by big science and global competition, our top universities compete for “market share” and “brand-name positioning,” employ teams of consultants and lobbyists and furnish their campuses with luxuries in order to attract paying “customers” — a word increasingly used as a synonym for students."

I can relate: during my orientation as a new postdoc at Washington University, I was given what basically amounted to a marketing lecture by the HR person: I was told about the ongoing marketing campaign to raise awareness around town about our medical school's national status, and lectured about how every day in my work I should think about how I can please my customers. And just who are my customers? Everyone - my mentor, the grad students in the lab, the dean, and God knows who else.

So ok, my job orientation indoctrination was a dumb triviality, but are big research univeristies still pursuing the higher mission they were founded on? I think that the answer to a large degree is still yes, but the risk that money concerns are overwhelming more fundamental ones is real. Not all, but most cutting edge scientific research these days requires a lot of money and supportive infrastructure. That also means a lot of 'overhead' funds for the universities that win lots of federal dollars.

Most disturbing is the excessive focus on the university's image, rather than substance, because a good image (including a good US News ranking), as coporate America knows, can often be more effective at attracting customers than a truly substantial product.

Major research universities are a good thing - they produce the best science in the world, and often the best medical care in the world. But they (along with great liberal arts schools) should also produce a top education for their students, who will hopefully in the future produce more great science and medical care. Students who are educated in an environment of excessive concern about image and coddling 'customers' are unlikely to make great future scholars.

The biggest casualty of this trend, beside the decline in the general quality of a college education, and collegetown streets draped with tacky banners emblazoned with slogans pushing the local school's image, is the eroding status of universities as the best source in our society of excellent, honest scholarship. If we don't pass on how and why we do good scholarship, our culture will forget and be surpassed by others that do care about honestly understanding our world.

Wednesday, September 19, 2007

Evolution's Balancing Act

Evolution carries out an incredibly tricky balancing act: the genetic program of a species has to be resistant to small changes, yet also susceptible to the adaptive remodeling of natural selection. The human genome is so robust that over 6 billion variations give rise to viable organisms that have successfully traversed the complex developmental program that produces a live human infant from a single cell. Yet the human genome is the product of major evolutionary innovation, even over the relatively short period since the human and chimp lineages diverged. How can genomes be robust and malleable at the same time?

A team of researchers led by Andreas Wagner have recently published an interesting theoretical study of this question. This subject can get abstract and tough to follow if you haven’t mastered a lot of technical jargon. However these results are worth the effort to follow the argument, and below I try to explain this study in a relatively jargon-free way.

So how does evolution maintain both stability and the potential for innovation? It’s much easier to study this question by choosing a small model system to study, such as a transcription factor regulatory network. A transcription factor regulatory network is essentially a set of genes that switch each other on or off (via, of course, the proteins that are encoded by these genes). These networks are the subject of Wagner’s research.

Over time, a stable pattern can emerge in a transcriptional regulatory network, with some genes on and others off. This phenomenon, the emergence of a stable pattern of gene expression, is roughly what happens when a stem cell differentiates into another type, such as a nerve or muscle cell - a certain combination of genes are switched on or off to produce a stable pattern that represents the final state in the fully differentiated cell.

A critical point to understand is that different versions of the transcriptional regulatory network can produce the same final, stable pattern of gene expression. In other words, we can rewire some of the connections in our network, but still get the same final pattern of genes switched on or off.

This is one way that evolution produces networks that are stable to change - small variations don’t radically alter the end result of the transcription factor network. Andreas Wagner and his colleagues used simulations to test just how much rewiring can happen in a network that still produces the same final gene expression pattern. The answer is that you can in fact do a lot of rewiring and still have your network carry out its proper function. Naturally some rewiring is going to produce catastrophic change, but the point is that evolution has many different ways to produce the same final pattern of gene expression. Once nature has hit on a great way to make a stem cell differentiate into a nerve cell, it’s not that difficult to keep making nerve cells, even in the face of significant evolutionary change.

Thus this group of researchers has used simulations to come up with a theoretical understanding of how the complex systems in a lineage of organisms remain stable over long periods of evolutionary change. But if these systems are so stable, how does evolution ever produce something new? If it is so stable, how can a transcriptional network which produces a stable pattern of gene expression in a nerve cell ever be rewired to produce an expression pattern for a new cell type?

The answer to this question is paradoxically tied to the very features that make transcriptional networks so stable. Wagner and his colleagues found in their simulations that networks which are structured in very similar ways can be mutated to produce only a few limited, new patterns of gene expression. But we learned just a moment ago that transcriptional networks can be structured in very different ways yet still produce the same final effects; these different networks can evolve in different ways and thus increase the potential for evolutionary innovation.

An example can make this idea clear. Let’s say we a dozen different but closely related species; each of these species produces its nerve cells using very similar transcriptional regulatory networks. The networks can only be changed in certain, limited ways, thus the potential for evolutionary innovation is limited.

But now let’s say we have a hundred different (more distantly related) species, and they all produce nerve cells (with similar stable patterns of gene expression), but they do so using a wide variety of transcriptional network structures. The pool for evolutionary innovation is now much larger, meaning that it’s much more likely that a new and useful cell variant will evolve.

If all of these hundred species had to produce their nerve cells in exactly the same way, using very similarly structured transcriptional networks, this potential for evolutionary innovation would not be possible. Thus Wagner’s group demonstrated how it’s possible for regulatory networks to be both stable, and yet malleable at the same time.

Tuesday, August 21, 2007

Domesticating Biotechnology in the 21st Century

Will we domesticate biotechnology in the next 50 years? More than 150 years of spectacular advances in physics, chemistry, and computing have thoroughly transformed the way we live. Yet so far, the big revolutions in molecular biology have had their impact primarily on professional laboratories, not our everyday lives. What do we need to do in order to domesticate biotech?

Physicist Freeman Dyson recently explored this question:

"Will the domestication of high technology, which we have seen marching from triumph to triumph with the advent of personal computers and GPS receivers and digital cameras, soon be extended from physical technology to biotechnology?"

Dyson predicts this will happen in the next 50 years:

"I predict that the domestication of biotechnology will dominate our lives during the next fifty years at least as much as the domestication of computers has dominated our lives during the previous fifty years."

What form might this domestication take? Among Dyson's suggestions for domestication is user-friendly genetic engineering for hobbyist plant and animal breeders. I'm not so sure that making genetic engineering idiot-proof is the major hurdle; in fact, genetic engineering today is somewhat of an oxymoron. We may be able to engineer pet fish to express a green fluorescent protein, but we honestly have no clue how to engineer any but the most simple, monogenic traits.

We will dometicate biotechnology, and I predict that this will happen in two ways: by bringing biotech into the day-to-day practice of medicine, and by bringing genetic engineering to a truly sophisticated level, on par with aerospace engineering.

Bringing Biotech into the Clinic
To be honest, with the exception of imaging technology, medicine as practiced today is extremely low-tech. Very few of the fancy techniques that scientists use in a molecular biology lab are available on a routine, affordable basis in the clinic. Blood tests are downright primitive. And in spite of all of our sophisticated genome analysis technology, detailed genotyping is almost never used in medicine. Biotech is ripe for domestication in the clinic.

Dirt Cheap Genome Sequencing
One day, every newborn child will be routinely genotyped; that is, the hospital lab will take a blood sample and, quickly and cheaply, determine that baby's DNA sequence in the millions of places where humans can differ. Our genotype will become part of our medical records, which of course we ourselves will also have access to. Genotyping can be used to customize drug and disease treatments, as well as suggest lifestyle choices that will help avoid or minimize diseases that a person may be susceptible to. Universal genotyping can even be used by family history hobbyists.

The technological barriers will soon be overcome, leaving the social ones remaining as the largest obstacle to universal genotyping. Who can have access to this information? How much do you really want to know about your disease susceptibility? Your paternity? These aren't trivial questions.

The Universal Blood Test
High-tech, preventative diagnostics will transform the way we practice medicine. Most of today's diagnostic tests, with the exception of medical imaging, are based on decades-old techniques. Leroy Hood, a founder of Seattle's Institute for Systems Biology is working on technology for affordable, routine blood tests that will provide a comprehensive picture of your health, including the very early detection of diseases like cancer. These blood tests, one day cheap enough to be done annually, could thoroughly modernize preventative medicine.

Real Genetic Engineering
At Boeing, engineers can essentially design a new plane completely by computer, and predict in minute detail how that plane will behave in real-world weather. True genetic engineering will mean being able to make such quantitative predictions with the cell, but currently our abilities to make quantitative predictions are embarrassingly small. Analogous to Boeing's computer-aided design, computer aided genetic engineering will one day enable us to develop gene replacement therapies that don't have cancer as a side effect, develop specific, side effect-free drugs that treat tough diseases, and develop microbes that can generate energy from renewable resources, clean up toxic spills, or perform chemical reactions that organic chemists haven't yet been able to achieve.

The first step towards achieving this level of sophistication will be to completely understand all of the parts of an organism; the next will be to understand how those parts work as a system. We've nearly reached that first step for a eukaryotic organism: brewer's yeast, one of biology's key model organisms, will have an essentially completely annotated parts list within the next 10 years or so. Many scientists are now struggling with the next step, trying to make sense of how these parts work as a system. Yeast will be the first Boeing 747 of biology - an organism that we can completely and predictively model by computer, without extensive trial and error studies in the lab.

Maybe, after we've really learned how to do genetic engineering, hobbyists will then fulfill Dyson's dream of user-friendly plant design, and come up with a way to make glow-in-the-dark roses.

Sunday, August 19, 2007

The Politics of God in the NY Times Magazine

The NY Times Magazine today has cover piece arguing that while the West may have figured out how to largely separate politics and religion, the rest of the world is unlikely to follow:

"Countless millions still pursue the age-old quest to bring the whole of human life under God’s authority, and they have their reasons."

If that's really true, we can expect that modern science will be a phenomenon largely confined to the West, with the rest of the world using science, pioneered elsewhere, to build more hi-tech weapons.

Perhaps though, the case is overstated in the NY Times piece - Japan and Korea have relatively secular politics, and a correspondingly strong scientific infrastructure. Several modernizing nations, such as India and China, are working hard to build their scientific reputations; to do so requires some commitment by their respective goverments to separate ideology from the political decision making process. The young Chinese and Indian graduate students, coming in droves to the US for a scientific education, will inevitably make life better in their non-Western home countries when they return.

The next step is to figure out how to get young Iraqis, Jordanians, Iranians, and Africans to come seeking a scientific education in the US.

Tuesday, August 14, 2007

Ancient Microbes Revived from Antarctic Ice May Be Spreading Their Genes

After being encased in Antarctic ice for 8 million years, ancient microbes thawed by a team of researchers revved up their metabolic engines again and began making proteins and replicating. These are the oldest organisms ever brought back to life after a deep freeze.

The research team, a group primarily from Rutgers, looked at the microbial population in some of the oldest ice known on earth, obtained from Antarctica’s Beacon Valley. Using microscopy, the researchers could see that these samples had a variety of bacteria encased inside. But microscopy can only tell you so much; to learn more, the research team turned to DNA sequencing.

The standard way of identifying what you have in a mixed population of bacteria is to sequence the 16S ribosomal DNA - a gene encoding an important component of the protein-synthesizing machinery. This gene is plays such an important functional role that it changes very slowly over evolutionary time, thus allowing scientists to easily compare DNA sequences among organisms that have diverged from each other for hundreds of millions of years. The 16S rDNA sequences from these ice samples revealed nearly a dozen different types of bacteria in the 8 million-year-old ice; that’s not much compared to a fresh, modern sample of seawater, but that's great for very old ice.

Some of these ancient bacteria were alive. When the researchers melted the ice (but keeping it still cold and dark - these are sensitive bacteria), they found that at least some of the bacteria were able start up their metabolism, which was measured using radioactive metabolites that the bacteria could ingest and incorporate into their protein or DNA.

16S rDNA can tell you what kinds of bacteria you have, but another intriguing question is what genes do these bacteria have? Are most of their genes similar to those of today’s known bacteria? After sequencing as much of the bacterial genomes as they could, the researchers found that a substantial 46% of the genome sequence did not match any known genes. This is not actually so surprising - in spite of all of the DNA sequence from thousands of organisms stored in GenBank, we know that we have sampled only a fraction of the different types of genomes on earth. The genomes of multicellular organisms are relatively similar to each other, but that bacterial world represents a vast, poorly explored genetic resource. We know most of the genes on our planet are in fact missing from our databases; we best understand the biology of that small subset of bacterial and archaebacterial genes that was present in the ancestors of all eukaryotic organisms.

While scientists may not know much about most bacterial genes, evolution is not blind to them. Bacteria are remarkably generous with their genes; they pass them on not only to their descendants, but to their neighbors as well. This phenomenon of lateral gene transfer, or LGT, makes the evolutionary analysis of bacteria fiendishly difficult. The authors of the ice microbes paper raise another fascinating (or depressing, if you study bacterial evolution) possibility: that ancient ice is a “gene popsicle,” facilitating gene transfer not only across species, but also across time. With the onset of an ice age, microbes, harboring a given set of genes, get preserved for thousands or millions of years, until the ice melts. That’s when these ancient bacteria return to the local ecosystem, where they can pass on their ancient genes via LGT to modern bacterial species. These modern species then, with luck, use these recently revived genes to better adapt to their environment. As the authors of the paper put it:
 “Our analysis suggests that melting of polar ice in the geological past may have provided a conduit for large-scale... LGT, potentially scrambling microbial phylogenies and accelerating the tempo of microbial evolution.”

This is a mind-boggling prediction, which will be difficult to test without a lot more bacterial genome sequencing. However, the idea again demonstrates the tremendous resources evolution has to work with. As the biologist Leslie Orgel reportedly once said, “Evolution is cleverer than you are.”

Wednesday, July 11, 2007

If you value alternative opinions...

Don't let Time-Warner and other big media outlets squash smaller, independent media voices that serve up more than just the bland, conformist news and opinion that has led to disaster in the US:

Stamp Out the Rate Hike: Stop the Post Office

Whether you're liberal or conservative, it affects you. Follow the link, sign the petition, and spread the word.

I promise to get back to science around here soon - personal things have taken priority lately.

Monday, July 09, 2007

Where is the leadership today?

Damn, times have sure changed. In 1944, FDR could get up before Congress and say this:

"We cannot be content, no matter how high that general standard of living may be, if some fraction of our people—whether it be one-third or one-fifth or one-tenth- is ill-fed, ill-clothed, ill housed, and insecure...

"We have come to a clear realization of the fact that true individual freedom cannot exist without economic security and independence..."

"In our day these economic truths have become accepted as self-evident. We have accepted, so to speak, a second Bill of Rights under which a new basis of security and prosperity can be established for all regardless of station, race, or creed.

Among these are:

The right to a useful and remunerative job in the industries or shops or farms or mines of the Nation;

The right to earn enough to provide adequate food and clothing and recreation;

The right of every farmer to raise and sell his products at a return which will give him and his family a decent living;

The right of every businessman, large and small, to trade in an atmosphere of freedom from unfair competition and domination by monopolies at home or abroad;

The right of every family to a decent home;

The right to adequate medical care and the opportunity to achieve and enjoy good health;

The right to adequate protection from the economic fears of old age, sickness, accident, and unemployment;

The right to a good education.

All of these rights spell security..."

"For unless there is security here at home there cannot be lasting peace in the world."

Today I would also add this: a thriving scientific research community can't be sustained in a country that does not make economic security, including inexpensive access to medical care and higher education, a reality for all of it's citizens. No middle class, no scientists.

More Confusion about Junk DNA and Regulatory Sequences

Back in June, John Greally, a biologist at Albert Einstein, wrote a frustrating Nature commentary on the ENCODE project in which he repeatedly and wrongly suggested that before ENOCODE, biologists were only paying attention to regulatory sequences:

"We usually think of the functional sequences in the genome solely in terms of genes, the sequences transcribed to messenger RNA to generate proteins."

"Now... the ENCODE Project Consortium shows... that the humble, unpretentious non-gene sequences have essential regulatory roles."

"...The researchers of the ENCODE consortium found that non-gene sequences have essential regulatory functions, and thus cannot be ignored."

Now, Greally spreads more confusion on NPR's Science Friday (hear it here at Sandwalk, where Larry Moran is as stunned as I am) by continuing to act as if we had no idea before ENCODE that regulatory sequences were so prevalent or important. In the interview, Greally even goes so far as to suggest (and allow the interviewer to suggest, without correction or clarification) that 95% of the genome consists of regulatory sequences. Nor does he correct a caller on the show who claimed that scientists were ridiculous to even suggest the idea of junk DNA.

The ENCODE project showed no such thing, and it wasn't the huge breakthrough in our understanding of non-coding DNA that Greally is hyping it to be. Non-coding regulatory sequences have been intensely studied, including large-scale experimental and computational surveys from yeast to humans. These sequences have not been ignored; many labs have put a lot of effort into identifying and understanding them.

Nor did the ENCODE project bury the idea of junk DNA. For example, 10% of the human genome consists of hundreds of thousands of copies of parasitic stretches of DNA called Alu elements. (A search on Google Scholar will turn up free copies of this paper.) Alu elements can, on occasion, generate beneficial and novel genomic diversity, but most copies of Alu elements are non-functional and unable to replicate - in other words, junk. In fact, Alu insertions can cause disease - as they hop around the genome, they occasionally break something.

There are many, many other examples of this kind of junk; it's not all poorly understood 'dark matter' of the human genome, as Greally suggests in the NPR interview. When we have finally identified all of the regulatory sequences, I predict that the total amount of functional regulatory sequence will still be much less than that the 45% of the genome comprised of the parasitic LINE and SINE transposable elements.

Saturday, July 07, 2007

Mammas, Don't Let Your Babies Grow Up to Be Postdocs

In most careers, when you do a good job, you get rewarded - a promotion, a bonus, a raise, whatever. These things are the incentives that make you strive to do your best.

In science, as a postdoc, you get penalized for your success.

I did something postdocs are supposed to do - get money by writing a research proposal that you submit to a funding agency. The process is competitive and time-consuming; when you are successful, this is a significant achievement, an essential step in your career.

But the actual payoff is really years down the road. In the immediate aftermath of successfully obtaining a funded postdoctoral fellowship, you lose out:

- Your salary very well may decrease. That's right - if you write a successful research proposal, your salary could go down. The going rate for an NIH fellowship was less than what I was earning as a university-paid fellow; it was only thanks to the timely help of my advisor I avoided a pay cut.

- Your health insurance premiums become fully taxable. Before I was funded by the NIH, my health insurance premiums were taken out before taxes - just as it's done for everyone else in the country with employer-sponsored health insurance. Once you are an NIH fellow however, your insurance premiums (including the hefty 'employer contribution') are not deducted before taxes - the entire cost of your health insurance premiums is considered taxable income. Thus my taxable income jumped up by several thousand dollars, but my take-home pay remained unchanged. (You can of course deduct the premiums at the end of the year, but this substantially complicates your tax filing.)

- You have to submit your own tax payments. As an NIH fellow, you are nobody's employee - not the university's, not the government's. But the IRS doesn't consider you self-employed either. You technically don't earn wages. You don't receive a W-2. You're not a student.

Thus you have to spend hours navigating the labyrinthine state and U.S. tax codes to figure out how to submit quarterly tax payments. I thought I had it all figured out - until I suddenly discovered that in my state, quarterly doesn't exactly mean quarterly. It turns out that sometimes you have to submit three months worth of taxes after just two months of pay! (April 15 to June 15). Of course, on my shamefully excessive NIH postdoc salary, it's no problem at all to save up extra taxes on income I haven't been paid yet. I suppose I shouldn't be eating anyway - there is lab work to do!

The list could go on... The sheer hours of bureaucratic combat involved in living off of NIH funding makes the whole process almost not worth it. Almost - there is the fact that your future academic career is not too bright if you don't manage to get some sort of fellowship.

Postdocs are in such a nebulous, in-between world - not students, not staff, not faculty, not employees of any kind. This basically means you're everyone's awkward step-child, only partially claimed by the NIH and the university, left to fend for yourself.

Friday, July 06, 2007

A Plug for Scientific Blogging

And I mean sepcifically, the site Scientific Blogging, not jut blogging in general.

Scientific Blogging features science news and blogs by working scientists (including yours truly). The bloggers cover every thing from math and physics to psychology, and range from lowly grad students and postdocs, like myself, to real science professors, as well as some science-trained bloggers working outside of academia.

On the whole, the site aims to be politically non-ideological. It's not a conservative site, as some seem to think; it's simply much less political than other major sites, like Seed's Science Blogs. Given the pervasive scientific illiteracy in our society, there is definitely a need for blogs that feature science writing that will be read by people from a range of political loyalties.

Not that I'm knocking Seed; I'm a regular reader of Pharyngula, and I need my daily dose of political red (uh... make that blue) meat to keep sane in our current political climate.

Scientific Blogging is still in public beta mode, but it already has some nice features, like a peer voting system (you vote for articles you like), and a place for anyone interested to set up their own blog. There is a set of hand-picked featured writers, but if you like writing science, and you want to write where someone will actually read it, you can set up your own space at Scientific Blogging. We're hoping to create an active community of people who love reading and writing about science, so go over and check it out.

Thursday, July 05, 2007

Junk DNA in the Opossum Genome

Vertebrate genomes are full of junk. Despite the occasional confusing magazine article, the spurious claims by creationists, or obfuscatory statements by some scientists, we know that our genomes are stuffed full of DNA sequence that serves no functional role for the organism. The vast bulk of this junk sequence consists of molecular parasites, called transposable elements, whose only 'function' is to replicate themselves. While our genomes obviously contain critical information required to build and maintain ourselves, they are also vast ecosystems of virus-like parasites that have colonized our DNA.

A recent paper in the journal Genome Research describes the DNA ecosystem of the opossum genome. "Ecosystem" is not an exaaggeration; more than 52% of the opossum genome is comprised of transposable elements, which can be classed into nearly 500 different families. Transposable elements are similar to viruses; they are, one way or another, able to replicate themselves within an organism's genome and get passed on to the next generation. These elements have variety of survival strategies; some elements get transcribed into RNA and then 'reverse transcribed' back into DNA and inserted somewhere in the genome, while other elements never go through an RNA stage. Some transposable elements encode proteins that enable them to spread through the genome more efficiently; other elements don't bother to code for any proteins and instead hijack the proteins produced by those elements that can code for them.

Why do transposable elements exist in our genomes? Because they can. If a DNA element in an organism's genome can get itself passed on into the next generation, whether that element is beneficial to the organism or not, then obviously it will remain in the genome of that species. Since these elements don't generally serve any functional role, there is no reason for natural selection to preserve them, and we thus see piles of defective copies of transposable elements scattered around our genomes. These elements no longer have the ability to spread through the genome and they serve no function - they are pure junk. While our cells do have systems that try to stop these elements from spreading, we, and most animals, have not evolved effective ways to get rid of the junk elements once they are there; these elements therefore hang around and bulk up our genomes with non-functional material. About 45% of the human genome consists of these elements; that fraction rises to 52% for the opossum (which has a genome slightly larger than ours).

Transposable elements are not completely useless. For one, biologists love them because they can be helpful for studying evolutionary history - one approach to teasing out relationships among various species is to reconstruct a rough history of transposable element activity in various genomes. We have also known for some time that these elements can occasionally be recruited for a functional role (such as telemoeres in flies, X-chromosome inactivation in mammals, and centromeres in various organisms).

The opossum paper offers even more tantalizing, although not wholly unprecedented, evidence of a larger role for transposable elements. The authors of this paper looked at transposable elements, common to both opossum and human, that were present in known or suspected regulatory regions of the genome. Transposable elements in these regions are obvious candidates for a functional role. And remarkably, the researchers found that a handful of transposable element families were highly abundant in these regulatory regions - in one case, 70% of all the individual elements of one family were found in regulatory regions of the genome. It is possible that this particular family of transposable elements somehow contains a useful 'regulatory module,' some sequence that has been recruited through evolution to control the expression some genes. If this is true, than this would be a case of transposable elements providing the raw genetic material to create new layers of regulation in the genome.

So while most of the self-perpetuating transposable element ecosystem is undoubtedly junk from the perspective of the organism hosting it, our genomes are occasionally able to scoop up some of the detritus and put it to good use.

Monday, June 25, 2007

Untangling the Logic of Gene Circuits

How does a cell process information? Unlike computers, with CPUs to carry out calculations, and animals, which have brains that process sensory information, cells have no centralized device for processing the many internal and external signals with which they are constantly bombarded. And yet they somehow manage just fine. The single-celled brewers's yeast, for example, can know what kind of food source is available, tell when it's hot or cold, and even find a mate.

One key way that cells sense and respond to their environment is via genetic circuits. Although biologists often use the word 'circuit' in a sense that is only loosely analogous to electrical circuits, recent research is putting our understanding of genetic circuits on a much more rigorous and quantitative footing. By studying very simple circuits, using computer models and simple experiments, we are starting to understand, in a still very limited way, why the cell is wired up the way it is.

Let's take an example of a simple of a wiring setup that occurs very commonly in gene regulation. Suppose that gene A turns on gene B. (Technically, gene A does not turn on anything - gene A directs the synthesis of protein A, which can then turn on gene B, but when we talk about genetic networks, this is taken for granted.) A also turns on another gene, C. Gene B turns on gene C as well, so you get a little system wired up like this:

Initially, this configuration, called a feed forward loop may not make much sense. If gene C is turned on by A, then why do you need B? The key to this whole setup is that C requires both A and B to be fully on. If gene C needs both A and B in order to be switched on, we now have a circuit that is resistant to noise.

To see how this works, let's view this from the perspective of a small bacterium, such as E. coli. An individual bacterium is constantly in search of food; it can only swim around so long before it runs out of energy. E. coli can use a variety of different food sources, but it needs to turn on the proper genes for each food. When the sugar arabinose is available, E. coli switches on the genes that enable it to import and metabolize arabinose. But turning on the whole suite of arabinose genes requires some effort; it's important that the bacterium not go through all that effort only to find out that there is no arabinose around after all.

Going back to our little circuit, let's suppose that A is sensitive to arabinose. When arabinose is around, A turns on B, and A and B turn on C; gene C makes an enzyme that can help metabolize arabinose. But A could get turned on by just a trace of arabinose; this kind of random noise would be disastrous if A was always switching on C at the slightest provocation. We only want C around when there is a seriously good arabinose source.

Enter the feed forward loop - it filters out the noise! It works like this:

Scenario 1 - random noise, or just a trace of arabinose:

1. A gets turned on briefly, and then shuts off.

2. B barely gets switched on by A, but not enough to affect C.

3. C never gets turned on.

Scenario 2 - sustained arabinose signal:

1. A gets turned on, reaches a maximal level and stays on for a period.

2. B gets switched on by A and hits its maximal level.

3. C gets turned on once A and B reach their maximal levels.

4. The bacterium metabolizes arabinose.

Such genetic circuits are extremely common in biology, although most often they occur in much more complex combinations than I've shown here. One current idea is that the more complex combinations are built up out of simple circuits like this Feed Forward Loop, and the hope is that we can use our understanding of these simple circuits to make sense of the information processing properties of the massively tangled networks that we find in all cells. This is still mainly just a hope though; although there are some increasingly sophisticated computer models of complex genetic networks, there is precious little experimental work demonstrating that we have actually learned something about these complex networks.

The experimental situation is different though for simple networks - several research groups have carried out some very nice experiments on simple systems. Uri Alon is one of the leaders in this field (and my figures are redrawn from his recent review of this field.) His group has performed experiments to test the effects of these simple genetic circuits, and other groups are doing similar studies.

So, while a useful, rigorous, experiment-based understanding of more complex networks is still just a hope, our understanding of small, functional circuits is enabling us to delve deeper into the information processing properties of the cell.

Wednesday, June 20, 2007

Our Genomes, ENCODE, and Intelligent Design

What has the ENOCODE project done, and how do their results change our understanding of the human genome? In the last post I put this project into perspective by briefly outlining some past concepts of the gene and highlighting some of the ENCODE findings. Now it's time to take a closer look at the results of the ENCODE project and their significance for our understanding of the human genome. ENCODE's genome snapshot is unquestionably fascinating, and it suggests that some features of genome regulation that were previously viewed as exceptions to the norm are really quite common. But are these results revolutionary? Do they overturn any long-cherished notions about genes that scientists have heavily relied on in their understanding of gene regulation, as some have suggested? And do they support intelligent design? I don't think so.

What ENCODE Did

In one sense, the ENCODE project can be thought of as the third big Human Genome Project - the first project being the actual genome sequencing, and the second being the HapMap Project to extensively study genome variation in different human populations. The ENCODE project is an effort to find and study, on an encyclopedic scale, all of the functional elements in the human genome.

For the first phase of this project, the ENCODE researchers examined a small but reasonably representative chunk of the human genome (roughly 1%, or 30 million DNA bases) by running that chunk through a battery of experimental tests and computational analyses. Most of the experimental techniques and results are unfortunately beyond the scope of this little summary. This first round of the ENOCDE project produced a big paper in Nature, and the journal Genome Research has devoted its entire June issue to papers from the ENCODE project. I'm going to winnow down this mass of material to two of the most interesting topics: transcription and evolution.

Transcription (if you don't know what transcription is, look here):

The researchers attempted to identify regions of DNA that were transcribed. Why? Because our presumption has generally been that most (note the qualifier!) transcripts contain some functional material, such as protein-coding genes or non-coding RNAs that have some functional role (such as miRNAs, snoRNAs, rRNAs, etc.). Therefore by looking for transcribed regions, we can find new functional portions of the genome.

The transcribed regions were identified using tiling arrays, which are DNA-chips, or microarrays, that cover the entire genome and thus can detect transcription from any place in the genome. This is in contrast to more traditional microarrays that only detect the transcription of known genes. Thus by using tiling arrays and a handful of other complementary techniques, the ENOCDE researchers found that a large fraction of the genome region in the study was transcribed, including many places that have no recognizable genes. They estimate that up to 93% of the genome is transcribed, although the evidence for much of this is indirect and other explanations of the experimental results are possible. The actual transcribed fraction may be substantially lower, although it is still likely to be large.

The most interesting finding of these transcription studies is that a lot of strange stuff is ending up in these RNA transcripts. We have long known that different protein-coding regions (exons) from a single gene can be spliced together in various combinations to create many different proteins. The ENCODE researchers confirmed this (the protein-coding genes they studied produce on average 5.4 differently spliced forms), but they also found that chunks of other sequence end up in the transcripts, such as coding and non-coding portions of neighboring genes. Why this is happening is not yet clear, although part of the explanation is surely that the transcription and splicing machinery are more noisy than we previously (and naively) appreciated.

Another major part of the ENOCODE project is to find out just where transcription starts. Transcription start sites (TSSs) are important, because key regulatory events take place there. Regulatory sequences in the DNA, together with regulatory proteins, act at TSSs to control the protein machinery that carries out transcription; this control is critical for deciding which genes in the cell are 'on' or 'off'.

The ENCODE researchers found many new TSSs, sometimes very far away from known genes. Interestingly, the TSSs far away from known genes had different characteristics from those close to known genes, suggesting two distinct functional roles. One possible role for these distant TSSs is to control the higher-order structure (i.e., chromatin structure) of big regions of the genome, and thus to some degree regulating entire sets of genes. This work lays a good foundation for studying these control systems.

Evolution

The ENCODE researchers searched for regions of the human genome that have changed little throughout mammalian evolutionary history; these are the regions that have been constrained by natural selection. They compared portions of the human genome with the genomes of 14 other mammalian species, and found that 5% of the genome is under evolutionary constraint, a result that agrees with earlier studies.

The immediate question then is, how much of the 5% consists of known functional elements? The ENCODE researchers reported the following breakdown:

Of the 5% of the genome that is evolutionarily constrained:
- 40% consists of protein-coding genes
- 20% consists of known, functional, non-coding elements
- 40% consists of sequence with no known function

The sequence with no known function is not too surprising. Functional DNA elements other than protein-coding genes are difficult to find, and in spite of many recent studies we know we're missing a lot. These results tell us roughly how much more functional, non-coding sequence we need to find, and where it is probably located.

The ENCODE researchers also looked at evolutionary conservation from another angle: how much of known, functional DNA falls into conserved regions? Protein-coding genes and their immediate flanking regions are generally well-conserved, while known, non-coding functional elements are less conserved. Again, this is nothing too surprising; non-coding elements tend to be very short and have what is called 'low information content', and they are more easily created and destroyed by by random mutations.

Many potentially functional elements, picked up in the experimental data analyzed by the ENOCODE groups, are not evolutionarily constrained - about 50%, when these elements are compared across all mammalian genomes in the study. This means that there are regions of the genome that are bound by regulatory proteins or that are transcribed, but which have not been constrained by natural selection.

Intelligently Designed Transcription?

I need to pause here and answer the obvious question here that those of you who aren't molecular biologists are probably asking: So does this mean that evolution can't explain much of the functional parts of the genome? Intelligent design advocates are already on the web, misreading the ENCODE work and claiming that it somehow supports the fuzzy claims of intelligent design. My advice: don't believe what you hear about this from people who only have the vaguest understanding of how ENCODE's experiments and analyses work (and that includes biochemist Michael Behe).

The ENCODE results do not cast doubt on evolution. Here are some of the reasons why:

1. Just because something is transcribed or bound by a regulatory protein does not mean that it is actually functional. The machinery of the cell does not literally read the DNA sequence like you and I do - it reads DNA chemically, based on thermodynamics. As I mentioned before, DNA regulatory elements are short, and thus are likely to occur just by chance in the genome. An 8-base element is expected to show up just by chance every 65,000 bases, and would occur randomly over 45,000 times in a 3 billion base pair genome. Nature does work with such small elements, but their random occurrence is hard to control. In a genome as large and complex as ours, we should expect that there is a significant amount of random, insignificant protein binding and transcription. Incidentally, such random biochemical events probably make it easier for currently non-functional events to be occasionally recruited for some novel function. We already know from earlier studies that this kind of thing does happen.

2. To say that something is truly functional requires a higher standard of evidence than the ENCODE research provides. The ENCODE researchers did a fine job detecting transcription and regulatory protein binding with state-of-the-art experimental and computational techniques, but confirming a functional role for these elements will require more experiments aimed at addressing that issue.

3. Some of the functional elements that don't appear to be conserved really are conserved. When you're comparing a small functional element in a stretch of DNA between say, humans and mice, it is often difficult to find the corresponding region in each species. The mice and humans may have the same functional element, but in slightly different places. Thus conserved elements can be missed. The ENOCODE researchers note this, and people like myself who study these small elements know from experience that this happens frequently.

4. Despite what you may read, there is still a lot of junk DNA. The ENOCDE project does not "sound the death-knell for junk DNA." Our genomes are filled with fossils of genetic parasites, inactive genes, and other low-complexity, very repetitive sequence, and it's extremely clear that most of this stuff no functional role. Much of this sequence may be transcribed, but remember that the ENCODE evidence for most of this transcription is indirect - their direct measurements only detected transcripts for ~14% of the regions they studied. Even if much of it is transcribed, this mainly suggests that it is not worth expending energy to actively repress this transcription, since there are so many other controls in place to deal with unwanted transcripts in the cell.

Enlightening but not revolutionary

Moving on from intelligent design, some people, around the web and in a few journals, are making the ENCODE results out to be more revolutionary than they really are. For example, writing in a Nature piece stuffed with exaggerated claims about what our "preconceptions" supposedly are (subscription required), John Greally states that "Now, on page 799 of this issue, the ENCODE Project Consortium shows through the analysis of 1% of the human genome that the humble, unpretentious non-gene sequences have essential regulatory roles," and "the researchers of the ENCODE consortium found that non-gene sequences have essential regulatory functions, and thus cannot be ignored."

Every biologist I know could have told you that "non-gene sequences have essential regulatory roles," years ago, before ENCODE. Larry Moran, over at Sandwalk says that he hasn't "had a 'protein-centric' view of a gene since I learned about tRNA and ribosomal RNA genes as an undergraduate in 1967." Where has Greally been all this time? I'm not sure why he is so surprised.

Also, as I mentioned above, not all (or maybe not even most) of the transcribed, intergenic sequences found by ENCODE are believed to have "essential regulatory roles." Non-coding DNA regulatory elements have been the subject of intense study by many groups for many years now. To claim that we have not paid enough attention to them is wrong. None of the types of transcripts discovered by ENOCDE are really novel; we've seen examples in earlier studies of they found. What is significant about the ENCODE results is the extent of this unusual transcription; what were once thought to be exceptions are now seen to be much more common.

I'm happy to see the ENCODE results; many of us will use their results in our own research, and projects like this certainly help to make the human genome much less of a black box. But they haven't shattered any paradigms that weren't already on their way out, or revolutionized the field of genomics.

Sunday, June 17, 2007

Time to Rethink the Gene?

After the tremendous discoveries in basic biology of the last 100 years, you might think that we would understand by now what a gene is. But the big news in genome biology this week is the publication of the results of the ENCODE project, a large scale experimental (as opposed to purely computational) survey of the human genome. The leaders of the ENCODE project suggest that we need to, yet again, rethink just what exactly a gene is.

I plan to cover this subject in two posts. Today I'll go over a very brief history of the gene and the basics of what the ENCODE project is doing. In a subsequent post, I'll dive into the ENCDE results, and tell you why I think the results are interesting, but not necessarily revolutionary.

A Brief History of the Gene

Mark Gerstein and his colleagues have written an interesting perspective piece on how the ENCODE results fit into our historical understanding of the gene. To put the ENOCODE results into perspective, here is a brief history (with some big gaps - go read Gerstein's paper, or check out Evelyn Fox Keller's book The Century of the Gene; and if you don't know the basics of the Central Dogma, check out my summary):

Something responsible for heritable traits: Beginning with Mendel (who did not use the word "gene"), geneticists at first thought of genes as something within an organism that makes fruit fly eyes red, or peas wrinkled, that is passed on to offspring. The key idea is that heritable traits were passed on as units of something, although of course no one knew what. Early in the 20th Century, some geneticists began to get the idea that genes were arrayed in a linear fashion, and thus were placed at various distances from each other.

Something that makes an enzyme: George Beadle and Edward Tatum, performing genetic studies on bread mold, worked out the idea that a gene somehow was responsible for making an enzyme. Their concept is sometimes referred to as the "one gene one enzyme" idea.

An open reading frame (ORFs): After the genetic code was worked out, a gene was recognized as a stretch of DNA that coded for protein, starting with the DNA bases ATG (which means 'start' in the genetic code) and ending with the sequence TAG, TAA or TGA (meaning, naturally, 'stop'). This concept is useful because you can look at a big chunk of DNA sequence and see where all of the protein coding regions are. Also included in this concept of a gene is the idea that DNA elements outside of the coding region regulate the transcription of the gene, especially the region immediately before the starting ATG.

Genes in pieces: a twist on the open reading frame idea, biologists discovered that protein-coding chunks of genes (called exons) were interspersed with long non-coding chunks, called introns. Before producing the final protein, exons have to get spliced together. In mammals, exons tend to be fairly short, while introns are extremely long, so a gene can be spread out over long stretches of DNA. An extra twist is that exons can get spliced together in a variety of different combinations, so that one gene, consisting of multiple exons, can produce many different proteins. In addition, we now know that the non-coding regulatory elements are dispersed much more widely than previously appreciated.

No protein needed: Not all genes code for proteins. MicroRNAs are genes which are transcribed and are flanked by regulatory elements just like ORFs, but they don't code for protein. They seem to be involved in regulating the transcription of other genes, and several hundred microRNA genes have been reliably confirmed in the human genome.

The ENCODE Project

A major goal of the ENCODE project is to identify all of the functional elements in the human genome. If one includes all of the known ORFs, regulatory elements, and microRNAs, they make up a few percent of the genome. The remaining DNA unquestionably includes a lot of junk, such as LINEs and SINEs and other DNA parasites that exist simply because they are able to perpetuate themselves. On rare occasions in evolutionary history, some of these parasites get recruited to perform a beneficial function. But most of the parasites are inactive, mere molecular fossils. Other molecular fossils include once-functional genes that have been irreparably scarred by mutation.

But we also know that there is more functional material there; for example about 5% of the genome shows evidence of being under natural selection, and this 5% covers more than just the functional elements we know about. So far, our best attempts to find functional elements have been based on computer searches to find DNA that has been conserved through evolution, and that resembles known functional elements. But the ENCODE research groups have now performed extensive experimental tests on 1% of the genome. 1% may not sound like a lot, but it is enough to give a good idea of what we're going to learn when results for more of the genome come out.

I'll go into more detail in my next post, but there are a few highlights that the ENCODE researchers have emphasized:

- Much more of the genome is transcribed than we previously knew about, although a lot of this may be unregulated, non-functional transcription. Many apparently functional transcripts are extremely long and transcripts of one gene frequently contain sequence that overlaps with another gene.

- Regulatory elements are frequently found both upstream and downstream of genes on the DNA strand; previously most (but not all) regulatory elements were thought to be upstream.

- There is more extensive gene splicing than we once thought - different exons are mixed up in previously unrecognized combinations.

- 5% of the genome is under the constraint of natural selection, and more than half of this consists of non-protein-coding elements.

What is the significance of all this? I'm not inclined to view it as revolutionary; it seems like much of this confirms many things we previously suspected about the genome, except perhaps that features we once thought were unusual are now known to be much more prevalent.

So this is the ENCODE project in context; tune in for the next post, in which I'll delve into the details some more and offer a much more opinionated outlook.

A Two-Minute Education in Molecular Biology

Most readers of science blogs already have at least some basic knowledge of molecular biology, but in my experience there are many people interested in science, including academics in non-science fields, lawyers, and older physicians, who aren't famliar with the basics. Such people might have a hard time figuring out where to start learning among all of the many technical terms and techniques.

If you learn the following five key terms, I promise you will be able to get at least the gist of most basic biomedical research. When I try to explain my research to people, it's easy if they know these five terms, and nearly impossible if they don't. These five key terms make up what's whimsically called the Central Dogma. You may have heard that there are all sorts of exceptions to the Central Dogma, and there are, but it still forms the core of our understanding of how instructions from our DNA get carried out in the cell. This is the part of molecular biology that absolutely everyone should know - it's as fundamental to biology the idea that matter is made of atoms (which in turn are made up of nuclei of neutrons and protons, surrounded by electrons) is to physics.

DNA - this is of course where the information to build the cell is stored. DNA consists of two winding chains of small chemical units. The most important part of these units is the portion called the base. Four types of base occur in DNA, abbreviated A, T, C, G. In the two winding chains of DNA, bases are always aligned opposite each other in specific pairs: A is always opposite T, and G is always opposite C:

Thus, if you know the sequence of bases of one chain, then you automatically know the sequence of the opposite chain. DNA sequencing is, obviously, the effort to determine the exact sequence of bases in a stretch of DNA. The sequences of these letters code for proteins, as well as many other important elements.

Our DNA is found in 23 pairs of chromosomes, where it is packaged up with lots and lots of protein.

RNA - Just like DNA, except different. RNA generally comes as one chain instead of two, and contains the base 'U' instead of the 'T' in DNA. RNA has many functions, but for our purposes here, RNA reproduces the information-containing DNA sequence in a form that can be carried to the protein-producing machinery of the cell. RNA that does this is called 'messenger RNA', or mRNA.

Transcription is the process by which RNA is produced from the DNA template. In this process the two chains of DNA are unwound, and one strand of DNA serves as a template for synthesizing a brand new strand of RNA, following the base pairing rules I mentioned above - G matches up with C, and A with T - except that in RNA chains, U gets substituted for T. This new strand of RNA can then move away from the DNA to some other part of the cell, where the information contained in the sequence of bases can be used to carry out various functions.

Many of the interesting new discoveries in basic biology involve transcription so it is important to be famliar with this term. When research reports talk about a region of DNA being 'transcribed', it means that RNA strands are made that match the sequence of that given DNA region. Some portions of our DNA are transcribed (and thus the information in that sequence of DNA can potentially be carried to other parts of the cell), while other portions are never transcribed (although that doesn't mean these non-transcribed regions are worthless - many sequences important for regulating transcription are found here.)

Proteins are the primary workhorses of the cell. Like RNA and DNA, proteins are chains consisting of small chemical subunits. In the case of proteins, those small subunits are amino acids. Amino acids, and thus proteins are much, much more chemically diverse than RNA or DNA, which is why proteins do much of the actual work in the cell - the enzymes that metabolize nutrients, and the receptors that sense hormones on the outside of the cell are proteins - as are your hair and finger nails.

Translation is the process by which the information encoded in the sequence of an RNA strand is used to produce a chain of amino acids to make a protein. The reasoning behind the terminology is this: whereas an RNA strand is transcribed from DNA in the same 'language' of bases, proteins are made by translating the language of bases into the language of amino acids. The bases-amino acid dictionary is called the genetic code. A group of three bases codes for one amino acid (below, amino acids are represented by single letters):

In addition to these five key terms, there is one more that I think comes in handy:

Transcription factor is a protein which binds to DNA in a specific place and helps to initiate (or in some cases, prevent) the process of transcription. Transcription factors are critical in controlling many complex processes, such as development of an organism from a single-celled zygote, and the process of cell division.

If you are familiar with these few terms, I guarantee that the stories on biology research you read about on blogs, magazines, and newspapers will be much more clear. If you don't understand these terms, there is no way you can understand the discoveries that are reported in the media.

Saturday, June 16, 2007

Al Gore's Plea for Reason

This blog is not meant to be a political blog, although it inevitably becomes one when those who reject evolution fall predominantly into one US political party. For the most part though, I prefer to get readers from any part of the political spectrum excited about fascinating developments in biology.

But in this post I'm going to favorably review a book that some readers will have (mistakenly) dismissed as just a partisan rant, since it comes from a former and possibly future Democratic presidential candidate, the man who is second only to Hillary Clinton as the subject of passionate vilification by the right-wing media. I'm reviewing The Assault on Reason here because Gore touches some of the biggest points of intersection between science and politics, and because the vision of democracy he articulates is essential to a thriving scientific enterprise. When entertainment or marketing campaigns stand in for serious political debate, liberal and conservative citizens alike become detached from the political process and develop at best a feeling of apathy, and at worst, one of cynicism, towards rational thought. Science itself can easily get swept up in this wave of cynicism, as we often see in the intelligent design debate where scientists and creationists are seen by many people as simply partisans slugging it out over an issue that has no relevance or rational solution. In The Assault on Reason, Gore goes deeper than just criticizing the policies of the current US administration; he tackles problems with the basic processes of our democracy that all of us should care about.

Before getting to the substance of the book, I'd like to deal with a point that some might use as a spurious excuse to dismiss the book: this book is not a great literary achievement. Gore often rambles, leaves some arguments incomplete, has some excessively repetitive passages that should have been edited, resorts sometimes to clichés, makes overly-broad statements about narrower points, and many chapters have a blurred focus. In short, this book could have used some tightening up. But none of this matters because Gore's essential vision is clear and consistent throughout. And, in spite of the weaknesses in the book's structure, Gore writes in clear language, unlike most politicians who, even with the help of ghost writers, often make you wonder how they ever made it through college.

The organizing theme of The Assault on Reason, which frames almost all of Gore's major points, is that the predominance of television as a news source, coupled with political marketing campaigns designed to "manufacture consent" have suffocated debate in our democracy, and thus have allowed unscrupulous individuals and coalitions to use wealth to promote their own interests at the expense of both the public good and the rule of law. Prime among these individuals are of course George Bush and Dick Cheney, along with their major financial and political backers. Gore's book is not 'balanced', at least as the term is generally defined today, in which (as Thomas Pynchon put it) "every 'truth' is immediately neutered by an equal and opposite one." Gore is clear in his indictment of the Bush administration:

"The essential cruelty of Bush's game is that he takes an astonishingly selfish and greedy collection of economic and political proposals and then cloaks them with a phony moral authority, thus misleading many Americans who have a deep and genuine desire to do good in the world. And in the process he convinces these Americans to lend unquestioning support for proposals that actually hurt their families and their communities." (p. 82)

Gore labels one faction of Bush's supporters as 'economic royalists,' "those who are primarily interested in eliminating as much of their own taxation as possible and removing all inconvenient regulatory obstacles." These economic royalists believe that “laws and regulations [to protect the public] are also bad - except when they can be used on behalf of this group [the economic royalists], which turns out to be often." The latest of many examples of this approach to government is Bush's (now withdrawn) nominee for head of the Consumer Product Safety Commission, lobbyist Michael Baroody - who was conveniently promised a $150,000 departing gift from the industry he would have been charged with regulating had he been confirmed. Unfortunately many more such nominees have been confirmed, and they are now in positions where they can abuse their regulatory authority to provide government favors to their former, and inevitably future corporate employers at the expense of the law and the public good.

Gore's intention is not to merely argue policy issues, although he does offer plenty of blunt criticism on the handling of the Iraq war, national security, civil liberties, and climate change. His claim is that is the Bush administration and its allies have, in their relentless pursuit of power, repeatedly crossed lines that, for the health of our democracy, should not be crossed. From the abuse of signing statements that undermine the Legislature's constitutional check on the Executive, to the excessive secrecy on even trivial or obsolete matters, the dismissal of the right to habeas corpus, and the partisan screening for career positions at the justice department, the integrity of our government is being chipped away by people whose concern for our governing institutions falls well below their concern for their political party or their economic allies.

The effect of all this is to weaken citizens' interest from their own government. Instead of acting as the ultimate check on abuses in government, the citizens are left believing that our government can only be influenced by those with money, and that rational debate is pointless or not possible. On all the major issues Gore discusses, the unwillingness of voters to punish dishonesty and incompetence is traced back to the combination of the obsessive secrecy prevailing in the Executive Branch and the manufacturing of public consent through TV marketing. Gore gives a disturbing example of this phenomenon of consent manufacturing from his own campaign for the Senate, when his campaign advisers were able to successfully predict an 8.5% bump in the polls based on a carefully crafted TV ad campaign played in just the right markets.

This is the constant message of the book: the mix of secrecy, money, and most of all, the manufacturing of political consent through television, have led to a complete lack of any effective challenge to the destructive actions carried out by the Bush administration and its allies in their single-minded pursuit to propagate not just an extreme political ideology, but their own power and wealth. Gore’s key insight is that the problem is not simply due to the unscrupulousness of Bush and his cronies - such people have existed as long as human society, and were recognized by America’s Founders, who sought to create a system that would limit the damage these types of self-serving people could do. It is the substitution of democratic debate with the propaganda of professional marketing and television that has disengaged citizens from the political process and enabled the damage that has been done under the current leadership.

The apathy or fatalism of even socially conscious, educated citizens has had a severe effect on my generation. At least as I've experienced it, the few people my age who are really politically engaged tend to hold a true-believer, Monica Goodling-style outlook (in which no aspect of our government should be immune from partisanship), while the rest of us take a South Park, “all politicians suck” view, devoid of any hope of seriously influencing the political process. As Gore writes:

“If the information and opinions made available in the marketplace of ideas come only from those with enough money to pay a steep price of admission, then all of those citizens whose opinions cannot be expressed in a meaningful way are in danger of learning they they are powerless as citizens and have no influence over the course of events in our democracy - and that their only appropriate posture is detachment, frustration, or anger.” (p. 250)

Along with this detachment from government comes a cynicism about reason and debate:

“When ideology is so often woven into the “facts” that are delivered in fully formed and self-contained packages, people naturally begin to develop some cynicism about what they are being told. When people are subjected to ubiquitous and unrelenting mass advertising, reason and logic often begin to seem like they are no more than handmaidens for the sophisticated sales force. And now that these same techniques dominate the political messages sent by candidates to voters, the integrity of our democracy has been placed under the same cloud of suspicion.” (p. 251)

It's easy to guess what Gore's proposed solution is before you get to the end of the book: the internet. The internet is the only media technology out there with the potential to compete with television for our attention, and it's advantage is that users don't have to just be passive absorbers of an extremely expensive message. The price of admission to the internet is low, and in some ways the internet resembles the raucous print culture of the 18th and 19th centuries. Gore's book is less about the assault on reason than it is about the assault on the reasoning process, and it is this process that the internet has the potential to renew. You may hate what I've written here, but you have the opportunity to reply in the comments (hopefully with more than just "you're an ass") where thousands (OK, on this site, just dozens) of people will read it. Just maybe, and there have been hints in the last few election cycles that this might actually work, people without access to big-media air time will be able to create a critical mass of public opinion on an issue that will make our representatives seriously worry the effect of inaction on their job security.

Gore's book should read by those who consider themselves principled conservatives, although the irrationally excessive hatred of the man that I've observed my conservative friends will probably keep many away. That's unfortunate, because he offers what should be common ground for people of both parties who believe our fundamental system of government should be preserved. Principled Republicans should recognize that the traditions and institutions their leaders are trashing have protected their party in other eras when the Democrats have been in power. We can argue about gun control, abortion, stem cells, military funding and whatever else, but if we can't agree that the First Amendment, the rule of law, and vigorous checks and balances among the three branches of the federal government are worth protecting, then we can't function any more as a coherent nation.