Monday, June 25, 2007

Untangling the Logic of Gene Circuits

How does a cell process information? Unlike computers, with CPUs to carry out calculations, and animals, which have brains that process sensory information, cells have no centralized device for processing the many internal and external signals with which they are constantly bombarded. And yet they somehow manage just fine. The single-celled brewers's yeast, for example, can know what kind of food source is available, tell when it's hot or cold, and even find a mate.

One key way that cells sense and respond to their environment is via genetic circuits. Although biologists often use the word 'circuit' in a sense that is only loosely analogous to electrical circuits, recent research is putting our understanding of genetic circuits on a much more rigorous and quantitative footing. By studying very simple circuits, using computer models and simple experiments, we are starting to understand, in a still very limited way, why the cell is wired up the way it is.

Let's take an example of a simple of a wiring setup that occurs very commonly in gene regulation. Suppose that gene A turns on gene B. (Technically, gene A does not turn on anything - gene A directs the synthesis of protein A, which can then turn on gene B, but when we talk about genetic networks, this is taken for granted.) A also turns on another gene, C. Gene B turns on gene C as well, so you get a little system wired up like this:



Initially, this configuration, called a feed forward loop may not make much sense. If gene C is turned on by A, then why do you need B? The key to this whole setup is that C requires both A and B to be fully on. If gene C needs both A and B in order to be switched on, we now have a circuit that is resistant to noise.


To see how this works, let's view this from the perspective of a small bacterium, such as E. coli. An individual bacterium is constantly in search of food; it can only swim around so long before it runs out of energy. E. coli can use a variety of different food sources, but it needs to turn on the proper genes for each food. When the sugar arabinose is available, E. coli switches on the genes that enable it to import and metabolize arabinose. But turning on the whole suite of arabinose genes requires some effort; it's important that the bacterium not go through all that effort only to find out that there is no arabinose around after all.

Going back to our little circuit, let's suppose that A is sensitive to arabinose. When arabinose is around, A turns on B, and A and B turn on C; gene C makes an enzyme that can help metabolize arabinose. But A could get turned on by just a trace of arabinose; this kind of random noise would be disastrous if A was always switching on C at the slightest provocation. We only want C around when there is a seriously good arabinose source.

Enter the feed forward loop - it filters out the noise! It works like this:

Scenario 1 - random noise, or just a trace of arabinose:

1. A gets turned on briefly, and then shuts off.

2. B barely gets switched on by A, but not enough to affect C.

3. C never gets turned on.



Scenario 2 - sustained arabinose signal:

1. A gets turned on, reaches a maximal level and stays on for a period.

2. B gets switched on by A and hits its maximal level.

3. C gets turned on once A and B reach their maximal levels.

4. The bacterium metabolizes arabinose.



Such genetic circuits are extremely common in biology, although most often they occur in much more complex combinations than I've shown here. One current idea is that the more complex combinations are built up out of simple circuits like this Feed Forward Loop, and the hope is that we can use our understanding of these simple circuits to make sense of the information processing properties of the massively tangled networks that we find in all cells. This is still mainly just a hope though; although there are some increasingly sophisticated computer models of complex genetic networks, there is precious little experimental work demonstrating that we have actually learned something about these complex networks.

The experimental situation is different though for simple networks - several research groups have carried out some very nice experiments on simple systems. Uri Alon is one of the leaders in this field (and my figures are redrawn from his recent review of this field.) His group has performed experiments to test the effects of these simple genetic circuits, and other groups are doing similar studies.

So, while a useful, rigorous, experiment-based understanding of more complex networks is still just a hope, our understanding of small, functional circuits is enabling us to delve deeper into the information processing properties of the cell.

Wednesday, June 20, 2007

Our Genomes, ENCODE, and Intelligent Design

What has the ENOCODE project done, and how do their results change our understanding of the human genome? In the last post I put this project into perspective by briefly outlining some past concepts of the gene and highlighting some of the ENCODE findings. Now it's time to take a closer look at the results of the ENCODE project and their significance for our understanding of the human genome. ENCODE's genome snapshot is unquestionably fascinating, and it suggests that some features of genome regulation that were previously viewed as exceptions to the norm are really quite common. But are these results revolutionary? Do they overturn any long-cherished notions about genes that scientists have heavily relied on in their understanding of gene regulation, as some have suggested? And do they support intelligent design? I don't think so.

What ENCODE Did

In one sense, the ENCODE project can be thought of as the third big Human Genome Project - the first project being the actual genome sequencing, and the second being the HapMap Project to extensively study genome variation in different human populations. The ENCODE project is an effort to find and study, on an encyclopedic scale, all of the functional elements in the human genome.

For the first phase of this project, the ENCODE researchers examined a small but reasonably representative chunk of the human genome (roughly 1%, or 30 million DNA bases) by running that chunk through a battery of experimental tests and computational analyses. Most of the experimental techniques and results are unfortunately beyond the scope of this little summary. This first round of the ENOCDE project produced a big paper in Nature, and the journal Genome Research has devoted its entire June issue to papers from the ENCODE project. I'm going to winnow down this mass of material to two of the most interesting topics: transcription and evolution.

Transcription (if you don't know what transcription is, look here):

The researchers attempted to identify regions of DNA that were transcribed. Why? Because our presumption has generally been that most (note the qualifier!) transcripts contain some functional material, such as protein-coding genes or non-coding RNAs that have some functional role (such as miRNAs, snoRNAs, rRNAs, etc.). Therefore by looking for transcribed regions, we can find new functional portions of the genome.

The transcribed regions were identified using tiling arrays, which are DNA-chips, or microarrays, that cover the entire genome and thus can detect transcription from any place in the genome. This is in contrast to more traditional microarrays that only detect the transcription of known genes. Thus by using tiling arrays and a handful of other complementary techniques, the ENOCDE researchers found that a large fraction of the genome region in the study was transcribed, including many places that have no recognizable genes. They estimate that up to 93% of the genome is transcribed, although the evidence for much of this is indirect and other explanations of the experimental results are possible. The actual transcribed fraction may be substantially lower, although it is still likely to be large.

The most interesting finding of these transcription studies is that a lot of strange stuff is ending up in these RNA transcripts. We have long known that different protein-coding regions (exons) from a single gene can be spliced together in various combinations to create many different proteins. The ENCODE researchers confirmed this (the protein-coding genes they studied produce on average 5.4 differently spliced forms), but they also found that chunks of other sequence end up in the transcripts, such as coding and non-coding portions of neighboring genes. Why this is happening is not yet clear, although part of the explanation is surely that the transcription and splicing machinery are more noisy than we previously (and naively) appreciated.

Another major part of the ENOCODE project is to find out just where transcription starts. Transcription start sites (TSSs) are important, because key regulatory events take place there. Regulatory sequences in the DNA, together with regulatory proteins, act at TSSs to control the protein machinery that carries out transcription; this control is critical for deciding which genes in the cell are 'on' or 'off'.

The ENCODE researchers found many new TSSs, sometimes very far away from known genes. Interestingly, the TSSs far away from known genes had different characteristics from those close to known genes, suggesting two distinct functional roles. One possible role for these distant TSSs is to control the higher-order structure (i.e., chromatin structure) of big regions of the genome, and thus to some degree regulating entire sets of genes. This work lays a good foundation for studying these control systems.

Evolution

The ENCODE researchers searched for regions of the human genome that have changed little throughout mammalian evolutionary history; these are the regions that have been constrained by natural selection. They compared portions of the human genome with the genomes of 14 other mammalian species, and found that 5% of the genome is under evolutionary constraint, a result that agrees with earlier studies.

The immediate question then is, how much of the 5% consists of known functional elements? The ENCODE researchers reported the following breakdown:

Of the 5% of the genome that is evolutionarily constrained:
- 40% consists of protein-coding genes
- 20% consists of known, functional, non-coding elements
- 40% consists of sequence with no known function

The sequence with no known function is not too surprising. Functional DNA elements other than protein-coding genes are difficult to find, and in spite of many recent studies we know we're missing a lot. These results tell us roughly how much more functional, non-coding sequence we need to find, and where it is probably located.

The ENCODE researchers also looked at evolutionary conservation from another angle: how much of known, functional DNA falls into conserved regions? Protein-coding genes and their immediate flanking regions are generally well-conserved, while known, non-coding functional elements are less conserved. Again, this is nothing too surprising; non-coding elements tend to be very short and have what is called 'low information content', and they are more easily created and destroyed by by random mutations.

Many potentially functional elements, picked up in the experimental data analyzed by the ENOCODE groups, are not evolutionarily constrained - about 50%, when these elements are compared across all mammalian genomes in the study. This means that there are regions of the genome that are bound by regulatory proteins or that are transcribed, but which have not been constrained by natural selection.

Intelligently Designed Transcription?

I need to pause here and answer the obvious question here that those of you who aren't molecular biologists are probably asking: So does this mean that evolution can't explain much of the functional parts of the genome? Intelligent design advocates are already on the web, misreading the ENCODE work and claiming that it somehow supports the fuzzy claims of intelligent design. My advice: don't believe what you hear about this from people who only have the vaguest understanding of how ENCODE's experiments and analyses work (and that includes biochemist Michael Behe).

The ENCODE results do not cast doubt on evolution. Here are some of the reasons why:

1. Just because something is transcribed or bound by a regulatory protein does not mean that it is actually functional. The machinery of the cell does not literally read the DNA sequence like you and I do - it reads DNA chemically, based on thermodynamics. As I mentioned before, DNA regulatory elements are short, and thus are likely to occur just by chance in the genome. An 8-base element is expected to show up just by chance every 65,000 bases, and would occur randomly over 45,000 times in a 3 billion base pair genome. Nature does work with such small elements, but their random occurrence is hard to control. In a genome as large and complex as ours, we should expect that there is a significant amount of random, insignificant protein binding and transcription. Incidentally, such random biochemical events probably make it easier for currently non-functional events to be occasionally recruited for some novel function. We already know from earlier studies that this kind of thing does happen.

2. To say that something is truly functional requires a higher standard of evidence than the ENCODE research provides. The ENCODE researchers did a fine job detecting transcription and regulatory protein binding with state-of-the-art experimental and computational techniques, but confirming a functional role for these elements will require more experiments aimed at addressing that issue.

3. Some of the functional elements that don't appear to be conserved really are conserved. When you're comparing a small functional element in a stretch of DNA between say, humans and mice, it is often difficult to find the corresponding region in each species. The mice and humans may have the same functional element, but in slightly different places. Thus conserved elements can be missed. The ENOCODE researchers note this, and people like myself who study these small elements know from experience that this happens frequently.

4. Despite what you may read, there is still a lot of junk DNA. The ENOCDE project does not "sound the death-knell for junk DNA." Our genomes are filled with fossils of genetic parasites, inactive genes, and other low-complexity, very repetitive sequence, and it's extremely clear that most of this stuff no functional role. Much of this sequence may be transcribed, but remember that the ENCODE evidence for most of this transcription is indirect - their direct measurements only detected transcripts for ~14% of the regions they studied. Even if much of it is transcribed, this mainly suggests that it is not worth expending energy to actively repress this transcription, since there are so many other controls in place to deal with unwanted transcripts in the cell.


Enlightening but not revolutionary

Moving on from intelligent design, some people, around the web and in a few journals, are making the ENCODE results out to be more revolutionary than they really are. For example, writing in a Nature piece stuffed with exaggerated claims about what our "preconceptions" supposedly are (subscription required), John Greally states that "Now, on page 799 of this issue, the ENCODE Project Consortium shows through the analysis of 1% of the human genome that the humble, unpretentious non-gene sequences have essential regulatory roles," and "the researchers of the ENCODE consortium found that non-gene sequences have essential regulatory functions, and thus cannot be ignored."

Every biologist I know could have told you that "non-gene sequences have essential regulatory roles," years ago, before ENCODE. Larry Moran, over at Sandwalk says that he hasn't "had a 'protein-centric' view of a gene since I learned about tRNA and ribosomal RNA genes as an undergraduate in 1967." Where has Greally been all this time? I'm not sure why he is so surprised.

Also, as I mentioned above, not all (or maybe not even most) of the transcribed, intergenic sequences found by ENCODE are believed to have "essential regulatory roles." Non-coding DNA regulatory elements have been the subject of intense study by many groups for many years now. To claim that we have not paid enough attention to them is wrong. None of the types of transcripts discovered by ENOCDE are really novel; we've seen examples in earlier studies of they found. What is significant about the ENCODE results is the extent of this unusual transcription; what were once thought to be exceptions are now seen to be much more common.

I'm happy to see the ENCODE results; many of us will use their results in our own research, and projects like this certainly help to make the human genome much less of a black box. But they haven't shattered any paradigms that weren't already on their way out, or revolutionized the field of genomics.

Sunday, June 17, 2007

Time to Rethink the Gene?

After the tremendous discoveries in basic biology of the last 100 years, you might think that we would understand by now what a gene is. But the big news in genome biology this week is the publication of the results of the ENCODE project, a large scale experimental (as opposed to purely computational) survey of the human genome. The leaders of the ENCODE project suggest that we need to, yet again, rethink just what exactly a gene is.

I plan to cover this subject in two posts. Today I'll go over a very brief history of the gene and the basics of what the ENCODE project is doing. In a subsequent post, I'll dive into the ENCDE results, and tell you why I think the results are interesting, but not necessarily revolutionary.

A Brief History of the Gene

Mark Gerstein and his colleagues have written an interesting perspective piece on how the ENCODE results fit into our historical understanding of the gene. To put the ENOCODE results into perspective, here is a brief history (with some big gaps - go read Gerstein's paper, or check out Evelyn Fox Keller's book The Century of the Gene; and if you don't know the basics of the Central Dogma, check out my summary):

Something responsible for heritable traits: Beginning with Mendel (who did not use the word "gene"), geneticists at first thought of genes as something within an organism that makes fruit fly eyes red, or peas wrinkled, that is passed on to offspring. The key idea is that heritable traits were passed on as units of something, although of course no one knew what. Early in the 20th Century, some geneticists began to get the idea that genes were arrayed in a linear fashion, and thus were placed at various distances from each other.

Something that makes an enzyme: George Beadle and Edward Tatum, performing genetic studies on bread mold, worked out the idea that a gene somehow was responsible for making an enzyme. Their concept is sometimes referred to as the "one gene one enzyme" idea.

An open reading frame (ORFs): After the genetic code was worked out, a gene was recognized as a stretch of DNA that coded for protein, starting with the DNA bases ATG (which means 'start' in the genetic code) and ending with the sequence TAG, TAA or TGA (meaning, naturally, 'stop'). This concept is useful because you can look at a big chunk of DNA sequence and see where all of the protein coding regions are. Also included in this concept of a gene is the idea that DNA elements outside of the coding region regulate the transcription of the gene, especially the region immediately before the starting ATG.

Genes in pieces: a twist on the open reading frame idea, biologists discovered that protein-coding chunks of genes (called exons) were interspersed with long non-coding chunks, called introns. Before producing the final protein, exons have to get spliced together. In mammals, exons tend to be fairly short, while introns are extremely long, so a gene can be spread out over long stretches of DNA. An extra twist is that exons can get spliced together in a variety of different combinations, so that one gene, consisting of multiple exons, can produce many different proteins. In addition, we now know that the non-coding regulatory elements are dispersed much more widely than previously appreciated.

No protein needed: Not all genes code for proteins. MicroRNAs are genes which are transcribed and are flanked by regulatory elements just like ORFs, but they don't code for protein. They seem to be involved in regulating the transcription of other genes, and several hundred microRNA genes have been reliably confirmed in the human genome.

The ENCODE Project

A major goal of the ENCODE project is to identify all of the functional elements in the human genome. If one includes all of the known ORFs, regulatory elements, and microRNAs, they make up a few percent of the genome. The remaining DNA unquestionably includes a lot of junk, such as LINEs and SINEs and other DNA parasites that exist simply because they are able to perpetuate themselves. On rare occasions in evolutionary history, some of these parasites get recruited to perform a beneficial function. But most of the parasites are inactive, mere molecular fossils. Other molecular fossils include once-functional genes that have been irreparably scarred by mutation.

But we also know that there is more functional material there; for example about 5% of the genome shows evidence of being under natural selection, and this 5% covers more than just the functional elements we know about. So far, our best attempts to find functional elements have been based on computer searches to find DNA that has been conserved through evolution, and that resembles known functional elements. But the ENCODE research groups have now performed extensive experimental tests on 1% of the genome. 1% may not sound like a lot, but it is enough to give a good idea of what we're going to learn when results for more of the genome come out.

I'll go into more detail in my next post, but there are a few highlights that the ENCODE researchers have emphasized:

- Much more of the genome is transcribed than we previously knew about, although a lot of this may be unregulated, non-functional transcription. Many apparently functional transcripts are extremely long and transcripts of one gene frequently contain sequence that overlaps with another gene.

- Regulatory elements are frequently found both upstream and downstream of genes on the DNA strand; previously most (but not all) regulatory elements were thought to be upstream.

- There is more extensive gene splicing than we once thought - different exons are mixed up in previously unrecognized combinations.

- 5% of the genome is under the constraint of natural selection, and more than half of this consists of non-protein-coding elements.

What is the significance of all this? I'm not inclined to view it as revolutionary; it seems like much of this confirms many things we previously suspected about the genome, except perhaps that features we once thought were unusual are now known to be much more prevalent.

So this is the ENCODE project in context; tune in for the next post, in which I'll delve into the details some more and offer a much more opinionated outlook.

A Two-Minute Education in Molecular Biology

Most readers of science blogs already have at least some basic knowledge of molecular biology, but in my experience there are many people interested in science, including academics in non-science fields, lawyers, and older physicians, who aren't famliar with the basics. Such people might have a hard time figuring out where to start learning among all of the many technical terms and techniques.

If you learn the following five key terms, I promise you will be able to get at least the gist of most basic biomedical research. When I try to explain my research to people, it's easy if they know these five terms, and nearly impossible if they don't. These five key terms make up what's whimsically called the Central Dogma. You may have heard that there are all sorts of exceptions to the Central Dogma, and there are, but it still forms the core of our understanding of how instructions from our DNA get carried out in the cell. This is the part of molecular biology that absolutely everyone should know - it's as fundamental to biology the idea that matter is made of atoms (which in turn are made up of nuclei of neutrons and protons, surrounded by electrons) is to physics.

DNA - this is of course where the information to build the cell is stored. DNA consists of two winding chains of small chemical units. The most important part of these units is the portion called the base. Four types of base occur in DNA, abbreviated A, T, C, G. In the two winding chains of DNA, bases are always aligned opposite each other in specific pairs: A is always opposite T, and G is always opposite C:



Thus, if you know the sequence of bases of one chain, then you automatically know the sequence of the opposite chain. DNA sequencing is, obviously, the effort to determine the exact sequence of bases in a stretch of DNA. The sequences of these letters code for proteins, as well as many other important elements.

Our DNA is found in 23 pairs of chromosomes, where it is packaged up with lots and lots of protein.

RNA - Just like DNA, except different. RNA generally comes as one chain instead of two, and contains the base 'U' instead of the 'T' in DNA. RNA has many functions, but for our purposes here, RNA reproduces the information-containing DNA sequence in a form that can be carried to the protein-producing machinery of the cell. RNA that does this is called 'messenger RNA', or mRNA.



Transcription is the process by which RNA is produced from the DNA template. In this process the two chains of DNA are unwound, and one strand of DNA serves as a template for synthesizing a brand new strand of RNA, following the base pairing rules I mentioned above - G matches up with C, and A with T - except that in RNA chains, U gets substituted for T. This new strand of RNA can then move away from the DNA to some other part of the cell, where the information contained in the sequence of bases can be used to carry out various functions.

Many of the interesting new discoveries in basic biology involve transcription so it is important to be famliar with this term. When research reports talk about a region of DNA being 'transcribed', it means that RNA strands are made that match the sequence of that given DNA region. Some portions of our DNA are transcribed (and thus the information in that sequence of DNA can potentially be carried to other parts of the cell), while other portions are never transcribed (although that doesn't mean these non-transcribed regions are worthless - many sequences important for regulating transcription are found here.)

Proteins are the primary workhorses of the cell. Like RNA and DNA, proteins are chains consisting of small chemical subunits. In the case of proteins, those small subunits are amino acids. Amino acids, and thus proteins are much, much more chemically diverse than RNA or DNA, which is why proteins do much of the actual work in the cell - the enzymes that metabolize nutrients, and the receptors that sense hormones on the outside of the cell are proteins - as are your hair and finger nails.

Translation is the process by which the information encoded in the sequence of an RNA strand is used to produce a chain of amino acids to make a protein. The reasoning behind the terminology is this: whereas an RNA strand is transcribed from DNA in the same 'language' of bases, proteins are made by translating the language of bases into the language of amino acids. The bases-amino acid dictionary is called the genetic code. A group of three bases codes for one amino acid (below, amino acids are represented by single letters):



In addition to these five key terms, there is one more that I think comes in handy:

Transcription factor is a protein which binds to DNA in a specific place and helps to initiate (or in some cases, prevent) the process of transcription. Transcription factors are critical in controlling many complex processes, such as development of an organism from a single-celled zygote, and the process of cell division.

If you are familiar with these few terms, I guarantee that the stories on biology research you read about on blogs, magazines, and newspapers will be much more clear. If you don't understand these terms, there is no way you can understand the discoveries that are reported in the media.

Saturday, June 16, 2007

Al Gore's Plea for Reason

This blog is not meant to be a political blog, although it inevitably becomes one when those who reject evolution fall predominantly into one US political party. For the most part though, I prefer to get readers from any part of the political spectrum excited about fascinating developments in biology.

But in this post I'm going to favorably review a book that some readers will have (mistakenly) dismissed as just a partisan rant, since it comes from a former and possibly future Democratic presidential candidate, the man who is second only to Hillary Clinton as the subject of passionate vilification by the right-wing media. I'm reviewing The Assault on Reason here because Gore touches some of the biggest points of intersection between science and politics, and because the vision of democracy he articulates is essential to a thriving scientific enterprise. When entertainment or marketing campaigns stand in for serious political debate, liberal and conservative citizens alike become detached from the political process and develop at best a feeling of apathy, and at worst, one of cynicism, towards rational thought. Science itself can easily get swept up in this wave of cynicism, as we often see in the intelligent design debate where scientists and creationists are seen by many people as simply partisans slugging it out over an issue that has no relevance or rational solution. In The Assault on Reason, Gore goes deeper than just criticizing the policies of the current US administration; he tackles problems with the basic processes of our democracy that all of us should care about.

Before getting to the substance of the book, I'd like to deal with a point that some might use as a spurious excuse to dismiss the book: this book is not a great literary achievement. Gore often rambles, leaves some arguments incomplete, has some excessively repetitive passages that should have been edited, resorts sometimes to clichés, makes overly-broad statements about narrower points, and many chapters have a blurred focus. In short, this book could have used some tightening up. But none of this matters because Gore's essential vision is clear and consistent throughout. And, in spite of the weaknesses in the book's structure, Gore writes in clear language, unlike most politicians who, even with the help of ghost writers, often make you wonder how they ever made it through college.

The organizing theme of The Assault on Reason, which frames almost all of Gore's major points, is that the predominance of television as a news source, coupled with political marketing campaigns designed to "manufacture consent" have suffocated debate in our democracy, and thus have allowed unscrupulous individuals and coalitions to use wealth to promote their own interests at the expense of both the public good and the rule of law. Prime among these individuals are of course George Bush and Dick Cheney, along with their major financial and political backers. Gore's book is not 'balanced', at least as the term is generally defined today, in which (as Thomas Pynchon put it) "every 'truth' is immediately neutered by an equal and opposite one." Gore is clear in his indictment of the Bush administration:

"The essential cruelty of Bush's game is that he takes an astonishingly selfish and greedy collection of economic and political proposals and then cloaks them with a phony moral authority, thus misleading many Americans who have a deep and genuine desire to do good in the world. And in the process he convinces these Americans to lend unquestioning support for proposals that actually hurt their families and their communities." (p. 82)

Gore labels one faction of Bush's supporters as 'economic royalists,' "those who are primarily interested in eliminating as much of their own taxation as possible and removing all inconvenient regulatory obstacles." These economic royalists believe that “laws and regulations [to protect the public] are also bad - except when they can be used on behalf of this group [the economic royalists], which turns out to be often." The latest of many examples of this approach to government is Bush's (now withdrawn) nominee for head of the Consumer Product Safety Commission, lobbyist Michael Baroody - who was conveniently promised a $150,000 departing gift from the industry he would have been charged with regulating had he been confirmed. Unfortunately many more such nominees have been confirmed, and they are now in positions where they can abuse their regulatory authority to provide government favors to their former, and inevitably future corporate employers at the expense of the law and the public good.

Gore's intention is not to merely argue policy issues, although he does offer plenty of blunt criticism on the handling of the Iraq war, national security, civil liberties, and climate change. His claim is that is the Bush administration and its allies have, in their relentless pursuit of power, repeatedly crossed lines that, for the health of our democracy, should not be crossed. From the abuse of signing statements that undermine the Legislature's constitutional check on the Executive, to the excessive secrecy on even trivial or obsolete matters, the dismissal of the right to habeas corpus, and the partisan screening for career positions at the justice department, the integrity of our government is being chipped away by people whose concern for our governing institutions falls well below their concern for their political party or their economic allies.

The effect of all this is to weaken citizens' interest from their own government. Instead of acting as the ultimate check on abuses in government, the citizens are left believing that our government can only be influenced by those with money, and that rational debate is pointless or not possible. On all the major issues Gore discusses, the unwillingness of voters to punish dishonesty and incompetence is traced back to the combination of the obsessive secrecy prevailing in the Executive Branch and the manufacturing of public consent through TV marketing. Gore gives a disturbing example of this phenomenon of consent manufacturing from his own campaign for the Senate, when his campaign advisers were able to successfully predict an 8.5% bump in the polls based on a carefully crafted TV ad campaign played in just the right markets.

This is the constant message of the book: the mix of secrecy, money, and most of all, the manufacturing of political consent through television, have led to a complete lack of any effective challenge to the destructive actions carried out by the Bush administration and its allies in their single-minded pursuit to propagate not just an extreme political ideology, but their own power and wealth. Gore’s key insight is that the problem is not simply due to the unscrupulousness of Bush and his cronies - such people have existed as long as human society, and were recognized by America’s Founders, who sought to create a system that would limit the damage these types of self-serving people could do. It is the substitution of democratic debate with the propaganda of professional marketing and television that has disengaged citizens from the political process and enabled the damage that has been done under the current leadership.

The apathy or fatalism of even socially conscious, educated citizens has had a severe effect on my generation. At least as I've experienced it, the few people my age who are really politically engaged tend to hold a true-believer, Monica Goodling-style outlook (in which no aspect of our government should be immune from partisanship), while the rest of us take a South Park, “all politicians suck” view, devoid of any hope of seriously influencing the political process. As Gore writes:

“If the information and opinions made available in the marketplace of ideas come only from those with enough money to pay a steep price of admission, then all of those citizens whose opinions cannot be expressed in a meaningful way are in danger of learning they they are powerless as citizens and have no influence over the course of events in our democracy - and that their only appropriate posture is detachment, frustration, or anger.” (p. 250)

Along with this detachment from government comes a cynicism about reason and debate:

“When ideology is so often woven into the “facts” that are delivered in fully formed and self-contained packages, people naturally begin to develop some cynicism about what they are being told. When people are subjected to ubiquitous and unrelenting mass advertising, reason and logic often begin to seem like they are no more than handmaidens for the sophisticated sales force. And now that these same techniques dominate the political messages sent by candidates to voters, the integrity of our democracy has been placed under the same cloud of suspicion.” (p. 251)

It's easy to guess what Gore's proposed solution is before you get to the end of the book: the internet. The internet is the only media technology out there with the potential to compete with television for our attention, and it's advantage is that users don't have to just be passive absorbers of an extremely expensive message. The price of admission to the internet is low, and in some ways the internet resembles the raucous print culture of the 18th and 19th centuries. Gore's book is less about the assault on reason than it is about the assault on the reasoning process, and it is this process that the internet has the potential to renew. You may hate what I've written here, but you have the opportunity to reply in the comments (hopefully with more than just "you're an ass") where thousands (OK, on this site, just dozens) of people will read it. Just maybe, and there have been hints in the last few election cycles that this might actually work, people without access to big-media air time will be able to create a critical mass of public opinion on an issue that will make our representatives seriously worry the effect of inaction on their job security.

Gore's book should read by those who consider themselves principled conservatives, although the irrationally excessive hatred of the man that I've observed my conservative friends will probably keep many away. That's unfortunate, because he offers what should be common ground for people of both parties who believe our fundamental system of government should be preserved. Principled Republicans should recognize that the traditions and institutions their leaders are trashing have protected their party in other eras when the Democrats have been in power. We can argue about gun control, abortion, stem cells, military funding and whatever else, but if we can't agree that the First Amendment, the rule of law, and vigorous checks and balances among the three branches of the federal government are worth protecting, then we can't function any more as a coherent nation.

Wednesday, June 13, 2007

Non-coding DNA, Junk and Creationism

Larry over at Sandwalk straightens out a very confused writer at Wired. (Also check out the links in Larry's post to other responses). Larry does a great job going over the article in detail, but I can't resist commenting on a piece that is so badly off track.

Catherine Shaffer, who claims to be very knowledgeable about genomics has written an extremely confused article about the non-debate over whether recent discoveries about 'junk DNA' support creationism.

I can't believe anyone knowledgeable about genomics would write this (referring to the recently published opossum genome sequence):

"The opossum data revealed that more than 95 percent of the evolutionary genetic changes in humans since the split with a common human-possum ancestor occurred in the "junk" regions of the genome. Creationists say it's also evidence that God created all life, because God does not create junk. Nothing in creation, they say, was left to chance."

The fact that most changes occur in 'junk' regions supports creationism??? We expect most differences between the opossum and human genomes to occur in non-functional (that is, junk!) regions because that's where random mutations don't get swept away by selection. Most functional regions will have undergone much, much less change than the wide swaths of non-functional DNA that can get hit by random mutations without any effect on the survival of the organism.

People who study gene regulation (myself included) look for the functional stretches of DNA by doing exactly the opposite of what Shaffer is suggesting - we look for regions that don't change much among distantly related species. Those places are where you find the regulatory elements. And those elements really are surrounded by junk - our genomes are filled with the remains of mostly inactive genetic parasites and other low-complexity repeated sequences. We have fairly good ideas of how such junk sequences are produced, and many of these ideas have been tested in the lab.

Beyond making it abundantly clear that Shaffer knows essentially nothing at all about genomics, the article shows that Creationists like Behe and Meyer, who are gleefully quoted, know nothing about genomics either.

Genome-wide Association Studies - Are the Long-Promised Benefits of the Human Genome Project on the Horizon?

Genome-wide association studies (GWAs) have received a lot of media attention in the last several months as various research groups have released over a half-dozen such studies, all focused on some of the most widespread Western diseases, including heart disease, type II diabetes, and breast cancer. (See here, here, and here for some examples.) These studies have the potential to substantially change how we understand, diagnose and treat these diseases, and they possibly signal the near-arrival of at least some of the long-promised health benefits of the Human Genome Project, although new cures are probably still many years in the future.

Genome-wide association studies have become feasible with the availability of the human genome sequence and several associated technologies that allow researchers to rapidly and extensively genotype large numbers of people (I have previously explained how these studies work here). Traditional studies have compared phenotypes of patient populations with healthy subjects, in an attempt to correlate certain lifestyle risk factors with disease, such as studies linking diet with heart disease. These traditional studies are important, but until recently we could only phenotype patients, not genotype them. Now, with new technologies available, we can compare sick and healthy subjects at thousands of places in their genomes, and identify genetic variants that are possibly linked with a disease.

These recently reported GWAs are just a first pass, and their results will take some time to pursue in more depth. But with each study we are finding a handful of genes (or more accurately, variants of genes, called alleles) that may be involved in disease. These alleles could very quickly become useful for identifying people who are at high risk for disease.

As far as new treatments go however, I wouldn't expect to see many in the near future. The problem is this: GWAs give us a list of genetic variants linked to a disease, yet we have no idea what most of these variants do. We still have to understand how those variants affect specific proteins in the cell (i.e., molecular biology), and how those variants lead physiologically to disease (i.e., pathogenesis). Finally, we have to understand how multiple genetic variants act in combination to produce complex diseases like diabetes. Unlike Mendelian diseases, like cystic fibrosis which can be linked to a defect in a single gene, complex diseases like diabetes involve multiple genes interacting in a complex way with environmental factors. We hardly know how to study such complex interactions, much less cure them. This is an extremely difficult scientific problem.

Does this mean that these cures will never come? That the Human Genome Project was a waste of money and effort? No way - genome sequencing projects have already been a huge boon to basic science research, and they are also the obvious way forward in biomedical research. Molecular biology and genomics have not yet produced health benefits on par with the germ theory of disease, vaccinations, and randomized double-blind studies, but some day they will.

Sunday, June 10, 2007

Science in Against the Day: Vectors and Quaternions

Here at last is the long-delayed next installment of my ongoing primer on the science in Thomas Pynchon's novel Against the Day. The draft of part 1 can be found here. Illness and major deadlnes put me back by months. I hope to have more installments out soon.

Anyway, here is part 2: Quaternions and Vectors in Against the Day:

Science and Against the Day

Part 2: Vectors and Quaternions

I. The need for algebra in more than one dimension

In Against the Day, Pynchon frequently refers to a relatively obscure conflict in the mathematics and physics community that took place in the early 1890's between advocates of quaternions and proponents of the newer vector analysis. This conflict is tied in to major themes in the book that emphasize the tensions between the old and the emerging world that culminated in the conflict of World War I, as well as the ability to perceive and describe the world in more than the three dimensions of Euclidean space. Quaternions, like the luminiferous aether discussed in Part 1 of this essay, became superfluous and obsolete, unnecessary in the efforts of physicists to describe the natural world after the advent of modern vector algebra and calculus.

To understand this conflict, it is important to understand what mathematicians and physicists were searching for when they developed first quaternions and then vector analysis. The most important aim of these mathematicians and physicists had in mind was the ability to do algebraic manipulations in more than one dimension.

All of us are familiar with the basic, one-dimensional operations which we learned in elementary school: addition, subtraction, multiplication, and division. By one-dimensional, I mean operations on combinations of single numbers; in other words, what we do in every day addition or multiplication. Each of these single numbers can all be represented on a one-dimensional number line, and each operation can be thought of as moving left or right along the line:



So for example, the operation 2 + 3 moves you to the right three units on the number line, from position 2 to position 5. I know that readers of Pynchon’s novels do not need a review of 1st grade math; the important point I’m trying to make is that these operations we’re all familiar with are one-dimensional operations on a number line.

These basic, one-dimensional operations have certain important properties, ones which most of us take for granted once we're out of elementary school. For example, two important properties are:

Associativity - when adding or multiplying more than two numbers, it doesn't matter how you group them:
(a + b) + c = a + (b + c) and (a x b) x c = a x (b x c)

Commutativity - when you add or multiply two numbers, it doesn't matter how you order them:
a + b = b + a and a x b = b x a

The challenge to mathematicians in the 18th century was to define algebraic operations such as addition and multiplication on pairs (or larger groups) of numbers - in essence, creating an algebra of more than one dimension. In order to be useful, these operations on pairs of numbers had to have at least some of the important properties for operations on single numbers; for example, the addition of number pairs should be associative and commutative.

Why are operations on paris or other groups of numbers important? One reason is that such definitions would represent an advance in pure mathematics, but another key reason is that higher dimensional mathematics would make it easier to work with the laws of physics in more than one dimension. To see what these means, let's take Newton's Second Law of Motion as an example. This law states that the force acting on an object is proportional to the mass of the object times the acceleration of the object produced by the force. Newton's second law can be written as this equation:

F = ma

But in three-dimensional space, Newton's Second Law is properly written with three equations, to account for the force and the acceleration in each dimension (each dimension represented by x, y, or z):



This means that when we make calculations using Newton's Second Law, we really have to perform our calculations on three equations if we want to deal with ordinary three-dimensional space. In a complicated situation where we want to add and subtract many different forces, we have to add and subtract the three components for each force. A system of analysis, where our operations of addition and subtraction could be performed on a set of three numbers at once, treating the three-dimensional force as one unit, would greatly simplify calculations using Newton's Second Law or the much more difficult laws of electromagnetism formulated by Maxwell.

Here is another way to see the problem. Scientists distinguish between the speed of an object, which is just a magnitude or a scalar quantity (such as ‘60 miles per hour’), and velocity, which is comprised of both a magnitude and a direction, and thus is a vector (such as ’60 miles per hour going northwest’). Adding speeds together is easy, but how do we add velocities? 18th and 19th century scientists could do this by breaking velocities down into their one-dimensional components (just as we did for Newton’s Second Law), but they realized that a better system was needed.

II. Complex numbers and two-dimensional math

Before tackling three dimensions, let’s start with just two. 19th century scientists already had a powerful system of analysis for dealing with pairs of numbers - complex numbers. Complex numbers are comprised of two parts, a real part and an imaginary part. The imaginary part consists of a number multiplied by the number i , which is the square root of -1:



This means ‘i squared’ is equal to -1:



Thus a complex number z looks like the following, where a and b are any numbers you choose:



Again, a is called the real part, and ib is known as the imaginary part.

Unlike our ordinary numbers on a number line, complex numbers can be represented on a two dimensional plane, called the complex plane. One axis of the plane is the number line for the real numbers, and the second axis is the number line for the imaginary numbers:



Instead of a point on a one-dimensional number line, complex numbers can be interpreted as points on the two-dimensional complex plane. For example, the complex number ‘6 + 3i’ is the point on the complex plane as shown below:



Using complex numbers, one can now describe two-dimensional operations like rotation. For example, multiplying a number by i is equivalent to a 90-degree rotation on the complex plane. Thus the operation:



is equivalent to this 90-degree rotation on the complex plane:



Instead of multiplying by i, one can multiply by any complex number to get a rotation of any angle other than 90-degrees. This subject comes up on p. 132 of Against the Day, where Dr. Blope talks about rotations, not in the two-dimensional space of the complex plane, but in the three dimensional space of quaternions:

“ ‘Time moves on but one axis, ‘ advised Dr. Blope, ‘past to future - the only turning possible being turns of a hundred and eighty degrees. In the Quaternions, a ninety-degree direction would correspond to an additional axis whose unit is √-1. A turn through any other angle would require for its unit a complex number.’”

This ability to use operations of complex numbers to describe two dimensional rotations and translations is an extremely important tool in math and physics.

Complex numbers have many other amazing properties, but most relevant to our discussion of Against the Day is that complex numbers can be manipulated with all of our basic operations - they can be added, subtracted, multiplied, and divided, with the kinds of useful properties mentioned earlier, such as associativity and commutativity. Thus, with complex numbers, we have a way to do algebra in two dimensions.


III. Extending complex numbers to three dimensions: Quaternions

In the mid-19th century, several mathematicians were looking for ways to extend the two-dimensional geometrical interpretation of complex numbers to three dimensions. One important figure was Hermann Grassman, whose system turned out to be closest to the yet-future vector analysis. Grassman is mentioned on occasion in Against the Day, but his role in the development of vector analysis was somewhat diminished by the fact that, compared to William Hamilton, Grassman was fairly unknown. It was William Hamilton, who was already famous for earlier work, who developed the most well-known immediate predecessor to vector analysis - quaternions.

William Hamilton had been struggling with the problem of how to generalize complex numbers to higher dimensions. While walking with his wife in Dublin, Hamilton discovered the fundamental relationship that could underlie such generalized complex numbers, which he called quaternions. This fundamental relationship is this:



Hamilton was so excited about the discovery that he carved this equation into the stone of Dublin’s Brougham Bridge. Readers of Against the Day will appreciate Hamilton’s own language describing this event (in a letter written to his son in 1865):

“But on the 16th day of the same month - which happened to be a Monday, and a Council day of the Royal Irish Academy - I was walking in to attend and preside, and your mother was walking with me, along the Royal Canal, to which she had perhaps driven; and although she talked with me now and then, yet an under-current of thought was going on in my mind, which gave at last a result, whereof it is not too much to say that I felt at once the importance. An electric circuit seemed to close; and a spark flashed forth, the herald (as I foresaw, immediately) of many long years to come of definitely directed thought and work, by myself if spared, and at all events on the part of others, if I should even be allowed to live long enough to distinctly communicate the discovery. Nor could I resist the impulse - unphilosophical as it may have been - to cut with a knife on a stone of Brougham Bridge, as we passed it, the fundamental formula with the symbols, i, j, k; namely



which contains the Solution of the Problem, but of course, as an inscription, has long since mouldered away.” (from Crowe, p. 29)

So what exactly are quaternions? It would be too difficult to explore their properties in any depth here. More thorough introductory references can be found at Mathworld, and also Roger Penrose’s book The Road to Reality, chapter 11. Briefly though, a quaternion is like a complex number, in that it is made up of multiple parts. It has four components, one scalar component and three vector components:



The three components of the vector portion of a quaternion are imaginary numbers, just like ‘i b’ is the imaginary number portion of a complex number. As we saw earlier, the imaginary number i is equal to the square root of -1, or:



The same holds true for j and k in quaternions:




Just as complex numbers can be used to algebraically describe rotations in the two-dimensional complex plane, quaternions can be used to describe rotations in three-dimensional space. That three dimensional space is defined by three imaginary axes, i, j, and k (instead of the x, y, and z we used earlier to describe our everyday, Cartesian, three-dimensional space).

Quaternions have most of the important algebraic properties of both real and complex numbers; for example, they have the associative property (i.e, a + (b + c) = (a + b) + c). However quaternions do not have one major property: multiplication is not commutative, that is i jj i. (To get an idea of how weird this is, imagine that 5 x 6 ≠ 6 x 5 !) Quaternions are actually anti-commutative, which means that: i j = -j i. (Again, imagine what it would be like of real numbers had this property - then 5 x 6 = -(6 x 5) - weird, but this kind of weirdness is an important property in quantum mechanics and other aspects of modern physics.)

Quaternions never caught on as widely as Hamilton had hoped, but they did have some very passionate advocates. A community of mathematicians and physicists put in a significant effort to show how quaternions could be useful for solving problems in physics. Maxwell’s law’s of electromagnetism (operating in three-dimensional space) could be written in terms of quaternions, but it still wasn’t clear that quaternions were the best tools for handling multi-dimensional problems in algebra and physics. As a recent paper put it, “Despite the clear utility of quaternions, there was always a slight mystery and confusion over their nature and use.” (Lasenby, Lasenby and Doran, 2000) Roger Penrose puts it this way:

“[Quaternions give] us a very beautiful algebraic structure and, apparently, the potential for a wonderful calculus finely tuned to the treatment of the physics and geometry of our 3-dimensional physical space. Indeed, Hamilton himself devoted the remaining 22 years of his life attempting to develop such a calculus. However, from our present perspective, as we look back over the 19th and 20th centuries, we must still regard these heroic efforts as having resulted in relative failure. This is not to say that quaternions are mathematically (or even physically) unimportant. They certainly do have some very significant roles to play, and in a slightly indirect sense their influence has been enormous, through various types of generalization. But the original ‘pure quaternions’ still have not lived up to what must have undoubtedly have initially seemed to be an extraordinary promise.

"Why have they not? Is there perhaps a lesson for us to learn concerning modern attempts at finding the ‘right’ mathematics for the physical world?” (Penrose, p. 200)

IV. "Kampf ums Dasein" - the struggle between quaternions and vector analysis

J. Willard Gibbs wrote a letter in 1888, in which he stated that “I believe a Kampf ums Dasein [struggle for existence] is just commencing between the different methods and notations of multiple algebra, especially between the ideas of Grassman & of Hamilton." (Crowe, p. 182) That struggle commenced in earnest in 1890, and lasted roughly four years. According to Michael Crowe, author of A History of Vector Analysis, the struggle involved eight scientific journals, twelve scientists, and roughly 36 publications between 1890 and 1894. (Crowe, p. 182) The following chronological outline is based on Michael Crowe’s extensive discussion of this struggle (chapter 6 of A History of Vector Analysis).

What was the argument about? Hamilton’s followers tried for years to bring what they perceived to be the quaternions’ untapped potential to fruition. They had not been as successful as they hoped, and a new competitor was emerging - the system of vector analysis developed simultaneously by Oliver Heaviside in England and J. Willard Gibbs at Yale. This new system was proving useful in a variety of contexts where quaternions had failed to live up to their promise. For instance, while Maxwell’s laws of electromagnetism had been at one point cast in quaternion form, Heaviside showed that Maxwell’s laws could be much more usefully presented in the form of vector calculus. Also, Gibbs had written a pamphlet laying out his system of vector analysis and argued its advantages in solving physics problems.

This competition riled the quaternionists. They began seeking support among mathematicians and physicists, trying to encourage their colleagues to join their effort to further develop quaternions into a useful tool. The leading quaternionist, successor to Hamilton (who had died in 1865), Peter Guthrie Tait argued in 1890 that quaternions were “transcendentally expressive” and “uniquely adapted to Euclidian [3-dimenesional] space.” Tait also launched what was basically the first shot in the struggle with the vectorists, when he wrote that Gibbs was “one of the retarders of Quaternion progress, in virtue of his pamphlet on Vector Analysis, a sort of hermaphrodite monster.”

Gibbs replied to Tait in an 1891 letter in the journal Nature. He argued that the scalar and vector products of his vector analysis had a fundamental importance in physics, but the quaternion itself (which, as discussed above, is a combination of both a scalar and a vector element) had little natural usefulness. (Briefly, the scalar and vector products are what we now call the ‘dot’ and ‘cross’ product of two vectors. Basically, Gibbs defined two types of multiplication for vectors: one could multiply two vectors to get a scalar quantity (the scalar, or ‘dot’ product); or one could multiply two vectors to obtain yet another vector (the vector, or scalar product). Both products are widely used today in physics.) Gibbs also pointed out that vector analysis could be extended to four or more dimensions, while quaternions were limited to three dimensions.

In his reply to Gibbs, Tait made the infamous comment (which crops up in Against the Day, p. 131) that “it is singular that one of Prof. Gibbs' objections to Quaternions should be precisely what I have always considered... their chief merit:- viz. that they are uniquely adapted to Euclidean space, and therefore specially useful in some of the most important branches of physical science. What have students of physics, as such, to do with space of more than three dimensions?” (Crowe comments wryly that “Fate seems to have been against Tait, at least in regard to that last point.”)

The arguments went back and forth for four years with little apparent progress. Gibbs repeatedly and calmly emphasized that the prime consideration in a system of analysis should be given to the fundamental relationships we wish to describe in the physical world. He wrote:

“Whatever is special, accidental, and individual [in these analysis systems] will die as it should; but that which is universal and essential should remain as an organic part of the whole intellectual acquisition. If that which is essential dies with the accidental, it must be because the accidental has been given the prominence which belongs to the essential...”

Other writers were not so calm as Gibbs. Several quaternionists were quite vitriolic, while Oliver Heaviside seemed to relish the battle when he wrote that “the quaternionic calm and peace have been disturbed. There is confusion in the quaternionic citadel; alarms and excursions, and hurling of stones and pouring of boiling water upon the invading host.”

After about 4 years, the arguments died down. Vector analysis began to be more widely adopted, not because of any arguments made in the ‘Kampf ums Dasein’, but because it became closely associated with the growing success of Maxwell’s theory of electromagnetism. Quaternions faded into a historical footnote, while a modernized version of the Gibbs and Heaviside vector analysis became what most students in physics, chemistry, and engineering learn to use today. Like the science of the luminiferous aether, which became obsolete after the work by Michelson and Morley, and the development of Einstein’s Special Relativity, quaternions are another largely abandoned subject of once-high 19th century hopes.

V. Speculations on quaternions in Against the Day

Why does Pynchon make such a big deal of quaternions and vectors in Against the Day? Possibly because they are so tied up with the changing notions of light, space, and time around the end of the 19th Century. An important theme in the history of science is that how we perceive our world is limited by how we can measure it, and what we can say about it (especially in terms of mathematics). The quaternionists’ views of space and time were limited by the mathematical formalisms they were working with. Some of them speculated that the scalar (or w ) term of a quaternion could be used somehow to represent time, while the three vector components covered 3-dimensional space, but this view treats time differently from how it would eventually be dealt with in the four-dimensional space-time of special relativity. For one thing, time as a scalar term would only have two directions, ‘+’ or ‘-’; that is, either forward or backwards, whereas in relativity individual observers can be rotated any angle relative to the time axis of four-dimensional space-time (recall the Frogger example from part I of this essay).

Characters in Against the Day speculate about the somewhat mysterious role of the w term of quaternions, suggesting that the ‘Quaternion weapon’ makes use the w term to somehow displace objects in time. As Louis Menand notes in his review of Against the Day, this book “is a kind of inventory of the possibilities inherent in a particular moment in the history of the imagination.” (I disagree with Menand’s claim that this is all the book is, and that it is just a rehash of what was done in Mason & Dixon. More on that in another installment of this essay.)

Spaces and geometries, those which we perceive, which we can’t perceive, or which only some of us perceive, are a recurring theme in Against the Day. As Professor Svegli tells the Chums about the ‘Sfinciuno Itinerary’, “The problem lies with the projection” of surfaces, especially imaginary ones beyond our three-dimensional earth. Thus ‘paramorphoscopes’ were invented to reveal “worlds which are set to the side of the one we have taken, until now, to be the only world given us.” (p. 249) To draw perhaps a too-crude analogy, the mathematical tools of physics are like paramorphoscopes - designed correctly, they can enable us to talk about worlds and imaginary axes that we would not have considered otherwise. And perhaps the by abandoning some of the tools once current in the 19th Century, we have closed off our perception of other aspects of nature that remain currently transparent to us. It turns out that Gibbs’ vector analysis itself was insufficient to handle important aspects of relativistic space-time as well as quantum mechanics, and physicists have since rediscovered important ideas in algebra developed by Hermann Grassman and William Clifford, whose 19th century work anticipated important 20th century developments better than quaternions or vector analysis.

There is much more that could be said. In future installments of this essay, I'll finish the science primer portion by covering Riemann surfaces and 4-dimensional space-time, and then hopefully move on to some interpretation and a reply to James Wood's claim that that Against the Day is just a massive Seinfeld episode - that is, a book about nothing.

Stay tuned...

For further reading:

Michael J. Crowe, A History of Vector Analysis (1969)
Roger Penrose, The Road to Reality, (2004) Chapter 11
The Feynman Lectures on Physics, Vol. 1, Chapters 11 and 22
Lasenby, et. al, "A unified mathematical language for physics and engineering in the 21st century", Phil. Trans. R. Soc. Lond. A (2000) 358, 21-39

Friday, June 08, 2007

Sean Carroll's Smackdown of Michael Behe

This week's issue of Science has a book review (subscription required unfortunately) of Michael's Behe's latest effort to defend Intelligent Design Creationism. Michael Behe's latest book, The Edge of Evolution, contains Behe's latest incarnation of his idea of irreducible complexity. A few years ago he put forward this latest argument in a paper in Protein Science (a journal which one of my mentors dismissed, maybe a little unfairly, as a "junk journal"), and he elaborates on this argument more extensively in his new book. (See a response to Behe's Protein Science paper here.)

The argument is this: Any novel function in a protein that requires two simultaneous amino acid changes is so unlikely to occur by chance that the novel function must have been designed. Since most beneficial changes in protein function would require a change of two or more amino acids, most of the varied functional proteins we see in nature must have been designed and not evolved. So for example, if a receptor for a certain hormone were, over the course of evolution, to evolve a new specificity for a different hormone, and if that new specificity required at least two amino acid changes, then such change is incredibly unlikely to occur under just natural selection and random mutation.

In making this argument, Behe makes this explicit assumption: an 'intermediate' protein with only one amino acid change (so in other words, it's only halfway evolved towards a new function that requires two certain amino acid changes) is non-functional, and thus is not subject to selection for the new function. So in order for natural selection to act, the protein would need both amino acid changes simultaneously, arising by chance mutation in the same individual organism (an event which Behe calculates to be unlikely).

Sean Carroll takes on this argument by pointing out that Behe's assumption of non-functional intermediate proteins is contradicted by vast amounts evidence. Single amino acid changes in a protein do in fact cause beneficial changes that are favored by natural selection, and over time these single changes accumulate in a lineage to create a more robust novel function. This is the norm in evolution, not the exception as Behe would have it. The scientific literature supporting this is extensive.

Carroll goes on to make this important point: "Behe seems to lack any appreciation of the quantitative dimensions of molecular and trait evolution." This is because Behe, like me, is a biochemist - biochemists learn about the physics and chemistry of proteins. The kind of math we use to do our work consists primarily of differential equations that describe the kinetics and thermodynamics of proteins and nucleic acids. Biochemists generally do not study mutation rates, evolving populations, or the heavy statistics behind natural selection. That's a whole separate field, called quantitative genetics, founded primarily by the pioneering scientists Sewall Wright and Ronald Fisher, whose work is usually not that familiar to biochemists. (As someone who did a PhD in a biochemistry department, but now works in a genetics department, and in a lab that does serious quantitative genetics, I have acutely, even painfully, experienced this difference in training firsthand.)

Behe's problem is that he's tried to jump into this field without any serious background knowledge; it's like a chemist or engineer trying to tackle research problems in quantum gravity - the chances of producing anything worthwhile are essentially zero. Behe's efforts at modeling mutation and selection on protein function have thus been amateurish, and not taken seriously by people who work on these problems professionally.

On a different note, it's great to see Sean Carroll join the evolution/intelligent design fray. Many distinguished scientists write well and effectively against the weak claims of Intelligent Design, but few have the stature, as scientists, that Carroll has. He's a very big player in the field of evo-devo, a field which is directly relevant to the claims made by ID creationists. It's nice to see such a heavy-hitter get involved.