Adaptive Complexity: the future of biology

Showing posts with label the future of biology. Show all posts

Wednesday, February 20, 2008

Making Biology Easy Enough For Engineers

No, I'm not knocking the intelligence of engineers. But we're still not at the point where, in the words of synthetic biologist Drew Endy:

...when I want to go build some new biotechnology, whether it makes a food that I can eat or a bio-fuel that I can use in my vehicle, or I have some disease I want to try and cure, I don't want that project to be a research project. I want it to be an engineering project.

Just like designing a new bridge or a new car is not a scientific research project, designing biotechnology shouldn't always be a research project. But biology is still too hard, argues Drew Endy, in a reflective interview on The Edge. (Thanks to The Seven Stones for the tipoff).

Endy draws a distinction between those of us trying to reverse engineer complex biological systems and those who want to build them - you could say, systems biologists vs. synthetic biologists:

Engineers hate complexity. I hate emergent properties. I like simplicity. I don't want the plane I take tomorrow to have some emergent property while it's flying.

He seems to also be arguing that if we want to build truly predictive models of biological systems, like, say, an individual yeast, we should work on building biological systems, not just reverse engineering them:

If I wanted to be able to model biological systems, if I wanted to be able to predict their behavior when the environment or I make a change to them, I should be building the biological systems myself.

I understand this to mean that you start by engineering really simple things (individual genes), and move up to more complex things (promoters, chromosomes, genomes).

This sounds like a useful approach, but I still don't see how synthetic biology is going to go from engineering really, really simple systems to systems that approach the complexity of real organisms. In the case of mechanical or electrical engineering, the physical theory behind how these systems behave has been worked out, to a high level of sophistication, for decades. And thus we can engineer, fairly easily, things from thermostats to computers to Boeing planes.

But how do we go from building artificial genes and promoters to artificial metabolic pathways (without just copying and pasting an existing metabolic pathway, with minor tweaks)? Let's say you can cheaply synthesize a 50 million-base artificial chromosome, big enough to hold a set of metabolic or signaling pathways of your custom design. How do you choose what to put on your artificial chromosome?

I don't see how you can do it without a genuinely quantitative, formal, theoretical framework for treating biological systems, which we just don't have yet. To echo Endy's earlier quote on engineering, every new effort to model a biological system is a research project in itself, not a routine engineering task. How do we change that?

It's a fascinating interview, worth checking out.

Monday, January 28, 2008

Sequencing 1000 Human Genomes - How Many Do We Really Need?

A group of the world's leading sequencing centers have announced plans to sequence 1000 human genomes. The cost of the first human genome project was about $3 billion; by comparison, the next 1000 will be a steal at possibly only $50 million dollars (and that's total cost, not per genome). But that's still a lot of money - why are we investing so much in sequencing genomes? It may be a lot up front, but the benefits, in terms of both economics and medical research, easily outweigh the cost of such a large project. By pooling sequencing resources and making large amounts of genome sequence data available up front, we can avoid inefficient and redundant sequencing efforts by groups of independent research groups trying discover gene variants involved in disease. In fact, it would probably be worthwhile to sequence 10,000 human genomes. With 1000 genomes, we're at least making a good start.

The 1000 genomes project comes at a time when new, genome-wide disease studies have created a need for an extensive catalog of human genetic variation, and new technology has provided a potentially efficient way to fill that need. Genome-wide association studies have scanned the genomes of thousands of people, sick and healthy, to find regions of DNA that may be involved in diseases like diabetes and heart disease. These studies have highlighted regions of our chromosomes that could be involved in disease, but the genome scans used are often still too low-resolution for researchers to pinpoint that exact genetic variant that might be involved. In many cases there dozens or more genes inside of a disease-linked region, any of which could be the disease gene of interest. As a result, groups that do these genome-wide studies have to invest considerable time and resources mapping genetic variants at higher resolution - which of course significantly increases the effort required to get anything useful out of genome-wide association studies. What we need is a much better, much more detailed catalog of all the spots where humans vary in their genomes.

Although we're all roughly 99.9% similar in our genomes, that still leaves millions of DNA positions where we vary. And often we don't just vary at single DNA base-pairs (instances where in one position you might have a 'T', while I have a 'C'); small deletions and insertions of stretches of DNA also exist, and can be involved in disease. The 1000 genomes project intends to map both single-base changes and these 'structural variants.'

Many genetic variants important in health show up in just 1%, 0.1%, or even 0.01% of the population. Imagine a medically important genetic variant present in just 0.1% of the U.S. population; 300,000 people will have this variant and will be possibly at risk for a particular disease. Knowing about such variants could help us understand how the disease develops, and possibly design prevention and treatment strategies. For 300,000 people in the U.S, and millions more around the world, it would therefore be a good thing to know about this variant. But to find those rare variants, we can't just sequence 100 human genomes, and even with 1000, we would miss a lot.

So it's obvious that 1000 genomes is just a start. To go more aggressively after rare variants that are likely to be medically important, the 1000 genomes group is going to also focus in on the small gene-containing fraction of the human genome. This will enable them to put more resources into finding rare variants near genes - places where we expect medically important variants to show up.

And let's not forget the technological benefits: by using next-generation sequencing technology, which is still not quite mature, this consortium hopes to develop innovative ways to effectively and cheaply re-sequence human genomes. In both physical technology and data analysis methods, we'll see benefits that will lead to more widespread use of this technology, hopefully in clinical diagnostics as well as research.

It has been nearly 20 years since the Human Genome Project officially began. We're still waiting for the promised medical benefits to emerge, but in the mean time, this effort has transformed biological research - in my opinion, at least as much as the foundational discoveries of molecular biology in the 50's and 60's. Future medical benefits will be based on this science.

Thursday, January 10, 2008

What Next Generation DNA Sequencing Means For You

Of all the 'Greatest Scientific Breakthroughs' of 2007 heralded in the pages of various newspapers and magazines this past month, perhaps the most unsung one is the entrance of next-generation DNA sequencing onto the stage of serious research. Prior to this year, the latest sequencing technologies were limited in their usefulness and accessibility due to their cost and a steep technical learning curve. That's now changing, and a group of recent research papers gives us a hint of just how powerful this new technology is going to be. Not only will next-generation sequencing be the biggest change in genomics since the advent of microarray technology, but it may also prove to be the first genome-scale technology to become part of every-day medical practice.

Sanger DNA sequencing is one of the most important scientific technologies created in the 20th century. It's the dominant method of sequencing DNA today, and very little of the best biological research of the last 20 years could have been done without it, including the whole genome sequencing projects that have thoroughly transformed modern biology. Now, new next-generation sequencing methods promise to rival Sanger sequencing in significance.

So what's so great about the latest sequencing technology? Sanger sequencing is inherently a one-at-a-time technology - it generates a single sequence read of one region of DNA at a time. This works well in many applications. If you're sequencing one gene from one sample, Sanger sequencing is reliable, and generates long sequence reads that, under the right conditions, can average well over 500 nucleotide bases. You get a nice clean readout of a single strand of DNA, as you can see in this example:

Modern sequencing machines that use this method generate a 4-color fluorescent dye readout, which you can see in the graph in the figure. Each peak of fluorescence in the graph represents one nucleotide base, and you know which base it is from the color of the dye.

Next-generation sequencing, also called pyrosequencing, can't generate the nice, long sequence reads you get with Sanger sequencing, nor are the individual reads as accurate. Instead of 500 DNA bases or more, you just get about 25 bases. But the difference is that you get lots and lots of sequence reads. Instead of just one long read from just one gene (or region of the genome), you get thousands of short, error-prone reads, from hundreds or thousands of different genes or genomic regions. Why exactly is this better? The individual reads may be short and error prone, but as they add up, you get accurate coverage of your DNA sample; thus you can get accurate sequence of many regions of the genome at once.

Next-generation sequencing isn't quite ready to replace Sanger sequencing of entire genomes, but in the meantime, it is poised to replace yet another major technology in genomics: microarrays. Like next-generation sequencing, microarrays can be used to examine thousands of genes in one experiment, and they are one of the bedrock technologies of genomic research. Microarrays are based on hybridization - you're basically seeing which fluorescently labeled DNA from your sample sticks (hybridizes) to spots of DNA probes on a microchip. The more fluorescent the spot, the more DNA of that particular type was in the original sample, like in this figure:

But quantifying the fluorescence of thousands of spots on a chip can be unreliable from experiment to experiment, and some DNA can hybridize to more than one spot, generating misleading results.

Next-generation sequencing gets around this by generating actual sequence reads. You want to know how much of a particular RNA molecule was in your sample? Simply tally up the number of sequence reads corresponding to that RNA molecule! Instead of measuring a fluorescent spot, trying to control for all sorts of experimental variation, you're just counting sequence reads. This technique works very well for some applications, and it has recently been used to look at regulatory markings in chromatin, to find where a neural regulatory protein binds in the genome, to look at the differences between stem cells and differentiated cells, and to see how a regulatory protein behaves after being activated by an external signal.

I've left out one the major selling points of this technology: it's going to be cheap. You get a lot of sequence at a fairly low cost. And this is why it may end up being the one technology that truly brings the benefit of genomics into our every-day medical care. Because next-generation sequencing is cheap and easy to automate, diagnostics based on sequencing, especially cancer diagnostics will become much more routine, and so will treatments based on such genetic profiling. It will be much easier to look at risk factors for genetic diseases. Microbial infections will be easier to characterize in detail.

All of this is still a few years off, but the promise of this technology is already apparent enough to include it among the great breakthroughs of 2007.

Go look at the very informative websites of 454 Life Sciences, Illumina, and Applied Biosystems, the major players in next-generation sequencing.

For more on Sanger sequencing, check out Sanger's Nobel Lecture (pdf file).

A recent commentary and primer on next-generation sequencing in Nature Methods (subscription required).

Tuesday, August 21, 2007

Domesticating Biotechnology in the 21st Century

Will we domesticate biotechnology in the next 50 years? More than 150 years of spectacular advances in physics, chemistry, and computing have thoroughly transformed the way we live. Yet so far, the big revolutions in molecular biology have had their impact primarily on professional laboratories, not our everyday lives. What do we need to do in order to domesticate biotech?

Physicist Freeman Dyson recently explored this question:

"Will the domestication of high technology, which we have seen marching from triumph to triumph with the advent of personal computers and GPS receivers and digital cameras, soon be extended from physical technology to biotechnology?"

Dyson predicts this will happen in the next 50 years:

"I predict that the domestication of biotechnology will dominate our lives during the next fifty years at least as much as the domestication of computers has dominated our lives during the previous fifty years."

What form might this domestication take? Among Dyson's suggestions for domestication is user-friendly genetic engineering for hobbyist plant and animal breeders. I'm not so sure that making genetic engineering idiot-proof is the major hurdle; in fact, genetic engineering today is somewhat of an oxymoron. We may be able to engineer pet fish to express a green fluorescent protein, but we honestly have no clue how to engineer any but the most simple, monogenic traits.

We will dometicate biotechnology, and I predict that this will happen in two ways: by bringing biotech into the day-to-day practice of medicine, and by bringing genetic engineering to a truly sophisticated level, on par with aerospace engineering.

Bringing Biotech into the Clinic
To be honest, with the exception of imaging technology, medicine as practiced today is extremely low-tech. Very few of the fancy techniques that scientists use in a molecular biology lab are available on a routine, affordable basis in the clinic. Blood tests are downright primitive. And in spite of all of our sophisticated genome analysis technology, detailed genotyping is almost never used in medicine. Biotech is ripe for domestication in the clinic.

Dirt Cheap Genome Sequencing
One day, every newborn child will be routinely genotyped; that is, the hospital lab will take a blood sample and, quickly and cheaply, determine that baby's DNA sequence in the millions of places where humans can differ. Our genotype will become part of our medical records, which of course we ourselves will also have access to. Genotyping can be used to customize drug and disease treatments, as well as suggest lifestyle choices that will help avoid or minimize diseases that a person may be susceptible to. Universal genotyping can even be used by family history hobbyists.

The technological barriers will soon be overcome, leaving the social ones remaining as the largest obstacle to universal genotyping. Who can have access to this information? How much do you really want to know about your disease susceptibility? Your paternity? These aren't trivial questions.

The Universal Blood Test
High-tech, preventative diagnostics will transform the way we practice medicine. Most of today's diagnostic tests, with the exception of medical imaging, are based on decades-old techniques. Leroy Hood, a founder of Seattle's Institute for Systems Biology is working on technology for affordable, routine blood tests that will provide a comprehensive picture of your health, including the very early detection of diseases like cancer. These blood tests, one day cheap enough to be done annually, could thoroughly modernize preventative medicine.

Real Genetic Engineering
At Boeing, engineers can essentially design a new plane completely by computer, and predict in minute detail how that plane will behave in real-world weather. True genetic engineering will mean being able to make such quantitative predictions with the cell, but currently our abilities to make quantitative predictions are embarrassingly small. Analogous to Boeing's computer-aided design, computer aided genetic engineering will one day enable us to develop gene replacement therapies that don't have cancer as a side effect, develop specific, side effect-free drugs that treat tough diseases, and develop microbes that can generate energy from renewable resources, clean up toxic spills, or perform chemical reactions that organic chemists haven't yet been able to achieve.

The first step towards achieving this level of sophistication will be to completely understand all of the parts of an organism; the next will be to understand how those parts work as a system. We've nearly reached that first step for a eukaryotic organism: brewer's yeast, one of biology's key model organisms, will have an essentially completely annotated parts list within the next 10 years or so. Many scientists are now struggling with the next step, trying to make sense of how these parts work as a system. Yeast will be the first Boeing 747 of biology - an organism that we can completely and predictively model by computer, without extensive trial and error studies in the lab.

Maybe, after we've really learned how to do genetic engineering, hobbyists will then fulfill Dyson's dream of user-friendly plant design, and come up with a way to make glow-in-the-dark roses.

Monday, June 25, 2007

Untangling the Logic of Gene Circuits

How does a cell process information? Unlike computers, with CPUs to carry out calculations, and animals, which have brains that process sensory information, cells have no centralized device for processing the many internal and external signals with which they are constantly bombarded. And yet they somehow manage just fine. The single-celled brewers's yeast, for example, can know what kind of food source is available, tell when it's hot or cold, and even find a mate.

One key way that cells sense and respond to their environment is via genetic circuits. Although biologists often use the word 'circuit' in a sense that is only loosely analogous to electrical circuits, recent research is putting our understanding of genetic circuits on a much more rigorous and quantitative footing. By studying very simple circuits, using computer models and simple experiments, we are starting to understand, in a still very limited way, why the cell is wired up the way it is.

Let's take an example of a simple of a wiring setup that occurs very commonly in gene regulation. Suppose that gene A turns on gene B. (Technically, gene A does not turn on anything - gene A directs the synthesis of protein A, which can then turn on gene B, but when we talk about genetic networks, this is taken for granted.) A also turns on another gene, C. Gene B turns on gene C as well, so you get a little system wired up like this:

Initially, this configuration, called a feed forward loop may not make much sense. If gene C is turned on by A, then why do you need B? The key to this whole setup is that C requires both A and B to be fully on. If gene C needs both A and B in order to be switched on, we now have a circuit that is resistant to noise.

To see how this works, let's view this from the perspective of a small bacterium, such as E. coli. An individual bacterium is constantly in search of food; it can only swim around so long before it runs out of energy. E. coli can use a variety of different food sources, but it needs to turn on the proper genes for each food. When the sugar arabinose is available, E. coli switches on the genes that enable it to import and metabolize arabinose. But turning on the whole suite of arabinose genes requires some effort; it's important that the bacterium not go through all that effort only to find out that there is no arabinose around after all.

Going back to our little circuit, let's suppose that A is sensitive to arabinose. When arabinose is around, A turns on B, and A and B turn on C; gene C makes an enzyme that can help metabolize arabinose. But A could get turned on by just a trace of arabinose; this kind of random noise would be disastrous if A was always switching on C at the slightest provocation. We only want C around when there is a seriously good arabinose source.

Enter the feed forward loop - it filters out the noise! It works like this:

Scenario 1 - random noise, or just a trace of arabinose:

1. A gets turned on briefly, and then shuts off.

2. B barely gets switched on by A, but not enough to affect C.

3. C never gets turned on.

Scenario 2 - sustained arabinose signal:

1. A gets turned on, reaches a maximal level and stays on for a period.

2. B gets switched on by A and hits its maximal level.

3. C gets turned on once A and B reach their maximal levels.

4. The bacterium metabolizes arabinose.

Such genetic circuits are extremely common in biology, although most often they occur in much more complex combinations than I've shown here. One current idea is that the more complex combinations are built up out of simple circuits like this Feed Forward Loop, and the hope is that we can use our understanding of these simple circuits to make sense of the information processing properties of the massively tangled networks that we find in all cells. This is still mainly just a hope though; although there are some increasingly sophisticated computer models of complex genetic networks, there is precious little experimental work demonstrating that we have actually learned something about these complex networks.

The experimental situation is different though for simple networks - several research groups have carried out some very nice experiments on simple systems. Uri Alon is one of the leaders in this field (and my figures are redrawn from his recent review of this field.) His group has performed experiments to test the effects of these simple genetic circuits, and other groups are doing similar studies.

So, while a useful, rigorous, experiment-based understanding of more complex networks is still just a hope, our understanding of small, functional circuits is enabling us to delve deeper into the information processing properties of the cell.

Wednesday, May 16, 2007

Studying General Principles of Biological Systems - How flies make sense of smell

About two months ago I was in Steamboat Springs, Colorado, attending the Keystone Symposium on Systems Biology and Regulatory Networks. I went hoping to hear about forward-looking research that deals with some of the most fundamental outstanding questions in biology - fundamental in the sense of being relevant not just to a particular cell type or organism, but to most cells, developmental systems, or biological systems in general.

I not sure whether I really got a glimpse of the future at this meeting, but I did get a good view of the present. Many of the speakers at this meeting are running labs that have made steady contributions to what is currently being called systems biology. I think these labs are producing good research, but I'm not so sure that this is what systems biology should look like. Much of the work presented at the conference was high-throughput data collection and analysis; in other words, genomics. This type of research can help obtain a global picture of what's going on in the cell - such as where all of the transcription factors (DNA-binding regulatory proteins) are under a given set of conditions, or which genes or proteins interact with each other (physically or genetically), forming an 'interaction network.' Unlike a lot of stuff out there that's billed as systems biology, much this stuff was quite good - the organizers did a good job of selecting worthwhile speakers.

However, there are two primary reasons I don't think this type of research is really what systems biology should ultimately be:

- What we learn isn't fundamental enough - it only applies in certain limited cases, where there are directly homologous systems in other cell types or organisms; it's not about biological systems in general.

- The arguments aren't quantitative enough - we may learn quantitative things about many individual genes and proteins, but what we say about the big picture is still only qualitative. In other words, we can't yet reason about biological systems in the same way we reason about engineered, nonliving systems like circuits.

These are hard challenges to meet, but unless we meet them, systems biologists will only be saying the same types of things about cells that molecular biologists have been saying for decades, except that they'll be saying them about larger and larger datasets.

My favorite talk of the conference was by John Carlson, from Yale, who spoke about how fruit flies smell what's in their environment. Or, in Carlson's words (co-written with Elissa Hallem):

"Sensory systems produce internal representations of external stimuli. A fundamental problem in neurobiology is how the defining aspects of a stimulus, such as its quality, quantity, and temporal structure, are encoded by the activity of sensory receptors."

Basically, Hallem and Carlson wanted to figure out how a fly's odor sensing machinery is able to discriminate among the many different odors a fly encounters in its environment. An odorant is an often complex molecule, one that binds to odorant receptors (proteins in the cell membrane), which in turn activate a neural signal. Flies (and humans) possess many different odor receptors, but not nearly enough to have one receptor for every odor they can perceive. Instead, flies have to make do with receptors that can detect a variety of odorants, and their neural systems have to integrate this data to produce a useful internal representation of the odors in the environment.

Hallem and Carlson took some of these fly odorant receptors and expressed them in special neural cells which harbored no other odor receptors. They would then expose the neural cell to an odorant; if that odorant activated the receptor, there would be an electrical pulse in the neural cell that they could measure. These researchers managed to actually measure this in live flies instead of just using neurons in tissue culture. They would suck up a fly into a pipette, leaving only its head sticking out, and then expose the fly to various odors.

In this way they tested over 100 different odors on several dozen different odorant receptors. Not surprisingly, some receptors were activated by a broad spectrum of odors, while others were sensitive to only a few odors. Furthermore, each receptor was sensitive to a unique combination of odors; that is, no two types of receptor responded to exactly the same odors, although there was a lot of overlap in the sensitivity of receptor types.

Carlson reported much more data, which I am skipping over here, but to make sense of it all, Carlson and Hallem represented their data in a 24-dimensional 'odor space' (one dimension for each receptor in the analysis). By analyzing the data this way, they were able to categorize the various odorants: certain groups of odors activated very similar sets of receptors; certain odors produced a complex response (that is, they activated many different receptors) while others produced a simple response (by activating only a few receptors). In essence, they were able to characterize classes of odors by the combinatorial neural response the odors produced.

In the paper I linked to above, Hallem and Carlson end with this tantalizing sentence:

"This analysis provides a foundation for investigating how the primary odorant representation is transformed to subsequent representations and ultimately to the behavioral output of an olfactory system."

At the conference, Carlson made good on this promise. Since this is unpublished work, I can't write publicly about what he shared yet. But essentially, he has been able, using his odorant data, to build a fairly predictive model of fly behavior that actually works. When I see his paper come out, I'll post an update here.

This is good systems biology. Carlson is doing more than just laying the foundation for a better bug-repellent; he is tackling a fundamental problem of how complex environmental signals are processed by limited numbers of neurons in an animal to produce a coherent behavioral response.

Friday, May 04, 2007

Will the real systems biologist please stand up?

According to the NIH, you can't be a systems biologist and an experimental geneticist at the same time. The NIH has issued a call for applications to:

"use systems biology approaches to investigate the mechanisms that underlie genetic determination of complex phenotypes. These projects will combine computational modeling approaches and experimental validation of predictive models."

This is exactly the kind of thing our lab is working on. We have expertise in both mathematical modeling and experimental genetics and biochemistry. But according to the NIH, my boss would have to find someone else to collaborate with if he wanted to apply for this particular grant:

"It is expected that a team of at least two principal investigators (PIs), one with expertise in systems biology and the other with expertise in the genetics of humans or model organisms, will apply for funding under this FOA. Applications from a single investigator or that propose solely data production and accumulation will be considered non-responsive and will not be reviewed."

At this point, systems biology is such a chaotic, diverse field (you can't really call it a discipline) that it is really an absurd exercise to try to define who has "expertise in systems biology." Almost all of the people publishing what is called systems biology were trained in other disciplines - math, physics, engineering, computer science, and yes, biochemistry, genetics, and molecular biology. (Just check out the faculty in Harvard's Department of Systems Biology.)

Computational biologist Sean Eddy (who trained as a molecular biologist) had the following to say about the team appraoch adovcated by the NIH:

"It's also depressing to read that the National Institutes of Health thinks that science has become too hard for individual humans to cope with, and that it will take the hive mind of an interdisciplinary “research team of the future” to make progress. But what's most depressing comes from purely selfish reasons: if groundbreaking science really requires assembling teams of people with proper credentials from different disciplines, then I have made some very bad career moves."

He goes on to talk about the biologist Howard Berg (who trained as a physicist):

"He's successfully applied physical, mathematical, and biological approaches to an important problem without enlisting an interdisciplinary team of properly qualified physicists, mathematicians, and biologists. As he recently wrote, perhaps he'll have to start collaborating with himself."

It is depressing to see that talented investigators who have skills in both areas are barred from applying under this application request.

Monday, February 05, 2007

Making Simple Model Systems to Study the Design Principles of Biological Networks

The speaker at our departmental seminar last week, Wenying Shou, discussed a system of cooperating yeast strains she has constructed, forthcoming in PNAS (link to abstract, full article requires subscription). She created two yeast strains which absolutely depend on each other for survival. Yeast are normally free-living single celled organisms. But Shou knocked out a gene required to make lysine (an amino acid, critical in many proteins) in one yeast strain, and a gene required to make adenine (needed to make DNA and RNA) in the other yeast strain. These yeast strains are perfectly fine on their own, as long as you supplement their broth or culture medium with adenine or lysine - if they can't make it, the yeast cells can pull it in from their environment. However, if you put these yeast strains alone in a culture medium without adenine or lysine, they die out.

Shou showed that you can put these two engineered strains together so that they each supply the missing nutrient required by the other strain (with a little bit of tweaking, described in the paper). The strain that cannot produce its own adenine does produce the lysine that's required by the other strain and visa versa. So what happens is this: you start out with a culture containing both strains, in medium missing lysine and adenine. The lysine-defective strain starts to die off because there is no lysine available. As the cells die, they release adenine (which they can synthesize) into the medium, which is quickly taken up and used by the other adenine-defective strain, which doesn't produce adenine. Soon enough, the adenine-defective strain sucks up all the adenine released by the first strain, and now the adenine-defective strain is dying. As it dies, it releases lysine, which the first strain, now near complete extinction, can take up, enabling it to start growing robustly again. And so this whole system goes back and forth, each strain supplying a missing nutrient to the other.

The beauty of this system is that all of the relevant parameters can be measured, such as how much adenine a cell releases when it dies, how fast a strain grows at a certain concentration of nutrient, etc. Shou was able to write a set of equations describing this relatively simple system, and create a phase diagram based on two variables - the number of cells from each strain used to start the cooperative culture:

So we now have a prediction - the two strains will cooperate and survive if you start them on one side of the phase diagram, but they will die out if you start them on the other side. If you start your culture with a number of cells that lies right on the dividing line, sometimes your culture will survive and sometimes it won't. That's exactly what Shou found when she started her cultures with varying numbers of cells from each strain.

I think this is a nice model system to play with, but the problem is that I'm not sure yet what key questions it can answer for us. The ostensible rationale for this system is to study cooperation in nature, but frankly I think this is too disconnected from real cooperating species found in nature to gain much insight. Furthermore, since cooperation is already established in this system, you're not really studying how cooperating systems evolve in the first place. In the paper itself, Shou and her co-authors barely make reference to the decades of field and theoretical studies that have analyzed natural cooperation, and I don't think they've identified any questions that field biologists would love to see answered with this model system.

But this is beside the point. Our inability to engineer any but the simplest cellular networks from scratch suggests that we are missing an important part of the picture of how cells work. We can't build a cell from scratch. One way forward is to build simple systems like Shou's, with measurable parameters and study the dynamics to learn more about the principles that underlie simple biological networks. The dynamics that Shou has produced so far are maybe a little too simple (if you read the paper, you'll notice that the system merely converges to a point attractor), but I don't doubt this system has potential. As more simple systems like this one become available, we will need to define the specific important questions we want to answer. We probably have most of the mathematical and physical tools we need, yet we are lacking the useful concepts that will enable us to seriously study how a cellular system works on a quantitative level.

Sunday, January 21, 2007

Is Systems Biology Teaching Us Anything New?

What I find most exciting about basic molecular biology today is the prospect of building a quantitative understanding of how a cell works. Many other scientists are excited about this as well, leading to the current popularity of what's being called 'systems biology.' The idea is that maybe we can understand the design principles behind a cellular process - how the behavior of a cell emerges from all of those detailed physical interactions among proteins, nucleic acids and other components of the cell. If that sounds vague to you, well, that's because it is vague. It's a nice sentiment, but I think biologists still have a hard time defining just what it is we want to learn.

Think of this problem from a historical perspective: biology has several profound organizing theories that have been fantastically useful as explanations for what happens in biological systems. As the geneticist Dobzhansky famously put it, nothing in biology makes sense except in the light of evolution. The same thing holds true for genetics (you don't have adaptive evolution if you don't have genes encoding traits that are passed on from one generation to the next), biochemistry (all organisms are made of molecules that obey the laws of physics and chemistry - not some mysterious substance that transcends physical laws), and molecular biology (DNA makes RNA makes Protein). Each of these theories has been successful by those criteria that define a good scientific theory; for example, they have explained previously mysterious phenomena, they have predicted completely new phenomena that have since been verified, and they have opened up huge new avenues of research. Each one of these theories has changed the way the entire community of biologists operates.

Will systems biology do that? I hope so, but I don't know. It hasn't yet. Let's take the cell division cycle, for example, since it's a process that's near and dear to my heart. It's also a process that is crucial for understanding human disease, notably cancer. How can systems biology help us understand the cell cycle? How can it help us understand and cure cancer?

A recent paper in Nature, from Michael Laub's lab, reports the identification of an "integrated genetic circuit." This is a very nice paper, with a clear set of experiments, that identifies certain interactions among key cell cycle proteins that control division in the bacterium Caulobacter crescentus. The interactions identified in this research explain how it is that a key cell cycle regulator protein, called CtrA, is cyclically switched on and off during the various stages of the cell cycle.

The authors aren't claiming that they are producing a quantitative model, but they use the language of systems biology, notably by calling their set of novel interactions an "integrated genetic circuit." So what then is a non-integrated genetic circuit? How does a biological integrated circuit relate to the integrated circuit used in electronics?

Ultimately, this study, and many others like it, are largely filling in the molecular details of a specific process, something molecular biologists have been doing for decades. Feedback loops and regulatory interactions are valuable, but not new. In some cases, we are getting enough detailed data to build some primitive computer models, but these models are largely descriptive - reproducing what extensive experimental work has already shown.

So is systems biology ever going to amount to something like the paradigm-shifting initiation of molecular biology in the 50's and 60's? Are there any Really Big Questions left in biology, or are we now just finding better, faster ways to fill in the details? I think there are big questions left, but they're still poorly defined and often lost in the flood of genomic research. One telling gap in our knowledge is the origin of living cells from nonliving systems - we can't build a cell from scratch. We don't have the theoretical tools to understand, rigorously, how the first cells could have arisen from available components on the early earth, for example. This suggests that there is more we need to understand about how physical systems can cohere together to produce something that can adapt to its environment and reproduce. Not just molecular details, but new concepts of how physical systems organize themselves.

For now I'm just raising questions, but in future posts I'll discuss how we might go about finding some answers.

Wednesday, May 31, 2006

Network Thinking in Biology

A recent post on Pharyngula about current debates on the evolution of gene regulatory networks in embryo development motivated me write down some thoughts about the current status of network thinking in biology. Networks are one of the latest hot items in molecular biology, but at this point, in my opinion, their significance to our current understanding is hugely overrated. More and more papers are showing up with 'network' in their titles, but most of the time, the word is uninformative about the content in the article, or just a fancy way of renaming what biologists have been doing for years - mapping the interactions between proteins, DNA, and small molecules.

What has network thinking actually done for biology? Very little except generate confusion and bad paper titles. At most, the idea of a network is used as a helpful metaphor. Metaphors can be useful in science, but if networks are to really make a big impact on our understanding of molecular biology, we need to move beyond vague metaphors and into a rigorous set of concepts that actually make a difference in how we think about biological problems.

Here's one way of looking at the situation: A recent review article by Alberto Barabasi begins by succinctly summarizing what nearly all molecular biologists would agree is a major goal:

"A key aim of postgenomic biomedical research is to systematically catalogue all molecules and their interactions within a living cell. There is a clear need to understand how these molecules and the interactions between them determine the function of this enormously complex machinery, both in isolation and when surrounded by other cells."

For at least one model organism, brewer's yeast, we're getting pretty damn close to coming up with a complete inventory of the identity, interactions, and functions of all the proteins in a cell. We know in great detail what goes on inside a yeast cell, and within 10 years, there won't be much left to discover in terms of the basic molecular biology that people have been working out for 50 years. Genomics has really accelerated this process. So the question arises, how do we make sense of all this?

This is where we hope networks will come in. As an analogy, think about a computer - you can know all about how transistors function, or how processors and buses and memory work in detail - in the case of yeast, it is like we have one particular logic board really figured out. But really, all that information only gets you so far if you don't understand things at a higher level of abstraction - if you don't understand the processor instruction set, or memory addressing, or network communication protocols.

Going back to Barabasi's review, we read further that:

"Rapid advances in network biology indicate that cellular networks are governed by universal laws and offer a new conceptual framework that could potentially revolutionize our view of biology and disease pathologies in the twenty-first century."

If only that were true. We certainly have seen advances made by people who study general properties of networks in abstract terms (Barabasi is one of those people), but I have frankly found very little to suggest that the 'universal laws' emerging from this work have deeply enhanced our understanding or predictive ability. For example, we know that many networks are 'scale free' - meaning that most network components have few connections, while a few 'hubs' have many connections. (The distribution of connections is exponential, which is what is meant we we say scale free networks follow a power law.) Why do we care whether a network is scale-free? The answer is that such networks more resistant to disruption; so scale-free biological networks are reasonably robust to mutation. After decades of genetic knock out experiments, we knew that fact already though, and while it's nice to know the source of this robustness, the utility of the scale-free concept seems to end there.

Compare this with the discoveries of the 50's and 60's - the Central Dogma: the idea that DNA makes RNA makes protein, or Jacob and Monod's model of gene regulation, which changed the way everyone in molecular biology thought about their work. These concepts set down the foundation for almost everything that happened in the next 50 years, culminating in the genome sequencing projects of the last decade. These genome sequencing projects are in some ways the ultimate validation of the importance of the Central Dogma.

I haven't come across any use of network concepts in biology that promise to be as significant. Basically, I think we don't even know how to start thinking about the next level of abstraction. Some people have made the excellent point that we shouldn't expect that level of abstraction to be closely analogous to what we see in human-designed circuits or computers, because evolution is likely to have hit upon very, very different solutions. That sounds like a tantalizing challenge.

What people have been doing instead is to just keep doing what we've been doing for 50 years, albeit with fancy genomic tools. We're still filling in details and mapping the interactions in the cell. But most of us deeply believe that there is another level of explanation - more than just having the most detailed map we can get. Sure, we can plug all those details into a computer and call it a model, but I'm not sure we'll understand much more as we do that.

So really, I think we're still waiting for networks to have a serious impact on the way we think about biology.