Monday, January 28, 2008

Sequencing 1000 Human Genomes - How Many Do We Really Need?

A group of the world's leading sequencing centers have announced plans to sequence 1000 human genomes. The cost of the first human genome project was about $3 billion; by comparison, the next 1000 will be a steal at possibly only $50 million dollars (and that's total cost, not per genome). But that's still a lot of money - why are we investing so much in sequencing genomes? It may be a lot up front, but the benefits, in terms of both economics and medical research, easily outweigh the cost of such a large project. By pooling sequencing resources and making large amounts of genome sequence data available up front, we can avoid inefficient and redundant sequencing efforts by groups of independent research groups trying discover gene variants involved in disease. In fact, it would probably be worthwhile to sequence 10,000 human genomes. With 1000 genomes, we're at least making a good start.

The 1000 genomes project comes at a time when new, genome-wide disease studies have created a need for an extensive catalog of human genetic variation, and new technology has provided a potentially efficient way to fill that need. Genome-wide association studies have scanned the genomes of thousands of people, sick and healthy, to find regions of DNA that may be involved in diseases like diabetes and heart disease. These studies have highlighted regions of our chromosomes that could be involved in disease, but the genome scans used are often still too low-resolution for researchers to pinpoint that exact genetic variant that might be involved. In many cases there dozens or more genes inside of a disease-linked region, any of which could be the disease gene of interest. As a result, groups that do these genome-wide studies have to invest considerable time and resources mapping genetic variants at higher resolution - which of course significantly increases the effort required to get anything useful out of genome-wide association studies. What we need is a much better, much more detailed catalog of all the spots where humans vary in their genomes.

Although we're all roughly 99.9% similar in our genomes, that still leaves millions of DNA positions where we vary. And often we don't just vary at single DNA base-pairs (instances where in one position you might have a 'T', while I have a 'C'); small deletions and insertions of stretches of DNA also exist, and can be involved in disease. The 1000 genomes project intends to map both single-base changes and these 'structural variants.'

Many genetic variants important in health show up in just 1%, 0.1%, or even 0.01% of the population. Imagine a medically important genetic variant present in just 0.1% of the U.S. population; 300,000 people will have this variant and will be possibly at risk for a particular disease. Knowing about such variants could help us understand how the disease develops, and possibly design prevention and treatment strategies. For 300,000 people in the U.S, and millions more around the world, it would therefore be a good thing to know about this variant. But to find those rare variants, we can't just sequence 100 human genomes, and even with 1000, we would miss a lot.

So it's obvious that 1000 genomes is just a start. To go more aggressively after rare variants that are likely to be medically important, the 1000 genomes group is going to also focus in on the small gene-containing fraction of the human genome. This will enable them to put more resources into finding rare variants near genes - places where we expect medically important variants to show up.

And let's not forget the technological benefits: by using next-generation sequencing technology, which is still not quite mature, this consortium hopes to develop innovative ways to effectively and cheaply re-sequence human genomes. In both physical technology and data analysis methods, we'll see benefits that will lead to more widespread use of this technology, hopefully in clinical diagnostics as well as research.

It has been nearly 20 years since the Human Genome Project officially began. We're still waiting for the promised medical benefits to emerge, but in the mean time, this effort has transformed biological research - in my opinion, at least as much as the foundational discoveries of molecular biology in the 50's and 60's. Future medical benefits will be based on this science.

4 comments:

BOB said...

QUESTION: IN THIS WAY OF IDENTIFING AND MAPPING THE GENETIC VARIANTS,AND WOULD ALLOW THOSE IN THE MEDICAL FIELD TO FOCUS ON THE AREA OF TREATMENT. AND NOW WITH THE NEW STEM CELL DEVELOPMENTS THIS PAST YEAR COULD BE CONSIDERED SOMEWHAT TIMELY WOULD IT NOT?

Mike said...

I don't think I understand your question, but it would be interesting to look at human sequence variants that are relevant to stem cell differentiation.

Don said...

A couple of questions for you...
1) it sounded as if there wasn't going to be medical records associated with the sequences, especially of the 1000 just getting exons sequenced, so how will this allow researchers to correlate phenotype with sequence? 2) even if this effort results in targets found for diseases, don't drug companies still need to develop methods of intervention (small molecules and such) that could and likely will take years?

Mike said...

Hi Don,

You're right that having phenotype information on these 1000 people would be better.

The best reason for sequencing 1000 genomes, with or without medical information, is to build our catalog of rare human variation. When we do medical genetic studies, we can't yet sequence the genomes of all the subjects in these studies - we just test them for 500,000 or 1 million (or soon more) different variants.

Rare variants, especially copy number variants, appear to play a significant role in human disease. If we don't know where those rare variants are, we can't test people for them in the large genetic studies people are doing.

So sequencing more genomes will help us design better genetic studies. But you're correct that it is still a long way from genetic studies to prevention, diagnosis, and treatments, but ultimately, I believe genetic studies will play a big role in developing treatments.

Mike