HGP10 Symposium: Fruits of the Genome Sequences for Society – David Botstein


David Botstein:
So I was asked to give a summary of the meeting, and I thought about giving a summary for the
meeting, and said, “Yeah, I could do that.” And a summary to the meeting is really useful
if the talks are really dead and nobody understands them, and then you have an opportunity to
help out. But the talks have been actually really good, and I anticipated that they might
be good. And as a content, many of them are in areas that I don’t actually do anything
in. And so I decided up front that I would try an experiment on myself, and this experiment
on myself was that I am obliged, not as often as Francis or the head of NHGRI is, but often
enough to give talks about what has the genome done for us? And I have one, which is sort
of updated from time to time, which is called this: The Fruits of the Genome for Society.
And I thought — and in it, I try to touch — because it’s really intended for a relatively
general audience — I try to touch on all of the issues, and I wondered, you know, how
that would work out if I just annotated what I normally do with what we heard today. And
I think you’ll see, I did beg some changes this morning when I realized how this was
going to go, and we’ll try this out as an experiment. It will be more fun for me, and
maybe more fun for you as well. So, of course, this whole business began in
pre-history. I don’t believe there’s anyone here who is on the Alberts [spelled phonetically]
Committee, which originally proposed that we have a genome project, except me, and I
was there. I was sort of a proto-opponent because I was afraid that all the money would
be sucked out of the system; and, actually, it’s come full circle because I am now afraid
that all the money is going to be sucked out of the system, as you’ll see as we go along,
right? Anyway, we were able — we, being the basic scientists on that committee, which
we were well represented — we were able to convince all hands that a) it was possible
to sequence the genome, even though at the time it was pretty daunting — not as daunting
as the brain initiative, because there are only 3×109 base pairs, and there are 1015
synapses; okay, so, plus or minus, an order of magnitude here or there, yes. Okay, but we could see our way to doing it
on the one hand, and on the other hand, it was absolutely clear that if we tried to do
it directly on the human, it would be a major league disaster for many technical reasons
which I’m not going to rehearse, okay, but which those of you who have a few grey hairs
will remember. Right, okay. So what we decided in the end was that we should do two things:
one, we should learn how to sequence on sequences that were much less difficult; and second,
we should try to begin right at the beginning to understand what the nucleotides that we
were sequencing were actually doing for the organism. And the two of those things ended
up in the idea of doing this series of organisms; first with the idea that we could understand
what we were doing, and that, in fact, we were rewarded, as we knew we would be because
there was enough data out there already, that basic cellular functions of all eukaryotes
are carried out by proteins and RNAs that are conserved up and down at least the eukaryotes,
and actually further. So the idea arose very early that what were
going to do was we were going to finish the dream of molecular biology to make all of
these connections real; that using genetics, which was the study of mutations, which, at
that time, in the human, was not a realistic possibility, but in the model organisms was,
in fact, the main order of business, could be related to the function. And the function
could be related by biochemistry to a protein. And proteins could be purified and various
things. And by molecular biology sequencing and analysis, we could connect the proteins
to the genes by the genetic code. And that was the idea. And, in fact, most of these
associations were made, and likely will continue to be made for the human by basic scientists
working with eukaryotic model systems, because what happens when you find a SNP that does
something interesting in a basic cellular function? Like — it’s very interesting. One of the talks, I believe it was Dana Farber,
Dr. Garraway, had the standard phenotype of cancer biology, which is that the Krebs cycle
goes anticlockwise, whereas the rest of us, it goes clockwise, okay? I don’t know when
this began, but it illustrates the — illustrates the sort of sociology of science. But the
IDH piece, right? How do we know anything about IDH, and how do we know what alpha ketoglutarate
is, and all the rest of that sort of stuff? I can leave that on set. Okay, so the intellectual impact of the genomic
view was a grand unification of biology, because, in fact, all these genes do do the same thing.
That Jacques Monod, who was — by the way, Jacob died — his partner — this week, and
the New York Times didn’t have an obituary, and I can’t understand it. So I wrote to them,
and they said, “Oh, we didn’t know he died.” And so I waited for the obituary to come out.
I guess they don’t read Le Figaro, but they apparently don’t read the Washington Post
either. So, anyway, Jacques has the last laugh because,
of course, what he said is true. And so the challenge for the future is to understand
not just the mechanisms at the individual process level, but the interactions among
all the processes. And I have to say, in my summary mode, that I was a little bit surprised
at how little concern there was, except in the context of GWAS and dark matter, about
how, you know, how are we going to figure out how these things really do interact? It’s
one thing to have correlation and anti-correlation. But I’m sorry, it’s very far from understanding
to say that A goes with B, and, you know, we’ll avoid giving A if B is present. I — you
know, that’s — maybe the NIH should have a higher standard than that. Okay. And then
genomics, of course, makes it possible to explore this higher level of interacting systems
as opposed — I mean, it’s not just organizing medical records that requires a lot of interaction
from many different areas. It’s all — the biology itself is that way. I mean, those
billions of years of evolution were there for a reason. So long before the genome was sequenced, or
any genome was sequenced, it was known that there’s a very high degree of similarity among
genes in eukaryotes, and this is — this was — is just to show that in yeast, you can
just put in the human gene, and you can restore function to the things where you lost function.
And so I come to, what are the fruits of the genome? Now, let me tell you how I decided
that something was a fruit of the genome. I decided when something was a fruit of the
genome if it — if, and only if, it had substantial penetrance into the society; that it was actually
in use; that it was not some promise, or some theory, or some hope. So when we come to pharmacogenomics,
which we will, I would have included cytochrome P450, because cytochrome P450 is actually
being used in real clinics, widely. It has real penetrance. I might not include some
of the other, you know, Atul Butte [spelled phonetically] things quite yet. Then there’s comparative genomics, which is
a really major thing. I no longer have to deal with the possibility that yeast is really
a prokaryote, which was written in the prominent Journal of Science by a very prominent Rockefeller
scientist — never mind. [laughter] Okay, of course, new comprehensive technologies,
of which there are many, not just the Illumina, but also all kinds of other technologies that
have become very widely deployed in society, not just medicine, but more generally, as
you’ll see. Uses of DNA sequence variation; nobody thought about sequence variation at
the beginning very much. Now, sequence variation has a very deep penetration. Functional genomics,
which I’m going to talk about a little bit; mainly, in my — the case I know about is
the subdivision of tumor subtypes, and that has now reached substantial penetration, and,
of course, DNA diagnostics. So these are the deliverables. They are not
what was promised. Francis Collins gave a talk which I really — in which he — very
early on, I think when he just took over, about how molecular biology was going to solve
all the problems of the world. And it was a great joke, and he got a big laugh out of
it. And I always remember that, because we have a very depressing tendency — we, as
a community — we have a very depressing tendency that when we’re being rigorous, and we are
writing, for example, a grant or a — or something, we are very conservative and straight-laced.
But if we’re talking to just people, or to journalists, or to donors, we’ll promise them
anything, okay? And there is — this stuff has a tendency to come back to bite us when
we do that. Okay. So the first deliverable is the quantitative
understanding of evolution from sequence. Lest we forget, there was a time when serious
people, even what Krootman [spelled phonetically] would call very serious people, okay, actually
thought that evolution was a theory in search of evidence, okay? What the genome sequences
have done is they have made this absolutely disappear, okay? So there are people who don’t
want to believe in evolution, but that no amount of evidence will convince them; I’m
talking about the serious people. So, what you look here is — this was done
by Darwin just after he stepped off the Beagle, and this is, I think, the great intuitive
insight, and here’s the real thing. Not bad, okay, for 1837. You may not know this, but
correlation had not been invented, clustering had not been invented, and, in fact, the idea
that the amount of distance would be the length of the line, as far as I know, Darwin was
the first one to use that metric. Okay, here we go. This is it. And the reason I love to
show this is because it looks like Darwin and his very rigorous “no root to the tree,”
and so forth and so on. It’s much easier to see everything in the more standard way, and
the important thing about this is to understand — oops — the circled part; that we are a
very small part of the biota. Okay? And that includes all the organisms I’m going to talk
about, right, the animals, and the fungi, and the humans. We’re all a little tiny part
of the business. This subset of that insight, which is very
important to society, and which I think really — for which Allan Wilson is probably most
responsible, but — which really has changed everyone’s view of society is that there’s
no longer any question about where we came from, and who’s most diverse, and how the
diversity spread. And that is, again, widely accepted, really for the good of society.
There’s no question that this is — having a realistic view of this is a good thing,
and this is just more of that. And now, I come to something a little bit
more technical, but actually more important. And — not more important than the origin
of humans in the sense of scientific importance, but more important in thinking about the future.
And this is a multiple alignment of genes from very many different organisms ranging
from bacteria to human, and what you see — the black parts are identity and the white parts
are near identity. And the — how do we know what this gene does? Well, this gene — the
parent of all of these sequences is a bacterial gene called mutS, and it is a protein that
repairs DNA; it recognizes mismatches and repairs DNA. And all of these other things
are mutS homologs. Okay, so that’s very good, and this — Jonathan
Eisen made this alignment quite some years ago, and he also was able to trace — looking
at the sequence in many bacteria, he was able to make a credible, and certainly correct,
inference that what has happened here is that the expectation of the evolution is that there
would be duplication and divergence. There was actually evidence for it, and if you look
at this side, you can see that in some of these genes, the blue gene has been lost in
some of these bacteria; and in other bacteria, the red genes have been lost. And so here
you have a living example of the whole idea of duplication and divergence. And especially
since by the time you get to eukaryotes, this is — you see that these things are — have
become much more complicated. The blue subset and the red subset have each generated subsets. And the interesting thing, of course, is — and
here’s the part where I depart from the rest of this meeting most strongly, because I’m
going to talk about function, about what these things do for the organism. And it turns out,
although they’re all mutS homologs, they do different things for the eukaryote. So MSH4
has to do with crossing over. MSH — mutS2 — MSH5 has to do — I can’t see it very well,
but they’re well-marked. One of these is mitochondrial, and one of these does mismatch repair of two
or three bases at a time, and so on. They have become specialists. The evolution has
been a specialization. David Kingsley showed you evolution in another context of organ
development, but his main evolutionary thing was dispensing with the organ when it wasn’t
required. That was on the previous slide. This is the history of evolution to produce
sub-functionalization, if you like, of an important function: DNA repair. So how do we extract functional information
from the human genome? Well, DNA polymorphism, SNPs, and haplotypes could tell you more about
function if and when it’s followed up. And in the meeting today, what was not talked
about was how you would follow up to know why it is that SNP so-and-so has something
to do with Crohn’s disease via the T cell, that all that in between mechanism stuff,
all the stuff that will validate the use of a drug, other than the, you know, crude thing
of do people live or do people die. Okay, all the mechanism that’s required to make
everybody comfortable with the use of the drug, we didn’t talk about. And that is a
huge lacuna, and that has to be filled in in the future. That’s just not possible for
us to go on with just crude correlation as our only guide. Simple Mendelian — somebody said today that
it’s 5,000 genes that have been found. The best numbers I could find were a bit less
than that, but that’s okay. Complex — everybody agrees that they are there, but it’s a little
hazy exactly how it’s all going to get worked out. The Broad Institute has a theory; other
institutes have other theories. It’s still not clear how that’s all going to play out,
but — and then there’s pharmacogenomics, which is just starting. I have to say, I believe
in all of these things, but in terms of actually having been delivered, the polymorphisms are
here; the simple Mendelian are here; the complex, there’s a little bit of action; and pharmacogenomics,
you heard, I think, an excellent summary. There are five or so things that are in general
use, and, what, 30 that you could imagine coming into use within the next few years.
So that’s a deliverable; that’s real. Okay, comparative genomics: I’ve already talked
a lot about this, and I’m not going to belabor it anymore. Patterns of gene expression: I’m
going to talk about that in a minute, because you didn’t hear a lot about that because,
I don’t know — I don’t know why, but there are deliverable things coming from that that
are very important, and I’ll tell you a little bit about that. And then finally, that whole
systems business, which I’m not going to tell you about; but, in fact, it’s the genome that
started a whole field of biology which seeks to understand how genes and proteins talk
to each other. And that is — has got to be the way of the future. You cannot understand
a locomotive on a — a diesel locomotive on the basis of knowing in excruciating detail,
at X-ray crystallography, at angstrom resolution, the structure of one wheel. It’s just not
going to be enough. And, in fact, knowing the structures of all the parts won’t get
you there. As David Kingsley said very nicely — he says, “From DNA sequence, you can’t
figure out what the organisms look like — looks like.” Okay, so, many years ago, there was the whole
business about using the introduction of a DNA sequence variation. My own history with
this is that I suggested you could follow genes this way, and you can. And many, many
genes have been found by linkage to adventitious polymorphisms that apparently don’t do anything
in the genome. These are the same kinds of polymorphisms that GWAS looks at, and — but
the discovery of the genes, because by genetics you can discover things that you know nothing
about, if you actually follow them up, you’ll learn a great deal. And here are a few deliverables. Huntington’s
disease opened the door to a huge class, as it turns out, of amplification of trinucleotide
repeat diseases, which the brain seems to have a specialty — special affinity for.
ALS, same issue, same kind of thing. Nobody actually thought seriously that oxygenation
in the blood — oxygenation in the brain was going to be a really — oxidative damage in
the brain was going to be a really serious issue, but that’s what that’s all about. BRCA1:
That was less surprising. Retinoblastoma was the — really the first time that the Knutzen
idea had real legs. And then I come to this. Now, I love these shows because their — every
plot is the same. So you can not watch it and have lots of background, okay, and the
plot is always the same. Nothing good happens, interviews and so forth. And somebody finds
an epithelial cell somewhere, and in a lab of unparalleled beauty and speed of action
— I mean, their mass spectrometers worked at light speed. Anyway, it comes out — it
all turns out to be DNA evidence. I have to tell you that at the time of the
Alberts — after the Alberts Committee, somewhere in the mid-’90s, I got a call from Bruce,
and we had a serious discussion about whether we could overcome some of the popular opposition
to the Genome Project by pointing out that if we really work on the Genome Project, we
can actually distinguish every individual from every other individual, and absolutely
nail criminals without doubt. And at that time I was very dubious as to whether the
general public wanted to be nailed without doubt, and didn’t go for this. And Bruce,
in the end, I think, didn’t go down this road. But the fact is that maybe the single most
important thing that we have contributed to society as a whole is this ability to identify
individuals by their remains, especially in war. Think about catching Osama Bin Laden.
How do we know it was Osama Bin Laden, right? Think about that. All right. And of course
everybody knows how this works in this audience. Now, I can’t — I’m trying — okay, so here
is another thing, which is the openness of communication, and especially the ability
to, with PCR and denomic [spelled phonetically] technologies to make progress in biology very
rapidly with very few missteps. So this is a figure from a paper published in ’93 by
a Finnish group, and what you’re looking at is a bunch of tumors and normal for individual
DNA markers. And these are just random DNA markers. And the thing to see is that the
tumors have many bands, and the normals have just two little families of bands. This is
a single locus, and this has more alleles. And these are tandem repeats of two or three
nucleotides. And so these guys — these dinucleotide repeat
polymorphisms, okay, and these guys suggested that maybe the tumors had acquired a loss
of function in DNA repair. And these guys, who work with yeast, read that paper, and
they said, “Boy, if that’s true, there ought to be such a thing in yeast.” And they set
up such a simple yeast screen to find such genes, and they did. And they sequenced the
gene, and they discovered that they are homologs of mutS; remember mutS, right? Okay, and they
are the nuclear homologs of mutS, and that was six months later. And three months after
that, okay, this paper came out. And, in fact, it turns out that the HNPCC gene is, in fact,
a MSH2; the mutations are loss of function, and 90 percent of all familial HNPCC have
mutations in one or two of these homologs — one or two of these homologs. Finally, I want to talk just very briefly
about gene expression, and the idea here is well-known to you. You can distinguish human
tissues, one from the other, by just looking at the intensity of the expression of 6,000
most variable genes. And then you can also do this with tumors, and you learn a bunch
of stuff. The most important thing that you learn at the time was that the then still
viable but doddering theory of d-differentiation by tumors was pretty much gone, because all
of these things are different tumors, and they all are more similar to each other than
they are to any other kind of tumor. It isn’t — we’re not going back anywhere to some primordium.
And also, we could see statistical substructure of tumor types, and that was reproducible.
I should say, we also learned a lot of bioinformatics along the lines here, about what — when support
vectors is maybe not the best way to go, and subdivision — the subtypes is more robust.
And also we learned that the prognosis of women with different subtypes of breast cancer
is different, okay? Regardless of what criterion you use for progression, or which criterion
— which method you used to do the typing, the — and also, what country the women are
from. None of these things make any difference. Now, of course, you can do a new kind of experiment,
and we are not doing anywhere near enough of this kind of experiment. And the experiment
is very simple, in concept. If we think there are four kinds of breast cancer, if women
who inherit a gene predisposing to one of these, they should only have one of the patterns.
And so you can do that just by looking for who among the women that we tested had BRCA1;
and it turns out that only the red subtype had BRCA1. This is what is now called the
“triple negative” type. Okay, now, there are lots and lots of cancer
genes, and I want to end with the following thing, which no one has mentioned until now,
which is that — and when you’re wanting to do a new treatment for cancer, then you have
to be careful on what population you test this. So, in the particular case of Genentech,
when they were — we were, actually — organizing trials for Herceptin, one of the things that
we said right up front was that we would only try this drug on women who had amplification
of the cognate receptor; that we would not do all comers. And what I’m showing you is
the actual clinical trial that the FDA eventually used to approve the drug for very advanced
cases of metastatic breast cancer, because that’s the way these things work. And this
is what would have happened if we had taken all comers: Herceptin would not have happened,
okay? I submit — even today, very many drugs fail
because the patients have not been sufficiently distinguished by their genotype and by their
expression phenotype, both of which are completely feasible today, and I don’t understand why
it isn’t being done. Okay? I do understand, unfortunately, some of it, but not much. And
the important thing to understand is that these women, the magenta women, are the women
who have untreated HER2 positive breast cancer. We did the trial in 2003; half of the women
who got it in 2003, which is now 10 years ago, are still alive, if they were treated
not after they had a zillion metastases, but were treated right after diagnosis and some
minimal amount of chemotherapy. Okay. So we can, in fact, do a much better job by fractionating
the patients. The extent to which these drugs work is really impressive. This is from the
FDA website for Gleevec, and this is for HER2. You can see how flat the — even back then,
the blue curve is getting. Okay, so, clinical applications: You’ve heard
all about this. I’m going to let you read it like a previous talk, and then I come with
the issues for the future. So personal genome is a predictor of health. You heard a wonderful
talk, I think, about what the issues there are. We really don’t understand a lot of what
goes on, and we need to know much more about this, and there ought — has to be more experiments.
Now, what kind of experiments do we need? Well, we need — the kind of experiments I
submit are the ones where we see the phenomenon in an organism which is more tractable than
the human, and hopefully the mouse. So the stickleback, because it has no prior prejudice,
is attractive, because you can figure this kind of stuff out because David and his guys
have figured out what is really a new model system, which is very, very high level in
the evolutionary tree. Zebrafish would be fine; Drosophila maybe, for some things, would
be fine okay? The second issue is how to reconcile interpretation
of DNA sequence by doctors and patients; been brought up before, but, in particular, we
don’t teach these guys any math. I’ll come back to that, all right? And then, of course,
the other issue is the actionability. It’s one thing to tell somebody that you have HER2
amplification, which is bad — the bad news is — the worst kind of breast cancer you
can have, except that we have a drug which makes it close to the best kind of cancer
you can have for half of you. That’s a deliverable, right, okay? Huntington’s disease: You’re
going to live to who-knows-what-age, and then you’re going to go nuts; that is not a deliverable,
okay? Nobody wants to know that, and we shouldn’t push it on anyone. Okay, biology and medicine are being transformed
into information science; you heard this many times. Nancy Cox said it, I think, most clearly,
but everybody said it to some extent. And everybody has a tendency to look at this computational
stuff as sort of, you know, handmaidens to the doctors, you know, we’ll write a website,
and it will make a heat map, and will tell the doctor what to do. With all due respect,
that’s not where I think we should go, okay? I think where we should go is we should understand,
just as Flexner said, you know, in 1917, that it would be good if all doctors knew some
biochemistry. I think the time’s come that it would good if doctors knew a little bit
more than elementary calculus which they learned in high school, which is pretty much the current
standard. Okay? And it can’t be — and this is true, by the way, also of molecular biologists.
We are having a lot of resistance, even at Princeton; not so much in our own program,
but in getting other folks to actually learn just to program a computer, which is, after
all, not rocket science. And the great majority of human genes are
not well-understood. And I think this — that, although people alluded to this, I would like
to reinforce this much more strongly. Somebody said, you know — I think it was Garraway
said, “Oh, yeah, this is SWItch/SNF.” Now SWItch/SNF are both yeast genes, okay? In
fact, most of the genes are — I could trace the origin. The human geneticists do their
best to camouflage the origins, but sometimes they don’t do it very well, okay? But at the
end of the day you see, you know, Wnt is wingless in drosophila. Now, if you find a gene and you’re honest,
what you’ll do is you’ll look in the databases; you’ll see that you’re looking at wingless;
and you will run, not walk, to the nearest Drosophila geneticist and say, “Can I put
my allele into your system and see what happens to the wings,” okay? Because that’s the quickest
and fastest way to find out whether you’re — we’re talking about a gene interaction
with some other thing — and on the other hand, right now, just as I said at the beginning,
my concern is — oh, boy. Oh, well. You read it already. I can’t do it. Okay. The — my biggest concern now is that in everybody’s
enthusiasm to translate what we don’t yet know to the bedside, we will stop learning
what we should know. Now, I am not saying that translational research is a bad idea;
on the contrary, I think I’ve done a lot more than my share of translational research. But
I do think that basic research is the only proven way of knowing what a gene does known
to us today that is both practical and ethical, okay? And the — so it’s really important
for all of you clinically-oriented guys — I appreciate what NHGRI has done to transform
itself into something that looks more relevant, and I agree with it. But you are the only
support for bioinformatics that means anything at the moment. And if NIGMS, for some reason,
gets really hard-hit, then there’ll be nobody to run to to see if the flies don’t have wings. And with that, I’ll thank you for the opportunity
to hold forth one more time at NIH, and have a good evening. [applause] Mark Guyer:
Thank you, David. Those of you who can remember as far back as this morning will remember
that Eric talked a lot about what was different between 2003 and today. Our next speaker is
going to tell you about what’s not different, and that is that we are still not in the post-genomic
era. So — and he will tell you why. Francis?

Leave a Reply

Your email address will not be published. Required fields are marked *