Sociable Worms and The Genome
With all the hoopla over the decoding of the human genome this
past week, I hardly have any other choice than to write about it,
do I? In fact, I just looked back in my files and find that I made a
rash promise in one of my April columns that I would make an
effort to understand and write about this genetic stuff by the end
of June. I hope you forgive me being late by a few days. In the
past, I''ve admitted to being much more comfortable with the
concepts of such things as black holes or encryption of data
transmission than with the complex structures and interactions of
genes, DNA, proteins, enzymes, etc. However, I''ll give it a try.
Let''s start with a simple question. What is a gene? I was
heartened to find in The Star Ledger, our New Jersey newspaper,
the following answer to this question: "Even scientists disagree on
exactly how to define it (a gene)." Another simple question.
How many genes are there? Answer in the Ledger: "Not even
scientists know. ...... estimates range from about 34,000 to
140,000 in human DNA." Elsewhere, I found estimates ranging
from 50,000 to 100,000. Well, already I''m feeling better. Even
scientists working in the field don''t seem to know what''s up!
O.K., that''s being unfair. We humans consist of trillions of cells
and each cell has a nucleus. Inside the nucleus there are 23 pairs
of chromosomes, each pair consisting of one chromosome from
the mother and one from the father. Each chromosome contains a
tightly packed string of DNA. If the strings of DNA in a human
cell were stretched out in a straight line, there would be about 6
feet of DNA (I''ve also seen a figure of 13 feet but whatever, it''s a
lot of DNA). You all know that a string of DNA is actually two
strands connected and entwined in the form of a double helix. If
you haven''t seen pictures of the DNA double helix, you''ve been
Rip Van Winkle for the past 47 years, the time since Watson and
Crick made their Nobel winning discovery. The double helix is
like a twisted ladder with each rung being a C-G, G-C, T-A or A-
T bond and nothing else. The letters A, T, C, and G stand for
four different chemicals called bases. We don''t really care about
their names and certainly not about their chemical formulas for
this discussion. The key point is that A only bonds to T and C
only bonds to G to form the rungs of the ladder. The sides of the
ladder are made up of sugars and phosphates but again, we don''t
have to burden ourselves with their names or compositions.
The human genome is the sum total of the DNA in the 23 pairs of
chromosomes in the nucleus of a single cell. The decoding of the
human genome simply involves determining the correct order of
the "letters" A, T, G and C in the strands of DNA. There is a
slight complication - there are over 3 billion letters in the genome!
Actually, the complete order of the letters has not been
determined as yet, but the important part of the job has been
accomplished, thanks to the competition spurred by a privately
financed effort (see Brian Trumbore''s Week in Review).
Actually, it may surprise you to know that the complete code of
the human genome could easily be stored on your computer''s
hard disk, assuming you haven''t already filled it up with games
and fancy software.
Let''s get back to the question of the gene. The gene is essentially
a piece of DNA, more precisely a sequence of the bases, or our
A, C, T, and G letters, along a strand of DNA. The "language"
of the gene can apparently be expressed as "words", each word
being a sequence of 3 of our letters (bases). Let''s figure out how
many words can be formed in this sequence of 3 letters. Since we
have only a4-character alphabet of A, C, T, and G, the first
position in the sequence can be just one of those 4 letters. In the
next position we can again have 4 different letters so that we have
4 x 4 = 16 different combinations. Adding in the third letter in the
sequence, we have again 4 different possibilities so that when we
combine all three together we have the possibility of 4 x 16 = 64
different combinations, or words such as CCG, ATT, GAT, etc.
I thought it might be amusing to compare these 64 combinations
with what you get in a computer with 3 bits. With a bit, you only
have a 2-character alphabet, say a 1 or a 0. This says you have
only two possibilities for each of the 3 bits and hence 2 x 2 x 2 =
8 different combinations. Right away, you can see that by having
the choices of A, T, G and C, four different chemicals, for each
position, we''ve increased the possible combinations for 3-position
sequence by a factor of 8, 64 versus 8. When you hear
predictions that computers or robots will some day become
"human" in their capabilities, you can see that even on this small,
3-character scale the computer has to be 8 times as powerful as
its biological counterpart. I''m really not worried about robots
taking over the world anytime soon!
Now, if we take a sequence of these 3-letter words in the DNA
ladder, particularly if the sequence is a few hundred words long,
we can have tens of thousands of possible ''paragraphs'' in our
book of life, the human genome. This paragraph of several
hundred words, one figure I''ve seen is 300 words, is what we call
a gene. The gene is really a set of instructions for the fabrication
and fate of a particular protein, or maybe more than one protein,
which might be used to construct a fingernail or a neuron in the
brain. My impression is that there is no fixed size for the number
of words in a gene but that 300 is sort of a typical or average
How powerful can a change in a word be? In today''s New York
Times (July 2, 2000), there''s an article by Nicholas Wade quoting
some work done at the University of California in San Francisco.
There they study C. elegans, a favorite worm for many types of
biological studies. In certain parts of the world these worms,
when dining on their favorite bacteria, like their fellow worms to
join them. However, in other parts of the world, C. elegans
prefers to dine alone. It turns out that the 215th word of one of
the worm''s genes is spelled TTT in the gregarious worms, but in
the unsociable worms the spelling is GTT. That is, the social
behavior of that worm is determined by a simple substitution of a
G for a T. You can guess that the rest of Wade''s article dealt
with the profound implications of this finding for our own human
behavioral traits and, of course, the usual question of the
importance of nature versus nurture.
But back to our own gene, which contains all these words of
instruction. It seems that only about 3% of the 3 billion or so
letters of the human genome are tied up in the genes. The genes
are separated by the other 97% of the letters, which can
apparently be either garbage or else sequences of essential
instructions that today are by no means fully understood.
Well, we haven''t even touched upon the next step. That is, we
have our DNA and have seen how easily a gene can be encoded
with a set of complex instructions, thanks to its 4-letter alphabet.
How do these instructions get translated into forming our
protein? Here things get complex, with the entry into the picture
of another protein called messenger RNA, as well as hormones
and enzymes and ribosomes and all manner of complicated
processes. I''m not going to try to touch the rest of the story here
except to say that this messenger RNA cleverly manages to copy
the code from the DNA. Then the RNA transports the copy from
the nucleus to the region where the necessary ingredients and
other players (enzymes, hormones, other proteins, etc.) reside.
Here the instructions are acted upon and the protein is assembled
and sent on its way. Surprisingly, biological researchers actually
have a pretty detailed idea as to how this process works.
As pointed out in various media reports, while the monumental
task of determining the book of letters or words comprising the
human genome is essentially complete, the translation of the
words into their true meanings may take the rest of the 21st
century. Just a hint of the complexity of the work to be done can
be derived from knowing that if virtually all of the hundred
thousand genes are coded for making a protein, then we have to
worry about 100,000 proteins. Not so fast! Some genes code for
more than one protein and proteins interact to form other
proteins. Put them all together and there are billions of possible
protein interactions. And proteins are much more complex than
our A, C, G and T bases. In retrospect, the monumental task of
decoding the human genome may be viewed by our great
grandchildren as the easy part of the work on genetics and its
application to improving the quality of human existence.
Allen F. Bortrum