Sociable Worms and The Genome

Sociable Worms and The Genome

With all the hoopla over the decoding of the human genome this

past week, I hardly have any other choice than to write about it,

do I? In fact, I just looked back in my files and find that I made a

rash promise in one of my April columns that I would make an

effort to understand and write about this genetic stuff by the end

of June. I hope you forgive me being late by a few days. In the

past, I”ve admitted to being much more comfortable with the

concepts of such things as black holes or encryption of data

transmission than with the complex structures and interactions of

genes, DNA, proteins, enzymes, etc. However, I”ll give it a try.

Let”s start with a simple question. What is a gene? I was

heartened to find in The Star Ledger, our New Jersey newspaper,

the following answer to this question: “Even scientists disagree on

exactly how to define it (a gene).” Another simple question.

How many genes are there? Answer in the Ledger: “Not even

scientists know. …… estimates range from about 34,000 to

140,000 in human DNA.” Elsewhere, I found estimates ranging

from 50,000 to 100,000. Well, already I”m feeling better. Even

scientists working in the field don”t seem to know what”s up!

O.K., that”s being unfair. We humans consist of trillions of cells

and each cell has a nucleus. Inside the nucleus there are 23 pairs

of chromosomes, each pair consisting of one chromosome from

the mother and one from the father. Each chromosome contains a

tightly packed string of DNA. If the strings of DNA in a human

cell were stretched out in a straight line, there would be about 6

feet of DNA (I”ve also seen a figure of 13 feet but whatever, it”s a

lot of DNA). You all know that a string of DNA is actually two

strands connected and entwined in the form of a double helix. If

you haven”t seen pictures of the DNA double helix, you”ve been

Rip Van Winkle for the past 47 years, the time since Watson and

Crick made their Nobel winning discovery. The double helix is

like a twisted ladder with each rung being a C-G, G-C, T-A or A-

T bond and nothing else. The letters A, T, C, and G stand for

four different chemicals called bases. We don”t really care about

their names and certainly not about their chemical formulas for

this discussion. The key point is that A only bonds to T and C

only bonds to G to form the rungs of the ladder. The sides of the

ladder are made up of sugars and phosphates but again, we don”t

have to burden ourselves with their names or compositions.

The human genome is the sum total of the DNA in the 23 pairs of

chromosomes in the nucleus of a single cell. The decoding of the

human genome simply involves determining the correct order of

the “letters” A, T, G and C in the strands of DNA. There is a

slight complication – there are over 3 billion letters in the genome!

Actually, the complete order of the letters has not been

determined as yet, but the important part of the job has been

accomplished, thanks to the competition spurred by a privately

financed effort (see Brian Trumbore”s Week in Review).

Actually, it may surprise you to know that the complete code of

the human genome could easily be stored on your computer”s

hard disk, assuming you haven”t already filled it up with games

and fancy software.

Let”s get back to the question of the gene. The gene is essentially

a piece of DNA, more precisely a sequence of the bases, or our

A, C, T, and G letters, along a strand of DNA. The “language”

of the gene can apparently be expressed as “words”, each word

being a sequence of 3 of our letters (bases). Let”s figure out how

many words can be formed in this sequence of 3 letters. Since we

have only a4-character alphabet of A, C, T, and G, the first

position in the sequence can be just one of those 4 letters. In the

next position we can again have 4 different letters so that we have

4 x 4 = 16 different combinations. Adding in the third letter in the

sequence, we have again 4 different possibilities so that when we

combine all three together we have the possibility of 4 x 16 = 64

different combinations, or words such as CCG, ATT, GAT, etc.

I thought it might be amusing to compare these 64 combinations

with what you get in a computer with 3 bits. With a bit, you only

have a 2-character alphabet, say a 1 or a 0. This says you have

only two possibilities for each of the 3 bits and hence 2 x 2 x 2 =

8 different combinations. Right away, you can see that by having

the choices of A, T, G and C, four different chemicals, for each

position, we”ve increased the possible combinations for 3-position

sequence by a factor of 8, 64 versus 8. When you hear

predictions that computers or robots will some day become

“human” in their capabilities, you can see that even on this small,

3-character scale the computer has to be 8 times as powerful as

its biological counterpart. I”m really not worried about robots

taking over the world anytime soon!

Now, if we take a sequence of these 3-letter words in the DNA

ladder, particularly if the sequence is a few hundred words long,

we can have tens of thousands of possible ”paragraphs” in our

book of life, the human genome. This paragraph of several

hundred words, one figure I”ve seen is 300 words, is what we call

a gene. The gene is really a set of instructions for the fabrication

and fate of a particular protein, or maybe more than one protein,

which might be used to construct a fingernail or a neuron in the

brain. My impression is that there is no fixed size for the number

of words in a gene but that 300 is sort of a typical or average

number.

How powerful can a change in a word be? In today”s New York

Times (July 2, 2000), there”s an article by Nicholas Wade quoting

some work done at the University of California in San Francisco.

There they study C. elegans, a favorite worm for many types of

biological studies. In certain parts of the world these worms,

when dining on their favorite bacteria, like their fellow worms to

join them. However, in other parts of the world, C. elegans

prefers to dine alone. It turns out that the 215th word of one of

the worm”s genes is spelled TTT in the gregarious worms, but in

the unsociable worms the spelling is GTT. That is, the social

behavior of that worm is determined by a simple substitution of a

G for a T. You can guess that the rest of Wade”s article dealt

with the profound implications of this finding for our own human

behavioral traits and, of course, the usual question of the

importance of nature versus nurture.

But back to our own gene, which contains all these words of

instruction. It seems that only about 3% of the 3 billion or so

letters of the human genome are tied up in the genes. The genes

are separated by the other 97% of the letters, which can

apparently be either garbage or else sequences of essential

instructions that today are by no means fully understood.

Well, we haven”t even touched upon the next step. That is, we

have our DNA and have seen how easily a gene can be encoded

with a set of complex instructions, thanks to its 4-letter alphabet.

How do these instructions get translated into forming our

protein? Here things get complex, with the entry into the picture

of another protein called messenger RNA, as well as hormones

and enzymes and ribosomes and all manner of complicated

processes. I”m not going to try to touch the rest of the story here

except to say that this messenger RNA cleverly manages to copy

the code from the DNA. Then the RNA transports the copy from

the nucleus to the region where the necessary ingredients and

other players (enzymes, hormones, other proteins, etc.) reside.

Here the instructions are acted upon and the protein is assembled

and sent on its way. Surprisingly, biological researchers actually

have a pretty detailed idea as to how this process works.

As pointed out in various media reports, while the monumental

task of determining the book of letters or words comprising the

human genome is essentially complete, the translation of the

words into their true meanings may take the rest of the 21st

century. Just a hint of the complexity of the work to be done can

be derived from knowing that if virtually all of the hundred

thousand genes are coded for making a protein, then we have to

worry about 100,000 proteins. Not so fast! Some genes code for

more than one protein and proteins interact to form other

proteins. Put them all together and there are billions of possible

protein interactions. And proteins are much more complex than

our A, C, G and T bases. In retrospect, the monumental task of

decoding the human genome may be viewed by our great

grandchildren as the easy part of the work on genetics and its

application to improving the quality of human existence.

Allen F. Bortrum