Monday, February 9, 2009

Coded Meaning

To represent anything at all in language, we need only the two symbols of binary math, the Zero and the One.

In this language zero is properly represented by 0, one by 1, but to say two we have to render it as 10, three as 11, and four already expands to three columns: 100. Such notations are tedious for humans, simple for machines. By arbitrary coding (as in the ASCII system), the A is rendered as 1000001, a number that, in decimal notation, is 65. This paragraph, rendered in binary code, would take up far more room than a single page; for us it would spell nonsense.

A single symbol communicates nothing. Two suffice to encompass the world. We like to say that 26 letters suffice us (in English at least), but the truth is that we routinely use many more. We use 28 common symbols, another 26 for lower case, and ten symbols for numbers. That’s 90 symbols for starters and still excludes such things as ¥, £, ©, ®; the Greek letters commonly reproduced; accented letters in upper and in lower case; and special symbols such as ±, ≠, ≤, ≥, and many others.

What set me off on this little meditation is thinking about the coding system, thus the language, in our bodies. It is a base-4 notation system with the symbols A, T, G, and C; they stand for the names of nucleotides that make up every strand of DNA—adenine, thymine, guanine, and cytosine. When thymine is used for coding, another base called uracil is substituted for it; beyond the DNA itself, thus outside the cellular library, the code thus consists of A, U, G, and C. Nothing’s simple.

Consider next the wondrous parsimony of nature’s bio-engineering. The proteins of the body are built up out of 20 amino acids. Each one of these is a combination of three, note three, of the bases A-U-G-C. The letters in this schema can repeat. UUU is one such acid; so is AGU; but sequence also matters; thus GUA differs in functionality from AGU or UGA.

If we calculate the possible combinations and permutations of the four letters, minding the sequence of occurrence as well, we get 64 combinations in total—no more, no fewer. Now the wondrous part of this is that 20 amino acids account for 60 of these clusters, leaving four without protein-building function. But hold on! It turns out that precisely four of these serve special purposes. One of them, AUG, always signals the beginning of a coding chain on the DNA itself. Thus the enzymes reading in the library—getting ready to build a new protein—always know where to start reading. The three other acids not actually used in protein, UAA, UAG, and UGA all serve as STOP codes (codons in the language of biology). They occur when the sequence stops coding for a protein. The enzymes, again, thus know when they’ve read far enough.

In this case the object of the game is obviously the building of proteins from 20 components, each made of three distinct parts. Four symbols are enough to render the structure necessary for this—along with what might be called the necessary punctuation marks.

A coded language, needless to say, requires an adequate reader, some entity able to recognize where to start and where to end. Our standard explanation for this is hopelessly muddled. Pure chance is assigned the agency for this intricate system—of which I’ve barely sketched an outline here. I’ve neglected mentioning ribonucleic acids (RNA) in their three forms as messenger, transfer, and ribosomal RNA. Never mind that now—or the mind-blowing complexity by means of which DNA strands are read, interpreted, and translated into proteins. But it is contemplations like this that have led to my hypothesis that deep beneath all bodies lives a chemical civilization. This will surely not be the last mention of that.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.