Comments on the general compositional process
The overall process of creating music from DNA emulates the process of converting DNA base sequences into proteins.
The composition process starts with the selection of a set of DNA bases from the NIH database. Usually, a segment of several thousand bases is used. The next step recognizes that the genetic code is made up of three-base triplets, the codons. Each codon specifies an amino acid. Since there are four bases and these are used three at a time, there are 64 possible codons. Several codons, however, specify the same amino acid. The result is a set of 20 possible amino acids. This means that it is possible to read the DNA base sequence as a sequence of amino acids. Instead of a sequence of just four types of items, it is a sequence of 20 different items.
The twenty amino acids can be grouped into those that have similar chemical properties. The usual grouping separates the amino acids into those that are acidic, basic, polar or non-polar. Two of these groups are quite small so they have been combined. A statistical analysis is then performed to find the most abundant amino acid in each of the three chemical groups. This amino acid will be assigned the note "middle C." The next most common amino acid in each group will get the next note up on the musical scale. This analysis is continued until all of the amino acids have been assigned a note within a chemical group. This is all done automatically by the software.
The following steps are part of the creative composition process.
Overall, the composition is assigned a musical scale and a tempo. Like all the creative decisions, this is done while listening to the composition and the choice can be changed at any stage in the composition process.
Each of the three chemical groups is assigned an instrument (and as a result, I'll call these groups "instruments" in the following discussion). At this stage, all of the notes are playing and much of the general structure of the composition is apparent. This doesn't mean that it has been easy to get to this point. There are dozens of instrument choices for each group. Since the instruments must work together, there are a lot of permutations that must be tested.
The composition plays each amino acid's note with an appropriate instrument. The sequence specified in the DNA is followed, but with the additional rule that the order of notes is arranged as the amino acids in the gene sequences in the DNA sequence. This means that notes in a sequence start with the "start codon" and tend with one of the "termination codons." These are the markers that define the length of a gene. In the DNA sequences that are used in these compositions, there are usually a number of genes.
Another compositional choice is the order in which the genes are played. This ranges from going sequentially along the DNA sequence from gene to gene to a random process where the composition jumps randomly from one gene to another. The choice can be anywhere between these two extremes of strictly linear to completely random.
There is a final, and major, compositional process. This controls the temporal sequencing of how the instruments (remember, these are the chemical groups) are heard. The software allows an instrument to be faded in or out. This means that the composition can begin with just those amino acids belonging to one chemical group being heard. Later, another chemical group builds in volume and even later, the final chemical group joins. This can change through out the composition. What it means is that individual notes remain faithful to the original DNA sequence. Whether you hear them, as a chemically related group, is up to the composer. The composer can only make large changes. There is no way to add or remove single notes.
The result of the composition process is to create changing patterns of notes that emphasizes some parts of the DNA for a while, and at other times show how the entire DNA sequence sounds. It is possible to change instrument assignments, and even other musical attributes, in the middle of a composition. It is up to the composer to decide if this is useful or not.
Viewed from a biological perspective, the general process of handling the DNA data is very similar to that which goes on in a cell as this same DNA sequence is used to synthesize a set of proteins. The same codon mapping is used, as are the codons that delimit the individual genes. What you hear in these compositions are series of intact genes, not as strings of amino acids, but as sequences of notes.
In discussing the composition process, John Dunn and I have concluded that we have been "blatantly programmatic." This means that we have worked hard to try to capture the spirit of the organism whose DNA we are using. For example, I see starfish as relatively simple and quite mechanical as they move across the substrate in search of clams and mussels. Tobacco leaves are somber factories that synthesize chemicals, including those that have proven to be dangerous. Yeast ferment sugars and produce alcohol. That can't help but evoke a party atmosphere. I have used the compositional tools, such as tempo and instrument selections, to render the DNA sequence into musical compositions that reflect these biological characteristics.
It doesn't always work. There have been a number of DNA sequences that have gotten weeks of composition effort and not yielded an interesting musical result. Other sequences have found a musical statement relatively rapidly, with just a few hours of work. Just because all of the notes are there (that is, there is a DNA sequence available), it doesn't guarantee that it will result in something that will be worth using.