Conflict Of Interest

“Conflict Of Interest” is a sonification of my own personal genetic analysis. Written for the Endangered Guitar, a hybrid interactive software/guitar instrument, this data set controls sonic processing as well as 8-channel spatialization in realtime. The piece also uses the data to interfere with the performer’s intentions. My genetic data set consists of approx. 600,000 lines of genetic variations, and was cross-referenced with approx. 150,000 research articles from publicly available databases. Data that is not associated with health issues, but with “identity” as a largely arbitrary, socially and historically constructed concept, is coming up at unexpected places, disrupts the performance and pushes the piece in a new direction. The piece was premiered at Festival Sonfications – Audible Datastreams in Berlin in October 2017.

My approach to the topic is explained in the text below, which was published by Wolke Verlag:

Tammen, H: Conflict Of Interest – A Personal Genetic Data Sonfication, in: Sonifikation: Transfer ins Musikalische / Sonification: Transfer into Musical Arts, Program book for the Sonification Festival by BGNM, Hofheim: Wolke, 2017.

This text is from June 2017, so it does not include the latest developments, and some of the thoughts and criticisms I voiced on one of the panels during the festival. I added those at the end of this page.

Hans Tammen: Conflict Of Interest (For DNA Dataset)

A few years ago I downloaded my own DNA analysis from a genomics laboratory. Of course, I was interested to see if I’d die soon (luckily not), or where my ancestors came from (no new revelations here, and those who know me wouldn’t be surprised that the lab found 2.7% of my genes come from Neanderthals). However, I was more interested in making art with it. After all, the set contained over 600,000 lines of data, and that should be a gold mine for an artist interested in algorithms. As this festival will be the first opportunity to present Conflict of Interest to the public, it is worth detailing some of its background, even if at the point of writing this (June 2017) some of the details may eventually change.

The most interesting aspect of a person’s DNA data is that each line actually has a specific “meaning”, and cannot be treated as a simple continuous stream of bits & bytes.

Why does that matter? Projects that turn data into sound or visuals often suffer from the artist’s desire to find something inherent in the data structure that determines the aesthetics (which an old friend of mine – Hi, Martin! – once shrugged off as composers being too lazy to create original works). As a union technology consultant in the 1990s, I argued in front of German works councils (“Betriebsräte”) that data is just data, but becomes “information” as soon as it is interpreted by the company’s executives. I argued that an employee working in a call center logs out of their system to go to the bathroom more often than the average does not mean anything, but the way his/her superiors interpret this fact according to their own prejudices may have certain consequences for the employee. How data from workplace monitoring is interpreted depends on the company’s approach to power.

I continued this line of thinking as a residency program officer and consultant for artists who wanted to work with data sonification and visualization projects, trying to emphasize that it is the artist who makes aesthetic choices, not the data. A common pitfall is to simply connect different physical realities (composers like to use numbers mapped onto frequency or pitch), and then letting the system run without further involvement from the artist. If it is then accompanied by outlandish claims one ends up in New Age territory, such as “G# is the frequency of the moon”, as if revolving objects and sound waves are somehow the same. One just needs to multiply the number of days the moon revolves around the earth often enough to end up in the audio range. Supposedly, G# is good for sexual energy – no kidding, just google it.

There is of course nothing wrong with mapping one stream of numbers onto another – after all that is one thing a computer is good at. The point is, there is no “meaning” in numbers. The goal of the artist is to make connections freely, but not to make outlandish claims.

The lines in my DNA analysis include a variety of data. Probably best known is the genotype (a string of letters such as AGCTCTAGACTTGGCTAAAGCCAA…). Some genome sonification projects seem to focus on that one in particular, maybe because one can easily work with it as a continuous data stream of As, Gs, Cs and Ts. Too bad “T” is not a pitch, though.

But there is more to be found, and it is here where the aforementioned “meaning” comes into place. Each line contains
1. the RSID identifier for a genetic variation (SNP),
2. on which of the 23 chromosomes it is located,
3. its position on that chromosome,
4. the genotype.
This is how a typical line out of the 600,000 looks like:

rs147226614 - 1 - 878697 - GG

We don’t know what many of these genetic variations are good for. Some of them may be junk, some have been established as having a function within our bodies, others only work together in groups. It makes no sense to use any of that data for an artistic project without considering what we know about these gene variants specifically. I may also use additional information (e. g. on which of the 23 chromosomes it can be found, and its particular location on that chromosome), but first I need to examine what we do know about the SNPs.

Luckily, there are lots of studies to be found linking SNPs to certain traits or conditions. Even better though is that these can be found in publicly available research databases, and what I want to know can usually be found in the abstract or the keywords. In those databases SNPs are referenced by their RSIDs, including the location on which chromosome they were found. It shouldn’t be a problem to write a piece of software that cross-references RSIDs, plus location found in my data with those found in research articles, in order to see what we can do with that information. It may even be possible to do this analysis in real-time during performance, after I have determined what I am eventually looking for.

Most of the studies found in research databases are of course related to diseases and conditions, such as prostate cancer, Alzheimer’s, migraine or Parkinson’s. References, however, can also be found to more generic information such as height, bitter taste response, body mass index, or being a morning person. It is possible to see, for example, if a specific variant may be connected to suicidal behavior or alcohol dependencies. Many research projects reference the population surveyed. Oh, and there is a lot that I have no clue about.

So the first step is to cross-reference my DNA with the data available in these research databases, and only keep those records that match. It is at the next step, though, where the composer comes in: as soon as I make a decision regarding which keywords or strings of text to use, I am crossing the line from “data” to “information” – or to put it another way: where art begins. “Cleft lip” may not be part of my project, but there are many entries I could focus on. Diseases? Bodily features? Susceptibility to specific drug ingredients? What happens if the SNPs in my data set are referencing population segments I have no connection with, such as “Chinese psychiatric patients” or “North Indian Agrawal population”? It is also possible to use information I already have from the gene lab, information that cross-referencing research databases may not yield. This can all be further complicated by the fact that a single SNP may not alone be responsible for a specific trait, e. g. the information (for what it’s worth) that I’m not likely to have cheek dimples. The lab used 9 markers fed into their statistical model to determine that there is a 62% chance of me not having dimples and that 63% of Europeans don’t either.
Eventually, I should have clusters of information I can work with, grouped by their specific traits or research keywords. Groups of sounds – played, pre-recorded or generated – will be assigned to each cluster, but the number of SNPs associated with that cluster, its position on a specific chromosome, and allele (AA, AC, TC, etc.) may all be part of the algorithm that determines how it will be used.

For the playing, processing and spatialization of sounds, the plan is to use software from my Endangered Guitar project. Developed since the year 2000, and presented in hundreds of concerts, the “Endangered Guitar” is a hybrid interactive instrument meant to facilitate live guitar sound processing (explained in detail in this publication).

An actual guitar might, however, not be part of the performance. Over the years the software has been extended to allow for other pieces and projects – and is now able to process input from other sound sources, play and process samples stored on the computer’s hard-drive and use any external controller imaginable. It has become my general go-to piece of software for the creation of sound works, whether there’s a guitar connected or not.

As for the live processing of sounds, a cursory check shows that the system is currently set up to control approximately 100 parameters across its entirety. Of course, nobody could actually control 100 parameters consciously and in real-time. However, they are already set up and ready to go, so any incoming control value – in this case, from my DNA information clusters – could be routed to any of these parameters. The data is used to determine the sounds played and to control the parameters of the processing at the same time.

As a third layer, this data set might also be used to make it an interactive performance – the Endangered Guitar software is also an interactive system. 30 years ago, Joel Chadabe described an “interactive composing system” as one that “operates as an intelligent instrument – intelligent in the sense that it responds to a performer in a complex, not entirely predictable way adding information to what a performer specifies and providing cues to the performer for further actions” (Chadabe, J. (1984). Interactive Composing. Computer Music Journal.  VIII:1, 23). My software “listens” to the input and then determines the parameters of the electronics that process the same sounds, but responds in a flexible way – as I deliberately programmed unpredictable or “fuzzy” elements into the software.

While each performance features elements that were developed over decades of performance practice (the choice of the actual sonic material, order and timing of parts, how to transition from one part to the other, when to make sharp cuts, dynamics and other elements of the form), the software that interferes with the predictable order of events makes it an improvisation, in that it forces the performer to deal with unforeseen circumstances. In addition to the available strategies I can add my DNA dataset to influence the performance, increase the unpredictability of the software’s behavior and throw the performer off his (well, my) path.

(finished June 2017)

Update November 2017

  1. I ended up writing this work for the Endangered Guitar. Eventually I used only 18,000 genetic variations we knew something about (“non-junk DNA”), 3,500 variations that fell into the “identity” category, 150 variations used to determine ancestry, and 900 additional lines of data with general information about each chromosome. As this is an instrument for sonic progressions, these lines of data were not used to create pitches, but to influence the parameters of the live sound processing.
  2. All data points were mapped onto specific parameters, mostly depending on how “musical” they felt during rehearsals. Two data points (position on chromosome and alleles) were made available during performance to be routed to any parameter I wanted. Special attention was paid to parameters affecting the spatialization through the octophonic system. The software was stepping through the 23 chromosomes in 23 minutes, and every time an “identity” line came up, all parameters got randomized. If you were wondering about the numerous quiet moments, that’s where they came from.
  3. There is one problem with this approach. I insist that there is nothing in data that determines the aesthetics, so the composer is free to connect anything with anything. That means I am mapping data points to whatever I myself think works. The outcome then is (as it often became clear during the festival) not distinguishable from other works by the same composer. In case of the Endangered Guitar, a “genetically determined” performance sounds the same as a performance ” determined by audio analysis”. I intend to counter that by creating a laptop-only work, in which I’d work with materials created (either on the fly or pre-recorded) from other points from this data set. 23 different sounds, perhaps?
  4. As the routines are implemented anyway, the Endangered Guitar can now be controlled by audio analysis and genetic dataset: either, or, or both at the same time. Why not?
  5. Using my own DNA analysis may allow for syndication, too. It can be used for an installation or a data visualization piece. Or to break with the dogma of sonic progressions, I could write a work for chamber ensemble, in which I then have to think about pitch, rhythm, timbre, modes of attack, etc.

Hans Tammen