Maxwell’s Demon and Genetic Assimilation


Organisms of the same genotype raised in different environmental conditions will sometimes develop different phenotypes, a phenomenon with numerous examples called plasticity.  One of my most recent favorites are caterpillars raised in different seasons who adopt different camouflage patterns, either twigs or flowers to match the environment, in a diet-dependent manner.  Plasticity is important for adaptation, as you might imagine, because it determines the extent of variation available to an organism without genetic change.

Sometimes, an organism will adopt a plastic phenotype when entering a new environment.  If that new environment is invariant, it becomes beneficial for the organism to always express the plastic phenotype.  Over time, so the theory goes, the organism will lose the ability to switch between phenotypes (the plasticity), and only express the phenotype adapted to the newly invariant environment.  This process is called genetic assimilation, and is hotly debated as a pattern of evolutionary change.

The Cost of Plasticity

I would argue that the big question concerning assimilation is whether it is necessary, in the sense of being obligatory or requisite.  Does it have to occur?  One way proponents of assimilation argue that it is necessary is by invoking the idea that there must always be a cost to plasticity, or more broadly, genetic decision making.  No matter the environment, no matter the switch which mediates the plasticity, such a switch must impose a cost on the organism.

If there is such a cost, assimilation will be obviously favored in order to remove that cost when the need for plasticity is no longer present.

Maxwell’s Demon and the Cost of Making Choices

Now I want to turn to a topic which will seem at first entirely unrelated, but which I promise will become relevant.  In the late 1860s, James Maxwell, eminent physicist, conceived of a thought experiment which later came to be known as Maxwell’s Demon.  He postulated a box divided into equal halves, with a door between the halves, and filled with a gas at a constant temperature.  In this box lives a demon, that is to say a hypothetical entity, with the following modus operandi:

  • If the demon detects a fast moving particle, he opens the door to move it into one half of the box (the hot half).
  • If the demon detects a slow moving particle, he opens the door to move it into the other half of the box (the cold half).

Repeating this process many times ought to produce a box in which one half is entirely filled with hot gas and the other half with cold.  (A schematic of this mechanism can be found here.)

The trouble with this demon is that by making this box differ in its heat content by halves, he has just produced usable energy from nothing.  One could imagine a very large box being created in which energy from the hot half was used to drive a generator and create electricity.  If true, Maxwell’s Demon has violated thermodynamics and created energy where before there was none.

Like other thought experiments, the intent was not to break the laws of physics, but to figure out where our understanding of them erred.  Maxwell’s Demon stood up to scrutiny for a long time without being entirely understood, until Leo Szilard put forth an explanation which neatly squared the box’s generated energy such that it became just as costly for the demon to operate the box, yielding a net gain of 0.

Szilard’s insight was that the Demon must consume energy in determining which particles are moving at which speeds.  Szilard showed that the very act of measuring the state of a given particle consumes exactly as much energy as would be gained by moving that particle from one half of the box to the other.  In the long term, the gain of energy would be nothing, and in fact, owing to inefficiency in the demon’s operation, such a mechanism would in fact cost energy.  Thermodynamics confirmed.

The Cost of Information

Szilard’s insight has profound implications on the world.  He showed that the very act of gaining information about the environment necessarily has a cost.

Hopefully, the utility of Maxwell’s Demon is becoming evident.  One of the ways in which genetic assimilation is justified is through the cost of varying phenotype in response to the environment.  Critics of assimilation raise the question of whether there is necessarily a cost, but I think Maxwell’s Demon eloquently demonstrates that any measurement of the environment and decision thereon requires the expenditure of energy.

The analogy to Maxwell’s Demon is not perfect, because the cost of measuring a single particle’s motion is so insignificant as to be negligible in the context of most biological systems.  Despite this, I think there are two lines of evidence suggesting that Maxwell’s Demon is relevant in the context of assimilation.  The first is that the Demon operates at the very limits of efficiency, making perfect measurements of the world at exactly the cost necessary to make those measurements.  Real systems are less efficient, often much less efficient.  Nature is a better designer of mechanisms than us, but even natural systems are profoundly less efficient than the thermodynamic ideal.

Secondly, both the measurement and the subsequent decision are substantially more difficult for an organism.  An organism has to measure a whole slew of variables related to the environment to determine the correct phenotypic response, and often these variables are noisy and befuddling.  Furthermore, the organism has to propagate that measurement throughout a carefully built genetic regulatory network to result in the correctly chosen plastic phenotype.  Between the added complexity of plasticity and the fact that nature doesn’t operate anywhere near maximum thermodynamic efficiency, I believe Maxwell’s Demon is relevant to the cost of genetic assimilation.

More broadly, Szilard and much work after him has demonstrated that information is deeply connected with energy, although the details of that relationship are still to some extent unresolved.  To measure the world, to reliably transduce that signal to other parts of the organism, and to make decisions based upon the signal, must all be treated as energetically expensive acts.  For this reason, there is necessarily a cost to plasticity, and that cost likely makes genetic assimilation an under-appreciated pattern of evolutionary change.

Function and Sequence Conservation

“A recent slew of ENCyclopedia Of DNA Elements (ENCODE) Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is less than 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 − 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions or because no mutation in these regions can ever be deleterious.”

I want to unwind the argument that’s at the core of the abstract (above) from D. Graur’s now-famous evisceration of ENCODE.  He starts by quoting ENCODE’s notorious declaration that 80% of the genome is functional; he proceeds to contrast that figure with the widely-accepted, yet rough, estimate that purifying selection can be detected in around only 10% of the genome, and therefore concludes that the ENCODE figure must be wrong.  There’s a hidden assumption here which, while seemingly rational, falls apart under close scrutiny.  That assumption is that function must entail conservation within the genome.

The thesis of this post is that the relationship between sequence conservation and function need not be so straightforward.  There are some encodings of function in the genome which wouldn’t necessarily be expected to produce sequence conservation, even when the function of such DNA is under strong purifying selection.


A Positive Control

Consider the following computer simulation of some Wright-Fisher populations evolving under drift, mutation, and selection.  First, I made 100 sequences, with a rate of polymorphism of ~1%.  I then evolved this group under mutation and drift, subject to the selective constraint that a particular region—in this case, nucleotides 79-82—had to contain the word “ACGT”.

Allele frequency of ACGT

The preferred word “ACGT” arises at base pair 79 around the 300th generation, rapidly rises to near-fixation, and sticks around at a high frequency thereafter (it never fully fixes owing to my unrealistically high mutation rate, set thusly to minimize the number of generations my feeble computer must run the simulation).  This situation can be thought of as a positive control for the relationship between function and selection; there is exactly one configuration of four nucleotides at one particular spot in the 100bp genome which satisfies the selective constraints.  As a result, when one looks for selective constraint, nucleotides 79-82 can be easily detected as a spike in percent identity:


Or, if one prefers fancier metrics, try looking at smoothed phastCons scores:


So there! You might say, conservation equals function.


Not So Fast

But wait a second.  I promised that there are encodings of function which don’t require much conservation.  Here’s an example: consider that the same word, ACGT, must be in the sequence, but unlike last time, selection doesn’t care where in the sequence it is found; any place will do.  To anchor this to some real biology, consider a transcription factor which is able to affect its targets in a wide range of distances, and so is under little spatial constraint.

Although the word remains the same, its position is allowed to vary, and so vary it does.  One way to visualize this is as follow: on the x-axis, time is moving forward; on the y-axis is the position, in base pairs, within the 100bp genome I am simulating.  Colored dots indicate that a motif is present, and the size of the dot indicates in what frequency it is present.  (Dots below the red line, at -1, indicate sequences for which no motif was present.)


Motifs arise regularly, and disappear regularly too.  Because the word is sufficiently short (I’ll come back to this later), multiple haplotypes coexist in which the motif is found in different spots.  Eventually, even dominant haplotypes die off, and are replaced by other variations.  Chaos rules, entropy triumphs.  What’s more, sequence identity isn’t much of a guide to the location of the motif:


There’s multiple peaks here, none of which stand out particularly well.  Also note that the mean sequence conservation is much lower in this case than in the position-constrained case.

Arguably the most pronounced peak in the free-position curve is the one which occurs at the very end of the sequence, but, consulting our map of motif evolution, there was never even a motif there; any peak of sequence identity is therefore not the result of selection so much as chance.

One might argue that more sophisticated algorithms are needed to address this situation: enter phastCons.  Unfortunately, it performs basically no better:


This graph is one case where smoothing is deceptive, so here are the raw scores.


There’s exactly five spots with high phastCons score (encouraging!), and three of these show scores of exactly 1, representing perfect certainty that purifying selection is operating there (very encouraging!).  But let’s crush those hopes and dreams, because not a single one of those had a high frequency motif, as the following map shows (red spots indicate the base pairs with high [>.1] scores).


Motif Rearrangement as a Function of Motif Size

There’s one more point which I think is worth making.  The word I used above—ACGT—had to be perfectly matched, which means it had an information content of 8 bits (2 bits per base pair*4 base pairs).  A skeptic might say, however, that the motif replacements and reversals we observed here wouldn’t happen if the motifs were longer.  So I looked into that, doing similar experiments with randomized words of 5, 6, and 7 bp.

Here’s 5:


Here’s 6:


There’s still turnover happening for those two lengths.  Here’s 7:


Finally, at l=7, there’s no major motif turnover events.  Still, there are some proto-motifs which arise, as circled in red.  There’s no reason to believe that, given enough time, one of those wouldn’t eventually replace the dominant motif.  Indeed, if one is willing to wait long enough, that’s exactly what happens (note the longer generation time–3000 generations).



Function without Evidence of Purifying Selection

Given a sufficiently short motif and a sufficiently long time, one can and does observe cases in which there is function which manifests no obvious signs of purifying selection.  The crucial determinant of this phenomenon is when there are multiple adaptive peaks which are equally fit—in this case, whether a motif can arise anywhere in the sequence, or whether it is constrained to a particular location.  In the former case, motifs up to 7 nt (14 bits of entropy) can replace other motifs.

These motifs can be thought of as transcription factor binding sites.  For those with particular spatial requirements (e.g. must be exactly 20bp from the transcription start site), selection might be easy to discover.  On the other hand, TFBSs with variable spacing requirements will resemble the cases discussed later, in which motif turnover happens frequently.

Of course, none of these simulations demonstrates that motif rearrangement actually does occur in nature.  It would be difficult to observe this phenomenon directly, as it would require sampling multiple timepoints over generations of time (Lenski-style), as well as knowing what the specific TFBSs are, in order to examine their constraint.

Still, there’s a few lines of evidence, both empirical and theoretical, which suggest that the scenario of motif turnover is not altogether unrealistic.  For starters, TFBSs in Drosophila have an average information content of ~12.1 bits, roughly similar to that found in the case of my 6 bp words, for which plenty of motif turnover can be observed.  Second, lest you argue that my unrealistically simple model was, ahem, unrealistically simple, a much more advanced and complex model largely confirms my results (albeit in a much more advanced and complex way [Bullaughey 2012]).

But third, and most importantly, conservation of function without overt signs of sequence constraint has been observed empirically on numerous occasions (see: here, here, or here).  Although we can’t exactly reconstruct how those occasions came about, we can say definitively that the relationship between function and the classic signs of purifying selection is not as clear as it superficially appears.  Whereas sequence conservation seems often to co-occur with function (but see the notorious ultraconserved elements), function need not be associated with the most overt signs of purifying selection.



One thing worth noting here is that conservation of function is associated with some signs of purifying selection on the variable position sets… they just aren’t obvious ones.  Consider that, following selection, fully 74 of the 100 sequences have the selected word, “ACGT”—whereas, the average for 100 randomly selected 4-letter words is a mere 32 sequences, a highly significant difference.  This result hints that there are some breadcrumbs to follow, should one look carefully enough for selection.