Razib Khan One-stop-shopping for all of my content

December 21, 2012

The causes of evolutionary genetics

A few days ago I was browsing Haldane’s Sieve,when I stumbled upon an amusing discussion which arose on it’s “About” page. This “inside baseball” banter got me to thinking about my own intellectual evolution. Over the past few years I’ve been delving more deeply into phylogenetics and phylogeography, enabled by the rise of genomics, the proliferation of ‘big data,’ and accessible software packages. This entailed an opportunity cost. I did not spend much time focusing so much on classical population and evolutionary genetic questions. Strewn about my room are various textbooks and monographs I’ve collected over the years, and which have fed my intellectual growth. But I must admit that it is a rare day now that I browse Hartl and Clark or The Genetical Theory of Natural Selection without specific aim or mercenary intent.

R. A. Fisher

Like a river inexorably coursing over a floodplain, with the turning of the new year it is now time to take a great bend, and double-back to my roots, such as they are. This is one reason that I am now reading The Founders of Evolutionary Genetics. Fisher, Wright, and Haldane, are like old friends, faded, but not forgotten, while Muller was always but a passing acquaintance. But ideas 100 years old still have power to drive us to explore deep questions which remain unresolved, but where new methods and techniques may shed greater light. A study of the past does not allow us to make wise choices which can determine the future with any certitude, but it may at least increase the luminosity of the tools which we have iluminate the depths of the darkness. The shape of nature may become just a bit less opaque through our various endeavors.

Figure from “Directional Positive Selection on an Allele of Arbitrary Dominance”, Teshima KM, Przeworski M

So what of this sieve of Haldane? As noted at  Haldane’s Sieve the concept is simple. Imagine two mutations, one which expresses a trait in a recessive fashion, and another in a dominant one. The sieve operates by favoring the emergence out of the low frequency zone where stochastic forces predominate of dominantly expressing variants (i.e., even if an allele confers a large fitness benefit, at low frequencies the power of random chance may still imply that it is highly likely to go extinct). An example of this would be lactase persistence, which in the modal  Eurasian variant seems to exhibit dominance. The converse case, where beneficial mutations are recessive in expression suffer from a structural problem where their benefit is more theoretical than realized.

The mathematics of this is exceedingly simple, a consequence of the Hardy-Weinberg dynamics of diploid random mating organisms. Let’s use the gene which is implicated in variation in lactase persistence as an example, LCT. Consider two alleles, LP and LNP, where the former confers persistence (one can digest lactose sugar as an adult), and the latter manifests the conventional mammalian ‘wild type’ (the production of lactase ceases as one leaves the life stage when nursing is feasible). LP is clearly the novel mutant. In a small population it is not unimaginable that by random chance the frequency of LP rises to ~10%. What now? At HWE you have:

p2 + 2pq + q2 = 1, where q = LP allele. At ~10% the numbers substituted would be:

(0.90)2 + 2(0.90)(0.10) + (0.10)2

This is where dominance or recessive expression is highly relevant. The reality is that LP is a dominant trait. So in this population the frequency of LP as a trait would be:

(0.10)2 + 2(0.90)(0.10) = 19%

Now imagine a model where LP is favored, but it expresses in a recessive fashion. Then the frequency of the trait would equal q2, the homozygote LP-allele proportion. That is, 1%. Though population genetics is often constructed on an algebraic foundation, the results lend themselves to intuition. A structural parameter endogenous to the genetic system, dominant or recessive expression, can have longstanding consequences in terms of the likely trajectory of the alleles. Selection only “sees” the trait, so a recessive trait with sterling qualities may as well be a trait with no qualities. In contrast, a dominantly expressed allele can cut like a scythe through a population, because every copy “counts.”

In preparation for this post I revisited the selection on Haldane’s Sieve in the encyclopediac Elements of Evolutionary Genetics. The authors note that this phenomenon, though of vintage character as these things can be reckoned is a field as young as evolutionary genetics, is still a live one. The dominance of favored mutations in wild populations, or the recessive character of deleterious ones in laboratory stock, may reflect the different regimes which these two genes pools are subject to. The nature of things is such that is easier to generate recessive mutations than dominant ones (i.e., loss is easier than gain), so the preponderance of dominant variants in wild stocks subject to positive selective pressure lends credence to the idea that evolutionary rather than development forces and constraints shape the genetic character of many species.

And yet things are not quite so tidy. Haldane’s Sieve, and the framework of dominant versus recessive alleles, operates differently in the area of sex chromosomes. In many lineages there is a ‘heterogametic sex’ which carries only one functional chromosome for most of the genome. In mammals this is the male (XY), while in birds this is the female (ZW). As males have only one functional copy of most genes on the sex chromosome, the masking effect of recessive expression does not apply to them in mammals. This may imply that because of the exposure of many deleterious recessive variants to natural selection within the heterogametic sex one would see different allelic distributions and genetic landscapes on these chromosomes (e.g., more rapid adaptation because of the exposure of nominally recessive alleles in the heterogametic sex, as well as more purifying selection on deleterious variants). But the reality is more complex, and the literature in this area is somewhat muddled. More precisely, it seems phylogenetically sensitive. Validation of the theory in mammals founders once one moves to Drosphila.

And that is why research in evolutionary genetics continues. The theory stimulates empirical exploration, and is tested against it. Much of the formal theory of classical evolutionary genetics, which crystallized in the years before World War II, is now gaining renewed relevance because of empirical testability in the era of big data and big computation. This is an domain where the past is not simply of interest to historians. Scientists themselves, chasing the next grant, and producing the expected stream of publications, may benefit from a little historical perspective by standing upon the shoulders of giants.

April 2, 2011

The Genetical Theory of Natural Selection, free!

Long time readers are aware that one of my favorite books is R. A. Fisher’s The Genetical Theory of Natural Selection. It’s a touch on the spendy side for a slim, though dense, book. But looking for stuff that’s public domain for my Kindle I noticed that you can get the 1930 version free. And it’s not just in Kindle formats, you can get it in PDF as well and read it on your computer. I think of TGTNG as somewhat like the Critique of Pure Reason of evolutionary genetics; even if your inclination is to rubbish it, it’s an importance place to begin.

August 17, 2010

Genetics is One: Mendelism and quantitative traits


ResearchBlogging.orgIn the early 20th century there was a rather strange (in hindsight) debate between two groups of biological scientists attempting to understand the basis of inheritance and its relationship to evolutionary processes. The two factions were the biometricians and Mendelians. As indicated by their appellation the Mendelians were partisans of the model of inheritance formulated by Gregor Mendel. Like Mendel many of these individuals were experimentalists, with a rough & ready qualitative understanding of biological processes. William Bateson was arguably the model’s most vociferous promoter. Set against the Mendelians were more mathematically minded thinkers who viewed themselves as the true inheritors of the mantle of Charles Darwin. Though the grand old patron of the biometricians was Francis Galton, the greatest expositor of the school was Karl Pearson.* Pearson, along with the zoologist W. F. R. Weldon, defended Charles Darwin’s conception of evolution by natural selection during the darkest days of what Peter J. Bowler terms “The Eclipse of Darwinism”.** One aspect of Darwin’s theory as laid out in The Origin of Species was gradual change through the operation of natural selection upon extant genetic variation. There was a major problem with the model which Darwin proposed: he could offer no plausible engine in regards to mode of inheritance. Like many of his peers Charles Darwin implicitly assumed a blending model of inheritance, so that the offspring would be an analog constructed about the mean of the parental values. But as any old school boy knows the act of blending diminishes variation! This, along with other concerns, resulted in a general tendency in the late 19th century to accept the brilliance of the idea of evolution as descent with modification, but dismiss the motive engine which Charles Darwin proposed, gradual adaptation via natural selection upon heritable variation.

Mendels theory of inheritance rescued Darwinism from the problem of gradual diminution of natural selection’s raw material through the process of sexual reproduction. Yet due to personal and professional rivalries many did not see in Mendelism the salvation of evolutionary theory. Pearson and the biometricians scoffed at Bateson and company’s innumeracy. They also argued that the qualitative distinctions in trait value generated by Mendel’s model could not account for the wide range of continuous traits which were the bread & butter of biometrics, and therefore natural selection itself. Some of the Mendelians also engaged in their own flights of fancy, seeing in large effect mutations which they were generating in the laboratory an opening for the possibility of saltation, and rendering Darwinian gradualism absolutely moot.

There were great passions on both sides. The details are impeccably recounted in Will Provine’s The Origins of Theoretical Population Genetics. Early on in the great debates the statistician G. U. Yule showed how Mendelism could be reconciled with biometrics. But his arguments seem to have fallen on deaf ears. Over time the controversy abated as biometricians gave way to the Mendelians through a process of attrition. Weldon’s death in 1906 was arguably the clearest turning point, but it took a young mathematician to finish the game and fuse Mendelism and biometrics together and lay the seeds for a hybrid theoretical evolutionary genetics.

R._A._FischerThat young mathematician was R. A. Fisher. Fisher’s magnum opus is The Genetical Theory of Natural Setlection, and his debates with the American physiologist and geneticist Sewall Wright laid the groundwork for much of evolutionary biology in the 20th century. Along with J. B. S. Haldane they formed the three-legged population genetic stool upon which the Modern Neo-Darwinian Synthesis would come to rest. Not only was R. A. Fisher a giant within the field of evolutionary biology, but he was also one of the founders of modern statistics. But those accomplishments were of the future, first he had to reconcile Mendelism with the evolutionary biology which came down from Charles Darwin. He did so with such finality that the last embers of the debate were finally doused, and the proponents of Mendelism no longer needed to be doubters of Darwin, and the devotees of Darwin no longer needed to see in the new genetics a threat to their own theory.

One of the major issues at work in the earlier controversies was one of methodological and cognitive incomprehension. William Bateson was a well known mathematical incompetent, and he could not follow the arguments of the biometricians because of their quantitative character. But no matter, he viewed it all as sophistry meant to obscure, not illuminate, and his knowledge of concrete variation in form and the patterns of inheritance suggested that Mendelism was correct. The coterie around Karl Pearson may have slowly been withering, but the powerful tools which the biometricians had pioneered were just waiting to be integrated into a Mendelian framework by the right person. By 1911 R. A. Fisher believed he had done so, though he did not write the paper until 1916, and it was published only in 1918. Titled The Correlation Between Relatives on the Supposition of Mendelian Inheritance, it was dense, and often cryptic in the details. But the title itself is a pointer as to its aim, correlation being a statistical concept pioneered by Francis Galton, and the supposition of Mendelian inheritance being the model he wished to reconcile with classical Darwinism in the biometric tradition. And in this project Fisher had a backer with an unimpeachable pedigree: a son of Charles Darwin himself, Leonard Darwin.

You can find this seminal paper online, at the R. A. Fisher digital archive. Here is the penultimate paragraph:

In general, the hypothesis of cumulative Mendelian factors seems to fit the facts very accurately. The only marked discrepancy from existing published work lies in the correlation for first cousins. Snow, owning apparently to an error, would make this as high as an avuncular correlation; in our opinion it should differ by little from that of the great-grandparent. The values found by Miss Elderton are certainly extremely high, but until we have a record of complete cousinships measured accurately and without selection, it will not be possible to obtain satisfactory numerical evidence on this question. As with cousins, so we may hope that more extensive measurements will gradually lead to values for the other relationship correlations with smaller standard errors. Especially would more accurate determinations of the fraternal correlation make our conclusions more exact.

I have to admit at the best of times that R. A. Fisher can be a difficult prose stylist to follow. One might wish to add from a contemporary vantage point that his language has a quaint and dated feel which compounds the confusion, but the historical record is clear that contemporaries had great difficulty in teasing apart distinct elements in his argument. Much of this was due to the mathematical aspect of his thinking, most biologists were simply not equipped to follow it (as late as the 1950s biologists at Oxford were dismissing Fisher’s work as that of a misguided mathematician according to W. D. Hamilton). In the the text of this paper there are the classic jumps and mysterious connections between equations along the chain of derivation which characterize much of mathematics. The problem was particularly acute with Fisher because his thoughts were rather deep and fundamental, and he could hold a great deal of complexity in his mind. Finally, there are extensive tables and computations of correlations of pedigrees from that period drawn from biometric research which seem extraneous to us today, especially if you have Mathematica handy.

But the logic behind The Correlation Between Relatives on the Supposition of Mendelian Inheritance is rather simple: in the patterns of correlations betweens relatives, and the nature of variance in trait value across those relatives, one could perceive the nature of Mendelian inheritance. It was Mendelian inheritance which could explain most easily the patterns of variation across continuous traits as they were passed down from parent to offspring, and as they manifested across a pedigree. Early on in the paper Fisher observes that a measured correlation between father and son in stature is 0.5. From this one can explain 1/4 of the variance in the height across the set of possible sons. This biological relationship is just a specific instance of the coefficient of determination, how much of the variance in a value, Y (sons’ heights), you can predict from the variance in X (fathers’ heights). Correcting for sex one can do the same for mothers and their sons (and inversely, fathers and their daughters).*** So combing the correlations of the parents to their offspring you can explain about half of the variance in the offspring height in this example (the correlation is higher in contemporary populations, probably because of much better nutrition in the lower orders). But you need not constraint yourself to parent-child correlations. Fisher shows that correlations across many sorts of relationships (e.g., grandparent-grandchild, sibling-sibling, uncle-niece/nephew) have predictive value, though the correlation will be a function of genetic distance.

What does correlation, a statistical value, have to do with Mendelism? Remember, Fisher argues that it is Mendelism which can explain in the details patterns of correlations on continuous traits. There were peculiarities in the data which biometricians explained with abstruse and ornate models which do not bear repeating, so implausible were the chain of conjectures. It turns out that Mendelism is not only the correct explanation for inheritance, but it is elegant and parsimonious when set next to the alternatives proposed which had equivalent explanatory power. A simple blending model could not explain the complexity of life’s variation, so more complex blending models emerged. But it turned out that a simple Mendelian model explained complexity just as well, and so the epicycles of the biometricians came crashing down. Mendelism was for evolutionary biology what the Copernican model was for planetary astronomy.

To a specific case where Mendelism is handy: in the data Fisher noted that the height of a sibling can explain 54% of the variance of height of other siblings, while the height of parents can explain only 40% of that of their offspring. Why the discrepancy? It is noted in the paper that the difference between identical twins is marginal, and other workers had suggested that the impact of environment could not explain the whole residual (what remains after the genetic component). Though later researchers observe that Fisher’s assumptions here were too strong (or at least the state of the data on human inheritance at the time misled him) the big picture is that siblings have a component of genetic correlation which they share with each other which they do not share with their parents, and that is the fraction accounted for by dominance. When dominance is included in the equation heritability is referred to as the “broad sense,” while when dominance is removed it is termed “narrow sense.”

A concept such as dominance can of course be easily explained by Mendelism, at least formally (the physiological basis of dominance was later a point of contention between Fisher and Sewall Wright). Most of you have seen a Punnet square, whereby heterozygous parents will produce offspring in ratios where 50% are heterozygous, and 25% one homozygote and 25% another. But consider a scenario where one parent is a heterozygote, and the other a homozygote for the dominant trait. Both parents will express the same trait value, as will their offsprings. But, there will be a decoupling of the correlation between trait-value and genotype here, as the offspring will be genotypically variant. Parent-offspring correlations along the regression line become distorted by a dominance parameter, and so reduce correlations. In contrast, full siblings share the same dominance effects because they share the same parents and can potentially receive the same identical by descent alleles twice. Consider a rare recessively expressed allele, one for cystic fibrosis. As it is rare in a population in almost all cases where the offspring are homozygotes for the disease causing allele, both parents will be heterozygotes. They will not express the disease because of its recessive character. But 25% of their offspring may because of the nature of Mendelian inheritance. So there’s a major possible disjunction between trait values from the parental to offspring cohorts. On the other hand, each sibling has a 25% chance of expressing the disease, and so the correlation is much higher than that with the parents (who do not express disease). In other words siblings can resemble each other much more than they may resemble either parent! This makes intuitive sense when you consider the inheritance constraints and features of Mendelism in diploid sexual species. But obviously a simple blending model can account for this. What it can not account for is the persistence of variation. It is through the segregation of independent Mendelian alleles, and their discrete and independent reassortment, that one can see how variation would not only persist from generation to generation, but manifest within families as alleles across loci shake out in different combinations. A simple model of inheritance can then explain two specific phenomena which are very different from each other.

There is much in Fisher’s paper which prefigures later work, and much which is rooted in somewhat shaky pedigrees and biometric research of his day. The take home is that Fisher starts from an a priori Mendelian model, and shows how it could cascade down the chain of inferences and produce the continuous quantitative characteristics we see all around us. From the Hardy-Weinberg principle he drills down through the inexorable layers of logic to generate the formalisms which we associate with heritability, thick with variance terms. The Correlation Between Relatives on the Supposition of Mendelian Inheritance was a marriage between what was biometrics and Mendelism which eventually gave rise to population genetics, and forced the truce between the seeds of that domain and what became quantitative genetics.

As I said, the paper itself is dense, often opaque, and characterized by a prose style that lends itself to exegesis. But I find that it is often useful to see the deep logics behind evolution and genetics laid bare. Some of the issues which we grapple with today in the “post-genomic era” have their intellectual roots in this period, and Fisher’s work which showed that quantitative continuous traits and discrete Mendelian characters were one in the same. The “missing heritability” hinges on the fact that classical statistical techniques tell us that Mendelian inheritance is responsible for the variation of many traits, but modern statistical biology which has recourse to the latest sequencing technology has still not be able to crack that particular nut with satisfaction. Perhaps decades from now biologists will look at the “missing heritability” debate and laugh at the blindness of current researchers, when the answer was right under their noses. Alas, I suspect that we live in the age of Big Science, and a lone genius is unlikely to solve the riddle on his lonesome.

Citation: Fisher, R. A. (1918). On the correlation between relatives on the supposition of Mendelian inheritance Transactions of the Royal Society of Edinburgh

Suggested Reading: The Origins of Theoretical Population Genetics, R.A. Fisher: The Life of a Scientist, and The Genetical Theory of Natural Selection.

* Though I will spare you the details, it may be that the Galtonians were by and large more Galtonian than Galton himself! It seems that Francis Galton was partial was William Bateson’s Mendelian model.

** To be fair, I believe the phrase was originally coined by Julian Huxely.

*** Just use standard deviation units.

Image Credit: Wikimedia

Powered by WordPress