Snap, phenotype, genotype and fitness

Snapdragon,_small One of the main criticisms of the population genetic pillar of the modern evolutionary synthesis was that too often it was a game of “beanbag genetics”. In other words population geneticists treated genes as discrete independent individual elements within a static sea. R.A. Fisher and his acolytes believed that the average effect of fluctuations of genetic background canceled out as there was no systematic bias, and could be ignored in the analysis of long term evolutionary change. Classical population genetics focused on genetic variation as abstract elementary algebras of the arc of particular alleles (or several alleles). So the whole system was constructed from a few spare atomic elements in a classic bottom-up fashion, clean inference by clean inference. Naturally this sort of abstraction did not sit well with many biologists, who were trained in the field or in the laboratory. By and large the conflict was between the theoretical evolutionists, such as R. A. Fisher and J. B. S. Haldane, and the experimental and observational biologists, such as Theodosius Dobzhansky and Ernst Mayr (see Sewall Wright and Evolutionary Biology for a record of the life and ideas of a man who arguably navigated between these two extremes in 20th century evolution because of his eclectic training). With the discovery that DNA was the specific substrate through which Mendelian genetics and evolutionary biology unfolded physically from generation to generation a third set of players, the molecular biologists, entered the fray.

The details of genetics, the abstract models of theorists, the messy instrumentalism of the naturalists, and the physical focus of the molecular researchers, all matter. Through the conflicts between geneticists, some arising from genuine deep substantive disagreement, and some from different methodological foci, the discipline can enrich our understanding of biological phenomena in all its dimensions. Genomics, which canvasses the broad swaths of the substrate of inheritance, DNA, is obviously of particular fascination to me, but we can also still learn something from old fashioned genetics which narrows in on a few genes and their particular dynamics.

A new paper in PLoS Biology, Cryptic Variation between Species and the Basis of Hybrid Performance, uses several different perspectives to explore the outcomes of crossing different species, in particular the impact on morphological and gene expression variation. You’ve likely heard of hybrid vigor, but too often in our society such terms are almost like black-boxes which magically describe processes which are beyond our comprehension (hybrid vigor and inbreeding depression freely move between scientific and folk genetic domains). This paper attempts to take a stab at peeling pack the veil and gaining a more fundamental understanding of the phenomenon. First, the author summary:

A major conundrum in biology is why hybrids between species display two opposing features. On the one hand, hybrids are often more vigorous or productive than their parents, a phenomenon called hybrid vigor or hybrid superiority. On the other hand they often show reduced vigour and fertility, known as hybrid inferiority. Various theories have been proposed to account for these two aspects of hybrid performance, yet we still lack a coherent account of how these conflicting characteristics arise. To address this issue, we looked at the role that variation in gene expression between parental species may play. By measuring this variation and its effect on phenotype, we show that expression for specific genes may be free to vary during evolution within particular bounds. Although such variation may have little phenotypic effect when each locus is considered individually, the collective effect of variation across multiple genes may become highly significant. Using arguments from theoretical population genetics we show how these effects might lead to both hybrid superiority and inferiority, providing fresh insights into the age-old problem of hybrid performance.

There are various ways one presumes that hybrid vigor could emerge. One the one hand the parental lines may be a bit too inbred and therefore have a heavier than ideal load of deleterious alleles which express recessively. Since two lineages will likely have different deleterious alleles, crossing them will result in immediate complementation and masking of the deleterious alleles in heterozygote state. Another model is that two different alleles when combined in heterozygote state have a synergistic fitness effect. We generally know of heterozygote advantage in cases where there’s balancing selection, so that one of the homozygotes is actually far less fit than the other, but the fitness of the heterozygote is superior to both homozygotes. But that is not a necessity, and presumably there could be cases where both homozygotes are of equal fitness, but the heterozygote is of marginally greater fitness.

As for hybrid inferiority, a simple model for that is that lineages have co-adapted complexes of genes which are enmeshed in gene-gene networks. These networks are finely tuned by evolution and introduction of novel alleles from alien lineages may lead in destabilization of the sensitive web of interconnections. This model taken to an extreme is a scenario whereby speciation could occur if two lineages become mutually exclusive on a particular genetic complex which is “mission critical” to biological machinery (imagine that the gene involved in spermatogenesis is effected).

These stories are fine as it goes, but they do have something of an excessively ad hoc aspect. A little light on formalization and heavy on exposition. In this paper the authors aim to fix that problem. To explore genetic interactions in hybrids, and how they effect gene expression, they selected the genus Antirrhinum as their model. These are also known as “snapdragons.” Like many plants Antirrhinum species can hybridize rather easily across species barriers. They observe the effect of taking genes from a set of species and placing them in the genetic background of another. In particular they are focusing on A. majus, hybridizing it with a variety of other Antirrhinum species, as well as introgressing alleles from the other species onto a A. majus genetic background (so an allele on a specific gene is placed within the genome of A. majus).

Just as they focus on a specific genus of organism, so they also focus on a specific set of genes and the molecular and developmental genetic phenomenon associated with those genes. The genes are CYC and RAD, which are located near each other genomically, with CYC being a cis-acting regulator of RAD. In other words, CYC modulates the expression of RAD which is on the same chromosome. Variance in gene expression simply defines the concrete difference in levels of protein product. Mutant variants of CYC and RAD, cyc and rad, are created by insertion of transposons. Insertion of transposons can abolish gene expression, resulting in removal or alteration of function. What is that function? I’m rather weak on botanical morphology, so I’m going to be cursory on this particular issue lest a reader correct me strenuously for misapplication of terminology. So I’ll show you a figure:

snap1

I added the labels. C is basically what majus should look like, while G is a totally “ventralized” mutant. B and F approach wild type, but the other outcomes are more mixed. Note the genotypes in the small print. Table 1 measures the expression levels of the gene product for the various genotype:

journal.pbio.1000429.t001 (1)

Look at the first row; mutant variants of CYC which are nonfunctional reduce normal copies of RAD down to 20% levels of gene expression. That’s because CYC is a transcriptional regulator of RAD. The process is not reversed. RAD lacking functionality does not impact CYC (last row). Finally, the heterozygote states does result in reduced dosage of the gene product. Though the phenotypes might be closer to wild type than the mutant, the molecular expression of the gene is substantially changed. This is one of the issues which is always important to remember: the extent of dominance exhibited by a sequence of phenotypes consequent from a particular genotype may vary dependent on which phenotype you are a highlighting. On a molecular level there is incomplete dominance. Additive effects. On the level of exterior morphology there is more perceived dominance. This is not even addressing the issue of pleiotropy, where the same gene may have dominant and recessive expression on two different traits simultaneously in inverted directions (i.e., the recessively expressed allele in trait A may be dominant in B, and vice versa).

Figure 1 shows the different allelic expression levels in hybrids of Antirrhinum species. But what about the impact of the combinations on phenotype? I’ve reedited figure 4 so it fits better on this page:

snap2

Here’s the text description for the figure:

GEM spaces for CYC and RAD, showing location of various genotypes and species.
(A) Dorsalisation index for each position in GEM space using values from Table 1. Standard errors for DI_cor and expression levels are shown (if error bars are not visible, they are smaller than the symbols). A smooth surface has been fitted to the data (see Materials and Methods for details of surface fitting). Note that the wild-type, C, lies on a plateau while the double heterozygote, E, is on the slope. (B) Top view of the GEM space, incorporating the relative expression values from the species taken from Figure 1 (circles). These values were adjusted assuming that A. majus (red circle) is at position (1, 1) in gene expression space. Triangles indicate expected gene activity values in the double heterozygote (CYC = x×0.6; RAD = y×0.5; see Table 1E). Some of the double heterozygotes are predicted to have DI values above or below the position of A. majus. Triangles pointing upwards indicate species showing notch phenotype. (C) Enlargement of rectangle in (B). bra, A. braun-blanquetii; cha, A. charidemi; lat, A. latifolium; lin, A. linkianum; maj, A. majus; meo, A. meonanthum; pul, A. pulverulentum; str, A. striatum; tor, A. tortuosum; cha-BC, introgression of A. charidemi into A. majus background.

GEM = gene expression–morphology (GEM) space. As I note above the mapping between the manifestation of genetic variation on the molecular level and on the gross morphological level may be subtle. Figure 4 has the two genes under consideration forming a plane through the x and z-axes, while gross morphology is illustrated on y-axis. What’s on the y-axis is actually a principal component which serves as an abstract representation of the morphological variation of the petal structure illustrated in the earlier figure. They call it the “dorsalization index” (D_i). The wild type = 1 and the expressed mutant = 0. So the interval 0 to 1 in phenotype space is a good gauge as to the deviation of the morphology from wild type.

The letters in panel A are representations of the letters in the first and second figures within this post. G represents the double homozygote mutant. It stands to reason that its D_i is ~ 0. C, B, F, and to some extent E, form a “plateau” where gene expression may vary a fair amount but the morphology remains relatively stable. A, D, H and I represent intermediate cases on the “slope” where changes in genetic architecture produce large shifts in phenotype. The idea of dominance and recessiveness already indicate that not all genetic variation is created equal, and that there are non-linearities in the interaction of genetic variation and phenetic variation. Here using D_i and quantitative levels of gene expression one can take the verbal/qualitative insight and translate it into a quantitative relation.

Panel B seems to be similar to L. L. Cavalli-Sforza’s synthetic maps of PC variation of gene frequencies. It’s taking the y-axis in A and transforming it into the clinal grade on the plane. The circles in panel B represent conventional hybrids between A. majus and other species within its genus. There is variation in gene expression levels within these hybrids, but note that they reside on the phenotypic plateau. In contrast, the triangles show double heterozygotes: (CYC RAD)/(cyc cyc). The heterozygote combinations are for a variety of species, as indicated in the figure text. Note that they explore more of the phenotype type space, as evident in panel C, which is just an zoom of the rectangle in panel B.

So far they’ve shown that homozygote mutants abolish the wild type morphology, while heterozygotes of various combinations move over phenotype space. RAD’s expression is contingent on CYC, so that can explain some of the unpredictability of the variation when viewed in light of a simple qualitative model. Additionally, wild type hybrids move in the gene expression dimension, but not in the phenotype space. So next they looked at the impact of a particular species CYC and RAD genes against the majus genetic background in the doubly heterozygote state. In other words we’re not talking about a hybrid where half of the total genome content is from each parental species. We’re talking about introgression of alleles at a specific locus from species A to species B, so that the nature of the total genome content is of species B, except at a particular locus or set of loci, where they are from A. Figure 5 shows the results of such a cross:

journal.pbio.1000429.g005

The result of these studies show that alleles from A. charidemi are much more efficacious in maintaining wild type phenotype in the heterozygote state than A majus. This is because of underlying gene expression differences across species. Observe that CYC^char is particularly relevant because of the dependence on the RAD locus upon CYC in terms of gene regulation. The presence of charidemi and majus derived alleles on the same chromosome, so that cis--acting dynamics were operative, was achieved through recombination. A further exploration of the expression of each allele individually confirmed that CYC^char had a 30% higher expression than CYC^maj.

OK, so at this point we’ve examined the general topology of GEM. The relation between morphology and gene expression, the nature of the landscapes which describe their relationship. Next, we’ll move to GEF, gene expression–fitness (GEF) spaces. Genes/gene expression, phenotypes, and fitness, are the three-legged-stool of evolution, and specifically natural selection and adaptation. In a proximate sense the relationship between genes and phenotypes are physically mediated by a sequence of developmental pathways over life history. In an ultimate, evolutionary, sense, the relationship between genes and phenotypes are mediated by fitness, with variation in phenotypes over time being driven by variation in genotypes via the engine of fitness differentials. The distinction between evolutionary and non-evolutionary genetics, the abstract/theoretical and concrete/empirical, crops up with something like epistasis. On the one hand epistasis refers to physical relationships between genes. On the other hand it can also describe the variation in trait value which emerges from the interlocus interactions. And finally, it can refer to non-linear fitness effects due to combinations of alleles across loci.

In this case we’ve already seen how variation on the molecular level of gene expression due to genetic differences at two loci do not always translate into variation in morphology. The plateau in GEM space is simply due to the invariance in the morphological dimension. Once variance shows up you see the plane tilt and become steep. GEF space is exactly analogous, except that we are looking at variation in fitness on the y-axis. This is the domain of evolution, the ultimate. This section has only one figure:

journal.pbio.1000429.g007

GEM was based on concrete observation and experiment. GEF space is more theoretical, insofar as from what I can tell they didn’t measure fitness in actual lineages, but rather hypothesized distributions of fitness from parameters which might give us insight into hybrid vigor and/or breakdown. Red is obviously increased fitness and blue decreased. The surface of the landscape is simply where the gene expression values intersect with realized fitness. There are several alternative topologies here. I’ll quote the figure text:

Gene expression levels for two genes are plotted along the horizontal plane while fitness is along the vertical axis. (A) Radially symmetrical peak. (B) 2-D Projection of (A) showing location of effectively neutral zone and position of two parental genotypes (P1, P2 triangles), the resulting F1 (square) and additional genotypes observed in the F2 (diamonds). The F1 in this case is nearer to the centre of the peak while the F2s have similar fitness to the parents. (C) Diagonal ridge. (D) 2-D projection of diagonal ridge showing tilted elliptical neutral zone. The F1 is nearer to the peak than the parents but some F2 genotypes may now have lower fitness and fall outside the neutral zone. (E) Curved ridge. (F) 2-D projection of curved ridge showing banana-shaped neutral zone. Some F1 genotypes may have lower fitness and fall outside the neutral zone.

First, one has to introduce the concept of ‘drift load.’ A population of a particular genotype has an expected fitness, while in the best-of-all-world’s there is an idealized fitness peak. Random genetic drift will drive the population away from the peak because variance which shifts the gene frequencies from generation to generation. The power of drift to alter gene frequencies is inversely proportional to effective population size, N_e, the proportion of the population contributing to the genes of the next generation (often a rule of thumb is 1/3 of the census size, though this probably applies for human-scale organisms; usually it is much smaller than census). The drift load is the drag on fitness induced by drift, and is defined by the equation: ~1/(4N_e). In other words, as N → ∞ the drift load disappears because sample variance is eliminated. But this load is applicable at each loci, so if you sum up across many genes then small increments can produce a non-trivial fitness decrement simply due to the vicissitudes of generation-to-generation variance.

In the figure above the dark red zone is neutral. That means there’s no fitness variance. That’s the “fitness plateau” equivalent to the phenotypic plateau observed above. P1 and P2 are parental generations, different lineages. F1 are hybrids, while F2 are crosses of the hybrid generation. The deviation of P1 and P2 in all the panels from the center of the fitness plateau are indications of drift load. The shape and nature of the fitness plateau are critical in determining the outcomes for the F1 and F2 generation, and consequent vigor or breakdown. Geometrically you see the rationale for hybrid vigor in panel A and B, as F1’s are closer to the center of the fitness plateau as drift load is dampened on the cross. In the text the authors note that ‘the variance around the optimum of the mean of two independent populations is half that of either one, and so the “drift load” is half as great.’ So instead of ~1/(4N_e), you have ~1/(8N_e). This is a gain in fitness which can be substantial over many loci. Over 1,000 genes it would be a gain of 0.125, which is very large, and can explain heterosis. But as many farmers know the F2 generation often exhibits a regress in fitness. “Hybrids do not breed true.” In a Mendelian model some of the offspring of the hybrids will segregate the alleles so that homozygotes will reappear. In panel A the F2 have about the same fitness as the parental cases. In panel B this is not the case; new genotypic combinations presumably are produced which lay outside of the fitness plateau, and this leads to a major hybrid breakdown. In panel C the F1 are slightly below the parental populations in fitness, while the F2 are far below them.

In the discussion they then work back from the theoretical digression to its relevance to observed variation, and their particular model taxon:

The phenotype and fitness of species hybrids will reflect the extent to which these various GEF scenarios apply to the many thousands of genes in the genome. Radial or elliptical neutral domains, centred around a common position in GEF space, would be expected for loci that are under similar normalising selection in multiple environments. This situation likely applies to the CYC and RAD genes as all species in the Antirrhinum group have similar asymmetric closed flowers. It would also be expected for many loci controlling basic physiology and growth. F1 hybrids would therefore be expected to show higher fitness and increased performance with respect to these traits. This provides an explanation for hybrid vigour that avoids the pitfalls of previous models that require fixation of loci with major deleterious effects or that invoke special mechanisms for heterozygote advantage. A similar explanation has been proposed to account for the origin of hybrid vigour between domesticated inbred lines…Hybrid vigour is usually lost in F2s or recombinant inbred lines, indicating that many of the loci involved interact to give tilted rather than untilted neutral zones.
Although hybrid vigour is commonly observed for physiological traits, the overall fitness of species hybrids is often lower than that of the parents, with sterility or other dysgenic effects being observed. This observation may partly reflect adaptation to different environments and thus shifts in the shape of fitness surfaces that drive changes in genotype. However, it may also reflect loci that interact to give curved or L-shaped neutral zones…Such zones will be prevalent for traits that involve more complicated epistatic interactions, perhaps accounting for the dysgenic effects observed in F1s. The negative contribution of loci with curved neutral zones is likely to increase with time, as loci drift towards the extremities of the banana-shaped neutral domains.

Remembering that there is a possible association between cis-elements and physiological traits,. it is interesting to observe that one may be able to infer fitness landscapes from patterns of morphological and genetic variation. I don’t know how robust the generalizations above are, and obviously this particular paper is more about setting up a testable framework than validating that framework, but we’ve come a long way from “beanbag genetics.”

Citation: Rosas U, Barton NH, Copsey L, Barbier de Reuille P, & Coen E (2010). Cryptic Variation between Species and the Basis of Hybrid Performance. PLoS biology, 8 (7) PMID: 20652019

Image Credit: Wikimedia