Razib Khan One-stop-shopping for all of my content

October 21, 2012

Buddy can you spare a selective sweep

The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.

As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:

Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending ...

March 9, 2012

Natural selection and dopamine receptor genes

Filed under: Genetics,Genomics,Selection — Razib Khan @ 6:50 am

Long time readers will be familiar with the large literature in behavior genetics/genomics and dopamine receptor genes. So with that, I point you to a paper exploring the patterns of variation and their relationship to possible natural selection, No Evidence for Strong Recent Positive Selection Favoring the 7 Repeat Allele of VNTR in the DRD4 Gene:

The human dopamine receptor D4 (DRD4) gene contains a 48-bp variable number of tandem repeat (VNTR) in exon 3, encoding the third intracellular loop of this dopamine receptor. The DRD4 7R allele, which seems to have a single origin, is commonly observed in various human populations and the nucleotide diversity of the DRD4 7R haplotype at the DRD4 locus is reduced compared to the most common DRD4 4R haplotype. Based on these observations, previous studies have hypothesized that positive selection has acted on the DRD4 7R allele. However, the degrees of linkage disequilibrium (LD) of the DRD4 7R allele with single nucleotide polymorphisms (SNPs) outside the DRD4 locus have not been evaluated. In this study, to re-examine the possibility of recent positive selection favoring the DRD4 7R allele, we genotyped HapMap subjects for DRD4 VNTR, and conducted several neutrality tests including long range haplotype test and iHS test based on the extended haplotype homozygosity. Our results indicated that LD of the DRD4 7R allele was not extended compared to SNP alleles with the similar frequency. Thus, we conclude that the DRD4 7R allele has not been subjected to strong recent positive selection.

In that vein, I also stumbled upon this paper recently, Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing:

Applying all commonly used neutrality-test statistics for allele frequency distribution to the newly generated sequence data provided conflicting results regarding evidence for positive selection. Previous haplotype-based findings could not be clearly confirmed. Although some tests were marginally significant for some populations and genes, none of them were significant after multiple-testing correction. Combined P values for each gene-population pair did not improve these results. Application of Approximate Bayesian Computation Markov chain Monte Carlo based to these sequence data using a simple forward simulator revealed broad posterior distributions of the selective parameters for all four genes, providing no support for positive selection. However, when we applied this approach to published sequence data on SLC45A2, another human pigmentation candidate gene, we could readily confirm evidence for positive selection, as previously detected with sequence-based and some haplotype-based tests.

Please note that they didn’t check for selection at SLC24A5. This probably would yielded some evidence of selection.

Both papers are open access, so I invite readers to take a look for themselves.

June 7, 2011

Why rice is so nice

The Pith: What makes rice nice in one varietal may not make it nice in another. Genetically that is….

Rice is edible and has high yields thanks to evolution. Specifically, the artificial selection processes which lead to domestication. The “genetically modified organisms” of yore! The details of this process have long been of interest to agricultural scientists because of possible implications for the production of the major crop which feeds the world. And just as much of Charles Darwin’s original insights derived from his detailed knowledge of breeding of domesticates in Victorian England, so evolutionary biologists can learn something about the general process through the repeated instantiations which occurred during domestication during the Neolithic era.

A new paper in PLoS ONE puts the spotlight on the domestication of rice, and specifically the connection between particular traits which are the hallmark of domestication and regions of the genome on chromosome 3. These are obviously two different domains, the study and analysis of the variety of traits across rice strains, and the patterns in the genome of an organism. But they are nicely spanned by classical genetic techniques such as linkage mapping which ...

April 24, 2011

The evolutionary effect of the sky gods

ResearchBlogging.orgLast week I reviewed ideas about the effect of “exogenous shocks” to an ecosystem of creatures, and how it might reshape their evolutionary trajectory. These sorts of issues are well known in their generality. They have implications from the broadest macroscale systematics to microevolutionary process. The shocks point to changes over time which have a general effect, but what about exogenous parameters which shift spatially and regularly? I’m talking latitudes here. The further you get from the equator the more the climate varies over the season, and the lower the mean temperature, and, the less the aggregate radiation the biosphere catches. Allen’s rule and Bergmann’s rule are two observational trends which biologists have long observed in relation to many organisms. The equatorial variants are slimmer in their physique, while the polar ones are stockier. Additionally, there tends to be an increase in mean mass as one moves away from the equator.

But these rules are just general observations. What process underlies these observations? The likely culprit would be natural selection of course. But the specific manner in which this process shakes out, on both the organismic and genetic level, still needs to be elucidated ...

October 11, 2010

Natural selection in our time

Filed under: Adaptation,Biology,Environment,Evolution,Genetics,Selection — Razib Khan @ 12:35 am

Last month in Nature Reviews Genetics there was a paper, Measuring selection in contemporary human populations, which reviewed data from various surveys in an attempt to adduce the current trajectory of human evolution. The review didn’t find anything revolutionary, but it was interesting to see where we’re at. If you read this weblog you probably accept a priori that it’s highly unlikely that evolution “has stopped” because infant mortality has declined sharply across developed, and developing, nations. Evolution understood as change in gene frequencies will continue because there will be sample variance in the proportions of given alleles from generation to generation. But more interestingly adaptive evolution driven by change in mean values of heritable phenotypes through natural selection will also continue, assuming:

1) There is variance in reproductive fitness

2) That that variance is correlated with a phenotype

3) That those phenotypes are at all heritable. In other words, phenotypic variation tracks genotypic variation

Obviously there is variance in reproductive fitness. Additionally, most people have the intuition that particular traits are correlated with fecundity, whether it be social-cultural identities, or personality characteristics. The main issue is probably #3. It is a robust finding for example that in developed societies the religious tend to have more children than the irreligious. If there is an innate predisposition to religiosity, and there is some research which suggests modest heritability, then all things being equal the population would presumably be shifting toward greater innate predisposition toward religion as time passes. I do believe religiosity is heritable to some extent. More precisely I think there are particular psychological traits which make supernatural claims more plausible for some than others, and, those traits themselves are partially determined by biology. But obviously even if we think that religious inclination is partially heritable in a biological sense, it is also heritable in the familial sense of values passed from one generation to the next, and in a broader cultural context of norms imposed from on high. In other words, when it comes to these sorts of phenotypic analyses we shouldn’t get too carried away with clean genetic logics. In Shall the Religious Inherit the Earth? Eric Kaufmann notes that it is in the most secular nations that the fertility gap between the religious and irreligious is greatest, and therefore selection for religiosity would be strongest in nations such as Sweden, not Saudi Arabia. But as a practical matter biologically driven shifts in trait value in this case pales in comparison to the effect of strong cultural norms for religiosity.

Below are two of the topline tables which show the traits which are currently subject to natural selection. A + sign indicates that there is natural selection for higher values of the trait, and a – sign the inverse.  An s indicates stabilizing selection, which tells you that median values have higher fitnesses than the extremes. The number of stars is proportional to statistical significance.



Some of this is not surprising. The age of the onset of menarche has been dropping in much of the world. I suspect this is mostly due to better nutrition, but a consequence of this shift is earlier fertility for some females. The authors are nervous about the robust correlation of higher fertility with lower intelligence, but notice that the pattern for wealth and income is different and more complicated. The key is to look at education.  Whether you believe intelligence exists or not in any substantive concrete sense, those who are more intelligent are more likely to have had more education, and there’s a rather common sense reason why investing in more schooling would reduce your fertility: you simply forgo some of your peak reproductive years, especially if you’re female. The higher you go up the educational ladder the stronger the anti-natalist cultural and practical pressures become (the latter is a heavier burden for females because of their biological centrality in child-bearing, but both males and females are subject to the former). As with religion even if the differences have no biological implication because you believe the correlations are spurious or reject the existence of the trait one presumes that parents and subcultures pass on values to offspring. If higher education has anti-natalist correlations we shouldn’t be surprised if subsequent generations turn away from higher education. Their parents were the ones who were more likely to avoid it.

We live in interesting times.

August 12, 2010

Hybridization is like sex

480px-Olivia_MunnOne of the major issues which has loomed at the heart of biology since The Origin of Species is why species exist, as well as how species come about. Why isn’t there a perfect replicator which performs all the conversion of energy and matter into biomass on this planet? If there is a God the tree of life almost seems to be a testament to his riotous aesthetic sense, with numerous branches which lead to convergences, and a inordinate fascination with variants on the basic morph of beetles. From the outside the outcomes of evolutionary biology look a patent mess, a sprawling expanse of experiments and misfires.

A similar issue has vexed biologists in relation to sex. Why is it that the vast majority of complex organisms take upon themselves the costs of sex? The existence of a non-offspring bearing form within a species reduces the potential natural increase by a factor of two before the game has even begun. Not only that, but the existence of two sexes who must seek each other out expends crucial energy in a Malthusian world (selfing hermaphrodites obviously don’t have this problem, but for highly complex organisms they aren’t so common). Why bother? (I mean in an ultimate, not proximate, sense)

It seems likely that part of the answer to both these questions on the grande scale is that the perfect is the enemy of long term survival. Sexual reproduction confers upon a lineage a genetic variability which may reduce fitness by shifting populations away from the adaptive peak in the short term, but the fitness landscape itself is a constant bubbling flux, and perfectly engineered asexual lineages may all too often fall off the cliff of what was once their mountain top. The only inevitability seems to be that the times change. Similarly, the natural history of life on earth tells us that all greatness comes to an end, and extinction is the lot of life. The universe is an unpredictable place and the mighty invariably fall, as the branches of life’s tree are always pruned by the gardeners red in tooth and claw.

ResearchBlogging.orgBut it is one thing to describe reality in broad verbal brushes. How about a more rigorous empirical and theoretical understanding of how organisms and the genetic material through which they gain immortality play out in the universe? A new paper which uses plant models explores the costs and benefits of admixture between lineages, and how those two dynamics operate in a heterogeneous and homogeneous world. Population admixture, biological invasions and the balance between local adaptation and inbreeding depression:

When previously isolated populations meet and mix, the resulting admixed population can benefit from several genetic advantages, including increased genetic variation, the creation of novel genotypes and the masking of deleterious mutations. These admixture benefits are thought to play an important role in biological invasions. In contrast, populations in their native range often remain differentiated and frequently suffer from inbreeding depression owing to isolation. While the advantages of admixture are evident for introduced populations that experienced recent bottlenecks or that face novel selection pressures, it is less obvious why native range populations do not similarly benefit from admixture. Here we argue that a temporary loss of local adaptation in recent invaders fundamentally alters the fitness consequences of admixture. In native populations, selection against dilution of the locally adapted gene pool inhibits unconstrained admixture and reinforces population isolation, with some level of inbreeding depression as an expected consequence. We show that admixture is selected against despite significant inbreeding depression because the benefits of local adaptation are greater than the cost of inbreeding. In contrast, introduced populations that have not yet established a pattern of local adaptation can freely reap the benefits of admixture. There can be strong selection for admixture because it instantly lifts the inbreeding depression that had built up in isolated parental populations. Recent work in Silene suggests that reduced inbreeding depression associated with post-introduction admixture may contribute to enhanced fitness of invasive populations. We hypothesize that in locally adapted populations, the benefits of local adaptation are balanced against an inbreeding cost that could develop in part owing to the isolating effect of local adaptation itself. The inbreeding cost can be revealed in admixing populations during recent invasions.

First, plants are good models to explore evolutionary genetics. They’re not as constrained as say mammals, or the typical tetrapod, when it comes to barriers to gene flow between distinct taxa. Hybridization is common, and plants can also self-fertilize as well as cross-fertilize, allowing researchers to push the genetic pool in different directions (”selfing” obviously reduces the effective population and is an extreme form of inbreeding, so it’s a good way to purge genetic variation really quickly). In a perfect abstract world of evolution one might imagine Richard Dawkins’ vehicles and replicators as fluid entities which float along a turbid sea of evolutionary genetic parameters, drift, migration, mutation and selection. But reality is constrained to DNA substrate, which have their own parameters such as recombination, modulators such as epigenetics, and numerous ways to express variation through gene regulation. It’s complicated, and stripping the issues down to their pith is easier said that done.

But the broader dynamics here being examined is the generalist-specialist trade-off, which I think is relevant to the two issues I introduced earlier in this post. Specialists are optimized for their own position in the adaptive landscape, but have difficulties when it is perturbed. Generalists always less than maximum fitness in all landscapes, but higher average fitness across them because they can adapt to changes. Specialization is local adaptation of particular lineages, while in the generalist case you can have invasive species in novel environments. They’re obviously facing an adaptive landscape which is at some remove from what any of the introduced genotypes were “optimized” for, so hybridization produces something new for something new.

In the first figure of the paper you see F3 wild barley descended from two parental lineages, ME and AQ. The left panels show seed output as a function of heterozygosity, and the right panels as a function of ME genome content. Remember that in subsequent generations the descendants of hybrids will vary quite a big in genetics and phenotype as the original alleles re-segregate.


The takeaway is that in novel environments genetic variation seems to result in increased fitness. Why? One concept which one has to introduce is heterosis, whereby crosses between homogeneous lineages produce more fitness offspring. One reason this may be is that there is overdominance, where heterozygotes have greater fitness than the homogyzotes. This is the case with sickle-cell malaria disease. Another reason may be that in the original parental lineages there was a higher fraction of alleles which were deleterious in homozygote genotypes. In plain English, inbreeding resulted in genetic drift which cranked up the proportion of alleles implicated in recessively express negative phenotypes. The authors argue though that in the context local adaptation is strong enough to be a barrier against too much gene flow between the parental wild barely lineages, so the deleterious alleles are less likely to be masked. Only in a novel environment when that benefit was removed from the equation could the negative consequences of inbreeding come to the fore in the total calculus.

Figure 2 shows the results of experiments which examine the fitness of white campion, a European species which has been introduced in North America. In the left panel are crosses between native European lineages, with distance between parental lineages on the x-axis. In the right panel you have the same experiment, but with North American variants, which are products of introductions from various regions of Europe. The plants were grown in a “common garden,” to show how all the genotypes performed when environment was controlled.


As you can see moderate levels of hybridization entailed a benefit in the European variants, but not the North American variants. Hybridization between variants which were too distant did produce outbreeding depression in the European case, suggesting perhaps that disruption of co-adapted gene complexes resulted in a greater fitness cost than the masking of deleterious alleles due to inbreeding. One can make the inference from these data that the introduced white campion lineages are already hybridized, the barriers to crossing being removed by a disruption of the adaptive landscapes which each native lineages was optimized for.

Here are the authors from the discussion talking about invasions of exotic species:

Provided that multiple introductions from different source populations have occurred, the benefits of admixture become freely available to introduced populations that do not yet show a pattern of local adaptation. Because the benefits are potentially large, admixture may play an important role during early invasions. Native populations often show evidence of inbreeding depression…and one instant reward of admixture in the introduced range is the release of this genetic burden. Such heterosis effects can contribute significantly to the establishment and early success of invasive species…When tested together in a common garden experiment, invaders can show enhanced fitness-related traits compared with populations from their native range…If there is evidence of admixture, the effects of heterosis might be a default explanation for such observations, perhaps providing a null expectation against which other explanations (such as trait evolution) need to be tested.

What have plants to do with life as a whole? I assume much. Plants differ in the details, but compared to other complex multicellular organisms in regards to evolutionary genetics they’re quite liberated. By this, I mean that their modes of reproduction and promiscuity in hybridization make them more of an ideal “frictionless” test case of evolutionary biology and the power of the classical parameters. Perhaps given enough time natural selection would produce the ideal replicator to rule them all, to drive all others to extinction. But that day is not this day. And that day may never come because the universe is far too protean and erratic. Life is varied, on the phenotypic and genotypic level, and the exogenous processes of climate and geology continue to warp and reshape the adaptive landscape. And more subtly, but just as critically, life is always in an endless race with itself, as pathogens co-evolve with their hosts, and predators figure out how to outfox their prey. Life warps its own adaptive landscapes, and the innovation of one branch may lead to extinction of others as well as the proliferation of new branches.

More prosaically and anthropocentrically what does this say about us? Humans are an expansive species, and over the past 500 years different lineages have been hybridizing promiscuously. New genotypes have arisen in altered landscapes, and our pathogens are also riding the high tide of globalization onward and upward. We are ourselves a “natural experiment.”

Image Credit: Olivia Munn by Gage Skidmore

Link hat tip: Dienekes.

Citation: Verhoeven KJ, Macel M, Wolfe LM, & Biere A (2010). Population admixture, biological invasions and the balance between local adaptation and inbreeding depression. Proceedings. Biological sciences / The Royal Society PMID: 20685700

July 21, 2010

Disease as a byproduct of adaptation

How we perceive nature and describe its shape are a matter of values and preferences. Nature does not take notice of our distinctions; they exist only as instruments which aid in our comprehension. I’ve brought this up in relation to issues such as categorization of recessive vs. dominant traits. The offspring of people of Sub-Saharan African and non-African ancestry where the non-African parent has straight or wavy hair tend to have very curly hair. Therefore, one may say that the tightly curled hair form is dominant to straight or wavy hair. But, it is also the case that there is some modification in relation to the African parent in the offspring, so the dominance is not complete. When examining the morphology of the follicle, which determines the extent of the hair’s curl, the offspring may in fact exhibit some differences from both parents. In other words our perception of the outcomes of inheritance are contingent to some extent on our categorization of the traits as well as our specific focus along the developmental pathway.

Or consider the division between “traits” and “diseases.” The quotations are necessary. Lactose intolerance is probably one of the best cases to illustrate the gnarly normative obstructions which warp our perceptions. As a point of fact lactose intolerance is the ancestral human state, and numerically predominant. It is the “wild type.” Lactose tolerance is a relatively recent adaptation, found among a variety of West Eurasian and African populations. A more politically correct term, lactase persistence, probably better encapsulates the evolutionary history of the trait, which has shifted from the class of disease to that of genetic trait when we evaluate the bigger picture (obviously diseases are simply “bad” traits”).

Sometimes though the issues are more cut & dried. No one would doubt that sickle-cell anemia is a disease. It has a major fitness impact in a colloquial sense, as well as evolutionarily. It kills you, and it kills your potential genetic lineage. But, it is also a byproduct of adaptation to endemic malaria. Sickle-cell disease one of the classical illustrations of heterozygote advantage, whereby those who carry one copy of the mutation on the gene have increased fitness vis-a-vis those who carry two normal copies of the gene. The increase in frequency of the mutant gene though is balanced by the fact that mutant homozygotes have decreased fitness.

We can then construct a narrative of the long term evolutionary dynamics from this initial condition. When a new exogenous stress hits a population mean fitness drops immediately (take a look at the biographies of the Popes, and observe how many died of malaria in the Dark Ages when that disease was new to Italy). Natural selection quickly increases in frequency any alleles which confer protection against the exogenous stress. But, baked into the cake of how genetics in complex organisms usually works, one allele may often have multiple downstream consequences. This is pleiotropy. This means that if a change at a locus increases aggregate fitness, it may nevertheless destabilize long established biochemical pathways. In the short term evolution simply takes the net fitness impact into account. Over the long term one assumes that “better solutions” will emerge which do not have so high a fitness drag, perhaps through the evolution of modifier genes which mask the deleterious outcomes of the initial mutant. This sort of ad hoc trial and error and “duct-taping” of kludges is part and parcel of how adaption works in situations where shocks out of equilibrium states are common.

In many cases the byproducts of a genetic change may be benign. To my knowledge no one knows major negative consequences of carrying the alleles which confer lactase persistence (excepting some studies indicating higher obesity, but this seems a marginal fitness impact which has only come to the fore in the past century in all likelihood). But in other cases the outcomes may not be as serious as that of sickle-cell anemia, but may rise above the level of significance where one must note the existence of a disease which is a secondary consequence of adaptation to meet a new challenge.

Yesterday I pointed to a paper which illustrates just this phenomenon, Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans:

African-Americans have higher rates of kidney disease than European-Americans. Here, we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 {FSGS odds ratio = 10.5 [95% confidence interval (CI) 6.0 to 18.4]; H-ESKD odds ratio = 7.3 (95% CI 5.6 to 9.5)}. The two APOL1 variants are common in African chromosomes but absent from European chromosomes, and both reside within haplotypes that harbor signatures of positive selection. Apolipoprotein L-1 (ApoL1) is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.

In its implementation the paper has a lot of moving parts, but the outcome is straightforward. If you haven’t, you might read Genomes Unzipped and its post How to read a genome-wide association study. This is a case where the original association studies were not reporting false results, but, it seems that one had to take a further step to really understand the likely molecular genetic and evolutionary underpinnings of what was going on. These results suggest that the original signals of association for variants within the MYH9 gene were actually signals from within APOL1, which happened to be next to MYH9. The region around MYH9 had already showed up in tests to detect natural selection through patterns of linkage disequilibrium (non-random associations of alleles at different loci within the genome, in this case the relevant consideration are adjacent loci across continuous regions of the genome which come together to form haplotype blocks). Since the footprint of natural selection on the genome is often wide that did not imply that MYH9 was the target of natural selection per se, opening the likely possibility for other causal associations. A convenience in light of the difficulty of establishing a plausible functional relationship between renal failure and MYH9.

To explore the possibility of nearby functional candidates the researchers focused on a number of alleles within this genomic region which exhibited maximal European-African frequency differences in the 1000 Genomes Project. Once they ascertained the between population differences they then looked at differences in allele frequencies in cases and controls within the African American population for the two diseases in question (those with the trait/disease vs. those without). Table 1 has the top line raw results:


WT = “Wild Type,” the ancestral allelic variant found in most populations. G1 and G2 are two haplotypes, associated alleles across the locus of the APOL1 gene. G1 consists of the two derived non-synonymous coding variants rs73885319 (S342G) and rs60910145 (I384M) within an exonic region of APOL1. Non-synonymous simply means that a change at that base pair alters the amino acid coded, and exons are the genomics regions whose information is eventually translated into proteins. In other words, these are non-neutral functionally significant genomic regions which do something. G2 is a 6 base pair deletion, rs71785313, close to G1 in APOL1.

apo12To more formally model the relationship between the alleles which are found to differ between cases and controls they performed a logistic regression. The alleles serve as independent variables which can predict the probable outcome of the dependent variable, the probability of FSGS or H-ESKD in this case (renal failure). Figure 1 to the left has a summary of some of the results of the regression in graphical form for FSGS. I’ve rotated it so it can fit on the screen. Basically the strong signals are to the right of the chart (from your perspective). The y-axis displays (horizontal from your perspective) negative-log of p-values for a signal at a particular marker, which is defied by the x-axis (vertical for you). The labels show the particular gene at that genomic position. The smaller the p-value, the more probable that the signal is real and not random. This produces huge spikes in the negative-log values (in the body of the paper they present p-values on the order of 10-35).

You can see that it is in APOL1 that the biggest signals reside. The first panel, A, throws all the SNPs into the mix. On MYH9 they highlight a few SNPs which combine to form the E-1 haplotype, which is strongly associated with cases (this is where the association between disease and genetic variants on MYH9 are coming from). This haplotype is found in conjunction with G1 and G2 on APOL1. E-1 is present in 89% of haplotypes carrying G1 and in 76% of haplotypes carrying G2. A classic illustration of likely correlation but not causation. The second panel controls for the effect of G1. In other words, this is showing you the variation in the dependent variable that remains after you take the largest independent variable, G1, into account. The G2 haplotype is the largest effect independent variable after G1 is taken into account; in other words, it explains most of the residual variation in FSGS probability. Finally, the last panel controls for both G1 and G2. As you can see there aren’t any major signals left; the distribution is relatively flat. Logically once you account for the variables which produce change in an outcome you shouldn’t see any impact of other variables. And that’s what happens here. They also performed controls where MYH9 was held constant, and that does not eliminate the signals in APOL1. MYH9 is conditional on its correlation with APOL1. This was the correlation which showed up on the original association studies. The exact same pattern of signals within the logistic regression model was replicated for H-ESKD. G1 had the strongest signal, then G2. The markers within MYH9 was not significant once one controlled for the variants in G1 and G2.

It is important to remember though that these markers are segregating within a human population where individuals have three potential genotypes. Ancestral homozygote, homozygote for the mutants, and heterozygote. They found that a recessive model of expression of disease is most appropriate in the case of these risk alleles. That is, most of the increased risk is accounted for by the change from one risk allele, the heterozygote state, to two risk alleles, the homozygote state. One risk allele increased odds of renal failure by 1.26, but two by 7.3. The odds ratio of two risk alleles compared to a base rate of one risk allele was 5.8. They report that the results for FSGS were broadly similar. This matters because the frequency of the trait/disease in a random mating population is conditional on the homozygotes if it has a recessive expression pattern. G1 was present in 40% of Yoruba HapMap data set, but in none of the two Eurasian groups, Europeans and East Asians. G2 was found in three Yoruba, but in none of the Eurasian groups. Assuming Hardy-Weinberg equilibrium the Yoruba should have 16% of the population at sharply elevated risk for FSGS and H-ESKD because they’d be homozygotes for the G1 allele.

Once they established which markers seem to implicated in this phenotypic variation, they wanted to focus on how the frequencies of those markers came to be. Specifically, G1 and G2 seem to be derived haplotypes which arose out of the ancestral background. In plain English 20,000 years ago Africans should have looked like all non-Africans genomically, at least on the functionally relevant segments, but within the last 10,000 years it looks like new variants rose in frequency driven by natural selection to new environmental stresses. The region has already broadly been surveyed by linkage disequilibrium based tests, which basically look for regions of long haplotypes, homogenized zones of the genome where many individuals have the variation removed because one gene rose so rapidly in frequency that huge adjacent sections hitchhiked up in frequency. Presumably this may have happened with the MYH9 haplotype correlated with the traits under consideration here; G1 and G2 dragged up the E-1 haplotype as a secondary consequence of their own rise to prominence among some Sub-Saharan African populations.

So next authors turned to tried & tested techniques and focused on the risk markers which they had discovered earlier in their research, G1 and G2. Specifically, EHH, which is best at detecting selection where sweeps have nearly completed (e.g., the derived variant is at frequency 0.95 within the population), iHS, which is best at detecting sweeps which have not completed (e.g., the derived variant is at frequency 0.6), as well as ΔiHH, which I am less familiar with but is reputedly similar to iHS but uses absolute haplotype length as opposed to relative haplotype length. Figure 2 show the results of these tests:


The resolution isn’t the best, but G1 and G2 seem to be outliers on all three tests to detect natural selection by using patterns of linkage disequilibrium. The first panel is EHH, the second and third show iHS and ΔiHH respectively, with the position of the markers being outliers among the distribution of values for the genome within the Yoruba. This is not proof of adaptation, but it changes our weights of possibilities. Additionally, they note that Europeans exhibit no such patterns on these markers. Visually the position of the markers in the latter two panels would be closer to the mode of the distribution in Europeans.

To review, first they confirmed a causal relationship between a particular set of markers, haplotypes, and the traits of interest. Second, they confirmed that said markers seem to bear the hallmarks of genomic regions subject to natural selection. We know that focal segmental glomerulosclerosis (FSGS) end-stage kidney disease (H-ESKD), the traits whose relationship to the G1 and G2 haplotypes seem confirmed, are unlikely to be targets of positive natural selection. To get a better sense of that we need to look at Apol1, the protein product of APOL1, and what it does. At this point I’ll quote the paper:

ApoL1 is the trypanolytic factor of human serum that confers resistance to the Trypanosoma brucei brucei (T. brucei brucei) parasite…T. brucei brucei has evolved into two additional subspecies, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense, which have both acquired the ability to infect humans…T. brucei rhodesiense is predominantly found in Eastern and Southeastern Africa, while T. brucei gambiense is typically found in Western Africa, though some overlap exists…Since these parasites exist only in sub-Saharan Africa, we hypothesized that the APOL1 gene may have undergone natural selective pressure to counteract these trypanosoma adaptations. As an initial test of this hypothesis, we performed in vitro assays to compare the trypanolytic potential of the variant, disease-associated forms of ApoL1 proteins with that of the “wild-type” form of ApoL1 protein that is not associated with renal disease.

We’re talking about sleeping sickness. Here’s a description:

It starts with a headache, joint pains and fever. It is the kind you would expect to get over quickly. But after a while, things get worse. You fall asleep most of the time, are confused and get intense pains and convulsions.

If you do not get treatment, your body begins to waste away. Eventually, you slip into coma and die. This is human African trypanosommiasis, better known as sleeping sickness. If untreated, it kills 100% of its victims in a very short time.

Cheery. I think we have a plausible reason for natural selection to kick into overdrive! Or more specifically, we have a plausible external selection pressure which will drive fitness differentials which correlate with genetic variation. Increased probability of kidney disease seems preferable to this. In terms of the molecular genetics it looks like a factor, serum resistance-associated protein (SRA), produced by T. brucei rhodesiense binds to a specific location of Apol1, and that mutations at G1 and G2 change exactly that location within the protein. So these mutants may block the ability of T. brucei rhodesiense to turn off the body’s defenses against trypanosomes.

To test this they examined the in vitro lytic potential of serum produced by individuals carrying the G1 and G2 haplotypes against the three subspecies of of Trypanosoma. T. brucei brucei, which normal Apol1 can lyse, and T. brucei rhodesiense and T. brucei gambiense which can infect humans (endemic to eastern and western Africa respectively, though the former extends into west Africa as well).

- All 75 samples lysed brucie brucie

- None lysed brucie gambiense

- 46 samples lysed SRA-positive brucie rhodesiense, all 46 samples were from G1 or G2 carrying individuals

- The potency of G2 seemed higher than G1 against SRA-positive samples of brucie rhodesiense, though not SRA-negative samples, where G1 seemed as potent

- Recombinants of Apol1 which had only one of the two SNPs of the G1 haplotype were less effective against brucie rhodesiense than those which had both (G1 haplotype)

- Recombinants with G1 and G2 were not more effective against brucie rhodesiense than those with G2 alone

- Recombinants with G1 alone were more potent against SRA-negative brucie rhodesiense than those with G2 alone

- G2 was necessary and sufficient to block SRA binding to Apol1 and allow lysing of brucie rhodesiense. G1 did not block SRA binding to Apol1, but was still sufficient to lyse brucie rhodesiense, but far less potent against SRA-positive brucie rhodesiense than G2

It seems that the G1 and G2 haplotypes utilize different mechanisms to enable the lysing of invasive pathogens, and so prevent the development of sleeping sickness. Their means differ, but the ends are the same. The authors note that even minimal amounts of plasma serum produced by G2 individuals seems potent enough to block the binding of SRA to Apol1 and so enable lysis. And introduction of such plasma into the bloodstreams of individuals who do not have resistance may then be highly efficacious as a preventative treatment against sleeping sickness. They do note that they did not explore in detail the mechanism by which the G1 and G2 variants result in suscepbility to kidney failure, but that’s presumably for the future.

Finally, the second to last paragraph where they bring it all together:

It will be interesting to determine the distribution of these mutations throughout sub-Saharan Africa. In present-day Africa, T. brucei rhodesiense is found in the Eastern part of the continent, while we noted high frequency of the trypanolytic variants and the signal of positive selection in a West African population. Changes in trypanosome biology and distribution and/or human migration may explain this discrepancy, or resistance to T. brucei rhodesiense could have favored the spreading of T. brucei gambiense in West Africa. Alternatively, ApoL1 variants may provide immunity to a broader array of pathogens beyond just T. brucei rhodesiense, as a recent report linking ApoL1 with anti-Leishmania activity may suggest…Thus, resistance to T. brucei rhodesiense may not be the only factor causing these variants to be selected.

This is a very long review already. But, while I have your attention, I think I need to point to another paper on the same topic which has a slightly different twist. I won’t dig into the details with the same thoroughness as above, but rather I’ll highlight the value-add of this group’s contribution. It’s an Open Access paper, unlike the one above, so you can review it in depth yourself. Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene:

MYH9 has been proposed as a major genetic risk locus for a spectrum of nondiabetic end stage kidney disease (ESKD). We use recently released sequences from the 1000 Genomes Project to identify two western African-specific missense mutations (S342G and I384M) in the neighboring APOL1 gene, and demonstrate that these are more strongly associated with ESKD than previously reported MYH9 variants. The APOL1 gene product, apolipoprotein L-1, has been studied for its roles in trypanosomal lysis, autophagic cell death, lipid metabolism, as well as vascular and other biological activities. We also show that the distribution of these newly identified APOL1 risk variants in African populations is consistent with the pattern of African ancestry ESKD risk previously attributed to MYH9. Mapping by admixture linkage disequilibrium (MALD) localized an interval on chromosome 22, in a region that includes the MYH9 gene, which was shown to contain African ancestry risk variants associated with certain forms of ESKD…MYH9 encodes nonmuscle myosin heavy chain IIa, a major cytoskeletal nanomotor protein expressed in many cell types, including podocyte cells of the renal glomerulus. Moreover, 39 different coding region mutations in MYH9 have been identified in patients with a group of rare syndromes, collectively termed the Giant Platelet Syndromes, with clear autosomal dominant inheritance, and various clinical manifestations, sometimes also including glomerular pathology and chronic kidney disease…Accordingly, MYH9 was further explored in these studies as the leading candidate gene responsible for the MALD signal. Dense mapping of MYH9 identified individual single nucleotide polymorphisms (SNPs) and sets of such SNPs grouped as haplotypes that were found to be highly associated with a large and important group of ESKD risk phenotypes, which as a consequence were designated as MYH9-associated nephropathies…These included HIV-associated nephropathy (HIVAN), primary nonmonogenic forms of focal segmental glomerulosclerosis, and hypertension affiliated chronic kidney disease not attributed to other etiologies…The MYH9 SNP and haplotype associations observed with these forms of ESKD yielded the largest odds ratios (OR) reported to date for the association of common variants with common disease risk…Two specific MYH9 variants (rs5750250 of S-haplotype and rs11912763 of F-haplotype) were designated as most strongly predictive on the basis of Receiver Operating Characteristic analysis…These MYH9 association studies were then also extended to earlier stage and related kidney disease phenotypes and to population groups with varying degrees of recent African ancestry admixture…and led to the expectation of finding a functional African ancestry causative variant within MYH9. However, despite intensive efforts including re-sequencing of the MYH9 gene no suggested functional mutation has been identified…This led us to re-examine the interval surrounding MYH9 and to the detection of novel missense mutations with predicted functional effects in the neighboring APOL1 gene, which are significantly more associated with ESKD than all previously reported SNPs in MYH9.

Table one has the top line results. Focus on the first two rows, they’re “G1″ from the earlier study (that is, the two SNPs which combine to form the G1 haplotype).


Here’s a difference between the previous paper and this one: the table above uses cases and controls from African Americans and Hispanic Americans. The original paper which the genomic data on this sample is drawn from calculates the average ancestry of African, European and Native American in the two groups is as follows (I did some rounding to keep the values round):

African American – 85%, 10%, 5%
Hispanic American – 30%, 55%, 15%

Not surprisingly the Hispanic American sample here is mostly Puerto Rican and Dominican, explaining the greater African than Native American ancestry. Nevertheless, it is a sufficiently different genetic background to test the effects of the same marker against different genes. They confirmed the association of the markers of large effect in African Americans within the Hispanic cohort. The risk allele frequency in the African American control group is 21% vs. 37% in the cases. For Hispanic Americans are 6% and 23% for the same categories.

OK, now to the most interesting point in this short paper:

HIVAN has been considered as the most prominent of the nondiabetic forms of kidney disease within what has been termed the MYH9-associated nephropathies…We have reported absence of HIVAN in HIV infected Ethiopians, and attributed this to host genomic factors (Behar et al. 2006). Therefore, we examined the allele frequencies of the APOL1 missense mutations in a sample set of 676 individuals from 12 African populations, including 304 individuals from four Ethiopian populations…We coupled this with the corresponding distributions for the African ancestry leading MYH9 S-1 and F-1 risk alleles. A pattern of reduced frequency of the APOL1 missense mutations and also of the MYH9 risk variants was noted in northeastern African in contrast to most central, western, and southern African populations examined…Especially striking was the complete absence of the APOL1 missense mutations in Ethiopia. This combination of the reported lack of HIVAN and observed absence of the APOL1 missense mutations is consistent with APOL1 being the functionally relevant gene for HIVAN risk and likely the other forms of kidney disease previously associated with MYH9.

apo16Bingo. The previous paper focused on African Americans (along with the HapMap Yoruba). But the pattern of variation within Africa is interesting as well. Ethiopians are not quite like other Africans, having a great deal of admixture with populations from Arabia (many of the languages of highland Ethiopia are Semitic). But the majority of their ancestry remains similar to that of other Sub-Saharan Africans. As a point of contrast the ecology of Ethiopia differs a great deal from the rest of Sub-Saharan Africa because of its elevation, and concomitant frigidity. The mean monthly low in Addis Ababa is around 10 (50 for Americans) degrees and mean high 20-25 (high 60s to mid 70s for Americans). There isn’t much variation from month to month because of the low latitude, but the high elevation keeps the temperatures relatively moderate. Different environments result in different selection pressures, and Ethiopia has a very unique environment within Africa. The tsetse fly which serves as a vector forTtrypanosomes does not seem to be present in the Ethiopian highlands. The map above shows the distribution within Africa of one the markers which defines the G1 haplotype in the previous paper. Note that the modal frequency is in the west of Africa, and the frequency drops off to the east (though the geographic coverage leaves a bit to be desired if you look at the raw data which went into generating this map, which smooths over huge discontinuities).

One of the points I want to reemphasize from the tests of natural selection in the first paper is that these genetic adaptations are likely to be new, otherwise recombination would have broken up the long haplotypes and reduced linkage disequilibrium. New as in the last 10,000 years. It is interesting that a particular subspecies of Trypanosome which is immune to these genetic adaptations is endemic to west Africa. We may be seeing evolution in action here, or at least the arms race between man and pathogen where man is always one step behind. In contrast, the subspecies which is effectively diffused by the genetic adaptations reviewed here is present in higher numbers precisely in the regions where the resistance mutations are extant at lower proportions. Perhaps there are different mutations in these regions of Africa, not yet properly identified. Or perhaps the we’re seeing humans in this region at an earlier stage of the dance, so to speak.

Citation: Giulio Genovese, David J. Friedman, Michael D. Ross, Laurence Lecordier, Pierrick Uzureau, Barry I. Freedman, Donald W. Bowden, Carl D. Langefeld, Taras K. Oleksyk, Andrea Uscinski Knob, Andrea J. Bernhardy, Pamela J. Hicks, George W. Nelson, Benoit Vanhollebeke, Cheryl A. Winkler, Jeffrey B. Kopp, Etienne Pays, & Martin R. Pollak (2010). Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans Science : 10.1126/science.1193032

Citation: Tzur S, Rosset S, Shemer R, Yudkovsky G, Selig S, Tarekegn A, Bekele E, Bradman N, Wasser WG, Behar DM, & Skorecki K (2010). Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Human genetics PMID: 20635188

May 18, 2010

An umbrella against the mutational showers

Filed under: Biology,Genetics,Good Genes,Selection,Sex — Razib Khan @ 2:41 am

Mutations are as you know a double-edged sword. On the one hand mutations are the stuff of evolution; neutral changes on the molecular or phenotypic level are the result of from mutations, as are changes which enhance fitness and so are driven to fixation by positive selection. On the other hand mutations also tend to cause problems. In fact, mutations which are deleterious far outnumber those which are positive. It is much easier to break complex systems which are near a fitness optimum than it is to improve upon them through random chance. In fact a Fisherian geometric analogy of the affect of genes on fitness implies that once a genetic configuration nears an optimum mutations of larger effect have a tendency to decrease fitness. Sometimes environments and selection pressures change radically, and large effect mutations may become needful. But despite their short term necessity these mutations still cause major problems because they disrupt many phenotypes due to pleiotropy.

But much of the playing out of evolutionary dynamics is not so dramatic. Instead of very costly mutations for good or ill, most mutations may be of only minimal negative effect, especially if they are masked because of recessive expression patterns. That is, only when two copies of the mutation are present does all hell break loose. And yet even mutations which exhibit recessive expression tend to generate some drag on the fitness of heterozygotes. And if you sum small values together you can obtain a larger value. This gentle rain of small negative effect mutations can be balanced by natural selection, which weeds does not smile upon less fit individuals who have a higher mutational load. Presumably those with “good genes,” fewer deleterious mutations, will have more offspring than those with “bad genes.” Because mutations accrue from one generation to the next, and, there is sampling variance of deleterious alleles, a certain set of offspring will always be gifted with fewer deleterious mutations than their siblings. This is a genetics of chance. And so the mutation-selection balance is maintained over time, the latter rising to the fore if the former comes to greater prominence.

The above has been a set of logic inferences from premises. Evolution is about the logic of life’s process, but as a natural science its beauty is that it is testable through empirical means. A short report in Science explores mutational load and fitness, and connects it with the ever popular topic of sexual selection, Additive Genetic Breeding Values Correlate with the Load of Partially Deleterious Mutations:

The mutation-selection–balance model predicts most additive genetic variation to arise from numerous mildly deleterious mutations of small effect. Correspondingly, “good genes” models of sexual selection and recent models for the evolution of sex are built on the assumption that mutational loads and breeding values for fitness-related traits are correlated. In support of this concept, inbreeding depression was negatively genetically correlated with breeding values for traits under natural and sexual selection in the weevil Callosobruchus maculatus. The correlations were stronger in males and strongest for condition. These results confirm the role of existing, partially recessive mutations in maintaining additive genetic variation in outbred populations, reveal the nature of good genes under sexual selection, and show how sexual selection can offset the cost of sex.

mutAdditive genetic variance just refers to the variation of genes which affect the phenotype by independent and usually small effects which sum together to produce the range of variation of the trait. Imagine for example that the range of variation in height within the population was 10 inches, and that there were 10 genes which varied, and that each gene exhibited co-dominance. One could construct a model where every gene pair could add 0, 0.5 or 1 inch to the height independently, so that the maximum height could be constructed by adding 10 inches to the baseline and 1 inch per locus, and the minimum height by adding no inches to the baseline when each locus is homozygous for null alleles.

Mutations can be conceived of in the same manner, with each mutation being a new variant which changes trait value. Even if most of the impact of a mutation is masked there is a small effect in the heterozygote state, and this may serve as a fitness drag. The range in mutational load can then naturally be analogized to additive genetic variance, in this case the trait under consideration ultimately being fitness, mediated through life history and morphological phenotypes.

In this report they focused primarily on the weevil’s ability to obtain resources and transform those resources into size, which correlates with greater sexual access for males and fecundity for females (ergo, greater fitness). They bred various outbred and inbred lineages across families of these weevils, because these sorts of crosses gauge the impact of masked deleterious alleles, which will manifest in homozygote state more often between related pairs who share mutations than unrelated ones. They found a correlation of -0.24 between inbreeding and breeding value; in other words the more inbred the pair the fewer offspring. The impact of these recessively expressed alleles is mitigated in heterozygous individuals, but because of the non-trivial impact the number of these alleles within an individual will determine its fitness all things equal.

328_892_F1Interestingly when background variables were controlled males tended to show the greatest fitness drag due to inbreeding depression. This would comport with models of sexual selection where males justify their expense (because they can not bear offspring) within the population by serving as the perishable dumping grounds of bad genes. In particular in a polygynous population a few healthy males with good genes could give rise to most of the next generation, and so providing the balance of selection to the background mutational rate.

Of course mating patterns vary between taxa. The more reproductive skew there is, in particular for males, the more recourse selection has every generation to dump deleterious alleles via selection. In contrast monogamous populations will have less power to expunge mutations in this fashion because there is more genetic equality across males, the bad will reproduce along with the good, more or less. Therefore a breeding experiment of weevils may have more limited insight than these authors may wish to admit. Geoffrey Miller’s The Mating Mind attempted to take the insights of sexual selection and develop a model of human evolutionary history, but it does not seem that this theory has swept all before it. Only time will tell, but until then more breeding experiments can’t help but clarify where theory goes wrong or right.

Citation: Tomkins, J., Penrose, M., Greeff, J., & LeBas, N. (2010). Additive Genetic Breeding Values Correlate with the Load of Partially Deleterious Mutations Science, 328 (5980), 892-894 DOI: 10.1126/science.1188013

May 14, 2010

Breathing like Buddha: altitude & Tibet

443px-PaldenLhamoYou probably are aware that different populations have different tolerances for high altitudes. Himalayan sherpas aren’t useful just because they have skills derived from their culture, they’re actually rather well adapted to high altitudes because of their biology. Additionally, different groups seem to have adapted to higher altitudes independently, exhibiting convergent evolution. But in terms of physiological function they aren’t all created equal, at least in relation to the solutions which they’ve come to to make functioning at high altitudes bearable. In particular, it seems that the adaptations of the peoples of Tibet are superior than those of the peoples of the Andes. Superior in that the Andean solution is more brute force than the Tibetan one, producing greater side effects, such as lower birth weight in infants (and so higher mortality and lower fitness).

The Andean region today is dominated by indigenous people, and Spanish is not the lingua franca of the highlands as it is everyone in in the former colonial domains of Spain in the New World. This is largely a function of biology; as in the lowlands of South America the Andean peoples were decimated by disease upon first contact (plague was spreading across the Inca Empire when Pizzaro arrived with his soldiers). But unlike the lowland societies the Andeans had nature on their side: people of mixed or European ancestry are less well adapted to high altitudes and women without tolerance of the environment still have higher miscarriage rates.

So despite the suboptimal nature of the Andean adaptations vis-a-vis the Tibetan ones, they are certainly better than nothing, and in a relative sense have been very conducive to higher reproductive fitness. And yet why might the Andeans have kludgier adaptations than Tibetans? One variable to consider is time. The probability is that the New World was populated by humans only for the past ~10,000-15,000 years or so, with an outside chance of ~20,000 years (if you trust a particular interpretation of the genetic data, which you probably shouldn’t). By contrast, modern humans have had a presence in the center of Eurasia for ~30,000 years. Generally when populations are exposed to new selective regime the initial adaptations are drastic and exhibit major functional downsides, but they’re much better than the status quo (remember, fitness is relative). Over time genetic modifications mask the deleterious byproducts of the genetic change which emerged initially to deal with the new environment. In other words, selection perfects design over time in a classic Fisherian sense as the genetic architecture converges upon the fitness optimum.*

Another parameter may be the variation available within the population, as the power of selection is proportional to the amount of genetic variation, all things equal. The peoples of the New World tend to be genetically somewhat homogeneous, probably due to the fact that they went through a bottleneck across Berengia, and that they’re already sampled from the terminus of the Old World. A physical anthropologist once told me that the tribes of the Amazon still resemble Siberians in their build. It may be that it takes a homogeneous population with little extant variation a long time indeed to shift trait value toward a local ecological optimum (tropical Amerindians are leaner and less stocky than closely related northern populations, just not particularly in relation to other tropical populations). In contrast, populations in the center of Eurasia have access to a great deal of genetic variation because they’re in proximity to many distinctive groups (the Uyghurs for example are a recent hybrid population with European, South Asian and East Asian ancestry).

So that’s the theoretical backdrop for the differences in adaptations. Shifting to the how the adaptations play out concretely, some aspects of the physiology of Tibetan tolerance of high altitudes are mysterious, but one curious trait is that they actually have lower levels of hemoglobin than one would expect. Andean groups have elevated hemoglobin levels, which is the expected “brute force” response. Interestingly it seems that evolution given less time or stabilizing at a physiologically less optimal equilibrium is more comprehensible to humans! Nature is often more creative than us. In contrast the Tibetan adaptations are more subtle, though interestingly their elevated nitric acid levels may facilitate better blood flow. Though the inheritance patterns of the trait had been observed, the genetic mechanism underpinning it has not been elucidated. Now a new paper in Science identifies some candidate genes for the various physiological quirks of Tibetans by comparing them with their neighbors, and looking at the phenotype in different genotypes with the Tibetan population. Genetic Evidence for High-Altitude Adaptation in Tibet:

Tibetans have lived at very high altitudes for thousands of years, and they have a distinctive suite of physiological traits that enable them to tolerate environmental hypoxia. These phenotypes are clearly the result of adaptation to this environment, but their genetic basis remains unknown. We report genome-wide scans that reveal positive selection in several regions that contain genes whose products are likely involved in high-altitude adaptation. Positively selected haplotypes of EGLN1 and PPARA were significantly associated with the decreased hemoglobin phenotype that is unique to this highland population. Identification of these genes provides support for previously hypothesized mechanisms of high-altitude adaptation and illuminates the complexity of hypoxia response pathways in humans.

Here’s what they did. First, Tibetans are adapted to higher altitudes, Chinese and Japanese are not. The three groups are relatively close genetically in terms of ancestry, so the key is to look for signatures of positive selection in regions of the genome which have been identified as possible candidates in terms of functional significance in relation to pathways which may modulate the traits of interest. After finding potential regions of the genome possibly under selection in Tibetans but not the lowland groups, they fixed upon variants which are at moderate frequencies in Tibetans and noted how the genes track changes in the trait.

This figure from the supplements shows how the populations are related genetically:


In a worldwide context the three groups are pretty close, but they also don’t overlap. The main issue I would have with this presentation is that the Chinese data is from the HapMap, and they’re from Beijing. This has then a northeast Chinese genetic skew (I know that people who live in Beijing may come from elsewhere, but recent work which examines Chinese phylogeography indicates that the Beijing sample is not geographically diversified), while ethnic Tibetans overlap a great deal with Han populations in the west of China proper. In other words, I wouldn’t be surprised if the separation between Han and Tibetan was far less if you took the Chinese samples from Sichuan or Gansu, where Han and Tibetans have lived near each other for thousands of years.

tib2But these issues of phylogenetic difference apart, we know for a fact that lowland groups do not have the adaptations which are distinctive to the Tibetans. To look for genetic differences they focused on 247 loci, some from the HIF pathway, which is important for oxygen homeostasis, as well genes from Gene Ontology categories which might be relevant to altitude adaptations. Table 1 has the breakdown by category.

Across these regions of the genome they performed two haplotype based tests which detect natural selection, EHH and iHS. Both of these tests basically find regions of the genome which have reduced variation because of a selective sweep, whereby selection at a specific region of the genome has the effect of dragging along large neutral segments adjacent to the original copy of the favored variant. EHH is geared toward detection of sweeps which have nearly reached fixation, in other words the derived variant has nearly replaced the ancestral after a bout of natural selection. iHS is better at picking up sweeps which have not resulted in the fixation of the derived variant. The paper A Map of Recent Positive Selection in the Human Genome outlines the differences between EHH and iHS in more detail. They looked at the three populations and wanted to find regions of the genome where Tibetans, but not the other two groups, were subject to natural selection as defined by positive signatures with EHH and iHS. They scanned over 200 kb windows of the genome, and found that 10 of their candidate genes were in regions where Tibetans came up positive for EHH and iHS, but the other groups did not. Since these tests do produce false positives they ran the same procedure on 240 random candidate genes (7 genes were in regions where Chinese and Japanese came up positive, so these were removed from the set of candidates), and came up with average EHH and iHS positive hits of ~2.7 and ~1.4 genes after one million resamplings (specifically, these are genes where Tibetans were positive, the other groups negative). Their candidate genes focused on altitude related physiological pathways yielded 6 for EHH and 5 for iHS (one gene came up positive for both tests, so 10 total). This indicates to them these are not false positives, something made more plausible by the fact that we know that Tibetans are biologically adapted to higher altitudes and we have an expectation that these genes are more likely than random expectation to have a relationship to altitude adaptations.

Finally, they decided to look at two genes with allelic variants which exist at moderate frequencies in Tibetans, EGLN1 and PPARA. The procedure is simple, you have three genotypes, and you see if there are differences across the 31 individuals by genotype in terms of phenotype. In this case you want to look at hemoglobin concentration, where those who are well adapted have lower concentrations. Figure 3 is rather striking:


Even with the small sample sizes the genotypic effect jumps out at you. This isn’t too surprising, previous work has shown that these traits are highly heritable, and that they vary within the Tibetan population. There’s apparently a sex difference in terms of hemoglobin levels, so they did a regression analysis, and it illustrates how strong the genetic effect from these alleles are:


My main question: why do Tibetans still have variation on these genes after all this time? Shouldn’t they be well adapted to high altitudes by now? A prosaic answer may be that the Tibetans have mixed with other populations recently, and so have added heterozygosity through admixture. But there are several loci here which are fixed in Tibetans, and not the HapMap Chinese and Japanese. For admixture to be a good explanation one presumes that the groups with which the Tibetans mixed would have been fixed for those genes as well, but not the ones at moderate frequencies. This may be true, but it seems more likely that admixture alone can not explain this pattern. As the Andean example suggests adaptation to high altitudes is not easy or simple. Until better options arrive on the scene, kludges will suffice. It may be that the Tibetans are still going through the sieve of selection, and will continue to do so for the near future. Or, there may be balancing dynamics on the genes which exhibit heterozygosity, so that fixation is prevented.

No matter what the truth turns out to be, this is surely just the beginning. A deeper investigation of the genetic architecture of Andeans and Ethiopians, both of which have their own independent adaptations, will no doubt tell us more. Finally, I wonder if these high altitude adaptations have fitness costs which we’re not cognizant of, but which Tibetans living in India may have some sense of.

Citation: Tatum S. Simonson, Yingzhong Yang, Chad D. Huff, Haixia Yun, Ga Qin, David J. Witherspoon, Zhenzhong Bai, Felipe R. Lorenzo, Jinchuan Xing, Lynn B. Jorde, Josef T. Prchal, & RiLi Ge (2010). Genetic Evidence for High-Altitude Adaptation in Tibet Science : 10.1126/science.1189406

* Additionally, it may be that archaic hominin groups were resident in the Himalaya for nearly one million years. Neandertal admixture evidence in Eurasians should change our priors when evaluating the possibility for adaptive introgression on locally beneficial alleles.

Image Credit: Wikimedia Commons

Powered by WordPress