Razib Khan One-stop-shopping for all of my content

October 21, 2012

Buddy can you spare a selective sweep

The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.

As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:

Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending ...

January 22, 2012

How the Amhara breathe differently

I have blogged about the genetics of altitude adaptation before. There seem to be three populations in the world which have been subject to very strong natural selection, resulting in physiological differences, in response to the human tendency toward hypoxia. Two of them are relatively well known, the Tibetans and the indigenous people of the Andes. But the highlanders of Ethiopia have been less well studied, nor have they received as much attention. But the capital of Ethiopia, Addis Ababa, is nearly 8,000 feet above sea level!

Another interesting aspect to this phenomenon is that it looks like the three populations respond to adaptive pressures differently. Their physiological response varies. And the more recent work in genomics implies that though there are similarities between the Asian and American populations, there are also differences. This illustrates the evolutionary principle of convergence, where different populations approach the same phenotypic optimum, though by somewhat different means. To my knowledge there has not been as much investigation of the African example. Until now. A new provisional paper in Genome Biology is out, Genetic adaptation to high altitude in the Ethiopian highlands:

We highlight several candidate genes for involvement in high-altitude adaptation in Ethiopia, including CBARA1, VAV3, ARNT2 and THRB. Although most of these genes have not been identified in previous studies of high-altitude Tibetan or Andean population samples, two of these genes (THRB and ARNT2) play a role in the HIF-1 pathway, a pathway implicated in previous work reported in Tibetan and Andean studies. These combined results suggest that adaptation to high altitude arose independently due to convergent evolution in high-altitude Amhara populations in Ethiopia.

The main shortcoming about this paper for me is that it does not highlight the evolutionary history of this adaptation. In the paper the authors compared the Amhara (a highland population) to nearby lowland populations. But did not explore the nature of the population structure and how it might have influenced the arc of adaptation. Are these very ancient adaptations? Or new ones? It seems that hominins have been resident in Ethiopian for millions of years. If this is so presumably there have been adaptations to higher elevations from time immemorial. But what if these adaptations are new?

More pointedly the Ethiopians can be modeled as a compound of an Arabian population with an indigenous East African one. If this is a genuine recent admixture event, then one might be able to ascertain via haplotype structure whether the adaptive variants derive from ancient African genetic variation, or whether they’re novel mutations. It seems that this paper is a good first step, but there’s a lot more to see here….

Citation: Genome Biology, doi:10.1186/gb-2012-13-1-r1

Image credit: Wikipedia

June 21, 2011

We stand on the shoulders of cultural giants

ResearchBlogging.orgIn reading The cultural niche: Why social learning is essential for human adaptation in PNAS I couldn’t help but think back to a conversation I had with a few old friends in Evanston in 2003. They were graduate students in mathematics at Northwestern, and at one point one of them expressed some serious frustration at the fact that so many of the science and business students in his introductory calculus courses simply wanted to “learn” a disparate set of techniques, rather than understand calculus. The reality of course is that the vast majority of people who ever encounter calculus aim to learn it for reasons of utility, not so that they can grok the fundamental theorem of calculus. With the proliferation of tools such as Mathematica and powerful portable calculators fewer and fewer people are getting their hands dirty with calculus in an analytic sense, and more often see it as simply a “requirement” which they have to pass.

Calculus, and mathematics generally, is a clean and crisp human invention. In the late 17th century Isaac Newton and Gottfried Leibniz originated calculus as we understand it. Later thinkers extended their work. But for the vast ...

June 20, 2011

Convergent evolution happens!

In the image to the left you see three human males. You can generate three pairings of these individuals. When comparing these pairs which would you presume are more closely related than the other pairs? Now let me give you some more information. The rightmost image is of the president of Tanzania. The middle image is of the president of Taiwan (Republic of China). And finally, the leftmost image is of the prime minister of Papua New Guinea. With this information you should now know with certainty that the prime minister of Papua New Guinea and the president of Taiwan are much more closely related than either are to the president of Tanzania. But some of you may not have guessed that initially. Why? I suspect that physical inspection may have misled you. One of the most salient visible human characteristics is of the complexion of our largest organ, the skin. Its prominence naturally leads many to mistakenly infer relationships where they do not exist.

This was certainly an issue when European explorers encountered the peoples of Melanesia. An older ...

April 24, 2011

The evolutionary effect of the sky gods

ResearchBlogging.orgLast week I reviewed ideas about the effect of “exogenous shocks” to an ecosystem of creatures, and how it might reshape their evolutionary trajectory. These sorts of issues are well known in their generality. They have implications from the broadest macroscale systematics to microevolutionary process. The shocks point to changes over time which have a general effect, but what about exogenous parameters which shift spatially and regularly? I’m talking latitudes here. The further you get from the equator the more the climate varies over the season, and the lower the mean temperature, and, the less the aggregate radiation the biosphere catches. Allen’s rule and Bergmann’s rule are two observational trends which biologists have long observed in relation to many organisms. The equatorial variants are slimmer in their physique, while the polar ones are stockier. Additionally, there tends to be an increase in mean mass as one moves away from the equator.

But these rules are just general observations. What process underlies these observations? The likely culprit would be natural selection of course. But the specific manner in which this process shakes out, on both the organismic and genetic level, still needs to be elucidated ...

February 23, 2011

Sweeping through a fly’s genome

Credit: Karl Magnacca

The Pith: In this post I review some findings of patterns of natural selection within the Drosophila fruit fly genome. I relate them to very similar findings, though in the opposite direction, in human genomics. Different forms of natural selection and their impact on the structure of the genome are also spotlighted on the course of the review. In particular how specific methods to detect adaptation on the genomic level may be biased by assumptions of classical evolutionary genetic models are explored. Finally, I try and place these details in the broader framework of how best to understand evolutionary process in the “big picture.”

A few days ago I titled a post “The evolution of man is no cartoon”. The reason I titled it such is that as the methods become more refined and our data sets more robust it seems that previously held models of how humans evolved, and evolution’s impact on our genomes, are being refined. Evolutionary genetics at its most elegantly spare can be reduced down to several general parameters. Drift, selection, migration, etc. Exogenous phenomena such as the flux in census size, or ...

December 7, 2010

One diabetes gene to explain it all?

President William Howard Taft

It is the best of times, it is the worse of times. On the one hand the medical consequences of human genomics have been underwhelming. This is important because this is the ultimate reason that much of the basic research is funded. And yet we’ve learned so much. The genetic architecture of skin color has been elucidated, and we’ve seen a clarification of patterns of natural selection in the human genome. The finding last spring of Neandertal admixture in modern human populations is perhaps the most awesome pure science finding of late, coming close to resolving a decades old debate in anthropology. This doesn’t cure cancer, but it does connect the dots about the human past, and that’s not trivial. We are species haunted by our memories, so we might as well get them right!

But all hope is not lost. Research continues. And one area which general surveys of genomic variation have usually shown to be targets of natural selection, and, also have clear and immediate biomedical relevance, is that of metabolism. How we eat, and how we process and integrate the food we eat, is of obvious fitness relevance in the evolutionary and medical senses. It turns out that there is even variation in our saliva which is probably due to natural selection. The combination of diversity in human cuisine and susceptibility to the diseases of modern life indicate possibilities as to the relationship between past selection pressures and contemporary patterns of genetic variation. Of course one has to tread softly in this area, there are the inevitable confounds of environment, as well the unfortunate probability of any given locus being of small effect size in its influence on any given trait.

ResearchBlogging.orgA new paper in Genome Research reports a SNP which seems to have been subject to natural selection in Eurasians within the last 10,000 years. This variant is located within an exon on a gene, GIP, which produces peptides critical in the regulation of various metabolic pathways, in particular insulin response. A possible biomedical relevance to risk susceptibility is then explored subsequent to the evolutionary genomic preliminaries. Adaptive selection of an incretin gene in Eurasian populations:

Diversities in human physiology have been partially shaped by adaptation to natural environments and changing cultures. Recent genomic analyses have revealed single nucleotide polymorphisms (SNPs) that are associated with adaptations in immune responses, obvious changes in human body forms, or adaptations to extreme climates in select human populations. Here, we report that the human GIP locus was differentially selected among human populations based on the analysis of a nonsynonymous SNP (rs2291725). Comparative and functional analyses showed that the human GIP gene encodes a cryptic glucose-dependent insulinotropic polypeptide (GIP) isoform (GIP55S or GIP55G) that encompasses the SNP and is resistant to serum degradation relative to the known mature GIP peptide. Importantly, we found that GIP55G, which is encoded by the derived allele, exhibits a higher bioactivity compared with GIP55S, which is derived from the ancestral allele. Haplotype structure analysis suggests that the derived allele at rs2291725 arose to dominance in East Asians ∼8100 yr ago due to positive selection. The combined results suggested that rs2291725 represents a functional mutation and may contribute to the population genetics observation. Given that GIP signaling plays a critical role in homeostasis regulation at both the enteroinsular and enteroadipocyte axes, our study highlights the importance of understanding adaptations in energy-balance regulation in the face of the emerging diabetes and obesity epidemics.

This is a paper with several moving parts.

-There is genomics (the broad sweep of the genome)

-Genetics (a focus on a few genes and their consequences)


-And some allusion to epidemiology, as befits a paper which comes out of a medical department

The first observation is that rs2291725 differs a great deal across populations. As I said, it’s a SNP on an exon in GIP. Not only that, it’s nonsynonomous, which means that it’s in a position to change the structure and therefore function of the biochemical which the sequence is ultimately coding for. The T allele is the ancestral variant, while the C allele is the derived one. That means that C arose as a mutation against the background of T. There is a figure which shows the geographical distribution of the variance on this SNP from the HGDP data set in the paper, but I think the HGDP browser produces a crisper display, so here it is:


As you can see the ancestral allele is dominant in Africa. In several populations it is fixed. In contrast among non-African populations there’s quite a bit of variation. In East Asia the derived variant is at a high frequency, though not fixed. In West Eurasia and North Africa the two variants are at rough balance, more or less. Finally, in the New World the derived variant is found in appreciable proportions, but the ancestral variant of the SNP is found at much higher proportions than in other non-African populations. Seeing as how Amerindians derive from a branch of East Eurasians, common descent from an ancestor with the derived allele can not explain the frequency discrepancy. Interestingly the HGDP Melanesians have amongst the highest frequencies of the derived allele in the data set.

In any case, most of the analysis was not done with the HGDP sample, but with the first two phases of the HapMap. The marker density is richer in this sample, and obviously it is easier to compare a few populations than dozens. So the primary populations of comparison in this study were the Chinese + Japanese (ASN), Utah Whites (CEU), and Yoruba from Nigeria (YRI). It was immediately noticeable that when doing pairwise comparisons between two populations in the HapMap data set that the SNP of interest in GIP was exceptional in between population difference when set against other nonsynonymous SNPs. The chart below shows the SNP in red, with the full distribution curve of Fst (proportion of between population difference) illustrated by the bars in blue. rs2291725 is the top 0.5% of Fst difference between ASN and YRI.


The expected Fst between continental races is on the order of ~0.15. The ASN vs. YRI difference is far greater than that, and even more exceptional when you note the skew of the distribution. As it happens there’s HapMap3 data on this SNP as well. It doesn’t add much value to the HGDP, but does confirm the general findings:


Population descriptors:
ASW (A): African ancestry in Southwest USA
CEU (C): Utah residents with Northern and Western European ancestry from the CEPH collection
CHB (H): Han Chinese in Beijing, China
CHD (D): Chinese in Metropolitan Denver, Colorado
GIH (G): Gujarati Indians in Houston, Texas
JPT (J): Japanese in Tokyo, Japan
LWK (L): Luhya in Webuye, Kenya
MEX (M): Mexican ancestry in Los Angeles, California
MKK (K): Maasai in Kinyawa, Kenya
TSI (T): Tuscan in Italy
YRI (Y): Yoruban in Ibadan, Nigeria

Now that they’ve established between population variation at the SNP, what about the structure around the SNP? Remember, the SNP is one base pair. T in the ancestral state, C in the derived. The patterns of variation flanking the SNP in GIP can tell us a lot. What they found was this:

- Africans have several different haplotypes around the T allele. A haplotype is just a set of correlated markers

- The C allele in East Asians seem to be embedded within one haplotype, or set of markers

- There was a lot of linkage disequilibrium around the C allele in East Asians

In East Asians both EHH and iHS were consistent with, if not necessarily suggestive of, selection. A plausible scenario is that the C allele was subject to a powerful bout of natural selection recently, and the allele rose so rapidly in frequency that a selective sweep dragged along the flanking regions of the genome. This would homogenize the variance in that genic region within the population in question (East Asians), as the numerous other haplotypes would decline in proportion. To show the relationships of the various haplotypes within the three HapMap populations being analyzed here they produced an unrooted tree. Observe that the haplotype in which the derived variant is embedded has only Asians and Europeans, and is on a separate branch by itself:


I noted above that just because there is a lot of linkage disequilibrium and haplotype block structure in this region of the genome, it doesn’t necessarily mean that it was a target of natural selection. There may have been stochastic phenomenon which produced these results, and so our inference would be a false positive. To check for this they ran several models and simulations which varied demographic parameters under neutral (non-selective) conditions, and for the Asian sample the iHS scores were generally not as low as those for the SNP of interest. This does not “prove” that demography can not explain these results, but it does shift the probability more toward natural selection than before.

The circumstantial evidence presented above is that the derived allele rose to frequency relatively recently (in general LD decays rapidly over time, so these tests detect more recent selective or demographic events). They ran a simulation under neutral parameters, and for the frequency of the derived haplotype it would take 100-500,000 years for the various populations to reach the values which we see (starting from the initial mutant gene copy). The latter figure is outside the bounds of modern humanity, while the former probably pre-dates the ”Out of Africa” event. It is implausible that so much haplotype structure could be preserved over time, because recombination over the generations breaks apart associations between markers. Using the recombination rates, which would slowly degrade long haplotypes in the genome, the authors inferred that the C allele and its haplotype began to rise in frequency on the order of 12-2,000 years before the present.

Why would an allele rise to frequency within the past 10,000 years? The authors gave the game away in the abstract: humans shifted to different modes of primary production after the rise of agriculture. This is where the role of GIP in producing peptides which have a role in regulating our biochemistry is relevant. GIP is of a class of hormones found in the intestine called incretins:

Incretins are a group of gastrointestinal hormones that cause an increase in the amount of insulin released from the beta cells of the islets of Langerhans after eating, even before blood glucose levels become elevated. They also slow the rate of absorption of nutrients into the blood stream by reducing gastric emptying and may directly reduce food intake. As expected, they also inhibit glucagon release from the alpha cells of the Islets of Langerhans….

500px-Incretins_and_DPP_4_inhibitors.svgIncreased insulin reduces blood sugar. Diabetes is a malfunction of the insulin release mechanism, and so blood sugar begins to rise as individuals don’t uptake their glucose. Glucagon has the opposite effect, increasing blood sugar. But just because there is a change in a nonsynonymous position in an exonic region of a gene of relevance to the pathway, it doesn’t mean that that necessarily impacts the pathway which is illustrated to the left. And for natural selection to have any traction it needs to have an impact on some sort of concrete biological process (unless we’re talking intra-genomic competition of some sort).

It turns out that rs2291725 is actually just outside the primary coding region for the GIP peptide. For it to be a functional variant there needs to be more to the story. As it turns out, there are other less common variants which ware modified by changes at this SNP, GIP55S and GIP55G. The first is produced by the ancestral T allele, and the second by the derived C allele. GIP55S and GIP55G are also found in the intestine, though they only constitute a few percent of the total GIP.

gipactBut here’s where it gets really interesting: GIP55G exhibits more bioactivity over the long term. In other words it seems to be more potent the generic GIP or GIP55S, the ancestral variant. They’ve gone from supposition based on the functional significance of the broader gene, to a connection between the T→C transition over the last 10,000 years. As it turns out it may be that those with GIP55G would have a stronger insulin response, and so reduce blood sugar faster, than those without.

It doesn’t take a genius to figure out where there’re going with this. The relationship between insulin response and carbohydrates in our day and age is fraught. But we already suspect that carbs have reshaped the human genome through copy number variation in the amylase gene. It is interesting though that the derived variant has not fixed. That is, it hasn’t replaced the ancestral variant. This may be due to dominance, so that one copy is almost as efficacious as two, or, it may be due to balancing selection of some sort, which the authors suggest in the text. At this point it’s time to jump to the discussion and let the authors speak for themselves. They start out well:

Based on the gene age estimation and biochemical analyses, our study revealed a functional mutation that is associated with the selection of the GIP locus in East Asian populations ~8100 yr ago and the presence of a cryptic GIP isoform. Specifically, we showed that the inventory of human GIP peptides has recently diverged and that individuals could express three different combinations of GIP isoforms (GIP, GIP55S, and GIP55G) with distinct bioactivity profiles. Future study of how this phenotypic variation affects glucose and lipid homeostasis in response to different diets and of which physiological variations in humans can be attributed to prior gene–environmental interactions at the GIP locus is crucial to a better understanding of human adaptations in energy-balance regulation.

As I observed above many of the researchers have a biomedical background, and the NIH is funding this. The evolutionary anthropological findings, cautious as they are, are fascinating and of deep interest. But I don’t think this is going to go anywhere:

It was hypothesized by Neel almost 50 yr ago that mismatches between prior physiological adaptations and contemporary environments can lead to health risks because the ancestral variants that have been selected for the organism’s fitness or reproductive success may not be optimal for the individual’s health in the new environment…In support of this thrifty genotype hypothesis, a number of genes in humans and house mice have been implied to have coevolved with the emergence of agricultural societies…and a rapid shift in diets is associated with the detrimental effects on human survival in a number of human populations…Conceptually, the serum-resistant GIP55G carried by the GIP103C haplotype may have been beneficial for individuals who have unconstrained access to the food supply in many agricultural societies by preventing severe hyperglycemia. As selection pressure changed in these societies, the ancient GIP103T haplotype could have become a liability and conferred a loss of fitness in the new environment. In addition, we speculate that the selection of GIP in East Asians may contribute to the heterogeneity in the risk of diabetes among major ethnic groups at the present time….

Do you believe that the Han Chinese have had a surfeit of food compared to Africans over the past 10,000 years? Or compared to Europeans? Indians have had more food than Africans? The populations of the New World are in a food-poor environment? This doesn’t make any sense as an evolutionary explanation because the stable state for most of human history has been one of Malthusianism. A few people had a lot of food, ergo, the association of wealth with corpulence. Additionally, one can imagine that societies transitioning between modes of production would have a period when land would be in surplus and there was a lot of food. But for most of history life was grinding. This is simply an unbelievable story. Additionally, this SNP can’t explain most of the variation in diabetes. South Asians have the highest rates in the world, but they have appreciable proportions of the derived variant. I am of the CC (derived-derived) genotype myself (I justed checked on 23andMe), and I have a family risk of diabetes, so I know to ignore the relevance of these findings for myself when it comes to personal risk assessment.

There is probably not going to be one gene that explains diabetes, or obesity, etc. We already knew that, but there is a strange kabuki theater which goes on whereby research groups pretend as to the high significance of one locus, because how is it going to look to a granting agency that you’re out or explain ~1% of the variance in a trait for trivial predictive value? And yet usually they’re honest enough in the discussions to suggest that one finding needs to be integrated into a broader picture…as in the hundreds of other genes of interest!?!?!

This paper is fascinating as a work of human evolutionary history. They don’t have a good story, but they have results which need to be integrated into the bigger framework. But the paper is also a story of the culture of science today, driven by biomedical relevances which are often simply phantoms.

Citation: Chang CL, Cai JJ, Lo C, Amigo J, Park JI, & Hsu SY (2010). Adaptive selection of an incretin gene in Eurasian populations. Genome research PMID: 20978139

October 25, 2010

Body odor, Asians, and earwax

EarWhen I was in college I would sometimes have late night conversations with the guys in my dorm, and the discussion would random-walk in very strange directions. During one of these quasi-salons a friend whose parents were from Korea expressed some surprise and disgust at the idea of wet earwax. It turns out he had not been aware of the fact that the majority of the people in the world have wet, sticky, earwax. I’d stumbled onto that datum in the course of my reading, and had to explain to most of the discussants that East Asians generally have dry earwax, while convincing my Korean American friend that wet earwax was not something that was totally abnormal. Earwax isn’t something we explore in polite conversation, so it makes sense that most people would be ignorant of the fact that there was inter-population variation on this phenotype.

But it doesn’t end there. Over the past five years the genetics of earwax has come back into the spotlight, because of its variation and what it can tell us about the history and evolution of humans since the Out of Africa event. Not only that, it seems the variation in earwax has some other phenotypic correlates. The SNPs in and around ABCC11 are a set where East Asians in particular show signs of being different from other world populations. The variants which are nearly fixed in East Asia around this locus are nearly disjoint in frequency with those in Africa. Here are the frequencies of the alleles of rs17822931 on ABCC11 from ALFRED:

ResearchBlogging.orgThe expression of the dry earwax phenotype is contingent on an AA genotype, it has recessive expression. So in a population where the allele frequency of A ~0.50, the dry earwax phenotype would have a ~0.25 frequency. In a population where the A allele has a ~0.20 frequency, the dry earwax phenotype would be at ~0.04 frequency. Among people of European descent the dry earwax phenotype is present at proportions of less than ~5%. Because of recessive expression a larger minority of Japanese and Chinese should manifest wet earwax, though interestingly the ALFRED database indicates that Koreans are fixed for the A allele. In Africa conversely the G allele seems to be fixed.

So the question is: why? A new paper in Molecular Biology and Evolution argues that the allele frequency differences are a function of positive directional selection since humans left Africa ~100,000 years ago. The impact of natural selection on an ABCC11 SNP determining earwax type:

A nonsynonymous single nucleotide polymorphism (SNP), rs17822931-G/A (538G>A; Gly180Arg), in theABCC11 gene determines human earwax type (i.e., wet or dry) and is one of most differentiated nonsynonymous SNPs between East Asian and African populations. A recent genome-wide scan for positive selection revealed that a genomic region spanning ABCC11LONP2, and SIAH1 genes has been subjected to a selective sweep in East Asians. Considering the potential functional significance as well as the population differentiation of SNPs located in that region, rs17822931 is the most plausible candidate polymorphism to have undergone geographically restricted positive selection. In this study, we estimated the selection intensity or selection coefficient of rs17822931-A in East Asians by analyzing two microsatellite loci flanking rs17822931 in the African (HapMap-YRI) and East Asian (HapMap-JPT and HapMap-CHB) populations. Assuming a recessive selection model, a coalescent-based simulation approach suggested that the selection coefficient of rs17822931-A had been approximately 0.01 in the East Asian population, and a simulation experiment using a pseudo-sampling variable revealed that the mutation of rs17822931-A occurred 2006 generations (95% credible interval, 1023 to 3901 generations) ago. In addition, we show that absolute latitude is significantly associated with the allele frequency of rs17822931-A in Asian, Native American, and European populations, implying that the selective advantage of rs17822931-A is related to an adaptation to a cold climate. Our results provide a striking example of how local adaptation has played a significant role in the diversification of human traits.

The region around ABCC11 has come under scrutiny with the emergence of tests of natural selection predicated on inspecting patterns of linkage disequilibrium (LD). LD is basically measuring the association of genetic variants within the genome shifted away from expectation. A selective sweep tends to generate a lot of LD around the target of natural selection because as the allele in question rises in frequency its neighbors also hitchhike along. The hitchhiking process means that within a population you may see regions of the genome which exhibit long sequences of correlated single-nucelotide polymorphisms (SNPs), haplotypes. An initial selective event will presumably generate a very long homogenized block, which over time will break apart through recombination and mutation, as variation is injected back into the genome. The extent and decay of LD then can help us gauge the time and strength of selection events.

But LD can emerge via other processes besides natural selection. Imagine for example that a population of Africans and Europeans mix in a given generation. Europeans and Africans have different genetic makeups, on average, so the initial generations will have more LD than expectation because recombination will only slowly break apart the physical connection between genomic regions from European and African ancestors. The decay of LD then can give one a sense of the time since admixture as well as selection. Not only that, stochastic demographic events and processes are also important and may drive the emergence of LD. Consider a bottleneck where the frequency of a particular haplotype is driven up by random genetic drift alone. The details of these alternative scenarios are explored in the 2009 paper The role of geography in human adaptation.

All this is preamble to the fact that there’s a lot of LD around ABCC11. Here’s a visualization from the HapMap populations:


abc11From left to right you have Chinese & Japanese, Utah whites, and the Yoruba from Nigeria. An absolute value of D’ ~0 means that there’s linkage equilibrium; the default or null state where there are no atypical excessive correlations of alleles across the genome. The axes here are pairwise combinations of SNPs around ABCC11, with a focus around rs17822931, a nonsynonymous SNP which seems to be the likely functional source of the variance in earwax and other phenotypes. In terms of LD rank order the results are not surprising, across the genome East Asians tend to exhibit more LD than Europeans, and Europeans exhibit more LD than the Yoruba. Part of this is probably a function of population history, a serial bottleneck model Out of Africa would posit that drift and other stochastic forces would have a stronger impact on the genomes of East Asians than Europeans. But this seems like it can’t be the whole picture here; note the variance in allele frequency in the New World as well as in Oceania. Some of the Amerindian populations seem to have a higher frequency of the ancestral G allele on rs17822931. The figure above is easier to understand, the Y-axis is showing you the extent of heterozygosity at a given location. GA is heterozygous, GG is homozygous. Africans again tend to exhibit more heterozygosity than non-Africans, but note the sharply diminished heterozygosity for the East Asian sample around rs17822931 in ABCC11. Remember that heterozygosity tends not to go above 0.50 in a random mating population in a diallelic model (though in selective breeding it may go above 0.50 for F1 generations).

The major findings of this paper beyond what was known before seem to be a) an explicit model of how East Asians could have arrived at a high frequency of the AA genotype at rs17822931, and, b) the correlation between climate and the frequency of A. I’ll get to the second point in a bit, but what about the first? Using the nature of variation in two microsatellites flanking the SNP of interest in East Asians, and assuming a recessive selection model, the authors posit that the A allele began to rise in frequency ~50,000 years ago, and, that the selection coefficient was ~1% per generation. This a significant value for the selection parameter, and the timing is possible in light of the separation of non-Africans into a western and eastern group around that period.

But honestly I’m pretty skeptical of this. The confidence intervals don’t inspire confidence, and from what little I know selection for recessive traits should exhibit less linkage disequilibrium. At low frequencies there is very little affect of natural selection on the allele because it is mostly “masked” in heterozygotes, and therefore there will be a long period before its proportion begins to rise more rapidly. During this time recombination will have time to chop up the haplotypes around the SNP, reducing the length of the statistically associated haplotype block. Also, the authors themselves don’t seem to believe that the phenotype of earwax itself was the target of selection, so its recessive expression pattern should be less important from where I stand.

abcc11dThe idea that the genes around ABCC11 might have something to do with adaptation to cold is suggestive, but almost every East Asian trait of distinction has been hypothesized to have something to do with cold at some point by physical anthropologists. You’d figure that the Cantonese lived in igloos going by all the myriad adaptations to frigid conditions which they exhibit. The reality is that much of China, Korea and Japan are subtropical today. In any case the last figure shows the correlation across several lineages. Earlier they found that by comparing variation around this region in humans with other primates that Africans seem to be subject to purifying selection. This means that there’s constraint so that neutral forces don’t change the frequencies of functionally significant regions. It is well known that on average Africans are more diverse than non-Africans, probably because the latter are a sampling of the former, but, on a small minority of genes the reverse is true. This is likely due to the relaxation of functional constraint as humans left the ancestral African environment. And this is clearly true for rs17822931; most non-African populations exhibit some heterozygosity. East Asians here are an exception, not the rule, at having derived allele frequencies nearly fixed. The regression lines in this last figure are all statistically significant. It is interest that there are particularly strong correlations between latitude and and frequency of the derived A allele among Europeans and Native Americans. In contrast the relationship within Asian populations is weaker. Only 17% of the allele frequency variance can be explained by latitude variance among the Asian ALFRED sample.

But we shouldn’t allow the hypothesis to rise and fall just on this evidence. After all there have likely been substantial movements of populations within the last 10,000. Perhaps especially in East Asia, where the expansion of the Han south may have triggered the movement of both the Thai and Vietnamese people out of South China and into mainland Southeast Asia. The best evidence of adaptation would be among admixed populations; presumably those at higher latitudes would have higher frequencies of the AA genotype than those at lower latitudes. Instead of categorizing the populations into three coarse classes probably a more sophisticated treatment using ancestral quanta derived from STRUCTURE or ADMIXTURE as independent variables would be informative. Remember, adaptation should show evidence of decoupling ancestry from phenotype.

Finally, I have to point to this section of the discussion:

What is the cause of the selective advantage of rs17822931-A? Although the physiological function of earwax is poorly understood (Matsunaga 1962), dry earwax itself is unlikely to have provided a substantial advantage. The rs17822931-GG and GA genotypes (wet earwax) are also strongly associated with axillary osmidrosis, suggesting that the ABCC11 protein has an excretory function in the axillary apocrine gland (Nakano et al. 2009)…,

I really didn’t know what this meant. So I looked it up. Here’s what I found, A strong association of axillary osmidrosis with the wet earwax type determined by genotyping of the ABCC11 gene:

Apocrine and/or eccrine glands in the human body cause odor, especially from the axillary and pubic apocrine glands. As in other mammals, the odor may have a pheromone-like effect on the opposite sex. Although the odor does not affect health, axillary osmidrosis (AO) is a condition in which an individual feels uncomfortable with their axillary odor, regardless of its strength, and may visit a hospital. Surgery to remove the axillary gland may be performed on demand. AO is likely an oligogenic trait with rs17822931 accounting for most of the phenotypic variation and other unidentified functional variants accounting for the remainder. However, no definite diagnostic criteria or objective measuring methods have been developed to characterize the odor, and whether an individual suffers from AO depends mainly on their assessment and/or on examiner’s judgment. Human body odor may result from the breakdown of precursors into a pungent odorant by skin bacteria….

Perhaps the paper should have been titled “why barbarians smell bad”? In any case, an idea for a book title on Korean genetics: “the least smelly race.”*

Citation: Ohashi J, Naka I, & Tsuchiya N (2010). The impact of natural selection on an ABCC11 SNP determining earwax type. Molecular biology and evolution PMID: 20937735

* I’m referencing The Cleanest Race.

October 11, 2010

Natural selection in our time

Filed under: Adaptation,Biology,Environment,Evolution,Genetics,Selection — Razib Khan @ 12:35 am

Last month in Nature Reviews Genetics there was a paper, Measuring selection in contemporary human populations, which reviewed data from various surveys in an attempt to adduce the current trajectory of human evolution. The review didn’t find anything revolutionary, but it was interesting to see where we’re at. If you read this weblog you probably accept a priori that it’s highly unlikely that evolution “has stopped” because infant mortality has declined sharply across developed, and developing, nations. Evolution understood as change in gene frequencies will continue because there will be sample variance in the proportions of given alleles from generation to generation. But more interestingly adaptive evolution driven by change in mean values of heritable phenotypes through natural selection will also continue, assuming:

1) There is variance in reproductive fitness

2) That that variance is correlated with a phenotype

3) That those phenotypes are at all heritable. In other words, phenotypic variation tracks genotypic variation

Obviously there is variance in reproductive fitness. Additionally, most people have the intuition that particular traits are correlated with fecundity, whether it be social-cultural identities, or personality characteristics. The main issue is probably #3. It is a robust finding for example that in developed societies the religious tend to have more children than the irreligious. If there is an innate predisposition to religiosity, and there is some research which suggests modest heritability, then all things being equal the population would presumably be shifting toward greater innate predisposition toward religion as time passes. I do believe religiosity is heritable to some extent. More precisely I think there are particular psychological traits which make supernatural claims more plausible for some than others, and, those traits themselves are partially determined by biology. But obviously even if we think that religious inclination is partially heritable in a biological sense, it is also heritable in the familial sense of values passed from one generation to the next, and in a broader cultural context of norms imposed from on high. In other words, when it comes to these sorts of phenotypic analyses we shouldn’t get too carried away with clean genetic logics. In Shall the Religious Inherit the Earth? Eric Kaufmann notes that it is in the most secular nations that the fertility gap between the religious and irreligious is greatest, and therefore selection for religiosity would be strongest in nations such as Sweden, not Saudi Arabia. But as a practical matter biologically driven shifts in trait value in this case pales in comparison to the effect of strong cultural norms for religiosity.

Below are two of the topline tables which show the traits which are currently subject to natural selection. A + sign indicates that there is natural selection for higher values of the trait, and a – sign the inverse.  An s indicates stabilizing selection, which tells you that median values have higher fitnesses than the extremes. The number of stars is proportional to statistical significance.



Some of this is not surprising. The age of the onset of menarche has been dropping in much of the world. I suspect this is mostly due to better nutrition, but a consequence of this shift is earlier fertility for some females. The authors are nervous about the robust correlation of higher fertility with lower intelligence, but notice that the pattern for wealth and income is different and more complicated. The key is to look at education.  Whether you believe intelligence exists or not in any substantive concrete sense, those who are more intelligent are more likely to have had more education, and there’s a rather common sense reason why investing in more schooling would reduce your fertility: you simply forgo some of your peak reproductive years, especially if you’re female. The higher you go up the educational ladder the stronger the anti-natalist cultural and practical pressures become (the latter is a heavier burden for females because of their biological centrality in child-bearing, but both males and females are subject to the former). As with religion even if the differences have no biological implication because you believe the correlations are spurious or reject the existence of the trait one presumes that parents and subcultures pass on values to offspring. If higher education has anti-natalist correlations we shouldn’t be surprised if subsequent generations turn away from higher education. Their parents were the ones who were more likely to avoid it.

We live in interesting times.

October 4, 2010

The adaptive space of complexity

evocomplexEvolution means many things to many people. On the one hand some scholars focus on time scales of “billions and billions,” and can ruminate upon the radical variation in body plans across the tree of life. Others put the spotlight on the change in gene frequencies on the scale of years, of Ph.D. programs. While one group must glean insight from the fossil remains of trilobites and ammonites, others toils away in dimly lit laboratories breeding nematodes and fruit flies, generations upon generations. More recently a new domain of study has been focusing specifically on the arc of animal development as a window onto the process of evolution. And so forth. Evolution has long been dissected by an army of many specialized parts.

ResearchBlogging.orgAnd yet the core truth which binds science is that nature is one. No matter the disciplinary lens which we put on at any given moment we’re plumbing the same depths on some fundamental level. But what are the abstract structures of those depths? Can we project a tentative map of the fundamentals before we go exploring through observation and experiment? That’s the role of theoreticians. Charles Darwin, R. A. Fisher, and Sewall Wright. Evolution is a phenomenon which is on a deep level an abstraction, though through objectification we speak of it as if it was as concrete as the frills of the Triceratops. As an abstraction it is open to mathematical formalization. Models of evolution may purport to tell us how change over time occurs in specific instances, but the ultimate aim is to capture the maximum level of generality possible.

Though the original mathematical theoreticians of evolution, in particular R. A. Fisher and Sewall Wright, were critical in the formation of the Modern Neo-Darwinian Synthesis, their formal frameworks were not without critics from within the mainstream. Ernst W. Mayr famously rejected “beanbag genetics,” the view propounded specifically by R. A. Fisher and J.B. S. Haldane in England that a model of evolution could be constructed from singular genetic elements operating independently upon traits. Mayr, as an ecologist and naturalist, believed that this framework lacked the essential integrative or holistic aspect of biology as it manifested in the real world. Selection after all operated proximately on the fitness of the whole organism. We’ve come a long way since those debates. One of the problems with the earlier disputes is that they were not sufficiently informed by the empirical evidence because of the primitive nature of experimental and observational evolutionary biology. Molecular biology changed that, and now the rise of genomics has also become a game changer. Genomics gets at the concrete embodiment of evolutionary change at its root, the structure and variation of the genomes of organisms.

A new paper in PNAS is a nice “mash-up” of the old and the new, Genomic patterns of pleiotropy and the evolution of complexity:

Pleiotropy refers to the phenomenon of a single mutation or gene affecting multiple distinct phenotypic traits and has broad implications in many areas of biology. Due to its central importance, pleiotropy has also been extensively modeled, albeit with virtually no empirical basis. Analyzing phenotypes of large numbers of yeast, nematode, and mouse mutants, we here describe the genomic patterns of pleiotropy. We show that the fraction of traits altered appreciably by the deletion of a gene is minute for most genes and the gene–trait relationship is highly modular. The standardized size of the phenotypic effect of a gene on a trait is approximately normally distributed with variable SDs for different genes, which gives rise to the surprising observation of a larger per-trait effect for genes affecting more traits. This scaling property counteracts the pleiotropy-associated reduction in adaptation rate (i.e., the “cost of complexity”) in a nonlinear fashion, resulting in the highest adaptation rate for organisms of intermediate complexity rather than low complexity. Intriguingly, the observed scaling exponent falls in a narrow range that maximizes the optimal complexity. Together, the genome-wide observations of overall low pleiotropy, high modularity, and larger per-trait effects from genes of higher pleiotropy necessitate major revisions of theoretical models of pleiotropy and suggest that pleiotropy has not only allowed but also promoted the evolution of complexity.

The basic thrust of this paper is to test older theoretical models of evolutionary genetics and their relationship and dependence on pleiotropy against new genomic data sets. In The Genetical Theory of Natural Selection R. A. Fisher proposed a model whereby all mutations affect every trait, and the effect size of the mutations exhibited a uniform distribution. Following in Fisher’s wake the evolutionary geneticist H. Allen Orr published a paper ten years ago, Adaptation and the cost of complexity, which argued that “…the rate of adaptation declines at least as fast as n-1, where n is the number of independent characters or dimensions comprising an organism.” This is the “cost of complexity,” which lay at the heart of this paper in PNAS.

To explore these questions empirically the authors looked at five data sets:

- yeast morphological pleiotropy, is based on the measures of 279 morphological traits in haploid wild-type cells and 4,718 haploid mutant strains that each lack a different nonessential gene (this also yielded quantitative measures)

- yeast environmental pleiotropy, is based on the growth rates of the same collection of yeast mutants relative to the wild type in 22 different environments

- yeast physiological pleiotropy, is based on 120 literature-curated physiological functions of genes recorded in the Comprehensive Yeast Genome Database (CYGD)

- nematode pleiotropy, is based on the phenotypes of 44 early embryogenesis traits in C. elegans treated with genome-wide RNA-mediated interference

- mouse pleiotropy, is based on the phenotypes of 308 morphological and physiological traits in gene-knockout mice recorded in Mouse Genome Informatics (MGI)

pleio1The first figure shows the results of the survey. You see in each data set the mean and median number of traits affected by mutations on a given gene, as well as the distribution of effects. Two conclusions are immediately evident, 1) most genes have a relationship only to a small number of traits, 2) very few genes have a relationship to many traits. You also see the percentages of genes impacted by pleiotropy is rather small. This seems to immediately take off the table simplifying assumptions of a mutant variant producing changes across the full range of traits in a complex organism. Additionally the effects do not seem to exhibit a uniform distribution; rather, they’re skewed toward genes which are minimally or trivially pleiotropic. From the text:

Our genome-wide results echo recent small-scale observations from fish and mouse quantitative trait locus (QTL) studiies…and an inference from protein sequence evolution…and reveal a general pattern of low pleiotropy in eukaryotes, which is in sharp contrast to some commonly used theoretically models…that assume universal pleiotropy (i.e., every gene affects every trait)

So if the theoretical models are wrong, what’s right? In this paper the authors argue that it seems as if pleiotropy has a modular structure. That is, mutations tend to have impacts across sets of correlated traits, not across a random distribution of traits. This is important when we consider the fitness implications of mutations, for if the impacts were not modular but randomly distributed the putative genetic correlations which would more likely serve as dampeners on directional change in trait value.

Figure 2 shows the high degree of modularity in their data sets:


pleio3Now that we’ve established that mutations tend to have clustered effects, what about their distribution? Fisher’s original model postulated a uniform distribution. The first data set, the morphological characteristics of baker’s yeast, had quantitative metrics. Using the results from 279 morphological traits they rejected the assumption of a uniform distribution. In fact the distribution was closer to normal, with a central tendency and a variance about the mode. Second, they found that standard deviations of effect sizes varied quite a bit as well. Many statistical models assume invariant standard deviations, so it is not surprising that that was the initial assumption, but I doubt many will be that surprised that the assumption turns out not to be valid. The question is: does this matter?

Yes. Within the parameter space being explored one can calculate distances which we can use to measure the effect of mutations. Panels C to F show the distances as a function of pleiotropic effect. The left panels are Euclidean distances while the right panels are Manhattan distances. The first two panels show the outcomes from the parameter values generated from their data sets. The second two panels use randomly generated effect sizes assuming a normal distribution. The last two panels use randomly generated effect sizes, and, assume a constant standard deviation (as opposed to the empirical distribution of standard deviations which varied).

To connect these empirical results back to the theoretical models: there are particular scaling parameters, the values of which the earlier models assumed, but which can now be calculated from the real data sets. It turns out that the empirical scaling parameter values differ rather significantly from the assumed parameter values, and this changes the inferences one generates from the theoretical models. The empirically calculated value of b = 0.612, as an exponent on the right hand side of the equation which generates the distances within the parameter space. From the text: “the invariant total effect model…assumes a constant total effect size (b = 0), whereas the Euclidian superposition model…assumes a constant effect size per affected trait (b = 0.5).” Instead of looking at the number value, note what each value means verbally. What they found in the empirical data was that there was variant effect size per affected trait. In this paper the authors found larger per-trait effects for genes affecting more traits, and this seems to be a function of the fact that b > 0.5; with a normal distribution of effect sizes and a variance in the standard deviation of effect sizes.

This all leads us back to the big picture question: is there cost of complexity?Substituting in the real parameters back into the theoretical framework originated by Fisher, and extended by H. Allen Orr and others, they find that the cost of complexity disappears. Mutations do not effect all traits, so more complex organisms are not disproportionately impacted by pleiotropic mutations. Not only that, the modularity of pleiotropy likely decreases the risk of opposing fitness implications due to a mutation, since similar traits are more likely to be similarly effected in fitness. These insights are summarized in the last figure:


The one to really focus on is panel A. As you can see there is a sweet spot in complexity when it comes to the rate of adaptation. Contra earlier models there isn’t a monotonic decrease in the rate of adaptation as a function of complexity, but rather an increase until to an equipoise, before a subsequent decrease. At least within the empirically validated range of the scaling exponent. This is important because we see complex organisms all around us. When theory is at variance with the observational reality we are left to wonder what the utility of theory is (here’s looking at your economists!). By plugging empirical results back into the theory we now have a richer and more robust model. I will let the authors finish:

First, the generally low pleiotropy means that even mutations in organisms as complex as mammals do not normally affect many traits simultaneously. Second, high modularity reduces the probability that a random mutation is deleterious, because the mutation is likely to affect a set of related traits in the same direction rather than a set of unrelated traits in random directions…These two properties substantially lower the effective complexity of an organism. Third, the greater per-trait effect size for more pleiotropic mutations (i.e., b > 0.5) causes a greater probability of fixation and a larger amount of fitness gain when a beneficial mutation occurs in a more complex organism than in a less complex organism. These effects, counteracting lower frequencies of beneficial mutations in more complex organisms…result in intermediate levels of effective complexity having the highest rate of adaptation. Together, they explain why complex organisms could have evolved despite the cost of complexity. Because organisms of intermediate levels of effective complexity have greater adaptation rates than organisms of low levels of effective complexity due to the scaling property of pleiotropy, pleiotropy may have promoted the evolution of complexity. Whether the intriguing finding that the empirically observed scaling exponent b falls in a narrow range that offers the maximal optimal complexity is the result of natural selection for evolvability or a by-product of other evolutionary processes…requires further exploration.

Citation: Wang Z, Liao BY, & Zhang J (2010). Genomic patterns of pleiotropy and the evolution of complexity. Proceedings of the National Academy of Sciences of the United States of America PMID: 20876104

Image credit: Moussa Direct Ltd., http://evolutionarysystemsbiology.org

August 27, 2010

Chosen genes of the Chosen People

ashjewheadshotLast spring two very thorough papers came out which surveyed the genetic landscape of the Jewish people (my posts, Genetics & the Jews it’s still complicated, Genetics & the Jews). The novelty of the results was due to the fact that the research groups actually looked across the very diverse populations of the Diaspora, from Morocco, Eastern Europe, Ethiopia, to Iran. They constructed a broader framework in which we can understand how these populations came to be, and how they relate to each other. Additionally, they allow us to have more perspective as to the generalizability of medical genetics findings in the area of “Jewish diseases,” which for various reasons usually are actually findings for Ashkenazi Jews (the overwhelming majority of Jews outside of Israel, but only about half of Israeli Jews).

Just as the two aforementioned papers were deep explorations of the genetic history of the Jewish people, and allowed for a systematic understanding of their current relationships, a new paper in PNAS takes a slightly different tack. First, it zooms in on Ashkenazi Jews. The Jews whose ancestors are from the broad swath of Central Europe, and later expanded into Poland-Lithuania and Russia. The descendants of Litvaks, Galicians, and the assimilated Jewish minorities such as the Germans Jews. Second, though constrained to a narrower population set, the researchers put more of an emphasis on the evolutionary parameter of natural selection. Like any population Jews have been impacted by drift, selection, migration (and its variant admixture), and mutation. Teasing apart these disparate parameters may aid in understanding the origin of Jewish diseases.

ResearchBlogging.orgThe paper is open access, so you don’t have to take my interpretation as the last word. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population:

The Ashkenazi Jewish (AJ) population has long been viewed as a genetic isolate, yet it is still unclear how population bottlenecks, admixture, or positive selection contribute to its genetic structure. Here we analyzed a large AJ cohort and found higher linkage disequilibrium (LD) and identity-by-descent relative to Europeans, as expected for an isolate. However, paradoxically we also found higher genetic diversity, a sign of an older or more admixed population but not of a long-term isolate. Recent reports have reaffirmed that the AJ population has a common Middle Eastern origin with other Jewish Diaspora populations, but also suggest that the AJ population, compared with other Jews, has had the most European admixture. Our analysis indeed revealed higher European admixture than predicted from previous Y-chromosome analyses. Moreover, we also show that admixture directly correlates with high LD, suggesting that admixture has increased both genetic diversity and LD in the AJ population. Additionally, we applied extended haplotype tests to determine whether positive selection can account for the level of AJ-prevalent diseases. We identified genomic regions under selection that account for lactose and alcohol tolerance, and although we found evidence for positive selection at some AJ-prevalent disease loci, the higher incidence of the majority of these diseases is likely the result of genetic drift following a bottleneck. Thus, the AJ population shows evidence of past founding events; however, admixture and selection have also strongly influenced its current genetic makeup.

The sample size of Ashkenazi Jews was ~400, and they looked at ~700,000 SNPs. As I said, how Jews relate to other populations really isn’t at the core of this paper as it was in the earlier ones from the spring, but there were the PCA plots (sorry Mike), a frappe bar plot, and a phylogenetic tree derived from Fst statistic. Again, remember that PCA is showing you the largest independent components of genetic variation within the data. The bar plot has a set of ancestral populations of which individuals are composites of. And finally, Fst measures between population component of genetic variation. The larger the Fst across two populations the bigger the genetic distance.

Using the Druze & Palestinians as the ancestral Middle Eastern reference the authors estimated that the European admixture into Ashkenazi Jews is on the order of 30-55%. This is in the same ballpark as the previous studies, so no great surprise. As I stated in earlier posts the authors can spin the same results in very different ways. From what I can tell these authors are inclined to emphasize the strong possibility that in terms of genetic distance Ashkenazi Jews are somewhat closer to Europeans than they are to Levantine Arabs. Of course these sorts of assertions need to be handled with care. The genetic distance between Ashkenazi Jews and Tuscans is less than half that between Ashenazi Jews and Russians, while the Jewish-Russian value is about 50% larger than the Jewish-Palestinian one. Remember that there’s a fair amount of circumstantial evidence that Tuscans may themselves be a relatively recent hybrid population between indigenous residents of the Italian peninsula and Near Easterners.

ashjtab1One thing that this paper does do is rebut any strong assertion that Ashkenazi Jews are a genetically homogeneous population which went through a powerful bottleneck. Basically, the idea that Jewish diseases are just an outcome of the operational inbreeding that occurs when genetic variation is expunged from a population through low effective population size. The clincher seems to be comparison of heterozygosity of Ashkenazi Jews and gentile Europeans. The former are actually somewhat more heterozygous than the latter. There’s been a bit of evidence from previous research that the long term effective population size of Ashkenazi Jews was not necessarily very small, so this isn’t a total surprise. Remember that heterozygosity simply means the fraction of individuals heterozygous at a locus.

One way you can become heterozygous is naturally admixture. Remember that populations differ across many genes. As an example, there’s a pigmentation gene, SLC24A5, where all Europeans are at one state, and all West Africans in another. Naturally African Americans exhibit much more heterozygosity on this locus than the ancestral populations. The Ashkenazi Jewish case is less extreme because the two parental populations are genetically closer, but the principle still holds.

A consequence of recent admixture between genetically different populations are high levels of linkage disequilibrium, non-random associations of alleles at different loci across the genome. Why? There are many genes where two populations may be very different. Offspring inherit half their genome from one parent, and half from the other, and the parents pass along to their offspring particular associations of alleles. There may be a set of European distinctive alleles on a chromosome, and an African distinctive set of alleles, so that in a hybrid individual the alleles are strongly correlated across loci. These associations are broken down over time by recombination. The regularity of this process can serve as a clock with which to measure the period since admixture. African Americans were used to calibrate the time since admixture for the Uyghur people of western China, who are mixed from West and East Eurasian populations. The authors did not do this in this paper, I assume because the ancestral populations were genetically rather close in comparison to the two above examples, so there’d be less linkage disequilibrium to break down in the first place.

In the Ashkenazi Jewish population they found more linkage disequilibrium than in Europeans as well as longer haplotypes. This could be the result of a population bottleneck where drift could drive up the frequency of blocks of the genome, but as they note in the paper that should probably reduce heterozygosity. The natural inference then is that admixture between distinct populations can explain both data points.

ashslselectBut let’s cut to the chase. What genes exhibit signatures of natural selection in Ashkenazi Jews? More precisely, what distinctive regions of the genome exhibit signatures of natural selection? They used the standard haplotype type based methods. Basically you’re looking for regions of the genome where there are long blocks of correlated alleles, signs of a selective sweep due to a favored variant which dragged along flanking genomic regions as it rose rapidly in frequency, more rapidly than recombination could break apart the associations. Because recombination does breaks up associations over time, you need the selective sweeps to be relatively recent to detect them with these methods. Since the Jewish people, and Ashkenazi Jews more particularly, are relatively recent historically timing shouldn’t be an issue for Jewish specific sweeps. But another factor is that the two primary tests they used, EHH and iHS, are not good at picking up sweeps which are just starting. EHH is geared toward sweeps which are almost complete, so the frequency of the selected allele is near 100%. iHS is better are mid-range values. Using a combination of these two techniques they found that six genes which are implicated in diseases characteristic of Ashkenazi Jews have the hallmarks of natural selection. Natural selection is self-evident, so what seems to have been going here is that the disease was simply a side effect or byproduct of adaptation.

The strongest signal they found was in ALDH2. The strongest signal in Europeans, LCT, was not found in Ashkenazi Jews. But is LCT a strong signal in Europeans? Many Southern European populations have low frequencies of the derived LCT allele, indicating that they haven’t been subject to strong selection for lactase persistence. These are the same populations genetically close to the Ashkenazi Jews. The authors suggest that the Jewish-European admixture occurred before the sweep of the derived LCT allele, but it seems more plausible that the Ashkenazim simply admixed with a European population, such as Italians, which do not exhibit much lactase persistence. As for ALDH2, the association between genetic variation on this locus and alcoholism is well known, and has been used to explain the low Jewish rates of the disease. In this case, the authors posit that protection from alcoholism is a positive side effect of natural selection:

The mechanism driving selection of the ALDH2 locus is unknown, but a plausible target of selection also within this selected region is the TRAFD1/FLN29 gene, which is a negative regulator of the innate immune system, important for controlling the response to bacterial and viral infection (49). TRAFD1/FLN29 may have conferred a selective advantage in the immune response to a pathogen, perhaps near the time that the Jews returned to Israel from their Babylonian captivity. Despite the unclear selective mechanism, this remains a remarkable example of a putatively selected region accounting for a known population phenotype.

Many of the other loci naturally did not show signatures of natural selection. But this sort of work is exploratory, and there are limits to the power of their techniques. As it is, it seems that we’re very far along on understanding the phylogenetic tree of the Jewish people, and we’re finally getting a grip on the exogenous parameters which might prune the branches.

Citation: Steven M. Bray, Jennifer G. Mulle, Anne F. Dodd, Ann E. Pulver, Stephen Wooding, & Stephen T. Warren (2010). Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population PNAS : 10.1073/pnas.1004381107

Related: John Hawks, New data on Ashkenazi population history.

Image Credit: Wikimedia

August 12, 2010

Hybridization is like sex

480px-Olivia_MunnOne of the major issues which has loomed at the heart of biology since The Origin of Species is why species exist, as well as how species come about. Why isn’t there a perfect replicator which performs all the conversion of energy and matter into biomass on this planet? If there is a God the tree of life almost seems to be a testament to his riotous aesthetic sense, with numerous branches which lead to convergences, and a inordinate fascination with variants on the basic morph of beetles. From the outside the outcomes of evolutionary biology look a patent mess, a sprawling expanse of experiments and misfires.

A similar issue has vexed biologists in relation to sex. Why is it that the vast majority of complex organisms take upon themselves the costs of sex? The existence of a non-offspring bearing form within a species reduces the potential natural increase by a factor of two before the game has even begun. Not only that, but the existence of two sexes who must seek each other out expends crucial energy in a Malthusian world (selfing hermaphrodites obviously don’t have this problem, but for highly complex organisms they aren’t so common). Why bother? (I mean in an ultimate, not proximate, sense)

It seems likely that part of the answer to both these questions on the grande scale is that the perfect is the enemy of long term survival. Sexual reproduction confers upon a lineage a genetic variability which may reduce fitness by shifting populations away from the adaptive peak in the short term, but the fitness landscape itself is a constant bubbling flux, and perfectly engineered asexual lineages may all too often fall off the cliff of what was once their mountain top. The only inevitability seems to be that the times change. Similarly, the natural history of life on earth tells us that all greatness comes to an end, and extinction is the lot of life. The universe is an unpredictable place and the mighty invariably fall, as the branches of life’s tree are always pruned by the gardeners red in tooth and claw.

ResearchBlogging.orgBut it is one thing to describe reality in broad verbal brushes. How about a more rigorous empirical and theoretical understanding of how organisms and the genetic material through which they gain immortality play out in the universe? A new paper which uses plant models explores the costs and benefits of admixture between lineages, and how those two dynamics operate in a heterogeneous and homogeneous world. Population admixture, biological invasions and the balance between local adaptation and inbreeding depression:

When previously isolated populations meet and mix, the resulting admixed population can benefit from several genetic advantages, including increased genetic variation, the creation of novel genotypes and the masking of deleterious mutations. These admixture benefits are thought to play an important role in biological invasions. In contrast, populations in their native range often remain differentiated and frequently suffer from inbreeding depression owing to isolation. While the advantages of admixture are evident for introduced populations that experienced recent bottlenecks or that face novel selection pressures, it is less obvious why native range populations do not similarly benefit from admixture. Here we argue that a temporary loss of local adaptation in recent invaders fundamentally alters the fitness consequences of admixture. In native populations, selection against dilution of the locally adapted gene pool inhibits unconstrained admixture and reinforces population isolation, with some level of inbreeding depression as an expected consequence. We show that admixture is selected against despite significant inbreeding depression because the benefits of local adaptation are greater than the cost of inbreeding. In contrast, introduced populations that have not yet established a pattern of local adaptation can freely reap the benefits of admixture. There can be strong selection for admixture because it instantly lifts the inbreeding depression that had built up in isolated parental populations. Recent work in Silene suggests that reduced inbreeding depression associated with post-introduction admixture may contribute to enhanced fitness of invasive populations. We hypothesize that in locally adapted populations, the benefits of local adaptation are balanced against an inbreeding cost that could develop in part owing to the isolating effect of local adaptation itself. The inbreeding cost can be revealed in admixing populations during recent invasions.

First, plants are good models to explore evolutionary genetics. They’re not as constrained as say mammals, or the typical tetrapod, when it comes to barriers to gene flow between distinct taxa. Hybridization is common, and plants can also self-fertilize as well as cross-fertilize, allowing researchers to push the genetic pool in different directions (”selfing” obviously reduces the effective population and is an extreme form of inbreeding, so it’s a good way to purge genetic variation really quickly). In a perfect abstract world of evolution one might imagine Richard Dawkins’ vehicles and replicators as fluid entities which float along a turbid sea of evolutionary genetic parameters, drift, migration, mutation and selection. But reality is constrained to DNA substrate, which have their own parameters such as recombination, modulators such as epigenetics, and numerous ways to express variation through gene regulation. It’s complicated, and stripping the issues down to their pith is easier said that done.

But the broader dynamics here being examined is the generalist-specialist trade-off, which I think is relevant to the two issues I introduced earlier in this post. Specialists are optimized for their own position in the adaptive landscape, but have difficulties when it is perturbed. Generalists always less than maximum fitness in all landscapes, but higher average fitness across them because they can adapt to changes. Specialization is local adaptation of particular lineages, while in the generalist case you can have invasive species in novel environments. They’re obviously facing an adaptive landscape which is at some remove from what any of the introduced genotypes were “optimized” for, so hybridization produces something new for something new.

In the first figure of the paper you see F3 wild barley descended from two parental lineages, ME and AQ. The left panels show seed output as a function of heterozygosity, and the right panels as a function of ME genome content. Remember that in subsequent generations the descendants of hybrids will vary quite a big in genetics and phenotype as the original alleles re-segregate.


The takeaway is that in novel environments genetic variation seems to result in increased fitness. Why? One concept which one has to introduce is heterosis, whereby crosses between homogeneous lineages produce more fitness offspring. One reason this may be is that there is overdominance, where heterozygotes have greater fitness than the homogyzotes. This is the case with sickle-cell malaria disease. Another reason may be that in the original parental lineages there was a higher fraction of alleles which were deleterious in homozygote genotypes. In plain English, inbreeding resulted in genetic drift which cranked up the proportion of alleles implicated in recessively express negative phenotypes. The authors argue though that in the context local adaptation is strong enough to be a barrier against too much gene flow between the parental wild barely lineages, so the deleterious alleles are less likely to be masked. Only in a novel environment when that benefit was removed from the equation could the negative consequences of inbreeding come to the fore in the total calculus.

Figure 2 shows the results of experiments which examine the fitness of white campion, a European species which has been introduced in North America. In the left panel are crosses between native European lineages, with distance between parental lineages on the x-axis. In the right panel you have the same experiment, but with North American variants, which are products of introductions from various regions of Europe. The plants were grown in a “common garden,” to show how all the genotypes performed when environment was controlled.


As you can see moderate levels of hybridization entailed a benefit in the European variants, but not the North American variants. Hybridization between variants which were too distant did produce outbreeding depression in the European case, suggesting perhaps that disruption of co-adapted gene complexes resulted in a greater fitness cost than the masking of deleterious alleles due to inbreeding. One can make the inference from these data that the introduced white campion lineages are already hybridized, the barriers to crossing being removed by a disruption of the adaptive landscapes which each native lineages was optimized for.

Here are the authors from the discussion talking about invasions of exotic species:

Provided that multiple introductions from different source populations have occurred, the benefits of admixture become freely available to introduced populations that do not yet show a pattern of local adaptation. Because the benefits are potentially large, admixture may play an important role during early invasions. Native populations often show evidence of inbreeding depression…and one instant reward of admixture in the introduced range is the release of this genetic burden. Such heterosis effects can contribute significantly to the establishment and early success of invasive species…When tested together in a common garden experiment, invaders can show enhanced fitness-related traits compared with populations from their native range…If there is evidence of admixture, the effects of heterosis might be a default explanation for such observations, perhaps providing a null expectation against which other explanations (such as trait evolution) need to be tested.

What have plants to do with life as a whole? I assume much. Plants differ in the details, but compared to other complex multicellular organisms in regards to evolutionary genetics they’re quite liberated. By this, I mean that their modes of reproduction and promiscuity in hybridization make them more of an ideal “frictionless” test case of evolutionary biology and the power of the classical parameters. Perhaps given enough time natural selection would produce the ideal replicator to rule them all, to drive all others to extinction. But that day is not this day. And that day may never come because the universe is far too protean and erratic. Life is varied, on the phenotypic and genotypic level, and the exogenous processes of climate and geology continue to warp and reshape the adaptive landscape. And more subtly, but just as critically, life is always in an endless race with itself, as pathogens co-evolve with their hosts, and predators figure out how to outfox their prey. Life warps its own adaptive landscapes, and the innovation of one branch may lead to extinction of others as well as the proliferation of new branches.

More prosaically and anthropocentrically what does this say about us? Humans are an expansive species, and over the past 500 years different lineages have been hybridizing promiscuously. New genotypes have arisen in altered landscapes, and our pathogens are also riding the high tide of globalization onward and upward. We are ourselves a “natural experiment.”

Image Credit: Olivia Munn by Gage Skidmore

Link hat tip: Dienekes.

Citation: Verhoeven KJ, Macel M, Wolfe LM, & Biere A (2010). Population admixture, biological invasions and the balance between local adaptation and inbreeding depression. Proceedings. Biological sciences / The Royal Society PMID: 20685700

July 21, 2010

Disease as a byproduct of adaptation

How we perceive nature and describe its shape are a matter of values and preferences. Nature does not take notice of our distinctions; they exist only as instruments which aid in our comprehension. I’ve brought this up in relation to issues such as categorization of recessive vs. dominant traits. The offspring of people of Sub-Saharan African and non-African ancestry where the non-African parent has straight or wavy hair tend to have very curly hair. Therefore, one may say that the tightly curled hair form is dominant to straight or wavy hair. But, it is also the case that there is some modification in relation to the African parent in the offspring, so the dominance is not complete. When examining the morphology of the follicle, which determines the extent of the hair’s curl, the offspring may in fact exhibit some differences from both parents. In other words our perception of the outcomes of inheritance are contingent to some extent on our categorization of the traits as well as our specific focus along the developmental pathway.

Or consider the division between “traits” and “diseases.” The quotations are necessary. Lactose intolerance is probably one of the best cases to illustrate the gnarly normative obstructions which warp our perceptions. As a point of fact lactose intolerance is the ancestral human state, and numerically predominant. It is the “wild type.” Lactose tolerance is a relatively recent adaptation, found among a variety of West Eurasian and African populations. A more politically correct term, lactase persistence, probably better encapsulates the evolutionary history of the trait, which has shifted from the class of disease to that of genetic trait when we evaluate the bigger picture (obviously diseases are simply “bad” traits”).

Sometimes though the issues are more cut & dried. No one would doubt that sickle-cell anemia is a disease. It has a major fitness impact in a colloquial sense, as well as evolutionarily. It kills you, and it kills your potential genetic lineage. But, it is also a byproduct of adaptation to endemic malaria. Sickle-cell disease one of the classical illustrations of heterozygote advantage, whereby those who carry one copy of the mutation on the gene have increased fitness vis-a-vis those who carry two normal copies of the gene. The increase in frequency of the mutant gene though is balanced by the fact that mutant homozygotes have decreased fitness.

We can then construct a narrative of the long term evolutionary dynamics from this initial condition. When a new exogenous stress hits a population mean fitness drops immediately (take a look at the biographies of the Popes, and observe how many died of malaria in the Dark Ages when that disease was new to Italy). Natural selection quickly increases in frequency any alleles which confer protection against the exogenous stress. But, baked into the cake of how genetics in complex organisms usually works, one allele may often have multiple downstream consequences. This is pleiotropy. This means that if a change at a locus increases aggregate fitness, it may nevertheless destabilize long established biochemical pathways. In the short term evolution simply takes the net fitness impact into account. Over the long term one assumes that “better solutions” will emerge which do not have so high a fitness drag, perhaps through the evolution of modifier genes which mask the deleterious outcomes of the initial mutant. This sort of ad hoc trial and error and “duct-taping” of kludges is part and parcel of how adaption works in situations where shocks out of equilibrium states are common.

In many cases the byproducts of a genetic change may be benign. To my knowledge no one knows major negative consequences of carrying the alleles which confer lactase persistence (excepting some studies indicating higher obesity, but this seems a marginal fitness impact which has only come to the fore in the past century in all likelihood). But in other cases the outcomes may not be as serious as that of sickle-cell anemia, but may rise above the level of significance where one must note the existence of a disease which is a secondary consequence of adaptation to meet a new challenge.

Yesterday I pointed to a paper which illustrates just this phenomenon, Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans:

African-Americans have higher rates of kidney disease than European-Americans. Here, we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 {FSGS odds ratio = 10.5 [95% confidence interval (CI) 6.0 to 18.4]; H-ESKD odds ratio = 7.3 (95% CI 5.6 to 9.5)}. The two APOL1 variants are common in African chromosomes but absent from European chromosomes, and both reside within haplotypes that harbor signatures of positive selection. Apolipoprotein L-1 (ApoL1) is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.

In its implementation the paper has a lot of moving parts, but the outcome is straightforward. If you haven’t, you might read Genomes Unzipped and its post How to read a genome-wide association study. This is a case where the original association studies were not reporting false results, but, it seems that one had to take a further step to really understand the likely molecular genetic and evolutionary underpinnings of what was going on. These results suggest that the original signals of association for variants within the MYH9 gene were actually signals from within APOL1, which happened to be next to MYH9. The region around MYH9 had already showed up in tests to detect natural selection through patterns of linkage disequilibrium (non-random associations of alleles at different loci within the genome, in this case the relevant consideration are adjacent loci across continuous regions of the genome which come together to form haplotype blocks). Since the footprint of natural selection on the genome is often wide that did not imply that MYH9 was the target of natural selection per se, opening the likely possibility for other causal associations. A convenience in light of the difficulty of establishing a plausible functional relationship between renal failure and MYH9.

To explore the possibility of nearby functional candidates the researchers focused on a number of alleles within this genomic region which exhibited maximal European-African frequency differences in the 1000 Genomes Project. Once they ascertained the between population differences they then looked at differences in allele frequencies in cases and controls within the African American population for the two diseases in question (those with the trait/disease vs. those without). Table 1 has the top line raw results:


WT = “Wild Type,” the ancestral allelic variant found in most populations. G1 and G2 are two haplotypes, associated alleles across the locus of the APOL1 gene. G1 consists of the two derived non-synonymous coding variants rs73885319 (S342G) and rs60910145 (I384M) within an exonic region of APOL1. Non-synonymous simply means that a change at that base pair alters the amino acid coded, and exons are the genomics regions whose information is eventually translated into proteins. In other words, these are non-neutral functionally significant genomic regions which do something. G2 is a 6 base pair deletion, rs71785313, close to G1 in APOL1.

apo12To more formally model the relationship between the alleles which are found to differ between cases and controls they performed a logistic regression. The alleles serve as independent variables which can predict the probable outcome of the dependent variable, the probability of FSGS or H-ESKD in this case (renal failure). Figure 1 to the left has a summary of some of the results of the regression in graphical form for FSGS. I’ve rotated it so it can fit on the screen. Basically the strong signals are to the right of the chart (from your perspective). The y-axis displays (horizontal from your perspective) negative-log of p-values for a signal at a particular marker, which is defied by the x-axis (vertical for you). The labels show the particular gene at that genomic position. The smaller the p-value, the more probable that the signal is real and not random. This produces huge spikes in the negative-log values (in the body of the paper they present p-values on the order of 10-35).

You can see that it is in APOL1 that the biggest signals reside. The first panel, A, throws all the SNPs into the mix. On MYH9 they highlight a few SNPs which combine to form the E-1 haplotype, which is strongly associated with cases (this is where the association between disease and genetic variants on MYH9 are coming from). This haplotype is found in conjunction with G1 and G2 on APOL1. E-1 is present in 89% of haplotypes carrying G1 and in 76% of haplotypes carrying G2. A classic illustration of likely correlation but not causation. The second panel controls for the effect of G1. In other words, this is showing you the variation in the dependent variable that remains after you take the largest independent variable, G1, into account. The G2 haplotype is the largest effect independent variable after G1 is taken into account; in other words, it explains most of the residual variation in FSGS probability. Finally, the last panel controls for both G1 and G2. As you can see there aren’t any major signals left; the distribution is relatively flat. Logically once you account for the variables which produce change in an outcome you shouldn’t see any impact of other variables. And that’s what happens here. They also performed controls where MYH9 was held constant, and that does not eliminate the signals in APOL1. MYH9 is conditional on its correlation with APOL1. This was the correlation which showed up on the original association studies. The exact same pattern of signals within the logistic regression model was replicated for H-ESKD. G1 had the strongest signal, then G2. The markers within MYH9 was not significant once one controlled for the variants in G1 and G2.

It is important to remember though that these markers are segregating within a human population where individuals have three potential genotypes. Ancestral homozygote, homozygote for the mutants, and heterozygote. They found that a recessive model of expression of disease is most appropriate in the case of these risk alleles. That is, most of the increased risk is accounted for by the change from one risk allele, the heterozygote state, to two risk alleles, the homozygote state. One risk allele increased odds of renal failure by 1.26, but two by 7.3. The odds ratio of two risk alleles compared to a base rate of one risk allele was 5.8. They report that the results for FSGS were broadly similar. This matters because the frequency of the trait/disease in a random mating population is conditional on the homozygotes if it has a recessive expression pattern. G1 was present in 40% of Yoruba HapMap data set, but in none of the two Eurasian groups, Europeans and East Asians. G2 was found in three Yoruba, but in none of the Eurasian groups. Assuming Hardy-Weinberg equilibrium the Yoruba should have 16% of the population at sharply elevated risk for FSGS and H-ESKD because they’d be homozygotes for the G1 allele.

Once they established which markers seem to implicated in this phenotypic variation, they wanted to focus on how the frequencies of those markers came to be. Specifically, G1 and G2 seem to be derived haplotypes which arose out of the ancestral background. In plain English 20,000 years ago Africans should have looked like all non-Africans genomically, at least on the functionally relevant segments, but within the last 10,000 years it looks like new variants rose in frequency driven by natural selection to new environmental stresses. The region has already broadly been surveyed by linkage disequilibrium based tests, which basically look for regions of long haplotypes, homogenized zones of the genome where many individuals have the variation removed because one gene rose so rapidly in frequency that huge adjacent sections hitchhiked up in frequency. Presumably this may have happened with the MYH9 haplotype correlated with the traits under consideration here; G1 and G2 dragged up the E-1 haplotype as a secondary consequence of their own rise to prominence among some Sub-Saharan African populations.

So next authors turned to tried & tested techniques and focused on the risk markers which they had discovered earlier in their research, G1 and G2. Specifically, EHH, which is best at detecting selection where sweeps have nearly completed (e.g., the derived variant is at frequency 0.95 within the population), iHS, which is best at detecting sweeps which have not completed (e.g., the derived variant is at frequency 0.6), as well as ΔiHH, which I am less familiar with but is reputedly similar to iHS but uses absolute haplotype length as opposed to relative haplotype length. Figure 2 show the results of these tests:


The resolution isn’t the best, but G1 and G2 seem to be outliers on all three tests to detect natural selection by using patterns of linkage disequilibrium. The first panel is EHH, the second and third show iHS and ΔiHH respectively, with the position of the markers being outliers among the distribution of values for the genome within the Yoruba. This is not proof of adaptation, but it changes our weights of possibilities. Additionally, they note that Europeans exhibit no such patterns on these markers. Visually the position of the markers in the latter two panels would be closer to the mode of the distribution in Europeans.

To review, first they confirmed a causal relationship between a particular set of markers, haplotypes, and the traits of interest. Second, they confirmed that said markers seem to bear the hallmarks of genomic regions subject to natural selection. We know that focal segmental glomerulosclerosis (FSGS) end-stage kidney disease (H-ESKD), the traits whose relationship to the G1 and G2 haplotypes seem confirmed, are unlikely to be targets of positive natural selection. To get a better sense of that we need to look at Apol1, the protein product of APOL1, and what it does. At this point I’ll quote the paper:

ApoL1 is the trypanolytic factor of human serum that confers resistance to the Trypanosoma brucei brucei (T. brucei brucei) parasite…T. brucei brucei has evolved into two additional subspecies, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense, which have both acquired the ability to infect humans…T. brucei rhodesiense is predominantly found in Eastern and Southeastern Africa, while T. brucei gambiense is typically found in Western Africa, though some overlap exists…Since these parasites exist only in sub-Saharan Africa, we hypothesized that the APOL1 gene may have undergone natural selective pressure to counteract these trypanosoma adaptations. As an initial test of this hypothesis, we performed in vitro assays to compare the trypanolytic potential of the variant, disease-associated forms of ApoL1 proteins with that of the “wild-type” form of ApoL1 protein that is not associated with renal disease.

We’re talking about sleeping sickness. Here’s a description:

It starts with a headache, joint pains and fever. It is the kind you would expect to get over quickly. But after a while, things get worse. You fall asleep most of the time, are confused and get intense pains and convulsions.

If you do not get treatment, your body begins to waste away. Eventually, you slip into coma and die. This is human African trypanosommiasis, better known as sleeping sickness. If untreated, it kills 100% of its victims in a very short time.

Cheery. I think we have a plausible reason for natural selection to kick into overdrive! Or more specifically, we have a plausible external selection pressure which will drive fitness differentials which correlate with genetic variation. Increased probability of kidney disease seems preferable to this. In terms of the molecular genetics it looks like a factor, serum resistance-associated protein (SRA), produced by T. brucei rhodesiense binds to a specific location of Apol1, and that mutations at G1 and G2 change exactly that location within the protein. So these mutants may block the ability of T. brucei rhodesiense to turn off the body’s defenses against trypanosomes.

To test this they examined the in vitro lytic potential of serum produced by individuals carrying the G1 and G2 haplotypes against the three subspecies of of Trypanosoma. T. brucei brucei, which normal Apol1 can lyse, and T. brucei rhodesiense and T. brucei gambiense which can infect humans (endemic to eastern and western Africa respectively, though the former extends into west Africa as well).

- All 75 samples lysed brucie brucie

- None lysed brucie gambiense

- 46 samples lysed SRA-positive brucie rhodesiense, all 46 samples were from G1 or G2 carrying individuals

- The potency of G2 seemed higher than G1 against SRA-positive samples of brucie rhodesiense, though not SRA-negative samples, where G1 seemed as potent

- Recombinants of Apol1 which had only one of the two SNPs of the G1 haplotype were less effective against brucie rhodesiense than those which had both (G1 haplotype)

- Recombinants with G1 and G2 were not more effective against brucie rhodesiense than those with G2 alone

- Recombinants with G1 alone were more potent against SRA-negative brucie rhodesiense than those with G2 alone

- G2 was necessary and sufficient to block SRA binding to Apol1 and allow lysing of brucie rhodesiense. G1 did not block SRA binding to Apol1, but was still sufficient to lyse brucie rhodesiense, but far less potent against SRA-positive brucie rhodesiense than G2

It seems that the G1 and G2 haplotypes utilize different mechanisms to enable the lysing of invasive pathogens, and so prevent the development of sleeping sickness. Their means differ, but the ends are the same. The authors note that even minimal amounts of plasma serum produced by G2 individuals seems potent enough to block the binding of SRA to Apol1 and so enable lysis. And introduction of such plasma into the bloodstreams of individuals who do not have resistance may then be highly efficacious as a preventative treatment against sleeping sickness. They do note that they did not explore in detail the mechanism by which the G1 and G2 variants result in suscepbility to kidney failure, but that’s presumably for the future.

Finally, the second to last paragraph where they bring it all together:

It will be interesting to determine the distribution of these mutations throughout sub-Saharan Africa. In present-day Africa, T. brucei rhodesiense is found in the Eastern part of the continent, while we noted high frequency of the trypanolytic variants and the signal of positive selection in a West African population. Changes in trypanosome biology and distribution and/or human migration may explain this discrepancy, or resistance to T. brucei rhodesiense could have favored the spreading of T. brucei gambiense in West Africa. Alternatively, ApoL1 variants may provide immunity to a broader array of pathogens beyond just T. brucei rhodesiense, as a recent report linking ApoL1 with anti-Leishmania activity may suggest…Thus, resistance to T. brucei rhodesiense may not be the only factor causing these variants to be selected.

This is a very long review already. But, while I have your attention, I think I need to point to another paper on the same topic which has a slightly different twist. I won’t dig into the details with the same thoroughness as above, but rather I’ll highlight the value-add of this group’s contribution. It’s an Open Access paper, unlike the one above, so you can review it in depth yourself. Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene:

MYH9 has been proposed as a major genetic risk locus for a spectrum of nondiabetic end stage kidney disease (ESKD). We use recently released sequences from the 1000 Genomes Project to identify two western African-specific missense mutations (S342G and I384M) in the neighboring APOL1 gene, and demonstrate that these are more strongly associated with ESKD than previously reported MYH9 variants. The APOL1 gene product, apolipoprotein L-1, has been studied for its roles in trypanosomal lysis, autophagic cell death, lipid metabolism, as well as vascular and other biological activities. We also show that the distribution of these newly identified APOL1 risk variants in African populations is consistent with the pattern of African ancestry ESKD risk previously attributed to MYH9. Mapping by admixture linkage disequilibrium (MALD) localized an interval on chromosome 22, in a region that includes the MYH9 gene, which was shown to contain African ancestry risk variants associated with certain forms of ESKD…MYH9 encodes nonmuscle myosin heavy chain IIa, a major cytoskeletal nanomotor protein expressed in many cell types, including podocyte cells of the renal glomerulus. Moreover, 39 different coding region mutations in MYH9 have been identified in patients with a group of rare syndromes, collectively termed the Giant Platelet Syndromes, with clear autosomal dominant inheritance, and various clinical manifestations, sometimes also including glomerular pathology and chronic kidney disease…Accordingly, MYH9 was further explored in these studies as the leading candidate gene responsible for the MALD signal. Dense mapping of MYH9 identified individual single nucleotide polymorphisms (SNPs) and sets of such SNPs grouped as haplotypes that were found to be highly associated with a large and important group of ESKD risk phenotypes, which as a consequence were designated as MYH9-associated nephropathies…These included HIV-associated nephropathy (HIVAN), primary nonmonogenic forms of focal segmental glomerulosclerosis, and hypertension affiliated chronic kidney disease not attributed to other etiologies…The MYH9 SNP and haplotype associations observed with these forms of ESKD yielded the largest odds ratios (OR) reported to date for the association of common variants with common disease risk…Two specific MYH9 variants (rs5750250 of S-haplotype and rs11912763 of F-haplotype) were designated as most strongly predictive on the basis of Receiver Operating Characteristic analysis…These MYH9 association studies were then also extended to earlier stage and related kidney disease phenotypes and to population groups with varying degrees of recent African ancestry admixture…and led to the expectation of finding a functional African ancestry causative variant within MYH9. However, despite intensive efforts including re-sequencing of the MYH9 gene no suggested functional mutation has been identified…This led us to re-examine the interval surrounding MYH9 and to the detection of novel missense mutations with predicted functional effects in the neighboring APOL1 gene, which are significantly more associated with ESKD than all previously reported SNPs in MYH9.

Table one has the top line results. Focus on the first two rows, they’re “G1″ from the earlier study (that is, the two SNPs which combine to form the G1 haplotype).


Here’s a difference between the previous paper and this one: the table above uses cases and controls from African Americans and Hispanic Americans. The original paper which the genomic data on this sample is drawn from calculates the average ancestry of African, European and Native American in the two groups is as follows (I did some rounding to keep the values round):

African American – 85%, 10%, 5%
Hispanic American – 30%, 55%, 15%

Not surprisingly the Hispanic American sample here is mostly Puerto Rican and Dominican, explaining the greater African than Native American ancestry. Nevertheless, it is a sufficiently different genetic background to test the effects of the same marker against different genes. They confirmed the association of the markers of large effect in African Americans within the Hispanic cohort. The risk allele frequency in the African American control group is 21% vs. 37% in the cases. For Hispanic Americans are 6% and 23% for the same categories.

OK, now to the most interesting point in this short paper:

HIVAN has been considered as the most prominent of the nondiabetic forms of kidney disease within what has been termed the MYH9-associated nephropathies…We have reported absence of HIVAN in HIV infected Ethiopians, and attributed this to host genomic factors (Behar et al. 2006). Therefore, we examined the allele frequencies of the APOL1 missense mutations in a sample set of 676 individuals from 12 African populations, including 304 individuals from four Ethiopian populations…We coupled this with the corresponding distributions for the African ancestry leading MYH9 S-1 and F-1 risk alleles. A pattern of reduced frequency of the APOL1 missense mutations and also of the MYH9 risk variants was noted in northeastern African in contrast to most central, western, and southern African populations examined…Especially striking was the complete absence of the APOL1 missense mutations in Ethiopia. This combination of the reported lack of HIVAN and observed absence of the APOL1 missense mutations is consistent with APOL1 being the functionally relevant gene for HIVAN risk and likely the other forms of kidney disease previously associated with MYH9.

apo16Bingo. The previous paper focused on African Americans (along with the HapMap Yoruba). But the pattern of variation within Africa is interesting as well. Ethiopians are not quite like other Africans, having a great deal of admixture with populations from Arabia (many of the languages of highland Ethiopia are Semitic). But the majority of their ancestry remains similar to that of other Sub-Saharan Africans. As a point of contrast the ecology of Ethiopia differs a great deal from the rest of Sub-Saharan Africa because of its elevation, and concomitant frigidity. The mean monthly low in Addis Ababa is around 10 (50 for Americans) degrees and mean high 20-25 (high 60s to mid 70s for Americans). There isn’t much variation from month to month because of the low latitude, but the high elevation keeps the temperatures relatively moderate. Different environments result in different selection pressures, and Ethiopia has a very unique environment within Africa. The tsetse fly which serves as a vector forTtrypanosomes does not seem to be present in the Ethiopian highlands. The map above shows the distribution within Africa of one the markers which defines the G1 haplotype in the previous paper. Note that the modal frequency is in the west of Africa, and the frequency drops off to the east (though the geographic coverage leaves a bit to be desired if you look at the raw data which went into generating this map, which smooths over huge discontinuities).

One of the points I want to reemphasize from the tests of natural selection in the first paper is that these genetic adaptations are likely to be new, otherwise recombination would have broken up the long haplotypes and reduced linkage disequilibrium. New as in the last 10,000 years. It is interesting that a particular subspecies of Trypanosome which is immune to these genetic adaptations is endemic to west Africa. We may be seeing evolution in action here, or at least the arms race between man and pathogen where man is always one step behind. In contrast, the subspecies which is effectively diffused by the genetic adaptations reviewed here is present in higher numbers precisely in the regions where the resistance mutations are extant at lower proportions. Perhaps there are different mutations in these regions of Africa, not yet properly identified. Or perhaps the we’re seeing humans in this region at an earlier stage of the dance, so to speak.

Citation: Giulio Genovese, David J. Friedman, Michael D. Ross, Laurence Lecordier, Pierrick Uzureau, Barry I. Freedman, Donald W. Bowden, Carl D. Langefeld, Taras K. Oleksyk, Andrea Uscinski Knob, Andrea J. Bernhardy, Pamela J. Hicks, George W. Nelson, Benoit Vanhollebeke, Cheryl A. Winkler, Jeffrey B. Kopp, Etienne Pays, & Martin R. Pollak (2010). Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans Science : 10.1126/science.1193032

Citation: Tzur S, Rosset S, Shemer R, Yudkovsky G, Selig S, Tarekegn A, Bekele E, Bradman N, Wasser WG, Behar DM, & Skorecki K (2010). Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Human genetics PMID: 20635188

July 2, 2010

Why Tibetans breathe so easy up high

I said yesterday I would say a bit more about the new paper on rapid recent high altitude adaptation among the Tibetans when I’d read the paper. Well, I’ve read it now. Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude:

Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1), a transcription factor involved in response to hypoxia. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.

The exome is just the protein-coding part of the genome; so they’re focusing ostensibly on functionally relevant single nucleotide polymorphisms (SNPs). About a month and a half ago a similar paper on Tibetan high altitude adaptations was published in Science (I posted on that too), but their methodology was somewhat different. That group was looking at a set of genes, candidates, which they’d assume might have been under selection and so have functional significance in explaining Tibetan vs. non-Tibetan phenotypes at high altitudes. This second paper takes a more bottom up approach, scanning the genome of Tibetans and Han Chinese, and trying to spotlight regions which exhibit a great deal of between population variance, far greater than one might presume from the total genome genetic distances.

As to that last point…the timing of this has been causing a major problem with archaeologists. The supplements lays out the details a bit more than the press reports, so below is figure 2:


It looks like to get a better sense of the model you’ll have to read the cited paper, and I’m not sure that that will satisfy the archaeologists. They did use a large number of neutral markers though, so I’m not too worried about biases in their data set. Some have been confused about the population numbers, but this value in a population genetic context can be counterintuitive, especially over the long term (low values are given much more weight than high values). The small Han value can be easily made less confusing when you consider a massive demographic expansion from a small founder group, as well as persist long term biases in reproductive value within the population (e.g., some males in a given generation are way more fecund than others through polygyny). A higher N for Tibetans may be explained by a more stable population where diverse subsets and across individuals the reproductive value may be more equitable. In other words, an effective population size is a statistic which is bundling together a lot of evolutionary history, and is not a simple measure of perceived census sizes (the Tibetans may also be something of a melange of a diverse set of ancient groups which took refuge in the highlands, while the Han are the descendants of early adopters of agriculture which expanded demographically; so they’re opposite ends of the demographic tunnel).

The time of divergence of a little under 3,000 years is important for the rest of the paper, so I suppose other workers had better replicate their findings in the future. Figure 1 is rather striking, so let’s jump to it:


This chart is simply showing frequencies of SNPs in Tibetans and Han. The two are obviously correlated, as evident by the diagonal. Shading indicates the density of the number of SNPs at a given position. Look to the bottom right, and you see the gene around which much of the paper hinges, EPAS1. It’s an enormous outlier, with SNPs where Tibetans and Han differ a great deal. This is important in regards to looking for genes which may drive adaptation to higher altitudes; if you don’t have different genes then you don’t have different traits. If the Tibetans and Han diverged ~3,000 years ago, then those adaptations may be recent and would have emerged through rapid allele frequency changes (though they observe that it may be drawn from standing variation). The researchers didn’t go looking for EPAS1 as such, rather, it came looking for them. What does it do? From the text:

EPAS1 is also known as hypoxia-inducible factor 2{alpha} (HIF-2{alpha}). The HIF family of transcription factors consist of two subunits, with three alternate {alpha} subunits (HIF-1{alpha}, HIF-2{alpha}/EPAS1, HIF-3{alpha}) that dimerize with a β subunit encoded by ARNT or ARNT2. HIF-1{alpha} and EPAS1 each act on a unique set of regulatory targets…and the narrower expression profile of EPAS1 includes adult and fetal lung, placenta, and vascular endothelial cells…A protein-stabilizing mutation in EPAS1 is associated with erythrocytosis…suggesting a link between EPAS1 and the regulation of red blood cell production.

Next, they dig into the functional significant of EPAS1 variants, in the literature, and in their current sample:

Associations between SNPs at EPAS1 and athletic performance have been demonstrated…Our data set contains a different set of SNPs, and we conducted association testing on the SNP with the most extreme frequency difference, located just upstream of the sixth exon. Alleles at this SNP tested for association with blood-related phenotypes showed no relationship with oxygen saturation. However, significant associations were discovered for erythrocyte count (F test P = 0.00141) and for hemoglobin concentration (F test P = 0.00131), with significant or marginally significant P values for both traits when each village was tested separately (table S5). Comparison of the EPAS1 SNP to genotype data from 48 unlinked SNPs confirmed that its P value is a strong outlier (5) (fig. S4).

The allele at high frequency in the Tibetan sample was associated with lower erythrocyte quantities and correspondingly lower hemoglobin levels…Because elevated erythrocyte production is a common response to hypoxic stress, it may be that carriers of the “Tibetan” allele of EPAS1 are able to maintain sufficient oxygenation of tissues at high altitude without the need for increased erythrocyte levels. Thus, the hematological differences observed here may not represent the phenotypic target of selection and could instead reflect a side effect of EPAS1-mediated adaptation to hypoxic conditions. Although the precise physiological mechanism remains to be discovered, our results suggest that the allele targeted by selection is likely to confer a functionally relevant adaptation to the hypoxic environment of high altitude.

There are random anomalies in nature, but it seems too perfect that this is the outlier in allele frequencies across two populations which differ in adaptations which relate to many of the traits above.

tibhan3OK, so they found an outlier SNP. The gene seems to have a reasonable probability of being involved in functional pathways relevant to altitude adaptation. But so far we’ve been focusing on the Tibetan-Han difference. If the two populations separated about 3,000 years ago one assumes that genes with SNPs with huge Fsts, where most of the variation can be partitioned between the groups, not within them, are good candidates for having been driven by selection. But it would be nice to compare with an outgroup. So they compared the Tibetans and Hans with the Danes, who are an outgroup who separated from the East Asian cluster about one order of magnitude further back in time (~30,000 years). Next they generated a “population branch statistic,” (PBS), from the the Fst data (see the supplements). Basically you’re getting a value which describes allele frequency differences normalized to the expected genetic distance as known from population history. I’ve extracted out Panel B from figure 2. T = Tibetans, H = Han, and D = Danes. The smaller tree represents genome average PBS values. It’s what you’d expect, the Danes are the outgroup. Over time genetic difference builds up because of separation between the groups. The Han and Tibetans are very close, as you’d expect from genetically similar populations. But look at the larger tree, the Tibetans are the outgroup by a mile! The Danes and Han differ far less from each other on EPAS1 than they do from the Tibetans. This seems like a clear deviation from the level of allele frequency difference one might be able to generate by neutral random walk processes.

EPAS1 isn’t the only gene which they found, but it was the most significant, and illustrates the nature of the methodological orientation of this group. Sift through the genome and look for something which is totally unexpected, and put a focus on the peculiar diamond in the rough and see what it can tell you. They conclude with the big picture:

Of the genes identified here, only EGLN1 was mentioned in a recent SNP variation study in Andean highlanders (24). This result is consistent with the physiological differences observed between Tibetan and Andean populations…suggesting that these populations have taken largely distinct evolutionary paths in altitude adaptation.

Several loci previously studied in Himalayan populations showed no signs of selection in our data set…whereas EPAS1 has not been a focus of previous altitude research. Although EPAS1 may play an important role in the oxygen regulation pathway, this gene was identified on the basis of a noncandidate population genomic survey for natural selection, illustrating the utility of evolutionary inference in revealing functionally important loci.

Given our estimate that Han and Tibetans diverged 2750 years ago and experienced subsequent migration, it appears that our focal SNP at EPAS1 may have experienced a faster rate of frequency change than even the lactase persistence allele in northern Europe, which rose in frequency over the course of about 7500 years…EPAS1 may therefore represent the strongest instance of natural selection documented in a human population, and variation at this gene appears to have had important consequences for human survival and/or reproduction in the Tibetan region.

Natural selection is somewhat stochastic; it can take different tacks to the same process because it doesn’t have infinite power in its search algorithm. Given enough time and gene flow no doubt adaptations would homogenize and converge upon a perfect optimum, but given enough time the universe will devolve into heat death. Evolution has to operate extemporaneously for eternity because the conditions are ever changing. Second, the big headline grabbing assertion about EPAS1 being the strongest instance of natural selection needs to be moduled by the fact that the conclusion was generated assuming the validity of the inferences of a particular model, and models can be wrong. It does seem like the evolutionary change is likely to be recent, I doubt they’d be off by an order of magnitude. But for lactase persistence we’ve extracted genetic material from ancient remains. The conclusion then is much more concrete in this case. Until we get remains from ancient Tibetans and can infer their allele frequencies, there will be some asymmetry in the confidence with which we can make a claim as to when the selection event began.

Citation: Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z., Pool, J., Xu, X., Jiang, H., Vinckenbosch, N., Korneliussen, T., Zheng, H., Liu, T., He, W., Li, K., Luo, R., Nie, X., Wu, H., Zhao, M., Cao, H., Zou, J., Shan, Y., Li, S., Yang, Q., Asan, ., Ni, P., Tian, G., Xu, J., Liu, X., Jiang, T., Wu, R., Zhou, G., Tang, M., Qin, J., Wang, T., Feng, S., Li, G., Huasang, ., Luosang, J., Wang, W., Chen, F., Wang, Y., Zheng, X., Li, Z., Bianba, Z., Yang, G., Wang, X., Tang, S., Gao, G., Chen, Y., Luo, Z., Gusang, L., Cao, Z., Zhang, Q., Ouyang, W., Ren, X., Liang, H., Zheng, H., Huang, Y., Li, J., Bolund, L., Kristiansen, K., Li, Y., Zhang, Y., Zhang, X., Li, R., Li, S., Yang, H., Nielsen, R., Wang, J., & Wang, J. (2010). Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude Science, 329 (5987), 75-78 DOI: 10.1126/science.1190371

May 14, 2010

Breathing like Buddha: altitude & Tibet

443px-PaldenLhamoYou probably are aware that different populations have different tolerances for high altitudes. Himalayan sherpas aren’t useful just because they have skills derived from their culture, they’re actually rather well adapted to high altitudes because of their biology. Additionally, different groups seem to have adapted to higher altitudes independently, exhibiting convergent evolution. But in terms of physiological function they aren’t all created equal, at least in relation to the solutions which they’ve come to to make functioning at high altitudes bearable. In particular, it seems that the adaptations of the peoples of Tibet are superior than those of the peoples of the Andes. Superior in that the Andean solution is more brute force than the Tibetan one, producing greater side effects, such as lower birth weight in infants (and so higher mortality and lower fitness).

The Andean region today is dominated by indigenous people, and Spanish is not the lingua franca of the highlands as it is everyone in in the former colonial domains of Spain in the New World. This is largely a function of biology; as in the lowlands of South America the Andean peoples were decimated by disease upon first contact (plague was spreading across the Inca Empire when Pizzaro arrived with his soldiers). But unlike the lowland societies the Andeans had nature on their side: people of mixed or European ancestry are less well adapted to high altitudes and women without tolerance of the environment still have higher miscarriage rates.

So despite the suboptimal nature of the Andean adaptations vis-a-vis the Tibetan ones, they are certainly better than nothing, and in a relative sense have been very conducive to higher reproductive fitness. And yet why might the Andeans have kludgier adaptations than Tibetans? One variable to consider is time. The probability is that the New World was populated by humans only for the past ~10,000-15,000 years or so, with an outside chance of ~20,000 years (if you trust a particular interpretation of the genetic data, which you probably shouldn’t). By contrast, modern humans have had a presence in the center of Eurasia for ~30,000 years. Generally when populations are exposed to new selective regime the initial adaptations are drastic and exhibit major functional downsides, but they’re much better than the status quo (remember, fitness is relative). Over time genetic modifications mask the deleterious byproducts of the genetic change which emerged initially to deal with the new environment. In other words, selection perfects design over time in a classic Fisherian sense as the genetic architecture converges upon the fitness optimum.*

Another parameter may be the variation available within the population, as the power of selection is proportional to the amount of genetic variation, all things equal. The peoples of the New World tend to be genetically somewhat homogeneous, probably due to the fact that they went through a bottleneck across Berengia, and that they’re already sampled from the terminus of the Old World. A physical anthropologist once told me that the tribes of the Amazon still resemble Siberians in their build. It may be that it takes a homogeneous population with little extant variation a long time indeed to shift trait value toward a local ecological optimum (tropical Amerindians are leaner and less stocky than closely related northern populations, just not particularly in relation to other tropical populations). In contrast, populations in the center of Eurasia have access to a great deal of genetic variation because they’re in proximity to many distinctive groups (the Uyghurs for example are a recent hybrid population with European, South Asian and East Asian ancestry).

So that’s the theoretical backdrop for the differences in adaptations. Shifting to the how the adaptations play out concretely, some aspects of the physiology of Tibetan tolerance of high altitudes are mysterious, but one curious trait is that they actually have lower levels of hemoglobin than one would expect. Andean groups have elevated hemoglobin levels, which is the expected “brute force” response. Interestingly it seems that evolution given less time or stabilizing at a physiologically less optimal equilibrium is more comprehensible to humans! Nature is often more creative than us. In contrast the Tibetan adaptations are more subtle, though interestingly their elevated nitric acid levels may facilitate better blood flow. Though the inheritance patterns of the trait had been observed, the genetic mechanism underpinning it has not been elucidated. Now a new paper in Science identifies some candidate genes for the various physiological quirks of Tibetans by comparing them with their neighbors, and looking at the phenotype in different genotypes with the Tibetan population. Genetic Evidence for High-Altitude Adaptation in Tibet:

Tibetans have lived at very high altitudes for thousands of years, and they have a distinctive suite of physiological traits that enable them to tolerate environmental hypoxia. These phenotypes are clearly the result of adaptation to this environment, but their genetic basis remains unknown. We report genome-wide scans that reveal positive selection in several regions that contain genes whose products are likely involved in high-altitude adaptation. Positively selected haplotypes of EGLN1 and PPARA were significantly associated with the decreased hemoglobin phenotype that is unique to this highland population. Identification of these genes provides support for previously hypothesized mechanisms of high-altitude adaptation and illuminates the complexity of hypoxia response pathways in humans.

Here’s what they did. First, Tibetans are adapted to higher altitudes, Chinese and Japanese are not. The three groups are relatively close genetically in terms of ancestry, so the key is to look for signatures of positive selection in regions of the genome which have been identified as possible candidates in terms of functional significance in relation to pathways which may modulate the traits of interest. After finding potential regions of the genome possibly under selection in Tibetans but not the lowland groups, they fixed upon variants which are at moderate frequencies in Tibetans and noted how the genes track changes in the trait.

This figure from the supplements shows how the populations are related genetically:


In a worldwide context the three groups are pretty close, but they also don’t overlap. The main issue I would have with this presentation is that the Chinese data is from the HapMap, and they’re from Beijing. This has then a northeast Chinese genetic skew (I know that people who live in Beijing may come from elsewhere, but recent work which examines Chinese phylogeography indicates that the Beijing sample is not geographically diversified), while ethnic Tibetans overlap a great deal with Han populations in the west of China proper. In other words, I wouldn’t be surprised if the separation between Han and Tibetan was far less if you took the Chinese samples from Sichuan or Gansu, where Han and Tibetans have lived near each other for thousands of years.

tib2But these issues of phylogenetic difference apart, we know for a fact that lowland groups do not have the adaptations which are distinctive to the Tibetans. To look for genetic differences they focused on 247 loci, some from the HIF pathway, which is important for oxygen homeostasis, as well genes from Gene Ontology categories which might be relevant to altitude adaptations. Table 1 has the breakdown by category.

Across these regions of the genome they performed two haplotype based tests which detect natural selection, EHH and iHS. Both of these tests basically find regions of the genome which have reduced variation because of a selective sweep, whereby selection at a specific region of the genome has the effect of dragging along large neutral segments adjacent to the original copy of the favored variant. EHH is geared toward detection of sweeps which have nearly reached fixation, in other words the derived variant has nearly replaced the ancestral after a bout of natural selection. iHS is better at picking up sweeps which have not resulted in the fixation of the derived variant. The paper A Map of Recent Positive Selection in the Human Genome outlines the differences between EHH and iHS in more detail. They looked at the three populations and wanted to find regions of the genome where Tibetans, but not the other two groups, were subject to natural selection as defined by positive signatures with EHH and iHS. They scanned over 200 kb windows of the genome, and found that 10 of their candidate genes were in regions where Tibetans came up positive for EHH and iHS, but the other groups did not. Since these tests do produce false positives they ran the same procedure on 240 random candidate genes (7 genes were in regions where Chinese and Japanese came up positive, so these were removed from the set of candidates), and came up with average EHH and iHS positive hits of ~2.7 and ~1.4 genes after one million resamplings (specifically, these are genes where Tibetans were positive, the other groups negative). Their candidate genes focused on altitude related physiological pathways yielded 6 for EHH and 5 for iHS (one gene came up positive for both tests, so 10 total). This indicates to them these are not false positives, something made more plausible by the fact that we know that Tibetans are biologically adapted to higher altitudes and we have an expectation that these genes are more likely than random expectation to have a relationship to altitude adaptations.

Finally, they decided to look at two genes with allelic variants which exist at moderate frequencies in Tibetans, EGLN1 and PPARA. The procedure is simple, you have three genotypes, and you see if there are differences across the 31 individuals by genotype in terms of phenotype. In this case you want to look at hemoglobin concentration, where those who are well adapted have lower concentrations. Figure 3 is rather striking:


Even with the small sample sizes the genotypic effect jumps out at you. This isn’t too surprising, previous work has shown that these traits are highly heritable, and that they vary within the Tibetan population. There’s apparently a sex difference in terms of hemoglobin levels, so they did a regression analysis, and it illustrates how strong the genetic effect from these alleles are:


My main question: why do Tibetans still have variation on these genes after all this time? Shouldn’t they be well adapted to high altitudes by now? A prosaic answer may be that the Tibetans have mixed with other populations recently, and so have added heterozygosity through admixture. But there are several loci here which are fixed in Tibetans, and not the HapMap Chinese and Japanese. For admixture to be a good explanation one presumes that the groups with which the Tibetans mixed would have been fixed for those genes as well, but not the ones at moderate frequencies. This may be true, but it seems more likely that admixture alone can not explain this pattern. As the Andean example suggests adaptation to high altitudes is not easy or simple. Until better options arrive on the scene, kludges will suffice. It may be that the Tibetans are still going through the sieve of selection, and will continue to do so for the near future. Or, there may be balancing dynamics on the genes which exhibit heterozygosity, so that fixation is prevented.

No matter what the truth turns out to be, this is surely just the beginning. A deeper investigation of the genetic architecture of Andeans and Ethiopians, both of which have their own independent adaptations, will no doubt tell us more. Finally, I wonder if these high altitude adaptations have fitness costs which we’re not cognizant of, but which Tibetans living in India may have some sense of.

Citation: Tatum S. Simonson, Yingzhong Yang, Chad D. Huff, Haixia Yun, Ga Qin, David J. Witherspoon, Zhenzhong Bai, Felipe R. Lorenzo, Jinchuan Xing, Lynn B. Jorde, Josef T. Prchal, & RiLi Ge (2010). Genetic Evidence for High-Altitude Adaptation in Tibet Science : 10.1126/science.1189406

* Additionally, it may be that archaic hominin groups were resident in the Himalaya for nearly one million years. Neandertal admixture evidence in Eurasians should change our priors when evaluating the possibility for adaptive introgression on locally beneficial alleles.

Image Credit: Wikimedia Commons

Powered by WordPress