Razib Khan One-stop-shopping for all of my content

July 27, 2011

How Chinese genetics is like Chinese food

Representatives of Szechuan and Shangdong cuisine

The Pith: The Han Chinese are genetically diverse, due to geographic scale of range, hybridization with other populations, and possibly local adaptation.

In the USA we often speak of “Chinese food.” This is rather peculiar because there isn’t any generic “Chinese cuisine.” Rather, there are regional cuisines, which share a broad family similarity. Similarly, American “Mexican food” and “Indian food” also have no true equivalent in Mexico or India (naturally the novel American culinary concoctions often exhibit biases in the regions from which they sample due to our preferences and connections; non-vegetarian Punjabi elements dominate over Udupi, while much authentic Mexican American food has a bias toward the northern states of that nation). But to a first approximation there is some sense in speaking of a general class of cuisine which exhibits a lot of internal structure and variation, so long as one understands that there is an important finer grain of categorization.

Some of the same applies to genetic categorizations. Consider two of the populations in the original HapMap, the Yoruba from Nigeria, and the Chinese from Beijing. There are ~30 million ...

June 28, 2011

“What if you’re wrong” – haplogroup J

Back when this sort of thing was cutting edge mtDNA haplogroup J was a pretty big deal. This was the haplogroup often associated with the demic diffusion of Middle Eastern farmers into Europe. This was the “Jasmine” clade in Seven Daughters of Eve. A new paper in PLoS ONE makes an audacious claim: that J is not a lineage which underwent recent demographic expansion, but rather one which has been subject to a specific set of evolutionary dynamics which have skewed the interpretations due to a false “molecular clock” assumption. By this assumption, I mean that mtDNA, which is passed down in an unbroken chain from mother to daughter, is by and large neutral to forces like natural selection and subject to a constant mutational rate which can serve as a calibration clock to the last common ancestor between two different lineages. Additionally, mtDNA has a high mutational rate, so it accumulates lots of variation to sample, and, it is copious, so easy to extract. What’s not to like?

First, the paper, Mutation Rate Switch inside Eurasian Mitochondrial Haplogroups: Impact of Selection and Consequences for Dating Settlement in Europe:

R-lineage mitochondrial DNA represents over 90% of the European ...

June 20, 2011

Convergent evolution happens!

In the image to the left you see three human males. You can generate three pairings of these individuals. When comparing these pairs which would you presume are more closely related than the other pairs? Now let me give you some more information. The rightmost image is of the president of Tanzania. The middle image is of the president of Taiwan (Republic of China). And finally, the leftmost image is of the prime minister of Papua New Guinea. With this information you should now know with certainty that the prime minister of Papua New Guinea and the president of Taiwan are much more closely related than either are to the president of Tanzania. But some of you may not have guessed that initially. Why? I suspect that physical inspection may have misled you. One of the most salient visible human characteristics is of the complexion of our largest organ, the skin. Its prominence naturally leads many to mistakenly infer relationships where they do not exist.

This was certainly an issue when European explorers encountered the peoples of Melanesia. An older ...

April 24, 2011

The evolutionary effect of the sky gods

ResearchBlogging.orgLast week I reviewed ideas about the effect of “exogenous shocks” to an ecosystem of creatures, and how it might reshape their evolutionary trajectory. These sorts of issues are well known in their generality. They have implications from the broadest macroscale systematics to microevolutionary process. The shocks point to changes over time which have a general effect, but what about exogenous parameters which shift spatially and regularly? I’m talking latitudes here. The further you get from the equator the more the climate varies over the season, and the lower the mean temperature, and, the less the aggregate radiation the biosphere catches. Allen’s rule and Bergmann’s rule are two observational trends which biologists have long observed in relation to many organisms. The equatorial variants are slimmer in their physique, while the polar ones are stockier. Additionally, there tends to be an increase in mean mass as one moves away from the equator.

But these rules are just general observations. What process underlies these observations? The likely culprit would be natural selection of course. But the specific manner in which this process shakes out, on both the organismic and genetic level, still needs to be elucidated ...

October 25, 2010

Body odor, Asians, and earwax

EarWhen I was in college I would sometimes have late night conversations with the guys in my dorm, and the discussion would random-walk in very strange directions. During one of these quasi-salons a friend whose parents were from Korea expressed some surprise and disgust at the idea of wet earwax. It turns out he had not been aware of the fact that the majority of the people in the world have wet, sticky, earwax. I’d stumbled onto that datum in the course of my reading, and had to explain to most of the discussants that East Asians generally have dry earwax, while convincing my Korean American friend that wet earwax was not something that was totally abnormal. Earwax isn’t something we explore in polite conversation, so it makes sense that most people would be ignorant of the fact that there was inter-population variation on this phenotype.

But it doesn’t end there. Over the past five years the genetics of earwax has come back into the spotlight, because of its variation and what it can tell us about the history and evolution of humans since the Out of Africa event. Not only that, it seems the variation in earwax has some other phenotypic correlates. The SNPs in and around ABCC11 are a set where East Asians in particular show signs of being different from other world populations. The variants which are nearly fixed in East Asia around this locus are nearly disjoint in frequency with those in Africa. Here are the frequencies of the alleles of rs17822931 on ABCC11 from ALFRED:

ResearchBlogging.orgThe expression of the dry earwax phenotype is contingent on an AA genotype, it has recessive expression. So in a population where the allele frequency of A ~0.50, the dry earwax phenotype would have a ~0.25 frequency. In a population where the A allele has a ~0.20 frequency, the dry earwax phenotype would be at ~0.04 frequency. Among people of European descent the dry earwax phenotype is present at proportions of less than ~5%. Because of recessive expression a larger minority of Japanese and Chinese should manifest wet earwax, though interestingly the ALFRED database indicates that Koreans are fixed for the A allele. In Africa conversely the G allele seems to be fixed.

So the question is: why? A new paper in Molecular Biology and Evolution argues that the allele frequency differences are a function of positive directional selection since humans left Africa ~100,000 years ago. The impact of natural selection on an ABCC11 SNP determining earwax type:

A nonsynonymous single nucleotide polymorphism (SNP), rs17822931-G/A (538G>A; Gly180Arg), in theABCC11 gene determines human earwax type (i.e., wet or dry) and is one of most differentiated nonsynonymous SNPs between East Asian and African populations. A recent genome-wide scan for positive selection revealed that a genomic region spanning ABCC11LONP2, and SIAH1 genes has been subjected to a selective sweep in East Asians. Considering the potential functional significance as well as the population differentiation of SNPs located in that region, rs17822931 is the most plausible candidate polymorphism to have undergone geographically restricted positive selection. In this study, we estimated the selection intensity or selection coefficient of rs17822931-A in East Asians by analyzing two microsatellite loci flanking rs17822931 in the African (HapMap-YRI) and East Asian (HapMap-JPT and HapMap-CHB) populations. Assuming a recessive selection model, a coalescent-based simulation approach suggested that the selection coefficient of rs17822931-A had been approximately 0.01 in the East Asian population, and a simulation experiment using a pseudo-sampling variable revealed that the mutation of rs17822931-A occurred 2006 generations (95% credible interval, 1023 to 3901 generations) ago. In addition, we show that absolute latitude is significantly associated with the allele frequency of rs17822931-A in Asian, Native American, and European populations, implying that the selective advantage of rs17822931-A is related to an adaptation to a cold climate. Our results provide a striking example of how local adaptation has played a significant role in the diversification of human traits.

The region around ABCC11 has come under scrutiny with the emergence of tests of natural selection predicated on inspecting patterns of linkage disequilibrium (LD). LD is basically measuring the association of genetic variants within the genome shifted away from expectation. A selective sweep tends to generate a lot of LD around the target of natural selection because as the allele in question rises in frequency its neighbors also hitchhike along. The hitchhiking process means that within a population you may see regions of the genome which exhibit long sequences of correlated single-nucelotide polymorphisms (SNPs), haplotypes. An initial selective event will presumably generate a very long homogenized block, which over time will break apart through recombination and mutation, as variation is injected back into the genome. The extent and decay of LD then can help us gauge the time and strength of selection events.

But LD can emerge via other processes besides natural selection. Imagine for example that a population of Africans and Europeans mix in a given generation. Europeans and Africans have different genetic makeups, on average, so the initial generations will have more LD than expectation because recombination will only slowly break apart the physical connection between genomic regions from European and African ancestors. The decay of LD then can give one a sense of the time since admixture as well as selection. Not only that, stochastic demographic events and processes are also important and may drive the emergence of LD. Consider a bottleneck where the frequency of a particular haplotype is driven up by random genetic drift alone. The details of these alternative scenarios are explored in the 2009 paper The role of geography in human adaptation.

All this is preamble to the fact that there’s a lot of LD around ABCC11. Here’s a visualization from the HapMap populations:


abc11From left to right you have Chinese & Japanese, Utah whites, and the Yoruba from Nigeria. An absolute value of D’ ~0 means that there’s linkage equilibrium; the default or null state where there are no atypical excessive correlations of alleles across the genome. The axes here are pairwise combinations of SNPs around ABCC11, with a focus around rs17822931, a nonsynonymous SNP which seems to be the likely functional source of the variance in earwax and other phenotypes. In terms of LD rank order the results are not surprising, across the genome East Asians tend to exhibit more LD than Europeans, and Europeans exhibit more LD than the Yoruba. Part of this is probably a function of population history, a serial bottleneck model Out of Africa would posit that drift and other stochastic forces would have a stronger impact on the genomes of East Asians than Europeans. But this seems like it can’t be the whole picture here; note the variance in allele frequency in the New World as well as in Oceania. Some of the Amerindian populations seem to have a higher frequency of the ancestral G allele on rs17822931. The figure above is easier to understand, the Y-axis is showing you the extent of heterozygosity at a given location. GA is heterozygous, GG is homozygous. Africans again tend to exhibit more heterozygosity than non-Africans, but note the sharply diminished heterozygosity for the East Asian sample around rs17822931 in ABCC11. Remember that heterozygosity tends not to go above 0.50 in a random mating population in a diallelic model (though in selective breeding it may go above 0.50 for F1 generations).

The major findings of this paper beyond what was known before seem to be a) an explicit model of how East Asians could have arrived at a high frequency of the AA genotype at rs17822931, and, b) the correlation between climate and the frequency of A. I’ll get to the second point in a bit, but what about the first? Using the nature of variation in two microsatellites flanking the SNP of interest in East Asians, and assuming a recessive selection model, the authors posit that the A allele began to rise in frequency ~50,000 years ago, and, that the selection coefficient was ~1% per generation. This a significant value for the selection parameter, and the timing is possible in light of the separation of non-Africans into a western and eastern group around that period.

But honestly I’m pretty skeptical of this. The confidence intervals don’t inspire confidence, and from what little I know selection for recessive traits should exhibit less linkage disequilibrium. At low frequencies there is very little affect of natural selection on the allele because it is mostly “masked” in heterozygotes, and therefore there will be a long period before its proportion begins to rise more rapidly. During this time recombination will have time to chop up the haplotypes around the SNP, reducing the length of the statistically associated haplotype block. Also, the authors themselves don’t seem to believe that the phenotype of earwax itself was the target of selection, so its recessive expression pattern should be less important from where I stand.

abcc11dThe idea that the genes around ABCC11 might have something to do with adaptation to cold is suggestive, but almost every East Asian trait of distinction has been hypothesized to have something to do with cold at some point by physical anthropologists. You’d figure that the Cantonese lived in igloos going by all the myriad adaptations to frigid conditions which they exhibit. The reality is that much of China, Korea and Japan are subtropical today. In any case the last figure shows the correlation across several lineages. Earlier they found that by comparing variation around this region in humans with other primates that Africans seem to be subject to purifying selection. This means that there’s constraint so that neutral forces don’t change the frequencies of functionally significant regions. It is well known that on average Africans are more diverse than non-Africans, probably because the latter are a sampling of the former, but, on a small minority of genes the reverse is true. This is likely due to the relaxation of functional constraint as humans left the ancestral African environment. And this is clearly true for rs17822931; most non-African populations exhibit some heterozygosity. East Asians here are an exception, not the rule, at having derived allele frequencies nearly fixed. The regression lines in this last figure are all statistically significant. It is interest that there are particularly strong correlations between latitude and and frequency of the derived A allele among Europeans and Native Americans. In contrast the relationship within Asian populations is weaker. Only 17% of the allele frequency variance can be explained by latitude variance among the Asian ALFRED sample.

But we shouldn’t allow the hypothesis to rise and fall just on this evidence. After all there have likely been substantial movements of populations within the last 10,000. Perhaps especially in East Asia, where the expansion of the Han south may have triggered the movement of both the Thai and Vietnamese people out of South China and into mainland Southeast Asia. The best evidence of adaptation would be among admixed populations; presumably those at higher latitudes would have higher frequencies of the AA genotype than those at lower latitudes. Instead of categorizing the populations into three coarse classes probably a more sophisticated treatment using ancestral quanta derived from STRUCTURE or ADMIXTURE as independent variables would be informative. Remember, adaptation should show evidence of decoupling ancestry from phenotype.

Finally, I have to point to this section of the discussion:

What is the cause of the selective advantage of rs17822931-A? Although the physiological function of earwax is poorly understood (Matsunaga 1962), dry earwax itself is unlikely to have provided a substantial advantage. The rs17822931-GG and GA genotypes (wet earwax) are also strongly associated with axillary osmidrosis, suggesting that the ABCC11 protein has an excretory function in the axillary apocrine gland (Nakano et al. 2009)…,

I really didn’t know what this meant. So I looked it up. Here’s what I found, A strong association of axillary osmidrosis with the wet earwax type determined by genotyping of the ABCC11 gene:

Apocrine and/or eccrine glands in the human body cause odor, especially from the axillary and pubic apocrine glands. As in other mammals, the odor may have a pheromone-like effect on the opposite sex. Although the odor does not affect health, axillary osmidrosis (AO) is a condition in which an individual feels uncomfortable with their axillary odor, regardless of its strength, and may visit a hospital. Surgery to remove the axillary gland may be performed on demand. AO is likely an oligogenic trait with rs17822931 accounting for most of the phenotypic variation and other unidentified functional variants accounting for the remainder. However, no definite diagnostic criteria or objective measuring methods have been developed to characterize the odor, and whether an individual suffers from AO depends mainly on their assessment and/or on examiner’s judgment. Human body odor may result from the breakdown of precursors into a pungent odorant by skin bacteria….

Perhaps the paper should have been titled “why barbarians smell bad”? In any case, an idea for a book title on Korean genetics: “the least smelly race.”*

Citation: Ohashi J, Naka I, & Tsuchiya N (2010). The impact of natural selection on an ABCC11 SNP determining earwax type. Molecular biology and evolution PMID: 20937735

* I’m referencing The Cleanest Race.

August 27, 2010

Chosen genes of the Chosen People

ashjewheadshotLast spring two very thorough papers came out which surveyed the genetic landscape of the Jewish people (my posts, Genetics & the Jews it’s still complicated, Genetics & the Jews). The novelty of the results was due to the fact that the research groups actually looked across the very diverse populations of the Diaspora, from Morocco, Eastern Europe, Ethiopia, to Iran. They constructed a broader framework in which we can understand how these populations came to be, and how they relate to each other. Additionally, they allow us to have more perspective as to the generalizability of medical genetics findings in the area of “Jewish diseases,” which for various reasons usually are actually findings for Ashkenazi Jews (the overwhelming majority of Jews outside of Israel, but only about half of Israeli Jews).

Just as the two aforementioned papers were deep explorations of the genetic history of the Jewish people, and allowed for a systematic understanding of their current relationships, a new paper in PNAS takes a slightly different tack. First, it zooms in on Ashkenazi Jews. The Jews whose ancestors are from the broad swath of Central Europe, and later expanded into Poland-Lithuania and Russia. The descendants of Litvaks, Galicians, and the assimilated Jewish minorities such as the Germans Jews. Second, though constrained to a narrower population set, the researchers put more of an emphasis on the evolutionary parameter of natural selection. Like any population Jews have been impacted by drift, selection, migration (and its variant admixture), and mutation. Teasing apart these disparate parameters may aid in understanding the origin of Jewish diseases.

ResearchBlogging.orgThe paper is open access, so you don’t have to take my interpretation as the last word. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population:

The Ashkenazi Jewish (AJ) population has long been viewed as a genetic isolate, yet it is still unclear how population bottlenecks, admixture, or positive selection contribute to its genetic structure. Here we analyzed a large AJ cohort and found higher linkage disequilibrium (LD) and identity-by-descent relative to Europeans, as expected for an isolate. However, paradoxically we also found higher genetic diversity, a sign of an older or more admixed population but not of a long-term isolate. Recent reports have reaffirmed that the AJ population has a common Middle Eastern origin with other Jewish Diaspora populations, but also suggest that the AJ population, compared with other Jews, has had the most European admixture. Our analysis indeed revealed higher European admixture than predicted from previous Y-chromosome analyses. Moreover, we also show that admixture directly correlates with high LD, suggesting that admixture has increased both genetic diversity and LD in the AJ population. Additionally, we applied extended haplotype tests to determine whether positive selection can account for the level of AJ-prevalent diseases. We identified genomic regions under selection that account for lactose and alcohol tolerance, and although we found evidence for positive selection at some AJ-prevalent disease loci, the higher incidence of the majority of these diseases is likely the result of genetic drift following a bottleneck. Thus, the AJ population shows evidence of past founding events; however, admixture and selection have also strongly influenced its current genetic makeup.

The sample size of Ashkenazi Jews was ~400, and they looked at ~700,000 SNPs. As I said, how Jews relate to other populations really isn’t at the core of this paper as it was in the earlier ones from the spring, but there were the PCA plots (sorry Mike), a frappe bar plot, and a phylogenetic tree derived from Fst statistic. Again, remember that PCA is showing you the largest independent components of genetic variation within the data. The bar plot has a set of ancestral populations of which individuals are composites of. And finally, Fst measures between population component of genetic variation. The larger the Fst across two populations the bigger the genetic distance.

Using the Druze & Palestinians as the ancestral Middle Eastern reference the authors estimated that the European admixture into Ashkenazi Jews is on the order of 30-55%. This is in the same ballpark as the previous studies, so no great surprise. As I stated in earlier posts the authors can spin the same results in very different ways. From what I can tell these authors are inclined to emphasize the strong possibility that in terms of genetic distance Ashkenazi Jews are somewhat closer to Europeans than they are to Levantine Arabs. Of course these sorts of assertions need to be handled with care. The genetic distance between Ashkenazi Jews and Tuscans is less than half that between Ashenazi Jews and Russians, while the Jewish-Russian value is about 50% larger than the Jewish-Palestinian one. Remember that there’s a fair amount of circumstantial evidence that Tuscans may themselves be a relatively recent hybrid population between indigenous residents of the Italian peninsula and Near Easterners.

ashjtab1One thing that this paper does do is rebut any strong assertion that Ashkenazi Jews are a genetically homogeneous population which went through a powerful bottleneck. Basically, the idea that Jewish diseases are just an outcome of the operational inbreeding that occurs when genetic variation is expunged from a population through low effective population size. The clincher seems to be comparison of heterozygosity of Ashkenazi Jews and gentile Europeans. The former are actually somewhat more heterozygous than the latter. There’s been a bit of evidence from previous research that the long term effective population size of Ashkenazi Jews was not necessarily very small, so this isn’t a total surprise. Remember that heterozygosity simply means the fraction of individuals heterozygous at a locus.

One way you can become heterozygous is naturally admixture. Remember that populations differ across many genes. As an example, there’s a pigmentation gene, SLC24A5, where all Europeans are at one state, and all West Africans in another. Naturally African Americans exhibit much more heterozygosity on this locus than the ancestral populations. The Ashkenazi Jewish case is less extreme because the two parental populations are genetically closer, but the principle still holds.

A consequence of recent admixture between genetically different populations are high levels of linkage disequilibrium, non-random associations of alleles at different loci across the genome. Why? There are many genes where two populations may be very different. Offspring inherit half their genome from one parent, and half from the other, and the parents pass along to their offspring particular associations of alleles. There may be a set of European distinctive alleles on a chromosome, and an African distinctive set of alleles, so that in a hybrid individual the alleles are strongly correlated across loci. These associations are broken down over time by recombination. The regularity of this process can serve as a clock with which to measure the period since admixture. African Americans were used to calibrate the time since admixture for the Uyghur people of western China, who are mixed from West and East Eurasian populations. The authors did not do this in this paper, I assume because the ancestral populations were genetically rather close in comparison to the two above examples, so there’d be less linkage disequilibrium to break down in the first place.

In the Ashkenazi Jewish population they found more linkage disequilibrium than in Europeans as well as longer haplotypes. This could be the result of a population bottleneck where drift could drive up the frequency of blocks of the genome, but as they note in the paper that should probably reduce heterozygosity. The natural inference then is that admixture between distinct populations can explain both data points.

ashslselectBut let’s cut to the chase. What genes exhibit signatures of natural selection in Ashkenazi Jews? More precisely, what distinctive regions of the genome exhibit signatures of natural selection? They used the standard haplotype type based methods. Basically you’re looking for regions of the genome where there are long blocks of correlated alleles, signs of a selective sweep due to a favored variant which dragged along flanking genomic regions as it rose rapidly in frequency, more rapidly than recombination could break apart the associations. Because recombination does breaks up associations over time, you need the selective sweeps to be relatively recent to detect them with these methods. Since the Jewish people, and Ashkenazi Jews more particularly, are relatively recent historically timing shouldn’t be an issue for Jewish specific sweeps. But another factor is that the two primary tests they used, EHH and iHS, are not good at picking up sweeps which are just starting. EHH is geared toward sweeps which are almost complete, so the frequency of the selected allele is near 100%. iHS is better are mid-range values. Using a combination of these two techniques they found that six genes which are implicated in diseases characteristic of Ashkenazi Jews have the hallmarks of natural selection. Natural selection is self-evident, so what seems to have been going here is that the disease was simply a side effect or byproduct of adaptation.

The strongest signal they found was in ALDH2. The strongest signal in Europeans, LCT, was not found in Ashkenazi Jews. But is LCT a strong signal in Europeans? Many Southern European populations have low frequencies of the derived LCT allele, indicating that they haven’t been subject to strong selection for lactase persistence. These are the same populations genetically close to the Ashkenazi Jews. The authors suggest that the Jewish-European admixture occurred before the sweep of the derived LCT allele, but it seems more plausible that the Ashkenazim simply admixed with a European population, such as Italians, which do not exhibit much lactase persistence. As for ALDH2, the association between genetic variation on this locus and alcoholism is well known, and has been used to explain the low Jewish rates of the disease. In this case, the authors posit that protection from alcoholism is a positive side effect of natural selection:

The mechanism driving selection of the ALDH2 locus is unknown, but a plausible target of selection also within this selected region is the TRAFD1/FLN29 gene, which is a negative regulator of the innate immune system, important for controlling the response to bacterial and viral infection (49). TRAFD1/FLN29 may have conferred a selective advantage in the immune response to a pathogen, perhaps near the time that the Jews returned to Israel from their Babylonian captivity. Despite the unclear selective mechanism, this remains a remarkable example of a putatively selected region accounting for a known population phenotype.

Many of the other loci naturally did not show signatures of natural selection. But this sort of work is exploratory, and there are limits to the power of their techniques. As it is, it seems that we’re very far along on understanding the phylogenetic tree of the Jewish people, and we’re finally getting a grip on the exogenous parameters which might prune the branches.

Citation: Steven M. Bray, Jennifer G. Mulle, Anne F. Dodd, Ann E. Pulver, Stephen Wooding, & Stephen T. Warren (2010). Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population PNAS : 10.1073/pnas.1004381107

Related: John Hawks, New data on Ashkenazi population history.

Image Credit: Wikimedia

July 21, 2010

Disease as a byproduct of adaptation

How we perceive nature and describe its shape are a matter of values and preferences. Nature does not take notice of our distinctions; they exist only as instruments which aid in our comprehension. I’ve brought this up in relation to issues such as categorization of recessive vs. dominant traits. The offspring of people of Sub-Saharan African and non-African ancestry where the non-African parent has straight or wavy hair tend to have very curly hair. Therefore, one may say that the tightly curled hair form is dominant to straight or wavy hair. But, it is also the case that there is some modification in relation to the African parent in the offspring, so the dominance is not complete. When examining the morphology of the follicle, which determines the extent of the hair’s curl, the offspring may in fact exhibit some differences from both parents. In other words our perception of the outcomes of inheritance are contingent to some extent on our categorization of the traits as well as our specific focus along the developmental pathway.

Or consider the division between “traits” and “diseases.” The quotations are necessary. Lactose intolerance is probably one of the best cases to illustrate the gnarly normative obstructions which warp our perceptions. As a point of fact lactose intolerance is the ancestral human state, and numerically predominant. It is the “wild type.” Lactose tolerance is a relatively recent adaptation, found among a variety of West Eurasian and African populations. A more politically correct term, lactase persistence, probably better encapsulates the evolutionary history of the trait, which has shifted from the class of disease to that of genetic trait when we evaluate the bigger picture (obviously diseases are simply “bad” traits”).

Sometimes though the issues are more cut & dried. No one would doubt that sickle-cell anemia is a disease. It has a major fitness impact in a colloquial sense, as well as evolutionarily. It kills you, and it kills your potential genetic lineage. But, it is also a byproduct of adaptation to endemic malaria. Sickle-cell disease one of the classical illustrations of heterozygote advantage, whereby those who carry one copy of the mutation on the gene have increased fitness vis-a-vis those who carry two normal copies of the gene. The increase in frequency of the mutant gene though is balanced by the fact that mutant homozygotes have decreased fitness.

We can then construct a narrative of the long term evolutionary dynamics from this initial condition. When a new exogenous stress hits a population mean fitness drops immediately (take a look at the biographies of the Popes, and observe how many died of malaria in the Dark Ages when that disease was new to Italy). Natural selection quickly increases in frequency any alleles which confer protection against the exogenous stress. But, baked into the cake of how genetics in complex organisms usually works, one allele may often have multiple downstream consequences. This is pleiotropy. This means that if a change at a locus increases aggregate fitness, it may nevertheless destabilize long established biochemical pathways. In the short term evolution simply takes the net fitness impact into account. Over the long term one assumes that “better solutions” will emerge which do not have so high a fitness drag, perhaps through the evolution of modifier genes which mask the deleterious outcomes of the initial mutant. This sort of ad hoc trial and error and “duct-taping” of kludges is part and parcel of how adaption works in situations where shocks out of equilibrium states are common.

In many cases the byproducts of a genetic change may be benign. To my knowledge no one knows major negative consequences of carrying the alleles which confer lactase persistence (excepting some studies indicating higher obesity, but this seems a marginal fitness impact which has only come to the fore in the past century in all likelihood). But in other cases the outcomes may not be as serious as that of sickle-cell anemia, but may rise above the level of significance where one must note the existence of a disease which is a secondary consequence of adaptation to meet a new challenge.

Yesterday I pointed to a paper which illustrates just this phenomenon, Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans:

African-Americans have higher rates of kidney disease than European-Americans. Here, we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 {FSGS odds ratio = 10.5 [95% confidence interval (CI) 6.0 to 18.4]; H-ESKD odds ratio = 7.3 (95% CI 5.6 to 9.5)}. The two APOL1 variants are common in African chromosomes but absent from European chromosomes, and both reside within haplotypes that harbor signatures of positive selection. Apolipoprotein L-1 (ApoL1) is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.

In its implementation the paper has a lot of moving parts, but the outcome is straightforward. If you haven’t, you might read Genomes Unzipped and its post How to read a genome-wide association study. This is a case where the original association studies were not reporting false results, but, it seems that one had to take a further step to really understand the likely molecular genetic and evolutionary underpinnings of what was going on. These results suggest that the original signals of association for variants within the MYH9 gene were actually signals from within APOL1, which happened to be next to MYH9. The region around MYH9 had already showed up in tests to detect natural selection through patterns of linkage disequilibrium (non-random associations of alleles at different loci within the genome, in this case the relevant consideration are adjacent loci across continuous regions of the genome which come together to form haplotype blocks). Since the footprint of natural selection on the genome is often wide that did not imply that MYH9 was the target of natural selection per se, opening the likely possibility for other causal associations. A convenience in light of the difficulty of establishing a plausible functional relationship between renal failure and MYH9.

To explore the possibility of nearby functional candidates the researchers focused on a number of alleles within this genomic region which exhibited maximal European-African frequency differences in the 1000 Genomes Project. Once they ascertained the between population differences they then looked at differences in allele frequencies in cases and controls within the African American population for the two diseases in question (those with the trait/disease vs. those without). Table 1 has the top line raw results:


WT = “Wild Type,” the ancestral allelic variant found in most populations. G1 and G2 are two haplotypes, associated alleles across the locus of the APOL1 gene. G1 consists of the two derived non-synonymous coding variants rs73885319 (S342G) and rs60910145 (I384M) within an exonic region of APOL1. Non-synonymous simply means that a change at that base pair alters the amino acid coded, and exons are the genomics regions whose information is eventually translated into proteins. In other words, these are non-neutral functionally significant genomic regions which do something. G2 is a 6 base pair deletion, rs71785313, close to G1 in APOL1.

apo12To more formally model the relationship between the alleles which are found to differ between cases and controls they performed a logistic regression. The alleles serve as independent variables which can predict the probable outcome of the dependent variable, the probability of FSGS or H-ESKD in this case (renal failure). Figure 1 to the left has a summary of some of the results of the regression in graphical form for FSGS. I’ve rotated it so it can fit on the screen. Basically the strong signals are to the right of the chart (from your perspective). The y-axis displays (horizontal from your perspective) negative-log of p-values for a signal at a particular marker, which is defied by the x-axis (vertical for you). The labels show the particular gene at that genomic position. The smaller the p-value, the more probable that the signal is real and not random. This produces huge spikes in the negative-log values (in the body of the paper they present p-values on the order of 10-35).

You can see that it is in APOL1 that the biggest signals reside. The first panel, A, throws all the SNPs into the mix. On MYH9 they highlight a few SNPs which combine to form the E-1 haplotype, which is strongly associated with cases (this is where the association between disease and genetic variants on MYH9 are coming from). This haplotype is found in conjunction with G1 and G2 on APOL1. E-1 is present in 89% of haplotypes carrying G1 and in 76% of haplotypes carrying G2. A classic illustration of likely correlation but not causation. The second panel controls for the effect of G1. In other words, this is showing you the variation in the dependent variable that remains after you take the largest independent variable, G1, into account. The G2 haplotype is the largest effect independent variable after G1 is taken into account; in other words, it explains most of the residual variation in FSGS probability. Finally, the last panel controls for both G1 and G2. As you can see there aren’t any major signals left; the distribution is relatively flat. Logically once you account for the variables which produce change in an outcome you shouldn’t see any impact of other variables. And that’s what happens here. They also performed controls where MYH9 was held constant, and that does not eliminate the signals in APOL1. MYH9 is conditional on its correlation with APOL1. This was the correlation which showed up on the original association studies. The exact same pattern of signals within the logistic regression model was replicated for H-ESKD. G1 had the strongest signal, then G2. The markers within MYH9 was not significant once one controlled for the variants in G1 and G2.

It is important to remember though that these markers are segregating within a human population where individuals have three potential genotypes. Ancestral homozygote, homozygote for the mutants, and heterozygote. They found that a recessive model of expression of disease is most appropriate in the case of these risk alleles. That is, most of the increased risk is accounted for by the change from one risk allele, the heterozygote state, to two risk alleles, the homozygote state. One risk allele increased odds of renal failure by 1.26, but two by 7.3. The odds ratio of two risk alleles compared to a base rate of one risk allele was 5.8. They report that the results for FSGS were broadly similar. This matters because the frequency of the trait/disease in a random mating population is conditional on the homozygotes if it has a recessive expression pattern. G1 was present in 40% of Yoruba HapMap data set, but in none of the two Eurasian groups, Europeans and East Asians. G2 was found in three Yoruba, but in none of the Eurasian groups. Assuming Hardy-Weinberg equilibrium the Yoruba should have 16% of the population at sharply elevated risk for FSGS and H-ESKD because they’d be homozygotes for the G1 allele.

Once they established which markers seem to implicated in this phenotypic variation, they wanted to focus on how the frequencies of those markers came to be. Specifically, G1 and G2 seem to be derived haplotypes which arose out of the ancestral background. In plain English 20,000 years ago Africans should have looked like all non-Africans genomically, at least on the functionally relevant segments, but within the last 10,000 years it looks like new variants rose in frequency driven by natural selection to new environmental stresses. The region has already broadly been surveyed by linkage disequilibrium based tests, which basically look for regions of long haplotypes, homogenized zones of the genome where many individuals have the variation removed because one gene rose so rapidly in frequency that huge adjacent sections hitchhiked up in frequency. Presumably this may have happened with the MYH9 haplotype correlated with the traits under consideration here; G1 and G2 dragged up the E-1 haplotype as a secondary consequence of their own rise to prominence among some Sub-Saharan African populations.

So next authors turned to tried & tested techniques and focused on the risk markers which they had discovered earlier in their research, G1 and G2. Specifically, EHH, which is best at detecting selection where sweeps have nearly completed (e.g., the derived variant is at frequency 0.95 within the population), iHS, which is best at detecting sweeps which have not completed (e.g., the derived variant is at frequency 0.6), as well as ΔiHH, which I am less familiar with but is reputedly similar to iHS but uses absolute haplotype length as opposed to relative haplotype length. Figure 2 show the results of these tests:


The resolution isn’t the best, but G1 and G2 seem to be outliers on all three tests to detect natural selection by using patterns of linkage disequilibrium. The first panel is EHH, the second and third show iHS and ΔiHH respectively, with the position of the markers being outliers among the distribution of values for the genome within the Yoruba. This is not proof of adaptation, but it changes our weights of possibilities. Additionally, they note that Europeans exhibit no such patterns on these markers. Visually the position of the markers in the latter two panels would be closer to the mode of the distribution in Europeans.

To review, first they confirmed a causal relationship between a particular set of markers, haplotypes, and the traits of interest. Second, they confirmed that said markers seem to bear the hallmarks of genomic regions subject to natural selection. We know that focal segmental glomerulosclerosis (FSGS) end-stage kidney disease (H-ESKD), the traits whose relationship to the G1 and G2 haplotypes seem confirmed, are unlikely to be targets of positive natural selection. To get a better sense of that we need to look at Apol1, the protein product of APOL1, and what it does. At this point I’ll quote the paper:

ApoL1 is the trypanolytic factor of human serum that confers resistance to the Trypanosoma brucei brucei (T. brucei brucei) parasite…T. brucei brucei has evolved into two additional subspecies, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense, which have both acquired the ability to infect humans…T. brucei rhodesiense is predominantly found in Eastern and Southeastern Africa, while T. brucei gambiense is typically found in Western Africa, though some overlap exists…Since these parasites exist only in sub-Saharan Africa, we hypothesized that the APOL1 gene may have undergone natural selective pressure to counteract these trypanosoma adaptations. As an initial test of this hypothesis, we performed in vitro assays to compare the trypanolytic potential of the variant, disease-associated forms of ApoL1 proteins with that of the “wild-type” form of ApoL1 protein that is not associated with renal disease.

We’re talking about sleeping sickness. Here’s a description:

It starts with a headache, joint pains and fever. It is the kind you would expect to get over quickly. But after a while, things get worse. You fall asleep most of the time, are confused and get intense pains and convulsions.

If you do not get treatment, your body begins to waste away. Eventually, you slip into coma and die. This is human African trypanosommiasis, better known as sleeping sickness. If untreated, it kills 100% of its victims in a very short time.

Cheery. I think we have a plausible reason for natural selection to kick into overdrive! Or more specifically, we have a plausible external selection pressure which will drive fitness differentials which correlate with genetic variation. Increased probability of kidney disease seems preferable to this. In terms of the molecular genetics it looks like a factor, serum resistance-associated protein (SRA), produced by T. brucei rhodesiense binds to a specific location of Apol1, and that mutations at G1 and G2 change exactly that location within the protein. So these mutants may block the ability of T. brucei rhodesiense to turn off the body’s defenses against trypanosomes.

To test this they examined the in vitro lytic potential of serum produced by individuals carrying the G1 and G2 haplotypes against the three subspecies of of Trypanosoma. T. brucei brucei, which normal Apol1 can lyse, and T. brucei rhodesiense and T. brucei gambiense which can infect humans (endemic to eastern and western Africa respectively, though the former extends into west Africa as well).

- All 75 samples lysed brucie brucie

- None lysed brucie gambiense

- 46 samples lysed SRA-positive brucie rhodesiense, all 46 samples were from G1 or G2 carrying individuals

- The potency of G2 seemed higher than G1 against SRA-positive samples of brucie rhodesiense, though not SRA-negative samples, where G1 seemed as potent

- Recombinants of Apol1 which had only one of the two SNPs of the G1 haplotype were less effective against brucie rhodesiense than those which had both (G1 haplotype)

- Recombinants with G1 and G2 were not more effective against brucie rhodesiense than those with G2 alone

- Recombinants with G1 alone were more potent against SRA-negative brucie rhodesiense than those with G2 alone

- G2 was necessary and sufficient to block SRA binding to Apol1 and allow lysing of brucie rhodesiense. G1 did not block SRA binding to Apol1, but was still sufficient to lyse brucie rhodesiense, but far less potent against SRA-positive brucie rhodesiense than G2

It seems that the G1 and G2 haplotypes utilize different mechanisms to enable the lysing of invasive pathogens, and so prevent the development of sleeping sickness. Their means differ, but the ends are the same. The authors note that even minimal amounts of plasma serum produced by G2 individuals seems potent enough to block the binding of SRA to Apol1 and so enable lysis. And introduction of such plasma into the bloodstreams of individuals who do not have resistance may then be highly efficacious as a preventative treatment against sleeping sickness. They do note that they did not explore in detail the mechanism by which the G1 and G2 variants result in suscepbility to kidney failure, but that’s presumably for the future.

Finally, the second to last paragraph where they bring it all together:

It will be interesting to determine the distribution of these mutations throughout sub-Saharan Africa. In present-day Africa, T. brucei rhodesiense is found in the Eastern part of the continent, while we noted high frequency of the trypanolytic variants and the signal of positive selection in a West African population. Changes in trypanosome biology and distribution and/or human migration may explain this discrepancy, or resistance to T. brucei rhodesiense could have favored the spreading of T. brucei gambiense in West Africa. Alternatively, ApoL1 variants may provide immunity to a broader array of pathogens beyond just T. brucei rhodesiense, as a recent report linking ApoL1 with anti-Leishmania activity may suggest…Thus, resistance to T. brucei rhodesiense may not be the only factor causing these variants to be selected.

This is a very long review already. But, while I have your attention, I think I need to point to another paper on the same topic which has a slightly different twist. I won’t dig into the details with the same thoroughness as above, but rather I’ll highlight the value-add of this group’s contribution. It’s an Open Access paper, unlike the one above, so you can review it in depth yourself. Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene:

MYH9 has been proposed as a major genetic risk locus for a spectrum of nondiabetic end stage kidney disease (ESKD). We use recently released sequences from the 1000 Genomes Project to identify two western African-specific missense mutations (S342G and I384M) in the neighboring APOL1 gene, and demonstrate that these are more strongly associated with ESKD than previously reported MYH9 variants. The APOL1 gene product, apolipoprotein L-1, has been studied for its roles in trypanosomal lysis, autophagic cell death, lipid metabolism, as well as vascular and other biological activities. We also show that the distribution of these newly identified APOL1 risk variants in African populations is consistent with the pattern of African ancestry ESKD risk previously attributed to MYH9. Mapping by admixture linkage disequilibrium (MALD) localized an interval on chromosome 22, in a region that includes the MYH9 gene, which was shown to contain African ancestry risk variants associated with certain forms of ESKD…MYH9 encodes nonmuscle myosin heavy chain IIa, a major cytoskeletal nanomotor protein expressed in many cell types, including podocyte cells of the renal glomerulus. Moreover, 39 different coding region mutations in MYH9 have been identified in patients with a group of rare syndromes, collectively termed the Giant Platelet Syndromes, with clear autosomal dominant inheritance, and various clinical manifestations, sometimes also including glomerular pathology and chronic kidney disease…Accordingly, MYH9 was further explored in these studies as the leading candidate gene responsible for the MALD signal. Dense mapping of MYH9 identified individual single nucleotide polymorphisms (SNPs) and sets of such SNPs grouped as haplotypes that were found to be highly associated with a large and important group of ESKD risk phenotypes, which as a consequence were designated as MYH9-associated nephropathies…These included HIV-associated nephropathy (HIVAN), primary nonmonogenic forms of focal segmental glomerulosclerosis, and hypertension affiliated chronic kidney disease not attributed to other etiologies…The MYH9 SNP and haplotype associations observed with these forms of ESKD yielded the largest odds ratios (OR) reported to date for the association of common variants with common disease risk…Two specific MYH9 variants (rs5750250 of S-haplotype and rs11912763 of F-haplotype) were designated as most strongly predictive on the basis of Receiver Operating Characteristic analysis…These MYH9 association studies were then also extended to earlier stage and related kidney disease phenotypes and to population groups with varying degrees of recent African ancestry admixture…and led to the expectation of finding a functional African ancestry causative variant within MYH9. However, despite intensive efforts including re-sequencing of the MYH9 gene no suggested functional mutation has been identified…This led us to re-examine the interval surrounding MYH9 and to the detection of novel missense mutations with predicted functional effects in the neighboring APOL1 gene, which are significantly more associated with ESKD than all previously reported SNPs in MYH9.

Table one has the top line results. Focus on the first two rows, they’re “G1″ from the earlier study (that is, the two SNPs which combine to form the G1 haplotype).


Here’s a difference between the previous paper and this one: the table above uses cases and controls from African Americans and Hispanic Americans. The original paper which the genomic data on this sample is drawn from calculates the average ancestry of African, European and Native American in the two groups is as follows (I did some rounding to keep the values round):

African American – 85%, 10%, 5%
Hispanic American – 30%, 55%, 15%

Not surprisingly the Hispanic American sample here is mostly Puerto Rican and Dominican, explaining the greater African than Native American ancestry. Nevertheless, it is a sufficiently different genetic background to test the effects of the same marker against different genes. They confirmed the association of the markers of large effect in African Americans within the Hispanic cohort. The risk allele frequency in the African American control group is 21% vs. 37% in the cases. For Hispanic Americans are 6% and 23% for the same categories.

OK, now to the most interesting point in this short paper:

HIVAN has been considered as the most prominent of the nondiabetic forms of kidney disease within what has been termed the MYH9-associated nephropathies…We have reported absence of HIVAN in HIV infected Ethiopians, and attributed this to host genomic factors (Behar et al. 2006). Therefore, we examined the allele frequencies of the APOL1 missense mutations in a sample set of 676 individuals from 12 African populations, including 304 individuals from four Ethiopian populations…We coupled this with the corresponding distributions for the African ancestry leading MYH9 S-1 and F-1 risk alleles. A pattern of reduced frequency of the APOL1 missense mutations and also of the MYH9 risk variants was noted in northeastern African in contrast to most central, western, and southern African populations examined…Especially striking was the complete absence of the APOL1 missense mutations in Ethiopia. This combination of the reported lack of HIVAN and observed absence of the APOL1 missense mutations is consistent with APOL1 being the functionally relevant gene for HIVAN risk and likely the other forms of kidney disease previously associated with MYH9.

apo16Bingo. The previous paper focused on African Americans (along with the HapMap Yoruba). But the pattern of variation within Africa is interesting as well. Ethiopians are not quite like other Africans, having a great deal of admixture with populations from Arabia (many of the languages of highland Ethiopia are Semitic). But the majority of their ancestry remains similar to that of other Sub-Saharan Africans. As a point of contrast the ecology of Ethiopia differs a great deal from the rest of Sub-Saharan Africa because of its elevation, and concomitant frigidity. The mean monthly low in Addis Ababa is around 10 (50 for Americans) degrees and mean high 20-25 (high 60s to mid 70s for Americans). There isn’t much variation from month to month because of the low latitude, but the high elevation keeps the temperatures relatively moderate. Different environments result in different selection pressures, and Ethiopia has a very unique environment within Africa. The tsetse fly which serves as a vector forTtrypanosomes does not seem to be present in the Ethiopian highlands. The map above shows the distribution within Africa of one the markers which defines the G1 haplotype in the previous paper. Note that the modal frequency is in the west of Africa, and the frequency drops off to the east (though the geographic coverage leaves a bit to be desired if you look at the raw data which went into generating this map, which smooths over huge discontinuities).

One of the points I want to reemphasize from the tests of natural selection in the first paper is that these genetic adaptations are likely to be new, otherwise recombination would have broken up the long haplotypes and reduced linkage disequilibrium. New as in the last 10,000 years. It is interesting that a particular subspecies of Trypanosome which is immune to these genetic adaptations is endemic to west Africa. We may be seeing evolution in action here, or at least the arms race between man and pathogen where man is always one step behind. In contrast, the subspecies which is effectively diffused by the genetic adaptations reviewed here is present in higher numbers precisely in the regions where the resistance mutations are extant at lower proportions. Perhaps there are different mutations in these regions of Africa, not yet properly identified. Or perhaps the we’re seeing humans in this region at an earlier stage of the dance, so to speak.

Citation: Giulio Genovese, David J. Friedman, Michael D. Ross, Laurence Lecordier, Pierrick Uzureau, Barry I. Freedman, Donald W. Bowden, Carl D. Langefeld, Taras K. Oleksyk, Andrea Uscinski Knob, Andrea J. Bernhardy, Pamela J. Hicks, George W. Nelson, Benoit Vanhollebeke, Cheryl A. Winkler, Jeffrey B. Kopp, Etienne Pays, & Martin R. Pollak (2010). Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans Science : 10.1126/science.1193032

Citation: Tzur S, Rosset S, Shemer R, Yudkovsky G, Selig S, Tarekegn A, Bekele E, Bradman N, Wasser WG, Behar DM, & Skorecki K (2010). Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Human genetics PMID: 20635188

March 28, 2010

More on recombination & natural selection

A follow up to the post below, see John Hawks, Selection’s genome-wide effect on population differentiation and p-ter’s Natural selection and recombination. As I said, it’s a dense paper, and I didn’t touch on many issues.

Powered by WordPress