CAT | Population genetics
31
Beyond visualization of data in genetics
Comments off · Posted by Razib Khan in Genetics, Genomics, Population genetics
Hopefully by now the image to the left is familiar to you. It’s from a paper in Human Genetics, Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. The paper is interesting in and of itself, as it combines a wide set of populations and puts the focus on the extent of disjunction between self-identified ethnic identity, and the population clusters which fall out of patterns of genetic variation. In particular, the authors note that the “Native Hawaiian” identification in Hawaii is characterized by a great deal of admixture, and within their sample only ~50% of the ancestral contribution within this population was Polynesian (the balance split between European and Asian). The figure suggests that subjective self assessment of ancestral quanta is generally accurate, though there are a non-trivial number of outliers. Dienekes points out that the same dynamic holds (less dramatically) for Europeans and Japanese populations within their data set.
All well and good. And I like these sorts of charts because they’re pithy summations of a lot of relationships in a comprehensible geometrical fashion. But they’re not reality, they’re a stylized representation of a slice of reality, abstractions which distill the shape and processes of reality. More precisely the x-axis is an independent dimension of correlations of variation across genes which can account for ~7% of the total population variance. This is the dimension with the largest magnitude. The y-axis is the second largest dimension, accounting for ~4%. The magnitudes decline precipitously as you descend down the rank orders of the principle components. The 5th component accounts for ~0.2% of the variance.
The first two components in these sorts of studies usually conform to our intuitions, and add a degree of precision to various population scale relations. Consider this supplement chart from a 2008 paper (I’ve rotated and reedited for clarity):
The first component separates Africans from non-Africans, the latter being a derived population from a subset of the former. The second component distinguishes West Eurasians from East Eurasians & Amerindians. These two dimensions and the distribution of individuals from the Human Genome Diversity Project reiterates what we know about the evolutionary history of our species.
And yet I wonder if we should be careful about the power of these two-dimensional representation’s in constraining us excessively when we think about genetic variation and dynamics. Naturally there is the sensitivity of the character of dimensions upon the nature of the underlying data set upon which they rely. But consider this thought experiment,
Father = Japanese
Mother = Norwegian
Child = Half Japanese & Half Norwegian
If you projected these three individuals upon the two-dimensional representation above of the worldwide populations the father would cluster with East Asians, the mother with Europeans, and the child with the groups who span the divide, Uyhgurs and Hazaras. So on the plot the child would be far closer to these Central Asian populations than to the groups from which its parents derive. And here’s a limitation of focusing too much on two-dimensional plots derived from population level data: is the child interchangeable with a Uyghur or Hazara genetically in relation to their parents? Of course not! If the child was a female, and the father impregnated her, the consequence (or probability of a negative consequence) would be very different than if he impregnated a Uyghur or Hazara woman.
The reason for this difference is obvious (if not, ask in the comments, many readers of this weblog know the ins & outs at an expert level). Abstractions which summarize and condense reality are essential, but they have their uses and limitations. Unlike physics biology can not rely too long on elegance, beauty, and formal clarity. Rather, it always has to dance back between rough & ready heuristics informed by the empirics and theoretical systems which emerge from axioms. Usually a picture has its own sense. But the key is to be precise in understanding what sense it makes to you.
No tags
15
When diversity is good for disease
Comments off · Posted by Razib Khan in Adapation, Evolution, Genetics, Genomics, Madagascar, Malagasy, Malaria, P. vivax, Population genetics, science
Yesterday I pointed to a new paper, Plasmodium vivax clinical malaria is commonly observed in Duffy-negative Malagasy people. P. vivax is the least virulent of the malaria inducing pathogens, and it is presumably responsible for the fact that the Duffy antigen locus is one of the more ancestrally informative ones in the human genome. In most of Eurasia the the Duffy negative null allele* is present at very low frequencies, less than 5%, and often simply absent. In contrast, in Sub-Saharan Africa the Duffy negative variant reaches frequencies as high as 95% in West Africa, and and 90% in many other regions. In North Africa and the Middle East the frequencies are intermediate, likely due to the necessity for local adaptation to malaria in many regions, and the historical introduction of the Duffy negative allele via the slave trade.
Before genomics, looking at the Duffy locus was one simple way that geneticists ascertained the proportion of white admixture in the African American population. The Duffy negative allele was nearly absent in Europeans, and present in frequencies of ~95% in West Africa. Therefore, the ~70% frequency in African Americans indicates what we know from other sources, a substantial minority European contribution to their ancestry. The people of Madagascar are similar insofar as they are a byproduct of admixture between African and non-African populations. The source of the non-African ancestry is rather easy to determine, unlike most African countries Madagascar has one language, Malagasy, and it is of the Barito family of languages. Aside from Malagasy the Barito languages are spoke only in a small region of southern Borneo in Indonesia. There are other aspects of the Malagasy culture which make their Southeast Asian provenance clear. The photo above is of Andry Rajoelina, the current President of Madagascar. Two aspects of his visage are salient, his youth (he used to be a disk jockey!), and the fact that his features do not seem typical Sub-Saharan African. Many of the leaders of Madagascar, including the former royal family, are from the highlands where Asiatic features and folkways are more prevalent.
But there is also a clear African component to the Malagasy, more obvious among coastal populations, but also possibly dominant in a genetic sense in terms of proportion to the Asian according to research using uniparental markers. An analysis of Y lineage Fst genetic distances suggests that the Malagasy are, on the whole, somewhat closer to East Africans than to people from Borneo. I stipulate on the whole because as implied above there seems to be regional variation, which Southeast Asian ancestry and culture least hybridized with a Sub-Saharan African in the central highlands, likely for ecological reasons.
If the Duffy negative allele was viewed purely as a neutral locus, and so ancestrally informative, one would assume that the Malagasy were mostly African. In the figure to the left the red tinted portions represent Duffy negative proportions, the green Duffy positive, and the darker shade P. vivax positivity. The green star indicates a site where P. vivax positivity was only found among the Duffy positive, while at the sites with red stars it was found among both antigen state groups. As you can see at none of the sites was the Duffy positive allele modal, and at Andapa the frequency of Duffy negative was typical of much of Sub-Saharan Africa. In the total data set 72% of the individuals were Duffy negative. Going by the previous cited work this would underestimate Asian ancestry, which seems likely to be near parity, if not quite.
Two points come to mind:
1) It seems clear that the Duffy locus is not neutral. It is subject to natural selection, as even though the malaria caused by P. vivax is relatively mild, it t does reduce fitness. Natural selection should result in an increase in frequency of the negative allele in regions where malaria caused by P. vivax is endemic. In the American South malaria was not as extreme of a problem, nor does Duffy negative status have a strong side effect (e.g., sickle cell), so it was a neutral locus and appropriate to inform ancestry.
2) Modern African populations may not be an accurate representation of the allele frequencies of Duffy in the ancestral groups which contributed to the ancestry of the Malagasy. More plainly, the Africans who intermarried with the Barito speakers may have had much higher frequencies of Duffy positive alleles because natural selection had not proceeded so that the null allele was driven to near fixation.
To assess the plausibility of #2, one needs to know how the Malagasy, or more accurately, the speakers of the Barito language which became Malagasy, got where they are. Unfortunately, no one really knows, and the hypotheses are controversial because of their speculative nature. It seems likely that the Southeast Asian mariners initially arrived in the western Indian ocean region ~2,000 years ago, but widespread settlement of Madagascar’s interior may not have been occurring until ~1,000 years ago. By the 13th century there was a large Muslim city in the north of Madagascar integrated into the Indian ocean trade network, so Madagascar is on the fringes of written history at that point. The anthropological evidence seems to point to a sojourn on the coast of East Africa by Southeast Asians, as there are aspects of Malagasy culture which seem related to Bantu groups in that area. Additionally, there some genetic data which point to an African contribution on the mtDNA from populations further north on the coast, toward Kenya, and Y DNA which suggests a connection with the adjacent region of the continent in Mozambique. A model of how this could occur is that the initial colonists in East Africa picked up local wives along the northern coast, and eventually resettled in Madagascar. After this settlement there were periodic migration of Africans from nearby regions, either voluntary or forced through slavery, which added the later diversity. The fact that this component is male-biased would point to slavery of the sort practiced in the New World, whereby Africans were forced to work in agriculture and male robustness was prized (this is in contrast with much of the Middle East, where female African domestic servants were the primary driver of slavery).
One of the mysterious aspects of the arrival of the Malagasy is that there aren’t records by the literate polities which fringed the Indian ocean of their movements. But why should there be? Open ocean traders were generally marginal to these states, who simply extracted rents from the activities of the merchants and migrants. It seems entirely plausible that many populations have been on the move throughout history, their impact in particular regions slowly being ablated by time. There is one aspect of Africa which makes it entirely plausible that the Barito presence would disappear or be marginal: the local populations seem biologically very well adapted to the pathogens on the continent. It is notable for example that the Arab and Persian cultural influence in East Africa never spread inland beyond the Indian ocean littoral. And yet these groups were present on the East African coast from the time of the Romans on. It seems likely to me that Africa is relatively resistant to “back-migration” from Eurasia on ecological grounds. North Africa is part of the Palearctic ecozone, while the highlands of Ethiopia are also ecologically distinct. Both these regions are strongly shaped genetically by populations with Eurasian connections, in the former case predominantly so, but both they are exceptions which prove the rule.
The maps to the left show topography and population density respectively. In Madagascar in the highlands Southeast Asians could transfer wet rice agriculture, and also escape the most baleful influences of African diseases (which would naturally be introduced with African populations). It is also where there is the greatest population density. In contrast the coastal regions are more lightly populated and have more African influence. Like South Africa or the Kenyan highlands I believe that Madagascar was one region of Sub-Saharan Africa which was open to the settlement of outsiders who lacked biological defenses because of its ecology. Granted, it seems to have been unsettled before the Malagasy arrived, but if its pathogen environment was equivalent to that of the mainland I suspect that African genes and culture would have replaced the Malagasy component rather rapidly. The Malagasy are just one of many populations which made some sort of great trek. Most of them disappear, get absorbed or become extinct. But in a few rare cases, such as in that of Iberians in the 16th century, or Polynesians 2,000 years ago, and the Malagasy, these travelers encountered territory which they were able to settle easily. And so we have concrete evidence of their past existence, their present existence. You couldn’t plausibly invent the cultural makeup of Madagascar, because our model of history and human population movement is simplified, and all the outliers and rough edges have been hidden or consciously removed.
Though the highlands of Madagascar allowed the Southeast Asian settlers a refuge for endogenous population growth, which allowed them to perpetuate their culture and leave a stamp on the island, Madagascar is African, and much of the island is clearly suited for malaria. The evolutionary dynamics may be contingent on the peculiarities of the island’s demographic history, but they will still proceed nonetheless. It is noted in these results that though varieties of P. vivax seem to have moved from the Duffy positive to the Duffy negative segment of the population, it is still much more virulent in those who are Duffy positive. There were 15 times as many full blown cases of P. vivax induced malaria (as opposed to positive infection status) among those who were Duffy positive than among those who were negative. Nevertheless, the emergence of strains able to infect Duffy negative blood cells opens up the possibility for more virulent strains in the future which could result in many more cases of full blown malaria within this population.
Let me jump to the conclusion:
Our observations in Madagascar showing conclusive evidence that P. vivax is capable of causing blood-stage infection and disease in Duffy-negative people illustrate that in some conditions P. vivax exhibits a capacity for infecting human erythrocytes without the Duffy antigen. The data assembled in this study suggest that conditions needed to clear the barrier of Duffy negativity may include an optimal human admixture. In Madagascar with significant numbers of Duffy-positive people and full susceptibility of hepatocytes in Duffy negatives, P. vivax may have sufficient exposure to Duffynegative erythrocytes, allowing more opportunities for de novo selection or optimization of an otherwise cryptic invasion pathway that nevertheless seems less efficient than the Duffy-dependent pathway.
There are several issues that I’ve glossed over in this paper, and one of them is that there are other populations which have a mix of negative and positive individuals. Implicitly the American South is one. But malaria is not endemic in most of the South. But in Brazil there is a similar racial mixture, and its climate is conducive to tropical diseases. It seems there are issues with detecting the P. vivax pathogen within blood cells, and so earlier studies as to the possibility of the infection of those who were Duffy negative were often muddled or inconclusive. In this study they established the existence of this group rather clearly, but is it due to the peculiarities of Madagascar’s population mixture and history? True, Brazil also has an admixed population whose Duffy allele frequencies are interchangeable with that of Madagascar, but Brazil has been settled for only the past ~300 years or so, with much of the population being of more recent origin (Brazil had the highest slave attrition rate on the American mainland, which explains the African nature of Afro-Brazilian culture. Many of the slaves were from Africa, or first generation, at emancipation). A lower bound for Madagascar is ~1,000 years, and the coexistence of Barito and African populations is likely closer to ~2,000 years. So the existence of P. vivax lines which can penetrate the negative allele population may be a function of the longer time given to the emergence of adaptive strategies.
I suspect the fact that there is a component of what ecologists term “patchiness” in the settlement patterns of various populations and ecology in Madagascar might have aided in the persistence of the Duffy positive allele. It seems that in much of the rest of Africa once agriculture became common and the conditions for the mosquito which carries P. vivax improved the Duffy negative allele swept to fixation. At this point the P. vivax infection rates were so low that natural selection became less of an issue (the extant variation was reduced, and only a small proportion of the population would have been subject to selection). It is on marginal areas where fixation did not occur that you’d have the diversity which might allow for the emergence of different P. vivax lineages. Another place to look besides Madagascar would be the margins of Ethiopia, as well as South Africa, where Bantu farmers came up against a very different ecologies and populations which they could not assimilate, or did so only partly.
* Duffy is really the the antigen itself, so “Duffy negative” means lacking the antigen. But I’m going to use the shorthand Duffy negative to point to the alleles which confer this state, which have names such as FY*A and FY*B. The gene itself is DARC.
Citation: Ménard D, Barnadas C, Bouchier C, Henry-Halldin C, Gray LR, Ratsimbasoa A, Thonier V, Carod JF, Domarle O, Colin Y, Bertrand O, Picot J, King CL, Grimberg BT, Mercereau-Puijalon O, & Zimmerman PA (2010). Plasmodium vivax clinical malaria is commonly observed in Duffy-negative Malagasy people. Proceedings of the National Academy of Sciences of the United States of America, 107 (13), 5967-71 PMID: 20231434
Image credit: BBC, Wikipedia
No tags
26
Natural selection & recombination in the human genome
Comments off · Posted by Razib Khan in Genetics, Genomics, Human Genetic Variation, Population genetics
If you are like me, and if you are reading this weblog there is a significant probability you are like me, you read L. L. Cavalli-Sforza’s History and Geography of Human Genes in the 1990s, and in the early aughts Spencer Wells’ A Journey of Man. Science has come very far in the last in the last 10-15 years, even Cavalli-Sforza’s magnum opus pales in comparison to the literal tsunami of data and analysis which the “post-genomic era” has ushered in. Instead of a gene here and there, or even the mtDNA and Y chromosome, researchers are now looking at hundreds of thousands of genetic variants, SNPs, across genomes. We’re rapidly approaching the era of whole genome sequencing, even if we’re not quite there yet.
But what’s the purpose of advances in technique and computation? Though the long-term project is to understand human variation and genetic function so as to have biomedical utility, in the short-term there is an enormous wealth of more abstract population genetic insight which can be extracted. Because of the biomedical focus of contemporary genomics we take a somewhat anthropocentric view, which is fine by me as I am an unregenerate speciest. The fish, fowl and crawling things of the earth can come later. And in any case, the beauty of the human focus of modern evolutionary genomics is that there are whole disciplines such as paleoanthropology which can serve as partners in interdisciplinary projects.
Humans are like any other organism, buffeted by conventional evolutionary genetic dynamics, drift, migration, natural selection, as well as processes which are more biophysically rooted such as recombination and mutation. Each of these processes leave their tell-tale marks on the genome. Mutation replenishes variation which drift and selection often eliminate, the former by chance and the latter in the form of negative selection. Migration serves to homogenize across populations through gene flow, while diversifying within populations by introducing novel variants. Finally, recombination breaks up linear associations of genetic variants along a DNA sequence, and has been used to explain sex.
In regards to H. sapiens it seems that our recent evolutionary history is dominated by a few big events. Within the last 100,000 years we underwent an extremely rapid population expansion from a small founding group within Africa, and radiated adaptively across all continents except for Antarctica. We are a then a relatively genetically homogeneous population, with much of the extant variation remaining within Africa, and the non-African groups getting progressively less diverse with distance from that continent. Basically a model whereby our species spread across the world via serial founder events. This simple model suffices in the broad sketch, but there is much more to the story. Over the past few years the older idea that current continental populations are the descendants of the first settlers, that is, the first modern humans who displaced the archaic populations which preceded them, seems unlikely to be to totally correct in all cases. It is likely wrong in Europe and to some extent India, no trivial exceptions.
There is much which can be said about details of demographic history in regards to the possibility of mass migrations, but today I want to focus on another dynamic: the effect of natural selection on the human genome. There are some researchers who are very skeptical of the efficacy of selection in shaping the patterns of variation we see, constraining it to a few loci such as that which confers lactase persistence or resistance to malaria. Others feel that selection’s power in shaping the genome is far more pervasive. Finally, there is a middle path, which emphasizes a diverse and complex portfolio.
A new paper explores the extent and nature of selection in human genomes through combining a rather old population genetic statistic with new expanded data sets and powerful statistical techniques, Human Population Differentiation Is Strongly Correlated with Local Recombination Rate:
Allele frequency differences across populations can provide valuable information both for studying population structure and for identifying loci that have been targets of natural selection. Here, we examine the relationship between recombination rate and population differentiation in humans by analyzing two uniformly-ascertained, whole-genome data sets. We find that population differentiation as assessed by inter-continental FSTshows negative correlation with recombination rate, with FST reduced by 10% in the tenth of the genome with the highest recombination rate compared with the tenth of the genome with the lowest recombination rate (P≪10−12). This pattern cannot be explained by the mutagenic properties of recombination and instead must reflect the impact of selection in the last 100,000 years since human continental populations split. The correlation between recombination rate andFST has a qualitatively different relationship for FST between African and non-African populations and for FST between European and East Asian populations, suggesting varying levels or types of selection in different epochs of human history.
You know of FST, even if you don’t know what FST is. You have heard that 15% of the variation in human genes is between races, and 85% within races. That 15% is an FST of 0.15. In other words,FST is a population genetic statistic which partitions the variance in genes between and within populations. If you have two populations and both have allele frequencies of 0.50 for two alleles, A & B, at one locus, then the FST would naturally be 0 as there is no between population difference, you can swap individuals from either group interchangeably for purposes of comparison. In contrast if the frequencies were disjoint so that all individuals in one population were of allele A and all individuals in the other of B, then naturally the FST would be 1, as all the variance is between populations, and all the information you need is found within population substructure. A perfect opportunity for profiling!
But this varies by gene and genomic region. As you know from above most variance is within races, not between them. But for the gene SLC24A5 almost all the variance is between Europeans and Africans, not within them. Similarly, all the variance on this gene is between Europeans and East Asians. Finally, there is no variance between East Asians and Africans on this gene. Why? Because it looks like that this gene has recently increased in frequency in West Eurasia, to the point where a new mutation has replaced the ancestral variant, which is common in Africa and East Eurasia. Additionally, it is notable that this genetic variant seems to account for 30-40% of the skin tone difference between Africans and Europeans. The point is that total genome variation is not always a good indicator of the evolutionary history of a specific gene. This is a problem especially in the case of those which we might find of interest.
In the paper above they find that areas of high recombination are negatively correlated with FST on a global scale. In other words, genomic regions which recombine more often across DNA strands and so shuffle genetic variation about and break apart linear associations show lower FST values. Reduced between population variation.
It’s rather clear in their first figure. Before we jump to that, let me note that they’re using the Perlegen data set, which has two dozen African Americans, Chinese and Europeans, respectively, and 1 million SNPs. Each panel has the FST values on the Y axis, and the recombination rate on the X axis.

It’s pretty clear what’s going on just through inspection. There’s an average decrease of 4% in FST for every 1 cM increase in recombination rate. The correlation estimates for FST and median recombination rates are for each panel:
A −0.962 (P = 8.9×10−6)
B −0.815 (P = 0.0041)
C −0.931 (P = 0.0001)
D −0.361 (P = 0.306)
The correlation estimates tell you something you can see visually: there’s a big difference in the relationships contingent upon which populations you’re using to calculate FST. In particular, a lot of the linear relationship between FST and recombination rate is actually due to the African vs. non-African difference. This is not a total surprise, Africans have a lot of genetic variation. In terms of genes one can think of non-Africans as simply a branch of Africans in many ways. Additionally, there’s been suggestive data for a decade now that when Africans left Africa they were subject to new selection pressures which seem common to Eurasians as a whole (though to be fair these signatures of selective sweeps in Eurasia may simply be false positives generated by population bottlenecks and the like).
Yes, I will work back to selection from genetic variation; the two are related, though the relation can be subtle. So why the negative correlation between FST and recombination? Consider an SNP, a single DNA base pair, which is subject to positive selection. It can increase rapidly in frequency so that it goes from ~0 to ~1 in proportion with the population. Fair enough, but SNPs do not exist in an abstract universe, they’re physically embedded in DNA, and so are flanked by many other bases. If an SNP is subject to powerful directional selection which drives its frequency upward, then adjacent bases also “hitchhike” along in frequency. In other words, powerful directional selection can reorder the variation of whole genomic regions, depending on how powerful the selection is. It can sweep away the noisy scatter of variants introduced over many generations by mutation and replace them with a long sequence of alleles from an ancestral genome which harbored the selected variant. Over time mutation can mask the homogenization as it replenishes variation, but there is also another dynamic which blocks the long march to genetic uniformity: recombination. Recombination can tear apart blocks of alleles as they sweep up in frequency, and the more recombination, the greater the counterforce to the homogenizing power of selection on the local genome as the block is chopped up evermore.
Now consider the nature of selection in different populations. Let us stipulate that the light skin of East Asians and Europeans are adaptations; we know that they are conferred by different SNPs. In other words, selection operated on different genetic variants to produce the adaptation (though by and large across the same set of genes). Therefore, FST on pigmentation genes is relatively high because of between population difference, and these genes tend to be surrounded by regions of homogenization as they seem to have swept up to high frequency rapidly and dragged along many nearby alleles. Assuming this effect of natural selection equal recombination would naturally tend to work against increased FST by reducing the number of linked alleles being dragged along by breaking apart the genomic blocks along their transient up.
A similar effect occurs in the case of background selection, which is operative around deleterious mutations. When there are alleles which are negatively selected their neighbors are also effected; consider it a sort of population genetic “property value.” Because negative selection tends to reduce effective population size, on a large geographical scale it can result in increased FST (consider the genetic uniqueness of isolated populations which have gone through bottlenecks). Again, recombination can blunt the impact of a deleterious allele on its neighbors. The authors do note that the particular dynamics of positive and background selection differ, the latter is a gentler affair by far, a repetitive tap as opposed to a sledgehammer, but the genomic resolution of their analysis is such that they lack power to explore these differences.
I’ll let the authors describe the peculiarities of the European-Chinese panel:
…The weaker correlation for the FST between European and Chinese populations is driven by a dip in differentiation at very low recombination rate loci…which is not at all what is seen in the comparison of African and non-African populations…This curve shows a qualitatively non-monotonic pattern, which motivated us to perform a quadratic regression fitted within the bootstrapping framework. The regression is concave and includes very significant linear (P = 3.0×10−4) as well as quadratic (P = 1.8×10−5) terms. Conversely, quadratic regression gives a non-significant quadratic term for FST between African Americans and each of the other two populations and if anything is slightly convex. As expected, for single SNP analysis (without binning by recombination rate), linear regression is very significant for FST between African Americans and either non-African population (P≪10−12). For FST between Chinese and Europeans, however, linear regression is not significant (P = 0.81), while a quadratic regression is very significant (P≪10−12)….
Non-monotic is just a way to say that the trend reverses direction. In other words, the linear model isn’t too good a fit on what’s going on with the variation between Europeans and Chinese, and how it relates to recombination rates. They offer two speculative possibilities for the “inverted U-shaped” nature of the relationship between FST and recombination in Europeans and Chinese. First, the smaller effective population sizes of non-African groups results in greater efficacy of background selection. As random genetic drift tends to increase the frequency of deleterious alleles, powerful negative selection is given opportunity work against that region of the genome. This results in more background selection as adjacent genomic regions are impacted. Because of differing population sizes the balance between positive and background selection is different for Africans and non-Africans. A second hypothesis is that gene flow between the two Eurasian groups allowed for selective sweeps to move from one group to the other. In other words, between population variance can be reduced if a favored alleles spreads across all populations from one original group (lactase persistence in much of northern Eurasia may be a case of just this).
Intriguingly they found the linear relationship between FST and recombination to be stronger in genomic regions which are coding (i.e., they have genes which code for proteins), and in particular in the X chromosome. The second aligns with other recent work which indicates that the X chromosome may be subject to stronger selective pressures than the rest of the genome because of its peculiar expression pattern (males have only one copy of the gene, and females express only one copy per cell due to X chromosome inactivation).
Finally, they replicated their results using other data sets. I’ll just show the figure from HapMap3:

I obviously rotated so you could see the labels at higher resolution. Here are the populations:
WAF (“West African”) is a combined sample of YRI (Yoruba in Ibadan, Nigeria) and LWK (Luhya in Webuye, Kenya)
EAS (“East Asia”) is a combined sample of CHB (Han Chinese in Beijing, China), CHD (Chinese in Metropolitan Denver, CO, USA), and JPT (Japanese in Tokyo, Japan)
EUR (“Europe”) is a combined sample of CEU (ancestry from Northern and Western Europe) and TSI (Toscani in Italia)
GIH is a sample of Gujarati Indians in Houston, TX, USA
MKK is a sample of Maasai in Kinyawa, Kenya; and CHI (Chinese) is a combined sample of CHB and CHD.
Their analysis of these findings are cautious:
A striking result is that the relationship between FST and recombination rate is stronger for FST between pairs of closely-related populations, whether within or outside Africa: FST between a West African sample and Maasai (of mixed West African and East African ancestry…decreases by an average of 6% for every 1 cM/Mb…FST between Italians and individuals of North-Western European ancestry decreases by 10% for every cM/Mb …and FST between Japanese and individuals of Chinese ancestry decreases by 4%...In view of the large effective population size in recent human history since each of these pairs of populations have split, these observations support the possibility that the different patterns observed between different pairs of populations are due to natural selection operating more efficiently in the context of larger population sizes. We observed a weak convex relationship with recombination rate for FSTbetween closely-related populations in a quadratic regression analysis …which is intriguingly opposite to what was observed between Europeans and Asians…On the other hand, these observations do not seem to support the possibility that the different patterns are due to selective sweeps being shared to different extent across different pairs of populations since the level of gene flow between HapMap 3 closely-related populations likely have had been higher than that between continents. These results, while interesting, should be viewed with caution due to the confounder of ascertainment bias. It will be possible to test these observations further by analyzing data from the 1000 Genomes Project, where whole-genome sequencing will generate data that is largely free of ascertainment bias for many of the HapMap 3 populations as well as additional populations
The reason that selection would be more powerful at large population sizes is that the noise of random genetic drift is less likely to interfere with its deterministic process. Additionally, one presumes there would be more extant genetic variation in large populations than small ones. But though these results are interesting, they don’t seem to put too much stock in them.
Let me finish with the author’s conclusion:
In conclusion, we have shown that genome-wide human population differentiation in allele frequencies is significantly correlated with recombination rate on a megabase scale, demonstrating that natural selection has had a profound effect on allele frequency distributions averaged over the last hundred thousand years. While these results likely reflect the effects of hitchhiking and background selection, disentangling the strengths of these two forces will require extending the analyses presented in this paper. One important direction is to use genetic maps that have fine spatial resolution, which may shed light on the detailed distribution of selective coefficients that have shaped allele frequency differentiation. A second direction in which these results can be extended is to compare more populations of continentally diverse ancestry. This should facilitate an exploration of the relationship between recombination rate and population differentiation during different epochs of human evolution, and should allow a better understanding of how demographic history has shaped the impact of natural selection on patterns of human genetic variation.
Note: I left a lot out in this treatment. It’s Open Access so you can read the whole thing!
Citation: Keinan A, Reich D, 2010 Human Population Differentiation Is Strongly Correlated with Local Recombination Rate. PLoS Genet 6(3): e1000886. doi:10.1371/journal.pgen.1000886
No tags
6
Beautiful butterflies & localized adaptation
Comments off · Posted by Razib in Population genetics
Two new papers are out in PLoS Genetics which make inferences about adaptation using butterfly species which exhibit Mullerian mimicry. I’ll give the author summaries instead of the abstracts.
The diversity of wing patterns in Heliconius butterflies is a longstanding example of both Mullerian mimicry and adaptive radiation. The genetic regions controlling such patterns are “hotspots” for adaptive evolution, with small regions of the genome controlling major changes in wing pattern. Across multiple hybrid zones in Heliconius melpomene and related species, we no find no strong population signal of recent selection. Nonetheless, we find significant associations between genetic variation and wing pattern at multiple sites. This suggests patterning alleles are relatively old, and might be a better model for most natural adaptation, in contrast to the simple genetic basis of recent human-induced selection such as pesticide resistance. Strikingly, across the region controlling the red forewing band, a very strong association with phenotype implicates three genes as potentially being involved in control of wing pattern. One of these, a kinesin gene, shows parallel differences in expression levels between divergent forms in the two mimetic species, making it a strong candidate for control of wing pattern. These results show that mimicry involves parallel changes in gene expression and strongly suggest a role for this gene in control of wing pattern.
Genomic Hotspots for Adaptation: The Population Genetics of Mullerian Mimicry in Heliconius erato:
Identifying the genetic changes responsible for beneficial variation is essential for understanding how organisms adapt. Here, we use a combination of mapping, population genetic analysis, and gene expression studies to identify the genomic regions responsible for phenotypic evolution in the Neotropical butterfly Heliconius erato. H. erato, together with its co-mimic H. melpomene, have undergone parallel and concordant radiations in their warningly colored wing patterns across Central and South America. The “genes” underlying the H. erato color pattern radiation are classic examples of Mendelian loci of large effect and are under strong natural selection. Nonetheless, we do not see a clear molecular signal of recent natural selection, suggesting that the H. erato color pattern radiation, or the alleles that underlie it, may be quite old. Moreover, rather than being single locus, the genetic patterns suggest that multiple, widely dispersed loci may underlie pattern variation in H. erato. One of these loci, a kinesin gene, shows parallel expression differences between races during wing pattern formation in both H. erato and H. melpomene, suggesting that it plays an important role in pattern variation. High rates of recombination within naturally occurring H. erato hybrid zones mean that finer genetic dissection will allow us to localize causative sites and better understand the history and molecular basis of this extraordinary adaptive radiation.
Here’s a section from the first paper which I found intriguing:
The results therefore appear to support the ‘shifting balance’ model for the evolution of Heliconius colour pattern races…whereby novel wing patterns arise and spread through otherwise continuous populations behind moving hybrid zones…The ‘Pleistocene refuge’ model seems less likely, as recent contact after extended periods of geographic isolation would presumably have left a stronger signal of genetic differentiation between divergent races, perhaps across the genome but especially more strongly in regions linked to patterning loci…
I have no idea why they necessarily think this validates the shifting balance. You can see David’s critique of the model, but reading Will Provine’s intellectual biography of Sewall Wright it seems that the shifting balance sometimes becomes the evolutionary genetic version of “it’s complicated.”* What they seem to have done here though is refute a simple model of powerful selective sweeps giving rise to these morphs recently. Rather, these seem to be ancient local adaptations, whose frequencies and genetic architectures are perhaps perturbed by long term exogenous (e.g., environment) and endogenous (e.g., complex frequency dependencies) dynamics.
Despite my lack of clarity on a few theoretical issues, I found the papers very interesting, and haven’t really processed them fully.
Citation:
Baxter SW, Nadeau NJ, Maroja LS, Wilkinson P, Counterman BA, et al. 2010 Genomic Hotspots for Adaptation: The Population Genetics of Mullerian Mimicry in the Heliconius melpomene Clade. PLoS Genet 6(2): e1000794. doi:10.1371/journal.pgen.1000794
Counterman BA, Araujo-Perez F, Hines HM, Baxter SW, Morrison CM, et al. 2010 Genomic Hotspots for Adaptation: The Population Genetics of Mullerian Mimicry in Heliconius erato. PLoS Genet 6(2): e1000796. doi:10.1371/journal.pgen.1000796
* I see one reference to epistasis in both papers, and that concept is very important in the shifting balance. Though I assume the LD and supergenes might point to that.
No tags
Volunteers from southern and northern India were comparable in age and sex. The LTT result was abnormal in 88.2% of southern Indians and in 66.2% of northern Indians…The lactose HBT result was abnormal in 78.9% of southern Indians and in 57.1% of northern Indians…The CC genotype was present in 86.8% and 67.5%…the CT genotype was present in 13.2% and 26.0%…and the TT genotype was present in 0% and 6.5%..of southern and northern Indians, respectively. The frequency of symptoms after the lactose load…and peak concentrations of breath hydrogen…both of which might indicate the degree of lactase deficiency, were higher in southern than in northern Indians.
The north Indian samples were from Lucknow on the mid-Gangetic plain, and the south Indian samples from Bangalore. The genetic variant conferring lactase persistence is the Central Asian one, T-1390. You can see the distribution of the genotypes by phenotype in the table to the left. These authors assume that the T allele was brought by the Indo-Aryans; this seems plausible seeing its clinal variation, as well the fact that this variant seems to be common in European and Central Asian populations. The frequency of the T allele in the Lucknow sample was 39%, and 13% in the Bangalore sample. Here are a selection of frequencies for the T allele in other populations:
17% – Saami
13% – Greeks (Athens)
82% – Scandinavians (Stockholm)
6% – Tuscans (Florence)
24% – Russians (Moscow)
73% – English (London)
66% – Basques
10% – Roma (Prague)
56% – Germans (Hamburg)
95% – North Irish (Enniskillen)
1% – Armenian (Yerevan)
5% – Uygur (Beijing)
10% – Mongolian (Beijing)
13% – Indians (Madras)
19% – Indians (New Dehli)
36% – Balochi (Islamabad)
51% – Pathan
You can see more here. This looks like a case of local adaptation.
No tags
John Hawks has some commentary on a Nicholas Wade article which previews a new paper on long term effective population size in humans, soon to be out in PNAS (Wade’s piece states that it’ll be out tomorrow, but it’s PNAS). Wade states:
They put the number at 18,500 people, but this refers only to breeding individuals, the “effective” population. The actual population would have been about three times as large, or 55,500.…
Geneticists have long known that the ancestors of modern humans numbered as few as 10,000 at some time in the last 100,000 years. The critically low number suggested that some catastrophe, like disease or climate change induced by a volcano, had brought humans close to the brink of extinction.
If the new estimate is correct, however, human population size has been small and fairly constant throughout most of the last million years, ruling out the need to look for a catastrophe.
Assuming an average census size on the order of 50,000, it seems as if our species stumbled onto a rather “risky” strategy of avoiding extinction. From what I recall conservation biologists start to worry about random stochastic events (e.g., a virulent disease) driving a species to extinction once its census size reaches 1,000. I suppose the fact that we were spread out over multiple continents would have mitigated the risk, but still…. It also brings me back to my post from yesterday, it seems that for most of human history we are a miserable species on the margins of extinction. For the past 10,000 years we were a miserable species. And now a substantial proportion of us are no long miserable (it seems life is actually much improved from pre-modern Malthusianism outside of Africa and South Asia). If only Leibniz could have seen it!
No tags
9
Localizing recent positive selection in humans using multiple statistics
Comments off · Posted by p-ter in Population genetics
Online this week in Science, a group presents a method for identifying genes under positive selection in humans, and gives some examples. I have somewhat mixed feelings about this paper, for reasons I’ll get to, but here’s their basic idea:
Readers of this site will likely be familiar with genome-wide scans for loci under positive selection in humans (see, eg., the links in this post). In such a scan, one decides on a statistic that measured some aspect of the data that should be different between selected loci and neutral loci–for example, extreme allele frequency differences between populations, or long haplotypes at high frequency–and calculates this statistic across the genome. One then decides on some threshold for deciding a locus is “interesting”, and looks at those loci for patterns–are there genes involved in particular phenotypes among those loci? Or protein-coding changes?
In this paper, the authors note that many of these statistics are measuring different aspects of the data, such that combining them should increase power to distinguish “interesting” loci from non-”interesting” loci. That is, if there’s an allele at 90% frequency in Europeans and 5% frequency in Asians, that’s interesting, but if that allele is surrounded by extensive haplotype structure in one of those populations, that’s even more interesting. The way they combine statistics is pretty straightforward–they essentially just multiply together empirical p-values from different tests as if they were independent. I wouldn’t believe the precise probabilities that come out of this procedure (for one, the statistics aren’t really fully independent), but it seems to work–in both simulations of new mutations that arise and are immediately under selection and in examples of selection signals where the causal variant is known (Figures 1-3)–for ranking SNPs in order of probability of being the causal SNP underlying a selection signal.
With this, the authors have a systematic approach for localizing polymorphisms that have experienced recent selection. It’s necessarily somewhat heuristic, sure, but it does the job. They then want to apply this procedure to gain novel insight into recent human evolution. This is sort of the crux of the matter–does this new method actually give us new biological insight?
The novel biology presented consists of a few examples of selection signals where they now think they’ve identified a plausible mechanism for the selection–a protein-coding change in PCDH15, and regulatory changes near PAWR and USF1 (their Figure 4). On reflection, however, these examples aren’t new. Consider PCDH15–this gene was mentioned in a previous paper by the same group, where they called a protein-coding change in the gene one of the 22 strongest candidates for selection in humans (Table 1 here, and main text). It’s unclear what is gained with the new method (except perhaps to confirm their previous result?).
Or consider the regulatory changes near PAWR and USF1. The authors use available gene expression data to show that SNPs near these genes influence gene expression, and that the signals for selection and the signals for association with gene expression overlap. Early last year, a paper examined in detail the overlap between signals of this sort, and indeed, both of these genes are mentioned as examples where this overlap is observed. So using different methods, a different group published the same conclusion about these genes a year ago. Again, it’s unclear what one gains with this new method.
In general, then, this paper has interesting ideas, but puzzlingly fails to really take advantage of them [1]. That said, they’ve taken some preliminary steps down a path that is very likely to yield interesting results in the future.
—–
[1] I wonder if I’m being too harsh on this paper just because it was published in a “big-name” journal. If this were published in Genetics, for example, I certainly wouldn’t be opining about whether or not it contains any novel biology.
—–
Citation: Grossman et al. (2010) A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection. Science. DOI: 10.1126/science.1183863
No tags
Different Evolutionary Histories of the Coagulation Factor VII Gene in Human Populations?:
Immoderate blood clotting constitutes a risk factor for cardiovascular disease in modern industrialised societies, but is believed to have conferred a survival advantage, i.e. faster recovery from bleeding, on our ancestors. Here, we investigate the evolutionary history of the Coagulation Factor VII gene (F7) by analysing five cardiovascular-risk-associated mutations from the F7 promoter and nine neutral polymorphisms (six SNPs and three microsatellites) from the flanking region in 16 populations from the broader Mediterranean region, South Saharan Africa and Bolivia (687 individuals in total). Population differentiation and selection tests were performed and linkage disequilibrium patterns were investigated. In all samples, no linkage disequilibrium between adjacent F7 promoter mutations −402 and −401 was observed. No selection signals were detected in any of the samples from the broader Mediterranean region and South Saharan Africa, while some of the data suggested a potential signal of positive selection for the F7 promoter in the Native American samples from Bolivia. In conclusion, our data suggest, although do not prove, different evolutionary histories in the F7 promoter region between Mediterraneans and Amerindians.
The primary aim of this research seems to have been to figure out if the variance in a medical trait (prevalence in cardiovascular disease) could be traced to variance in this coagulation factor gene. Doesn’t seem like that panned out. But their “Native American” sample happened to consist of Bolivian highlanders, Quechua and Aymara speakers. There are long haplotypes amongst these populations for the variant which seems result in increased risk for cardiovascular disease. I don’t know much about physiology, but I immediately wondered if modulating traits which effect hematological system might have nasty side-effects. The populations of the Andes of course have developed some genetic tricks to optimize their functioning at high altitudes, bt tricks often have trade-offs. Of course this doesn’t necessarily mean it’s selection which drove up the frequency of the variant in question. Native populations of the New World seem to have gone through a population bottleneck, which can generate some of the same patterns. But there are enough non-highland groups whereby one could check to see if they have the high risk variant and a long haplotype as well.
No tags
Methodology/Principal Findings
Using a combination of genetic/genomic and bioinformatics approaches, we identified a large number of genes that were both differentially expressed between American subjects self-identified to be of either African or European ancestry and that also contained single nucleotide polymorphisms that distinguish distantly related ancestral populations. Several of these genes control the metabolism of simple carbohydrates and are direct targets for the SREBP1, a metabolic transcription factor also differentially expressed between our study populations.Conclusions/Significance
These data support the concept of stable patterns of gene transcription unique to a geographic ancestral lineage. Differences in expression of several carbohydrate metabolism genes suggest both genetic and transcriptional mechanisms contribute to these patterns and may play a role in exacerbating the disproportionate levels of obesity, diabetes, and cardiovascular disease observed in Americans with African ancestry.
Figure 2 had me thinking of Me, Myself & Irene.
No tags
25
GWAS, population structure and the Han Chinese
Comments off · Posted by Razib in Chinese, Genetics, Han, Population genetics
Two new articles in AJHG, Genomic Dissection of Population Substructure of Han Chinese and Its Implication in Association Studies:
To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at ∼160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (FST = 0.0002 ∼0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (FST > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10−101). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.
And, Genetic Structure of the Han Chinese Population Revealed by Genome-wide SNP Variation:
Population stratification is a potential problem for genome-wide association studies (GWAS), confounding results and causing spurious associations. Hence, understanding how allele frequencies vary across geographic regions or among subpopulations is an important prelude to analyzing GWAS data. Using over 350,000 genome-wide autosomal SNPs in over 6000 Han Chinese samples from ten provinces of China, our study revealed a one-dimensional “north-south” population structure and a close correlation between geography and the genetic structure of the Han Chinese. The north-south population structure is consistent with the historical migration pattern of the Han Chinese population. Metropolitan cities in China were, however, more diffused “outliers,” probably because of the impact of modern migration of peoples. At a very local scale within the Guangdong province, we observed evidence of population structure among dialect groups, probably on account of endogamy within these dialects. Via simulation, we show that empirical levels of population structure observed across modern China can cause spurious associations in GWAS if not properly handled. In the Han Chinese, geographic matching is a good proxy for genetic matching, particularly in validation and candidate-gene studies in which population stratification cannot be directly accessed and accounted for because of the lack of genome-wide data, with the exception of the metropolitan cities, where geographical location is no longer a good indicator of ancestral origin. Our findings are important for designing GWAS in the Chinese population, an activity that is expected to intensify greatly in the near future.
No tags

