Razib Khan One-stop-shopping for all of my content

September 7, 2012

Across the sea of grass: how Northern Europeans got to be ~10% Northeast Asian

The Pith: You’re Asian. Yes, you!

A conclusion to an important paper, Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich:

In particular, we have presented evidence suggesting that the genetic history of Europe from around 5000 B.C. includes:

1. The arrival of Neolithic farmers probably from the Middle East.

2. Nearly complete replacement of the indigenous Mesolithic southern European populations by Neolithic migrants, and admixture between the Neolithic farmers and the indigenous Europeans in the north.

3. Substantial population movement into Spain occurring around the same time as the archaeologically attested Bell-Beaker phenomenon (HARRISON, 1980).

4. Subsequent mating between peoples of neighboring regions, resulting in isolation-by-distance (LAO et al., 2008; NOVEMBRE et al., 2008). This tended to smooth out population structure that existed 4,000 years ago.

Further, the populations of Sardinia and the Basque country today have been substantially less influenced by these events.

 

It’s in Genetics, Ancient Admixture in Human History. Reading through it I can see why it wasn’t published in Nature or Science: methods are of the essence. The authors review five population genetic statistics of phylogenetic and evolutionary genetic import, before moving onto the novel results. ...

April 19, 2011

Europeans as Middle Eastern farmers

ResearchBlogging.orgThe Pith: Over the past 10,000 years a small coterie of farming populations expanded rapidly and replaced hunter-gatherer groups which were once dominant across the landscape. So, the vast majority of the ancestry of modern Europeans can be traced back to farming cultures of the eastern Mediterranean which swept over the west of Eurasia between 10 and 5 thousand years before the before.

Dienekes Pontikos points me to a new paper in PNAS which uses a coalescent model of 400+ mitochondrial DNA lineages to infer the pattern of expansions of populations over the past ~40,000 years. Remember that mtDNA is passed just through the maternal lineage. That means it is not subject to the confounding dynamic of recombination, allowing for easier modeling as a phylogenetic tree. Unlike the autosomal genome there’s no reticulation. Additionally, mtDNA tends to be highly mutable, and many regions have been presumed to be selectively neutral. So they are the perfect molecular clock. There straightforward drawback is that the history of one’s foremothers may not be a good representative of the history of one’s ...

December 14, 2010

Re-visualizing European ancestry

I decided to take the Dodecad ADMIXTURE results at K = 10, and redo some of the bar plots, as well as some scatter plots relating the different ancestral components by population. Don’t try to pick out fine-grained details, see what jumps out in a gestalt fashion. I removed most of the non-European populations to focus on Western Europeans, with a few outgroups for reference.

Here’s a table of the correlations (I bolded the ones I thought were interesting):


W Asian NW African S Europe NE Asian SW Asian E Asian N European W African E African S Asian
W Asian * -0.01 -0.18 0.04 0.81 0.59 -0.64 0.39 0.2 0.04
NW African * * 0.19 -0.16 0.23 -0.09 -0.19 0.26 0.67 -0.11
S European * * * -0.38 -0.03 -0.27 -0.42 -0.11 -0.02 -0.36
NE Asian * * * * -0.06 0.5 0.26 -0.04 -0.1 -0.07
SW Asian * * * * * 0.21 -0.62 0.74 0.59 -0.13
E Asian * * * * * * -0.27 0.08 0 0.14
N European * * * * * * * -0.34 -0.28 -0.31
W African * * * * * * * * 0.86 -0.04
E African * * * * * * * * * -0.07

dodenorthdodsouthdodswasiandodwestscatternorthwestscattersouthnorthscattersouthwestscatterwestasiansouthwest

December 13, 2010

Live not by visualization alone

pc1
Synthetic map

In the age of 500,000 SNP studies of genetic variation across dozens of populations obviously we’re a bit beyond lists of ABO blood frequencies. There’s no real way that a conventional human is going to be able to discern patterns of correlated allele frequency variations which point to between population genetic differences on this scale of marker density. So you rely on techniques which extract the general patterns out of the data, and present them to you in a human-comprehensible format. But, there’s an unfortunate tendency for humans to imbue the products of technique with a particular authority which they always should not have.

ResearchBlogging.orgThe History and Geography of Human Genes is arguably the most important historical genetics work of the past generation. It has surely influenced many within the field of genetics, and because of its voluminous elegant visual displays of genetic data it is also a primary source for those outside of genetics to make sense of phylogenetic relations between human populations. And yet one aspect of this great work which never caught on was the utilization of “synthetic maps” to visualize components of genetic variation between populations. This may have been fortuitous, a few years ago a paper was published, Interpreting principal components analyses of spatial population genetic variation, which suggested that the gradients you see on the map above may be artifacts:

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.’s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

A paper earlier this year took the earlier work further and used a series of simulations to show how the nature of the gradients varied. In light of recent preoccupations the results are of interest. Principal Component Analysis under Population Genetic Models of Range Expansion and Admixture:

In a series of highly influential publications, Cavalli-Sforza and colleagues used principal component (PC) analysis to produce maps depicting how human genetic diversity varies across geographic space. Within Europe, the first axis of variation (PC1) was interpreted as evidence for the demic diffusion model of agriculture, in which farmers expanded from the Near East ∼10,000 years ago and replaced the resident hunter-gatherer populations with little or no interbreeding. These interpretations of the PC maps have been recently questioned as the original results can be reproduced under models of spatially covarying allele frequencies without any expansion. Here, we study PC maps for data simulated under models of range expansion and admixture. Our simulations include a spatially realistic model of Neolithic farmer expansion and assume various levels of interbreeding between farmer and resident hunter-gatherer populations. An important result is that under a broad range of conditions, the gradients in PC1 maps are oriented along a direction perpendicular to the axis of the expansion, rather than along the same axis as the expansion. We propose that this surprising pattern is an outcome of the “allele surfing” phenomenon, which creates sectors of high allele-frequency differentiation that align perpendicular to the direction of the expansion.

The first figure shows the general framework with which they performed the simulations:

pcab1

You have a lattice which consists of demes, population units, all across Europe. They modulated parameters such as population growth (r), carrying capacity (C), and migration (m). Additionally, they had various scenarios of expansion from the southwest or southeast, as well as two expansions one after another to mimic the re-population of Europe after the Ice Age by Paleolithic groups, and their later replacement by Neolithic groups. They modulated admixture and introgression of genes from the Paleolithic group to the Neolithics so that you had the full range where the final European were mostly Neolithic or mostly Paleolithic.

Below are some of the figures which show the results:

allesurAs you can see the strange thing is that in some models the synthetic map gradient is rotated 90 degrees from the axis of demographic expansion! In this telling the famous synthetic map showing Neolithic expansion might be showing expansion from Iberia. Perhaps a radiation from a post-Ice Age southern refuge?

One explanation might be “allele surfing” on the demographic “wave of advance.” Basically as a population expands very rapidly stochastic forces such as random genetic drift and bottlenecks could produce diversification along the edge of the population wave front. The reason for this is that these rapidly expanding populations explode out of serial bottlenecks and demographic expansions, which will produce genetic distinctiveness among the many differentiated demes bubbling along the edge of expansion. Alleles which may have been at low frequency in the ancestral population can “fix” in descendant populations on the edge of the demographic wave of advance. This is the explanation, more or less, that one group gave last year for the very high frequencies of R1b1b2 in Western Europe. With this, they overturned the classic assumption that R1b1b2 was a Paleolithic marker, and suggested it was a Neolithic one.

Here’s their conclusion from the paper:

A previous study showed that the original patterns observed in PCA might not reflect any expansion events (Novembre and Stephens 2008). Here, we find that under very general conditions, the pattern of molecular diversity produced by an expansion may be different than what was expected in the literature. In particular, we find conditions where an expansion of Neolithic farmers from the southeast produces a greatest axis of differentiation running from the southwest to the northeast. This surprising result is seemingly due to allele surfing leading to sectors that create differentiation perpendicular to the expansion axis. Although a lot of our results can be explained by the surfing phenomenon, some interesting questions remain open. For example, the phase transition observed for relatively small admixture rates between Paleolithic resident and Neolithic migrant populations occurs at a value that is dependent on our simulation settings, and further investigations would be needed to better characterize this critical value as a function of all the model parameters. Another unsolved question is to know why the patterns generally observed in PC2 maps for our simulation settings sometimes arise in PC1 maps instead. These unexplained examples remind us that PCA is summarizing patterns of variation in the sample due to multiple factors (ancestral expansions and admixture, ongoing limited migration, habitat boundary effects, and the spatial distribution of samples). In complex models such as our expansion models with admixture in Europe, it may be difficult to tease apart what processes give rise to any particular PCA pattern. Our study emphasizes that PC (and AM) should be viewed as tools for exploring the data but that the reverse process of interpreting PC and AM maps in terms of past routes of migration remains a complicated exercise. Additional analyses—with more explicit demographic models—are more than ever essential to discriminate between multiple explanations available for the patterns observed in PC and AM maps. We speculate that methods exploiting the signature of alleles that have undergone surfing may be a powerful approach to study range expansions.

What’s the big picture here? In the textbook Human Evolutionary Genetics it is asserted that synthetic maps never became very popular compared to PCA itself. I think this is correct. But, the original synthetic maps have become prominent for many outside of genetics. They figure in Peter Bellwood’s First Farmers, and are taken as a given by many pre-historians, such as Colin Renfrew. And yet a reliance on these sorts of tools must not be blind to the reality that the more layers of abstraction you put between your perception and comprehension of concrete reality, the more likely you are to be led astray by quirks and biases of method.

In this case I do think first-order intuition would tell us that synthetic maps which display PCs would be showing gradients as a function of demographic pulses. And yet the intuition may not be right, and with the overturning of old orthodoxies in the past generation of inferences from the variation patterns in modern populations, we should be very cautious.

Citation: Olivier François, Mathias Currat, Nicolas Ray, Eunjung Han, Laurent Excoffier, & John Novembre (2010). Principal Component Analysis under Population Genetic
Models of Range Expansion and Admixture Mol Biol Evol

July 15, 2010

Really fine grained genetic maps of Europe

Filed under: European genetics,European Genomics,Genomics,History,Inbreeding — Razib Khan @ 12:41 am

genmap1A few years ago you started seeing the crest of studies which basically took several hundred individuals (or thousands) from a range of locations, and then extracted out the two largest components of genetic variation from the hundreds of thousands of  variants. The clusters which fell out of the genetic data, with each point being an individual’s position, were transposed onto a geographical map. The figure to the left (from this paper)   has been widely circulated. You don’t have to be a deep thinker to understand why things shake out this way; people are more closely related to those near than those far because gene flow ties populations together, and its power decreases as a function of distance.

Of course the world isn’t flat, and history perturbs regularities. Jews for example often don’t shake out where they “should” geographically, because of their historical mobility contingent upon random and often capricious geopolitical or social pressures. The Hazara of Afghanistan have their ethnogenesis in the melange of peoples who were thrown together after the Mongol conquest of Central Asia and Iran in the 13th century, and the subsequent collapse of the Ilkhan dynasty. Though the Hazara have mixed with their Persian, Tajik and Pashtun neighbors, they still retain a strong stamp of Mongolian ancestry which means that they are at some remove on the “genetic map” from their geographical neighbors.


So when interpreting these sorts of results you have two extreme dynamics operative. On the one hand you have an equilibrium state where gene flow is mediated through continuous but small flows of migration; women moving between villages, younger sons venturing out of the village in search of better opportunities. Then you have the random (or perhaps modeled as a poisson distribution) “shocks” which are attributed to world-historical (or region-historical) events which leave an outsized and often perplexing stamp and distort the genetic map from the geographic one. Sometimes the two are not in balance. In much of the New World and Australasia the native populations were genetically replaced by settlers from the outside. Thousands of years of genetic variation accumulated and shaped by localized gene flow events were wiped clean off the map by the demographic tsunami.

Obviously that’s an extreme scenario. The macroscale does not always render the microscale irrelevant in such a fashion. A new short paper in The European Journal of Human Genetics gives us an example. Genes predict village of origin in rural Europe:

The genetic structure of human populations is important in population genetics, forensics and medicine. Using genome-wide scans and individuals with all four grandparents born in the same settlement, we here demonstrate remarkable geographical structure across 8–30 km in three different parts of rural Europe. After excluding close kin and inbreeding, village of origin could still be predicted correctly on the basis of genetic data for 89–100% of individuals.

Here’s the ubiquitous PC chart, except on the scale of villages:

village1

As noted above they excluded close relatives, out to second cousins. They judge the genetic time depth is about ~120 years into the past back to the common ancestry. Remember that if their grandparents are from this village they obviously are going to be somewhat inbred, from the perspective of an American whose ancestors are from different nations. But for most of history the European case was the typical one, not the American one where people from different continents mingled.

Here’s part of the discussion which I think needs highlighting:

To explore how many markers are required to recover these fine scale patterns of structure, we ranked SNPs by FST among villages and repeated the PCA for the most differentiated subsets of 30 000, 10 000, 3000 and 300 SNPs in each population. In all three populations, 10 000 or more high FST SNPs recovered an essentially identical picture to that using the full data set, and even 3000 SNPs preserved considerable separation between the villages (not shown). Using only the most discriminating 300 SNPs, little structure could be observed between the two Croatian villages; however, in Scotland and Italy one of the three settlements included in each location remained completely differentiated from the other two (not shown). We note that these results are only indicative of the minimum number of SNPs required to separate these populations, as by necessity SNPs have been selected intrinsically on the basis of FST within the same data set, rather than extrinsically from other data.

The slightly lower differentiation of the Croatian villages is not surprising given the fact that they are physically the closest of those considered here, being 8 km apart, with only low hills separating them. In contrast, the settlements in the Scottish Isles and Italy are separated by 15–30 km of sea in the former case, and of 3000 m mountains in the latter, although there are deep connecting valleys.

First, we get a sense of the range of informative markers necessary to discern population structure well in much of the Old World. For continental races (e.g., Europeans vs. East Asians) you need on the order of 10-100 markers to distinguish them with a high degree of confidence (closer to the low bound than the high). It looks like in the case of village vs. village differences, it will be on the order of 100-1000 markers. I suspect in Iraq or the Caucasus you’ll need less than 300 markers, because genetic differentiation is higher over a shorter distance due to inbreeding, ethnic diversity, and geography (more the former in Iraq, more the latter in the Caucasus). In contrast, in regions where geography is conducive to transport and local norms enforce exogamy  I wouldn’t be surprised if you need more like a thousand markers.

Second, observe the importance of topographical detail. I have observed before than Sardinia is a genetic outlier in Europe. That’s not because Sardinians interbred with native elves of that island. Rather, a water barrier serves as a major check on continuous gene flow mediated by banal contacts (e.g., going to the market and meeting a person from the neighboring village). Islands become worlds unto themselves. Though they are effected by the exogenous shocks, they are less subject to the continuous gene flow at the equilibrium because the water serves as a barrier. Similarly mountains can produce genetic barriers as well, because they make travel rather difficult. In Consanguinity, Inbreeding, and Genetic Drift in Italy L. L. Cavalli-Sforza documents in detail through Roman Catholic Church records what a big impact modern roads had on inbreeding coefficients, which plunged in the 19th century. Distortions of the genetic map tells about variations in elevation in the third dimension on the geographic map!

The utility of this sort of data collection and analysis in the modern world is an empirical question. On the one hand many Europeans are relatively less inclined to move in comparison to Americans. And yet the breaking down of borders with the European Union and the likely need for a more productive economic sector on that continent because of changing demographics point to greater mobility, migration and mixing, which would make these sorts of studies of only near-term use. Of more interest to me are going to be fine-grained analyses of social groups. For example the Indian caste system. Last fall in the Reich et al. paper the authors seemed to be indicating the likelihood of a lot of between population variance groups these groups. It doesn’t matter if a particular Bania sub-caste from Gujarat is scattered across the world, from Kenya to England to the United States. They may all still marry amongst a set of individuals who hale from the same original few villages.

Good times.

Citation: O’Dushlaine, C., McQuillan, R., Weale, M., Crouch, D., Johansson, Aulchenko, Y., Franklin, C., Polašek, O., Fuchsberger, C., Corvin, A., Hicks, A., Vitart, V., Hayward, C., Wild, S., Meitinger, T., van Duijn, C., Gyllensten, U., Wright, A., Campbell, H., Pramstaller, P., Rudan, I., & Wilson, J. (2010). Genes predict village of origin in rural Europe European Journal of Human Genetics DOI: 10.1038/ejhg.2010.92

April 30, 2010

European man perhaps not a Middle Eastern farmer

Filed under: Anthroplogy,European genetics,Genetics,science — Razib Khan @ 12:20 am

A few months ago I blogged a paper in PLoS Biology which suggested that a common Y chromosomal haplogroup, in fact the most common in Europe and at modal frequency along the Atlantic fringe, is not pre-Neolithic. Rather their analysis of the data implied that the European variants were derived from an Anatolian variant. The implication was that a haplogroup which had previously been diagnostic of “Paleolithicness,” so to speak, of a particular population may in fact be an indication of the proportion of Neolithic Middle Eastern ancestry. The most interesting case were the Basques, who have a high frequency of this haplogroup, and are often conceived of as “ur-Europeans,” Paleolithic descendants of the Cro-Magnons in the most romantic tellings. I was somewhat primed to accept this finding because of confusing results from ancient DNA extraction which implies a lot of turnover in maternal lineages, the mtDNA. My logic being that if the mtDNA exhibited rupture, then the Y lineages should too, as demographic revolutions are more likely to occur among men.

But perhaps not. A new paper in PLoS ONE takes full aim at the paper I blogged above. It is in short a purported refutation of the main finding of the previous paper, and a reinstatement of what had been the orthodoxy (note the citations to previous papers). A Comparison of Y-Chromosome Variation in Sardinia and Anatolia Is More Consistent with Cultural Rather than Demic Diffusion of Agriculture:

Two alternative models have been proposed to explain the spread of agriculture in Europe during the Neolithic period. The demic diffusion model postulates the spreading of farmers from the Middle East along a Southeast to Northeast axis. Conversely, the cultural diffusion model assumes transmission of agricultural techniques without substantial movements of people. Support for the demic model derives largely from the observation of frequency gradients among some genetic variants, in particular haplogroups defined by single nucleotide polymorphisms (SNPs) in the Y-chromosome. A recent network analysis of the R-M269 Y chromosome lineage has purportedly corroborated Neolithic expansion from Anatolia, the site of diffusion of agriculture. However, the data are still controversial and the analyses so far performed are prone to a number of biases. In the present study we show that the addition of a single marker, DYSA7.2, dramatically changes the shape of the R-M269 network into a topology showing a clear Western-Eastern dichotomy not consistent with a radial diffusion of people from the Middle East. We have also assessed other Y-chromosome haplogroups proposed to be markers of the Neolithic diffusion of farmers and compared their intra-lineage variation—defined by short tandem repeats (STRs)—in Anatolia and in Sardinia, the only Western population where these lineages are present at appreciable frequencies and where there is substantial archaeological and genetic evidence of pre-Neolithic human occupation. The data indicate that Sardinia does not contain a subset of the variability present in Anatolia and that the shared variability between these populations is best explained by an earlier, pre-Neolithic dispersal of haplogroups from a common ancestral gene pool. Overall, these results are consistent with the cultural diffusion and do not support the demic model of agriculture diffusion.

Their main trump cards seem to be that they used a denser set of markers, and, they claim they have a more accurate molecular clock. Ergo, in the latter case they produce a better time to the last common ancestor, which is twice as deep as the paper they’re attempting to refute. Someone like Dienekes or Polish Genetics can tackle the controversies in scientific genealogy here (I know Dienekes has a lot of interest in mutational rates which go into the molecular clock for these coalescence times). Rather, I would suggest that usage of Sardinians concerns me for an obvious reason: they’re genetic outliers in Europe. A lot of this has to do with being an island. Islands build up uniqueness because they don’t engage in the normal low level gene flow between adjacent populations because they’re…well, islands. You would know about Sardinia’s position because they’re one of the populations in L. L. Cavalli-Sforza’s HGDP sample and they show up in History & Geography of Human Genes as on the margins of the PCA plots. But here’s a figure from a more recent paper using a much denser market set, constrained to Southern European populations. I labelled some of the main ones so you’d get a sense of why I say Sardinians are outliers:
sardin

Over the two largest independent dimensions of genetic variation you can see a distribution from the southeast Mediterranean all the way to the northwest (in fact, the Basques are an Atlantic group). The Sardinians are out of the primary axis, and that’s why I say they’re an outlier. A few other European groups, like the Icelanders and Sami exhibit this tendency. As I suggested above I think the fact that the Sardinians are on an isolated island relatively far from the European and Africa mainland means that they’ll “random walk” in genetic variation space toward an outlier status naturally, just as the Icelanders have since the year 1000. So though I grant the authors their rationale for using the Sardinians as a reference against the Anatolian source population, the fact that we know that they’re peculiar in their variation in total genome content makes me wary of drawing too many inferences from their relationships to other groups where they are seen as representative of a larger set.

Citation: Morelli L, Contu D, Santoni F, Whalen MB, & Francalacci P (2010). A Comparison of Y-Chromosome Variation in Sardinia and Anatolia Is More Consistent with Cultural Rather than Demic Diffusion of Agriculture PLoS ONE : 10.1371/journal.pone.0010419

January 19, 2010

Where are the “Paleolithic Europeans”?

Filed under: Archaeology,European genetics,Finn baiting — Razib @ 1:22 pm

Over at my other blog I have a review up of a new paper in PLoS Biology. The authors argue that a particular Y haplogroup lineage, R1b1b2, which has often been assumed to be a marker of indigenous Paleolithic Europeans (i.e., those who were extant before the rise of agriculture and the spread of farmers), is actually a signature of Anatolians who brough agriculture. This probably isn’t too surprising for the genetic genealogy nuts among the readers. After I got a copy of this paper I poked around the internet and the general finding that R1b1b2 was very diverse in the eastern Mediterranean seems to have been well known among the genetic genealogy community (also see Anatole Klyosov’s paper and what he says about Basques specifically). And then in eastern Europe you have R1a1, which seems to have also undergone recent range expansion. Finally, there are the recent rumblings out of ancient DNA extraction which imply a lot of turnover of mtDNA lineages during the shift from hunter-gathering to agriculture.

I think this makes us reconsider the idea that most of the ancestors of contemporary citizens of the European Union who were alive 10,000 years ago were actually resident within the current borders of the European Union. But let’s put the details of that aside for a moment. Which group might be most representative of Paleolithic Europeans? If the paper above is correct, the Basques are not a good proxy for the ancient hunter-gatherers of Europe.

Let’s look at a map which illustrates the spread of agriculture. I’d always focused on the SE-NW cline, but if the U5 mtDNA haplogroup is a reasonable marker of ancient pre-agricultural Europeans, we need to look at the Finnic peoples of the northeast. This may explain why these populations also tend to be genetically distinct from other European groups; not because they’re an exotic admixture, but because they’re not. Anyway, simply speculation, I’m sure readers will have their opinions….

Powered by WordPress