Razib Khan One-stop-shopping for all of my content

July 15, 2018

India vs. China, genetically diverse vs. homogeneous

Filed under: China,China genetics,Human Population Genetics,India,India Genetics — Razib Khan @ 1:50 pm

About 36% of the world’s population are citizens of the Peoples’ Republic of China and the Republic of India. Including the other nations of South Asia (Pakistan, Bangladesh, etc.), 43% of the population lives in China and/or South Asia.

But, as David Reich mentions in Who We Are and How We Got Here China is dominated by one ethnicity, the Han, while India is a constellation of ethnicities. And this is reflected in the genetics. The relatively diversity of India stands in contrast to the homogeneity of China.

At the current time, the best research on population genetic variation within China is probably the preprint A comprehensive map of genetic variation in the world’s largest ethnic group – Han Chinese. The author used low-coverage sequencing of over 10,000 women to get a huge sample size of variation all across China. The PCA analysis recapitulated earlier work. Genetic relatedness among the Han of China is geographically structured. The largest component of variance is north-south, but a smaller component is also east-west. The north-south element explains more than 4.5 times the variance as the east-west.

Click to enlarge

Another dimension of the of the variation is that different parts of China are character by different levels of admixture between the Han and other groups. In Northwest China, there is gene flow from West Eurasian sources. In all likelihood, this is through proxy populations, such as Mongols, who are about ~10% West Eurasian. Also, during the period between the fall of the Han Dynasty and the rise of the Sui-Tang Dynasty much of northern China was dominated by barbarian groups from the steppe, and these groups settled down and were absorbed. In Northeast China, the source of admixture is from Siberian and Tungusic group. Again, this makes geographical sense.

In contrast in South China, the gene flow is from indigenous Chinese national groups, such as Dai. This is in keeping with the historical record, whereby South China became Han in the period between 0 and 1000 AD through migration, intermarriage, and acculturation.

Click to enlarge

I have my own small private dataset of Chinese individuals. Some with provenance. Some without. But using known populations I was able to divide China along the north to south cline.  Individuals from Guangdong in the south, those from Shaanxi in the north, and from Zhejiang to Sichuan in the center.

Using Punjabis as a West Eurasian outgroup I was able to plot these individuals on a PCA. If you click to enlarge you will see that a substantial minority of the Han_N sample is shifted to the left of the plot. This is toward the Punjabis. This is not because they have Punjabi ancestry, but because Punjabis are reasonable proxies for West Eurasians.

Click to enlarge

More importantly, I want to compare South Asia to China. To do that I created a small dataset that merged the Han with representative South Asian groups. The first PC, 1 and 2, illustrate the contrast. All three Chinese groups, sampled from the north to the south, occupy a very tight cluster, while the South Asians span PC 2. The Bengalis are shifted a bit to the Chinese, but most of the variance is due to within-South Asian genetic differences.

Click to enlarge

I ran PCA to 10 dimensions. Only at PC 10 did the Han Chinese separate along the north-south access. Most of the earlier PC’s separated out specific castes (e.g, Patels because if their large number in the Gujurati sample were PC 3). Here are the eigenvalues: 53.0682, 2.5641, 2.31876
1.97058, 1.90652, 1.88879, 1.7935, 1.69375, 1.61516, and 1.54207. The large value for PC 1 is what you’d expect, it’s a continental scale difference. PC 2 differentiates South Asia from north to south. It’s much more modest. The other PCs get progressively smaller, but within the data, it’s clear that the continental size difference is the big one. The variance between north and south China is a small one in a South Asian scale.

Click to enlarge

Pairwise Fst is more ambiguous. That’s probably because most of the South Asian samples have structure within them. Merging them into one pooled population just confuses the issue.

Using a South Asian dataset where groups are disaggregated makes a lot more sense, and you see the structure between the different groups.

Click to enlarge

Running Treemix gives similar results. The South Asian groups exhibit a fan-shaped topology, where the Han cluster tightly together. Since I removed Bengalis from Treemix adding migration edges doesn’t do anything between the two clusters, so I omitted those results.

Click to enlarge

Finally, of course I ran some admixture analysis. Using South Asians + Han Chinese, I thought K = 4 would be reasonable. Even if you don’t enlarge, the results are straightforward: the Han Chinese have very little diversity in unsupervised mode. A small South Asian-like component, which has affinities with Punjabis, is found in northern Han. This confirms other results with other methods that the northern Han have some West Eurasian gene flow.  Some of the southern and central Han have an affinity with one of the South Indian clusters. I think is artifactual, due to deep structure within Eastern Eurasian populations and affinities between those groups that the Han absorbed as they moved south.

This post doesn’t really shed new light on anything we didn’t know. Rather, it’s just a review of what jumps out at anyone who works with genotype data: there is not very much genetic diversity in China and there is a great deal of genetic diversity in India. Why? These are not questions genetics can really answer directly, though it can give us clues and support certain models over others.

Anyone who has read much about Chinese history knows that the cultural ideal of meritocracy is deeply ingrained, even if it is honored in the breach quite often. Chinese civilizations has been characterized by the domination of extended pedigrees (e.g., the Xianbei-Han ruling faction among the Tang), but those pedigrees never become ethno-religious castes. The exception occurred during the Yuan (Mongol) period where Kublai Khan entered into a divide-and-rule policy. But that was a short period which had no longer term cultural consequences.

In contrast, South Asia is characterized by long-term endogamy. This is not surprising to anyone who knows anything about South Asian history. The genetic evidence suggests that modern jati-barriers emerged around ~2,000 years ago. Not only do South Asian groups differ a great deal in biogeographic ancestry (deep ancestry), but historical endogamy has resulted in further drift between these groups.

July 4, 2018

What Neanderthals tells us about modern humans

Filed under: Human Population Genetics,Neanderthals — Razib Khan @ 5:35 pm

In Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past David Reich spends a fair amount of time on Neanderthal admixture into modern human lineages. Reich details exactly the process of how his team arrived to analyze the data that Svante Paabo’s group had produced, and how they replicated some peculiar patterns. In short, eventually, they concluded that modern humans outside of Africa have Neanderthal ancestry, because the Neanderthal genome that Paabo’s group had recovered happened to be subtly, but distinctively, closer to all non-Africans than to Africans. At the time, the group reported that Neanderthal ancestry was relatively evenly spread across non-African populations, which lead them to suggest that it was likely a singular admixture event early on during the expansion phase of modern humans.

Nearly a decade things have changed. There is a consistent pattern of West Eurasians having less Neanderthal ancestry than East Eurasians. That is, Europeans have lower Neanderthal ancestry fractions than Chinese (South Asians are in between, in direct proportion to their West Eurasian ancestral quantum). There have been a variety of arguments and explanations for why this might be, which fall into two classes:

  1. Neanderthal ancestry was purged more efficiently from West Eurasians due to larger effective population sizes (selection is stronger in large populations).
  2. There may have been multiple admixture events into modern humans, or, gene-flow into West Eurasians diluting their Neanderthal ancestry.

But what if all these arguments are mostly wrong? That’s what a new preprint seems to suggest: The limits of long-term selection against Neandertal introgression:

Several studies have suggested that introgressed Neandertal DNA was subjected to negative selection in modern humans due to deleterious alleles that had accumulated in the Neandertals after they split from the modern human lineage. A striking observation in support of this is an apparent monotonic decline in Neandertal ancestry observed in modern humans in Europe over the past 45 thousand years. Here we show that this apparent decline is an artifact caused by gene flow between West Eurasians and Africans, which is not taken into account by statistics previously used to estimate Neandertal ancestry. When applying a more robust statistic that takes advantage of two high-coverage Neandertal genomes, we find no evidence for a change in Neandertal ancestry in Western Europe over the past 45 thousand years. We use whole-genome simulations of selection and introgression to investigate a wide range of model parameters, and find that negative selection is not expected to cause a significant long- term decline in genome-wide Neandertal ancestry. Nevertheless, these models recapitulate previously observed signals of selection against Neandertal alleles, in particular a depletion of Neandertal ancestry in conserved genomic regions that are likely to be of functional importance. Thus, we find that negative selection against Neandertal ancestry has not played as strong a role in recent human evolution as had previously been assumed.

The basic argument in the preprint is that the model assumed for the ancestry of West Eurasians and Africans was wrong. Wrong assumptions can lead to wrong inferences. Using two Neanderthal genomes which are from different populations, one of whom directly contributed to the Neanderthal ancestry in modern humans, a new statistic which was insensitive to model assumptions about modern human phylogeny was computed.

The older statistic held that West Eurasians and Africans were distinct clades which had not had gene flow in ~50,000 years. Using simulations the authors argue that the best fit to the statistics that they do see, the earlier flawed one, and the current more robust one, is a situation where a population of West Eurasian origin mixed with Africans starting about ~20,000 years ago.

This explains why there was a consistent decline in Neanderthal ancestry: the earlier statistic’s model assumption got worse and worse over time, and so began to underestimate Neanderthal ancestry more and more. There was continuous gene flow into Africa over the past 20,000 years.

Not everything that came before is wrong. It could still be that there are multiple admixtures. And, the authors do agree that some selection for Neanderthal alleles has occurred. It’s just that it’s not the primary reason for the decline of Neanderthal ancestry in West Eurasians.

As for the other explanation, that Neanderthal-less Basal Eurasian ancestry diluted the European hunter-gatherer fractions, the authors seem very skeptical of that. One point the authors make is that though an early European farmer was estimated to have ~40% Basal Eurasian, its Neanderthal estimate is still quite high. Iosif Lazaridis points out that this is an old estimate, and the Reich group now puts it closer to ~25%. Additionally, another recent preprint put the fraction closer to ~10%. With such low values, it is possible that Basal Eurasians may have had low Neanderthal fractions, but that that was a marginal effect on the aggregate West Eurasian ancestry quantum from Neanderthals.

I think the bigger thing to consider is that our understanding of the relationships of modern humans is roughly right, but there are lots of nuanced details we’re missing or misunderstanding. Ancient DNA from South Africa, for example, shows that modern Bushmen all seem to have exotic ancestry compared to samples from 2,000 years ago. But what about samples from 20,000 years ago?

We have the best temporal transect from Ice Age Europe, and in this region, there are many population turnovers and admixtures. It seems implausible that Europe is entirely exceptional. The West Eurasian gene flow event dated to ~20,000 years ago is curiously coincidental with the beginning of the recession of the Last Glacial Maximum. To get a better understanding of the relationships of Pleistocene people looking at paleoclimate data is probably useful. The ancient DNA will come online at some point…and unless you think ahead, we’re going to be surprised.

June 29, 2018

Human genomics will uncover a lot of treasure in Southeast Asia

Filed under: Human Population Genetics,Negritos — Razib Khan @ 12:25 am


On this week’s podcast on “Isolated Populations” I mentioned offhand to Spencer that I believe it is a bit ridiculous to bracket a host of Southeast Asian populations as “Negritos,” as if they were an amorphous and homogeneous substratum over which the diversity of modern South and Southeast Asian agriculturalists were overlain.There was almost certainly a great deal of population structure which accrued over the Pleistocene. Another issue, which I didn’t mention, is that Southeast Asia is also very geographically expansive. Modern Indonesia alone spans the length of North America.

Of course, you could say the same for Europe, from the Urals to the Atlantic. And yet we know that European hunter-gatherers were relatively homogeneous (albeit, with some structure!) at the beginning of the Holocene. I think the difference though is that Europe was a landscape into which hunter-gatherers expanded during the Last Glacial Maximum, while Southeast Asia, like Africa, has long been a refuge for human populations even during the coldest and driest periods of the Pleistocene.

There are three major classes of “Negrito” peoples in South and Southeast Asia.  To the west, are the indigenous peoples of the Andaman Islands. These tribes probably arrived from what is today Myanmar during the Pleistocene, when sea levels were lower. In peninsular Malaysia you have groups such as the Semang. Though physically very different from their neighbors, these people speak the Aslian form of Austro-Asiatic languages. They are not linguistic isolates like the Andaman tribes.

This speaks to the reality that unlike the Andaman Islanders the Negritos of mainland Southeast Asia have long been interacting with local populations. The languages they speak reflect interactions with Austro-Asiatic rice farmers. Curiously though, the dominant people amongst whom they live no longer speak Austro-Asiatic languages. Rather, they speak Austronesian or Tai dialects. These two groups are later arrivals on the Southeast Asian scene, and both seem to have assimilated Austro-Asiatic groups culturally and genetically, except in Cambodia and Vietnam (and to a lesser extent in pockets of Thailand and Myanmar).

If you are curious about the relationship between the various modern Southeast Asian groups, then two ancient DNA papers, Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia and Ancient genomes document multiple waves of migration in Southeast Asian prehistory, should do the trick. Some of the migrations are historically or semi-historically attested. In particular, the intrusion of the Tai, the long occupation of what became Vietnam by the Chinese, and the settlement of Han officials amongst the local people, and the migrations of the ancestors of the Hmong into Laos.

Others processes are vaguer and poorly understood. It has long been clear that the Austronesian probably assimilated Austro-Asiatic rice farmers in much of maritime Southeast Asia. And yet unlike mainland Southeast Asia to my knowledge, there are no Austro-Asiatic populations in Indonesia. Additionally, it has been brought to my attention that the ~ 3,000-year-old sample from Myanmar has no clear Austro-Asiatic signature, despite the common sense suggestion that Austro-Asiatic languages must have entered India via that region (it has affinities to modern Tibeto-Burman individuals). And, importantly the Austro-Asiatic populations themselves seem to have been deeply mixed between a dominant element strongly related to the Han Chinese, and a minority component which was basal Southeast Asian, for lack of a better term. This means that the Munda populations within India have several distinct components of ancient South and Southeast Asian substratum.

Aeta family

But speaking of this substratum, probably the best paper recently focusing on these groups is from last year, Discerning the Origins of the Negritos, First Sundaland People: Deep Divergence and Archaic Admixture. In many ways, it just reinforced the results of Reich et al. 2011. All the Negrito groups are only distantly related to each other. The Negritos of the Andaman Islanders and those of peninsular Malaysia seem to be somewhat closer to each other than either is to those of the Philippines. And, the groups in the Phillippines seem to be somewhat closer to the peoples of Melanesia. To some extent, this is just geographically expected, but there are also interesting details.

The Negritos of the Philippines, in particular, those from the northern island of Luzon, have some of the highest fractions of Denisovan ancestry of any human populations outside of Melanesia. No one is clear whether the admixture is from the same event as the one that leads to the high fractions in Melanesians, or whether there were separate mixing events (not implausible). The western Negrito groups have far lower fractions of Denisovan.

Another surprising result is that the Negritos of the southern Philippines seem very distinct from those of the northern Philippines. This may be an artifact of particular admixture history, but I wouldn’t be surprised if these islands preserved a lot of diversity which has been homogenized elsewhere.

Like many people, I believe that human evolutionary genomics will have a lot to say about Africa in the next 10 years. But, outside of Africa Southeast Asia may be one of the most fertile regions in terms of exposing deep history. This was an area that was always amenable to habitation by modern-like Africans. It seems very likely now that the predominant modern human ancestry found in the Negrito substratum, and shared with all other non-Africans, is actually not the signal of the oldest modern humans to be present in Southeast Asia. Second, there seem to be many archaic human species which made their homes in Southeast Asia.

Humans arrived in Southeast Asia a long time ago. Our speciosity and census sizes were high. With more ancient DNA and better deep whole genome sequence analysis, we’ll uncover some surprising things. I guarantee.

June 26, 2018

Height differences across Europe could be less effected by selection than we had thought

Filed under: GWAS,Height,Human Population Genetics — Razib Khan @ 8:43 am

Like an Old Testament prophet of yore Graham Coop has been prophesying that cryptic population stratification may be a major confounder in analyses for as long as I’ve known him with any degree of familiarity. So it’s no surprise he’s an author on one of two preprints which have rocked the genomics world:

Reduced signal for polygenic adaptation of height in UK Biobank:

There is considerable variation in average height across European populations, with individuals in the northwest being taller, on average, than those in the southeast. During the past six years, a series of papers reported that polygenic scores for height also show a north to south gradient, and that this cline results from natural selection. These polygenic analyses relied on external estimates of SNP effects on height, taken from the GIANT consortium and from smaller replication studies. Here, we describe a new analysis based on SNP effect estimates from a large independent data set, the UK Biobank (UKB). We find that the signals of selection using UKB effect-size estimates for height are strongly attenuated, though not entirely absent. Because multiple prior lines of evidence provided independent support for directional selection on height, there is no single simple explanation for all the discrepancies. Nonetheless, our current view is that previous analyses were likely confounded by population stratification and so the conclusion of strong polygenic adaptation in Europe now lacks clear support. Moreover, these discrepancies highlight (1) that current methods for correcting for population structure in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of polygenic differences between populations should be treated with caution until these issues are better understood.

And…Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies:

Genetic predictions of height differ significantly among human populations and these differences are too large to be explained by random genetic drift. This observation has been interpreted as evidence of polygenic adaptation, natural selection acting on many positions in the genome simultaneously. Selected differences across populations were detected using single nucleotide polymorphisms (SNPs) that were genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of sub-significant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection for diverse traits, the introduction of methods to do this, and claims of polygenic adaptation for multiple traits. All of the claims of polygenic adaptation for height to date have been based on SNP ascertainment or effect size measurement in the GIANT Consortium meta-analysis of studies in people of European ancestry. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. While we replicate most previous findings when restricting to genome-wide significant SNPs, when we extend the analyses to large fractions of SNPs in the genome, the differences across groups attenuate and some change ordering. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure, a more severe problem in GIANT and possibly other meta-analyses than in the more homogeneous UK Biobank. Therefore, claims of polygenic adaptation for height and other traits, particularly those that rely on SNPs below genome-wide significance, should be viewed with caution.

I haven’t read both preprints through and through, but my first thought (along with others), is the same as Casey Brown:

Note that no one has responded to his question.

Finally, recall that population structure within Europe is relatively weak and the distances between the groups low. It reminds you of how difficult polygenic traits are to analyze due to the small and subtle effects, and how they might be overwhelemed even by subtle population structure. And recall, even the British population has some of that… (albeit, an order of magnitude or so less than what you can find across Europe).

June 12, 2018

The lost 50,000 years of non-African humanity

Filed under: Human Population Genetics — Razib Khan @ 11:24 pm

The figure above is from Efficiently inferring the demographic history of many populations with allele count data. This preprint came out a few months ago, but I was prompted to revisit it after reading Spectrum of Neandertal introgression across modern-day humans indicates multiple episodes of human-Neandertal interbreeding.

The latter paper indicates that there were multiple waves to Neanderthal admixture into both Europeans and East Asians. The motivation to do the analysis is that East Asians are about ~12 percent more Neanderthal than Europeans. The authors don’t reject the idea that there was ‘dilution’ of Neanderthal through selection and especially admixture with a “Basal Eurasian” group which didn’t have Neanderthal ancestry. I don’t want to get into the details of the results except for one thing: the preprint confirms a consistent finding over the past eight years that the Neanderthal contribution to the modern human genome is from a single population.

Perhaps it was a small population. Or perhaps it was a large population that had gone through a bottleneck and was genetically not very differentiated. But unlike Denisovans it seems that it was a particular Neanderthal lineage that interacted with modern humans.

Moving back to the “Basal Eurasians,” notice some details of the schematic above. The divergence of Basal Eurasians from other non-Africans was ~80,000 years ago, across an interval of 70 to 100 thousand years ago. The admixture of Basal Eurasians into the proto-LBK population occurred ~30,000 years ago, across an interval of 11 to 41 thousand years ago. Ancient DNA from North Africa indicates that Basal Eurasians were already well admixed well before 11 thousand years ago.

The other dates make sense. 50,000 years for Europeans-Han Chinese, 96,000 years for Mbuti-Eurasians, and 696,000 years for Neanderthal-modern humans.

Ancient modern humans were highly structured. We know this from within Africa. But it seems clear that modern humans who had crossed over the other side of the Sahara also exhibited the same tendency. Basal Eurasians did not mix with Neanderthal populations. I suspect that that might be due to the fact that they were in Northeast Africa. At some point in the Pleistocene a mixing event occurred. This may have been precipitated by drier conditions and human retreat into only a few habitable areas, and the original Basal Eurasian populations may have mixed into other Near Eastern groups, which were part of the broader Neanderthal-mixed populations.

June 5, 2018

The great bottleneck after the post-Eemian separation

Filed under: Human Population Genetics — Razib Khan @ 12:10 am


I’ve been thinking about effective population size. Basically it’s the inferred breeding population you estimate in the present, or in many cases the past, based on the genetic variation you see within the population. Another way to say it is that it’s the population size that can explain the genetic drift that you see in the data.

To give a concrete example, the population of the New England states of America was ~1,000,000 during the 1790 Census. The vast majority of this was due to natural increase from a settler population of about ~50,000 in 1650 (total fertility rate of women in New England was seven children in the years between 1650 and 1700). Of these, ~23,000 were Puritans or the offspring of Puritans who migrated around between 1630 and 1643 (due to religious differences with the English government of the period). One might think that a population of ~1,000,000 would be genetically diverse, but the ~50,000 in 1650 matter a lot more than the ~1,000,000 in 1790. The rate of mutation accumulation is pretty slow, so a population bottleneck or subsample has a huge long-term effect.

In fact, as you probably know one of the biggest determinants of genetic variation in New England whites of 1790 is the bottleneck that they share with all other non-Africans that dates to 50,000 years or more before 1790!

And these are just the coarse demographic considerations on the broader population/historical scale. In any normal random-mating human population, there’s some reproductive variance by chance (usually it is modeled as a poisson distribution; mean and variance being the same, though from I have read the variance in mammals is usually greater than the mean).

Some people have more children, and some people have fewer children. That means that there is a census population, and a breeding population, and the breeding population is invariably smaller than the census population. Some individuals don’t reproduce to the next generation, obviously. But there are also cases where some individuals have large numbers of surviving offspring, while others have only a few.

To make it concrete I plotted the distribution of the number of children of women older than 50 years of age from the year 2000 and later in the General Social Survey (GSS). You can see that the most common number is two, but there are a fair number with three. Only about 10% of women 50 years and older have no children in the GSS.

But the curious thing is that if you weight the number by the proportion, you notice that women who have three children may not be as common as women who have two children, but they are contributing more children to the next generation than women who have the more typical two children. And, though the number of women who have five or more children is only 11% of the sample, as opposed to 14% who have one child, they contribute nearly five times as many children as those with one child to the next generation (women with six children alone contribute more than women with one child).

Basically, not all the genetic variation in a given generation is created equally. Some people will contribute more to the next generation, and that has a homogenizing effect (there are models of mutation/selection/drift which establish equilibria values of variation in a stationary state).

I’m revisiting all of this for two reasons. First, in Who We Are And How We Got Here David Reich talks about a long period of a shared population bottleneck for “Out of Africa” (all non-Africans) groups before the primary expansion ~60,000 years ago. Second, in my conversation with Matt Hahn, he was very skeptical of drawing any correspondence between effective population and some inferred census size. In hindsight I think part of it is that in most organisms census quotes are more an art than science. Not so with humans.

This made me look more into the literature for humans again. Recently Browning et al. published Ancestry-specific recent effective population size in the Americas. It’s a great paper. Basically, it uses identity by descent tracts of different ancestry to tease apart the distinctive pre-admixture effective population sizes. If you take an admixed population and assume that it was a single population random-mating indefinitely, and then work backward in time, you’re probably going to produce rather strange effective population sizes (if the two groups are about the same genetic diversity beforehand, they’ll probably show an inflated effective population, because you are assuming the two groups were a big random-mating population long before they were randomly mating!).

There are many ways to infer effective population, and the identity by descent method seems reasonable for recent time periods. And one thing about recent population size estimates for humans is that you have reasonable census estimates (you don’t just check with simulations):

Our simulations showed that biased sampling of a structured population results in underestimation of most recent effective population size. When we compare the estimated current effective sizes of HCHS/SOL country-of-origin populations to World Bank population sizes (accessed via Google Public Data Explorer) from 1995 (when the average age of the sampled individuals was around 25), we find that the ratio of current estimated effective size to 1995 population size ranges from approximately 1/60 (Ecuador) to approximately 1/4 (Cuba), with typical values around 1/10. Although estimates of effective size in the most recent generations are affected by these issues, our simulations also showed that less recent generations are not affected. Thus our estimates are useful for learning about the effective population sizes at and before admixture.

The structured part is important. For example, the paper On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference? explores how structured models of gene-flow might be confused when genomic inferences assume a panmictic population. Last year a paper in PNAS, Early history of Neanderthals and Denisovans, suggested that Neanderthals were characterized by a high structured meta-population, and that low effective populations from sampled genomes in this group of humans reflects this, rather than a genuinely low census size.

Browning et al. focused on recent population size inferences. I was curious about these inferences because we can compare them to real census sizes. From this I think I can tune my intuition at least to the possibily that census size of a random mating population is not likely to be two orders of magnitude above the inferred effective population size. Conversely, the rough mammalian value of an effective population size of ~1/3 the census size seems to be a ceiling. Population structure and bottleneck aside, humans seem to have enough basal reproductive skew that effective population size is less than half of the census size.

To focus on ancient population growth (or lack thereof), I reread Inferring human population size and separation history from multiple genome sequences (Schiffels et al. 2014), Exploring Population Size Changes Using SNP Frequency Spectra (Liu et al. 2015) and Neutral genomic regions refine models of recent rapid human population growth (Gazavea et al. 2014). The first two papers seem to suggest an “Out of Africa” population bottleneck that’s pretty long, with an effective population that’s somewhat lower than 5,000 individuals. In contrast, the last paper seems to have a sharp bottleneck of 200 individuals.

Remember, different models can produce the same empirical patterns in the genome. You can reduce genetic diversity by a modest, but long, bottleneck. Or, through a very sharp short bottleneck.

In Who We Are and How We Got Here David Reich definitely leans toward a long, but more modest, bottleneck. For anthropological and archaeological reasons this seems more plausible now than it did ten years ago.

But perhaps it makes more sense now that we have more ancient DNA and a more elaborated model of human history seen through the lens of population genetics. In Schlebusch and Jakkbonson’s Tales of Human Migration, Admixture, and Selection in Africa the authors come out say “For our species’ deep history in Africa, both paleoanthropological and genetic evidence increasingly point to a multiregional origin of AMHs [anatomically modern humans] in Africa.”

They’re only saying what I hear other people talking about.

Instead of the “Out of Africa bottleneck” being defining for our species, it’s only a phenomenon which is important for peoples outside of Sub-Saharan Africa. Arguably for the majority of the existence of our species something closer to multi-regionalism was operative within modern humans.

If fact, isn’t that what the new ancient DNA shows? Pulses of admixture and gene-flow between distinct groups? Arguably multiregionalism might be the answer to our origins, but also characterize many of the dynamics after the “Out of Africa” event.

In any case, the best evidence now points to the likelihood that modern human lineages began to diversify and diverge before 200,000 years ago. Conversely, most of the ancestry of modern humans outside of Africa dates to an expansion around ~60,000 years before the present (ancient DNA and archaeology seem to agree here).

This is probably right before the Neanderthal admixture event with non-African humans, at least the modern lineages we have around today. But, it turns out it does not define the point when non-African humans diverged from the ancestral African population. Another group, “Basal Eurasians” (who may not have been Eurasian at all), diverged before the expansion of all eastern non-Africans, Oceanians, as well as the ancestors of Pleistocene Europeans and Siberians. It does not seem that Basal Eurasians had any Neanderthal admixture. Basal Eurasian ancestry is substantial in the Middle East today (although lower than 50%), and non-trivial across broad swaths of Europe and South Asia, due to the expansion of farming. They seem to have been well mixed in places like North Africa with other Eurasian groups ~15,000 years ago. Presumably that was a “back to Africa” migration, since these people had Neanderthal ancestry.

All of this leads to the conclusion that the ancestors of Basal Eurasians/non-Africans must have gone through their shared bottleneck well before ~60,000 years before the present. And, it may have happened on the African continent. So with that, I’ll quote Schiffels et al.:

This comparison reveals that no clean split can explain the inferred progressive decline of relative cross coalescence rate. In particular, the early beginning of the drop would be consistent with an initial formation of distinct populations prior to 150kya, while the late end of the decline would be consistent with a final split around 50kya. This suggests a long period of partial divergence with ongoing genetic exchange between Yoruban and Non-African ancestors that began beyond 150kya, with population structure within Africa, and lasted for over 100,000 years, with a median point around 60-80kya at which time there was still substantial genetic exchange, with half the coalescences between populations and half within (see Discussion). We also observe that the rate of genetic divergence is not uniform but can be roughly divided into two phases. First, up until about 100kya, the two populations separated more slowly, while after 100kya genetic exchange dropped faster.

David Reich’s group, and others, now posit the existence of “Basal Human” population that mixed into West Africans, who can be modeled as primarily proto-East African (without Eurasian admixture), as well as this ancient outgroup. This means that estimates of divergences with non-Africans from something like MSMC may generate a composite if proto-East Africans are closer to the ancestors of non-Africans, which seems likely. One likely model is that the “Out of Africa” population emerged out of the northern edge of this proto-East African distribution of modern humans over 100,000 years ago (but after groups like the Khoisan and Basal Humans had already diverged).

Looking at Schiffel et al., they seem to posit lower in divergence times than seems likely to me. Is that perhaps due to unaccounted for admixture in lineages which fuse together groups which were earlier distinct?

In any case, with details about the divergence dates set aside, the MSMC results are actually in line with a new congealing consensus. Deep structure within Africa, but gene-flow between distinct populations, for at least ~100,000 years (possibly more). This is the period when population structure was quite fluid and indistinct along the East Africa continuum out of with non-Africans emerged.

Also, the archaeological evidence is now strongly suggestive of modern humans in places like Southeast Asia over 10,000 years before the wave which led to the ancestry of most extant populations. In fact, we know that this sort of early migration with no descendants isn’t abnormal. The first modern humans in Europe left no descendants (at least in any appreciable quantity). And the Altai Neanderthal seems to have modern-like admixture that dates to ~100,000 years before the present.

With all the evidence that modern humans were present in Africa, and expansively so, for hundreds of thousands of years, it seems unlikely that they never mixed with “archaic” Eurasian  lineages (and vice versa). In fact, as we obtain more and more Neanderthal and Denisovan genomes perhaps we’ll find that a rapid expansion like the one that occurred ~60,000 years ago across Eurasia and Oceania happened before, out of and/or into Africa.

Looping back to the effective population issue, the effective population of modern non-Africans seems to have been below ~5,000 for a while. There was minimal gene-flow with other populations for many generations. Reich has a schematic of 40,000 years between 90,000 and 50,000 BP in Who We Are and How We Got Here. But that’s obviously just a ballpark figure. I have a hard time believing that the census size was around 500,000. The world population 10,000 years ago is usually estimated to be 1 to 10 million. Human populations were probably much larger at the end of the Pleistocene than 100,000 years ago. But a figure of 10% effective would give 50,000, which seems a reasonable number, especially with the likelihood that we’re talking about many tribes over a wide ecological zone. Meta-population dynamics of extinction and resettlement in inclement periods probably drove down the effective population.

The separation seems to be distinct from the older multiregional phase. What could explain it? The existence of the Sahara, and periods of extreme desertification seems the most likely candidate. I can’t say much with any credibility because I don’t know the archaeology and paleoclimate literature, but before domesticated animals, it was probably difficult for hunter-gatherers to make a go of it in the deep Sahara during the driest phases.

If I had to bet, the Eemian interglacial, 130 to 115 thousand years ago, is when I would assume there was:

  1. Lots of gene flow across the Sahara, perhaps in both directions
  2. A major population expansion of humans, of all sorts

This gives plenty of time for a wave of modern humans to push east, probably going through milder climates, rather than expanding north into Neanderthal or Denisovan territory. Eventually, some group must have mixed with the ancestors of the Altai Neanderthals. It seems likely that a cold and dry spell after the Eemian would have been optimized more to the well adapted Eurasian groups, and modern populations would have withdrawn into refugia. The brutally expanding Sahara would have divided the majority of modern humans, who existed in the meta-populations to the south that dated back hundreds of the thousands of years, from the groups on the northern fringe.

One can imagine that large numbers of modern humans were either absorbed or went extinct with the expansion of Neanderthals and other archaics. Though Neanderthals and Denisovans were interfertile with moderns, the lineages were still distinct enough that it looks like there was some hybrid breakdown. Just as modern humans seem to have purged many Neanderthal alleles from our genome, the opposite dynamic was probably at work.

There was clearly some structure in the relict modern human group that was separated from the African populations. Basal Eurasians did not mix with Neanderthals, but the ancestors of all other non-African humans did. Though one has to be careful about such geographical inferences, that suggests to me that the range of modern humans in the period between 60,000 to 80,000 years ago extended further back into pockets of northeast Africa, where no contact with Neanderthals would have occurred. Perhaps, in the end, we’ll end up thinking that the Basal Eurasians in some ways were a lot more like Africans south of the Sahara, as they didn’t undergo the massive range expansion of other populations during the Upper Paleolithic.

I’ll end with some predictions.

  • Ancient DNA of proto-moderns and archaics in eastern Eurasia dated to between 50,000 to 100,000 years BP will be analyzed at some point and will exhibit a fair amount of admixture. That is, the Altai Neanderthal was not exceptional, and probably relatively attenuated. I’m moderately confident of this.
  • The pre-60,000 year eastern Eurasians will be found to have left some of their genes in modern eastern Eurasians. Especially in Southeast Asia and Oceanian. Probably in the 1-10% range. I’m moderately confident of this.
  • The Denisovan ancestry in Oceanians is mediated by a “first wave” group “Out of Africa.” I have low confidence in this, but I really wouldn’t be surprised either way. My confidence in my confidence is low!
  • At some point we’ll obtain sequence from a 1 million year old hominin somewhere in the colder/drier climes of Eurasia (we have a 900,000 year old horse genome). This will predate Neanderthal/Denisovans. We will see from this that some of these super-archaic populations left their heritage in later archaics, and therefore our own lineage. I’m rather confident of this.
  • By hook or crook we’ll get more ancient genomes out of African samples, and confirm a lot of ancient population structure, as well as some gene-flow from archaic non-modern lineages. Probably around the same range you see in non-Africans (though some of the gene-flow may also apply to non-Africans, since they didn’t separate from eastern Africans until 100,000 to 150,000 years ago). I’m rather confident of this.
  • H. naledi will return sequence at some point. I’m very confident of this. I don’t have inside knowledge, but I know they’re going to keep trying. They are getting more samples.
  • H. naledi will be found to have contributed ancestry to modern southern African populations. I’m moderately confident of this.
  • At some point ancient genomes from the Americas will confirm the existence of an earlier group which was only distantly related to modern New World populations descended mostly from Siberians. There is indirect evidence of this group from South American populations, but we’ll get individuals who are much more distinct at some point in the future. I’m moderately confident of this.
  • Basal Eurasians will be found to have inhabited Southern Arabia/Persian Gulf region. But “pure” population will have been found to have disappeared around the Last Glacial Maximum ~20,000 years ago, as the human populations to the north moved south, and the Near East’s southern fringe became drier. I’m moderately confident of this.

June 3, 2018

The 4,000 year explosion

Filed under: FADS1,Human Population Genetics — Razib Khan @ 10:53 pm


The figure above shows a most interesting result from a new preprint, FADS1 and the timing of human adaptation to agriculture. It shows the allele frequency change using ancient Eurasian genomes for the derived allele at FADS1.

In case you don’t know why FADS1 is important, it’s been implicated in variation long-chain polyunsaturated fatty acids (LC-PUFA) metabolism. The derived allele, embedded in haplotype D in the above preprint, seems more optimized for plant-based diets, because of the higher activity of synthesis of LCPUFAs (which one might otherwise obtain from marine resources, as is likely among Inuit).

So the standard model is that the Neolithic changed things, as humans began to adapt to cereal-based diet diets. This preprint suggests maybe not:

Our analysis shows that selection at the FADS locus was not tightly linked to the development of agriculture. Further, it suggests that the strongest signals of recent human adaptation may not have been driven by the agricultural transition but by more recent changes in environment or by increased efficiency of selection due to increases in effective population size.

The authors are explicit that the derived allele at FADS1, which is at ~60% in modern Europeans, was under strong selection during the Bronze Age. In fact, this allele, which is common in Africans, may have been absent in most Paleolithic Eurasians. Using various methods they infer in fact that the ancestors of non-Africans may have been subject to selection for the ancestral variant. Their timing estimates indicate that this predates the standard expansion period starting ~60,000 BP (there was also an older selection event for the derived variant within Africa). Additionally, the authors posit that the derived variant was introduced into Europeans due to the Basal Eurasian ancestry in farmers.

They posit two dynamics that might drive the Bronze Age selection events. First, they suggest that the change in environment was actually more dramatic than that during the Paleolithic-Neolithic transition. Second, they suggest that effective populations were much smaller before the Bronze Age, so selection was not as efficacious (or, more precisely, drift effects were dominant in shaping variation).

This idea that the Neolithic isn’t quite as important, or singular, is somewhat of a surprise. But we may need to consider it. Another line of research, using high-quality modern day sequences rather than ancient genotypes, implies that there has been a lot of recent selection, and that’s likely going on today.

Second, one of the major takeaways from The Fate of Rome is that pandemics probably weren’t a feature of Neolithic small-scale societies. Rather, pandemics relied on long-distance trade and movement, as well as concentrations such as urban centers. Though certain endemic diseases probably arose in the Neolithic, the periodic sweep of pandemics required greater social and cultural complexity and overall human density.

The analogy then is rather straightforward. Just as microbes can move faster and more efficiently in an interconnected world, so such a world is much closer to a panmictic one. Earlier work suggested that effective population size of Neolithic farmers was not particularly small, but perhaps there are dynamics being missed by that simple summary value when it comes to the interconnectedness of the Eurasian landscape triggered by the emergence of pastoralism, and the necessary reaction of larger-scale polities.

A simple test of this would be to compare selection signals in a place like Papua New Guinea, which did not seem to undergo the same sort of pressures as Bronze Age Eurasian societies in relation to reduced diversity. I presume that New World societies as well would be an interesting test.

May 26, 2018

Y chromosomal star-phylogenies as inter-group competition between paternal lineages

Filed under: Human Population Genetics,Star phylogenies,Y chromosomal lineages — Razib Khan @ 11:37 pm

The figure to the left should be familiar to readers of this weblog. It is taken from A recent bottleneck of Y chromosome diversity coincides with a global change in culture (Kamin et al.). Over the past few years a peculiar fact long suspected or inferred has come into sharp focus: some of the Y chromosome haplogroups very common today were not so common in the past, and their frequency changed very rapidly over a short time period.

What Kamin et al. did was look at sequence data across the Y chromosome to make deeper inferences. The issue is that the Y chromosome is not genetically very diverse. Earlier generations of researchers focused on highly mutable microsatellite regions for identification. While microsatellites are good for identification and classification because of their genetic diversity, they are not as good when it comes to making evolutionary inferences about parameters such as time since last common ancestor. They have very high and variable mutation rates.

Single nucleotide polymorphisms (SNPs) are probably better for a lot of evolutionary inference, but the Y chromosome doesn’t have too many of these. SNP-chip era technology which focuses on a select subset of polymorphisms at specific locations didn’t have much to choose from and likely missed rare variants.

This is where whole-genome sequence of the Y comes in. It retrieves maximal information, and with that, the authors of Kamin et al. could definitely confirm that some Y chromosomal lineages under explosive expansion ~4,000 years ago after a bottleneck.

By and large ancient DNA take a different angle, focusing on genome-wide autosomal ancestry, and lacking in high-coverage whole-genome sequences. But they have confirmed the inferences from whole-genomes that some of these lineages exhibit explosive growth in the last ~4,000 years. One moment they were rare, and the next moment ubiquitous.

But geneticists are geneticists. They’re interested in genetical questions, methods, and dynamics. To be frank cultural models for how those genetic patterns might have come about are either exceedingly simple and probably true (e.g., gene-culture coevolution with lactase persistence), or vague and handwavy. With the surfeit of genomic data to analyze it isn’t surprising that this happens.

This is why researchers in the field of cultural evolution need to get involved. They’re model-builders and should see which models predict the copious empirical results we have now when it comes to genetic change over time.

For several years now I have been asserting that inter-group competition of paternal lineages best explains the pattern of Y chromosome expansions ~4,000 years ago. A new paper brings forth a formal model which explores this hypothesis, Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck:

In human populations, changes in genetic variation are driven not only by genetic processes, but can also arise from cultural or social changes. An abrupt population bottleneck specific to human males has been inferred across several Old World (Africa, Europe, Asia) populations 5000–7000 BP. Here, bringing together anthropological theory, recent population genomic studies and mathematical models, we propose a sociocultural hypothesis, involving the formation of patrilineal kin groups and intergroup competition among these groups. Our analysis shows that this sociocultural hypothesis can explain the inference of a population bottleneck. We also show that our hypothesis is consistent with current findings from the archaeogenetics of Old World Eurasia, and is important for conceptions of cultural and social evolution in prehistory.

Their model is interesting because inter-group competition between paternal lineages can result in a loss of haplogroup diversity without huge reproductive skew. That is, instead of a highly polygynous society, one can simply posit that group dynamics of expansion and extinction produce expansions of Y chromosomal lineages.

A formal model synthesized with genomic results is a major step forward, though I haven’t dug into the methods (computational or analytic). Presumably, this is a first step.

But the discussion does review a lot of anthropological literature about the nature of human conflict and social interaction. Basically, it seems that between nomadic hunter-gatherers and before chiefdoms, biologically defined paternal clans were often the organizing principle of society. To some extent this makes total sense since the meta-ethnic religious and social identities explicitly appeal to fictive relationships of blood even after blood was no longer paramount. Ancient Near Eastern kings addressed each other in familial terms (e.g., “brother” and “son”), while universal religions deploy the construct of brotherhood.

In Empires of the Silk Road the author makes the case that these bands of brothers were more influential in shaping history than we realize today. Not surprisingly, the authors of the above paper suggest that the Inner Asian nomad zone is where star-phylogenies have been most pervasive and persist down to historical time. As in Steven Pinker’s The Better Angels of Our Nature it seems that the rise of the state suppressed the viciousness of the paternal kin group. How do we know this? Because the period of the maximal explosion of star-phylogenies seem to be a transient between the early Neolithic and the historical age.

The Y chromosomal literature is just the low hanging fruit. I suspect in the next decade cultural evolutionary models will be brought to bear on the huge mountain of genomic data….

Citation: Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck Tian Chen Zeng, Alan J. Aw & Marcus W. Feldman.

May 24, 2018

Selection is going on with SLC24A5….

Filed under: Human Population Genetics — Razib Khan @ 12:20 am
The ancestral allele for rs1426654 at SLC24A5

 
On this week’s episode of The Insight, I talked to Matt Hahn about why he wrote his new book, his opinions on “Neutral Theory”, and what he thought about David Reich’s op-ed. Without Spencer’s supervision, I have to admit that I think I lost control and just went “full nerd”. Next week we’re dropping Carl Zimmer’s podcast, so rest assured that the world will come back into balance, and The Insight will be more welcoming to civilians!

At a certain point, Matt and I were discussing allele frequency differences between populations and he came close to saying all such differences between human populations were of modest frequency in relation to pairwise comparisons (e.g., 40% vs. 49%). Obviously, this is not true, because there is always the huge difference in SLC24A5 at SNP rs1426654 (at Duffy and a few other loci). A substitution of a G for an A converts the codon from alanine to threonine.

You have heard of this locus because of a paper in 2005, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. This paper came out in December of 2005, a few years after Armand Leroi wrote in Mutants that geneticists still hadn’t come to grips with normal variation in pigmentation in humans. The above publication was the first step in solving this question in the years between 2005 to 2010, at least to a good first approximation.

In the sample in the paper they explain 25-40% of the variation in melanin index between Africans and Europeans with this single genetic change (for various technical reasons it’s probably not that big an effect, though it is still big, and probably the largest effect quantitative trait locus for pigmentation in the human genome).

It turns out that this mutation, the derived variant, is almost disjoint is frequency between Europeans and Africans. That is, about ~100% of Africans carry the ancestry G base at while ~0% of Europeans carry the G base (as opposed to the A base). Interestingly, East Asians carry the G base at ~100% frequency as well. If you genotype an anonymous individual and their genotype is AG or GG on at rs1426654 then it is highly likely that that individual is not a European.

To give an example of how this works, in 2013 I stumbled onto a paper which genotyped 101 Europeans from Cape Town in South Africa. That means there are 202 alleles (two per person) at rs1426654. Of these, 5 of the alleles were ancestral (G). From this, I immediately concluded that it was highly likely that the Afrikaaner people of South Africa have non-European ancestry. I came to this conclusion because of 5 copies of the ancestral allele, ~2.5%, is shockingly high for a European population, and it was long surmised that the Afrikaaner people had some non-European heritage (Khoisan, Bantu, South and Southeast Asian) ancestry. The major of the whites sampled in Cape Town could have been Afrikaaners (I’ve confirmed this with genome-wide data).

To get a sense of where my intuitions come from you need to look at allele counts within populations. Using 1000 Genomes, Yale’s Alfred, and Gnomad I assembled a representative list to give you a sense of what’s going on. Using 126,548 counted alleles in Gnomad for individuals of European (non-Finnish) descent you see that 0.38% out of the total, 486, are ancestral.

Population Ancestral alleles Total alleles Freq
Samaritan 0 74 0%
Basque 0 216 0%
Greeks (Thrace, Athens) 0 184 0%
Burusho 0 50 0%
Pandit Brahmin, Kashmir 0 40 0%
European (Non-Finnish) 486 126548 0%
Ashkenazi Jewish 47 10148 0%
European (Finnish) 329 25790 1%
Iraq Kurds 1 68 2%
Yemenite Jews 2 78 3%
Havyaka Brahmin, Karnataka 2 62 3%
Palestinian 4 122 3%
Gujarati 10 206 5%
Tunisian Berber 6 110 5%
Andalusian 14 252 6%
Iranian 6 84 7%
Pashtun 21 190 11%
Uttar Pradesh Brahmin 4 34 12%
Pandit Brahmin, Haryana 13 78 17%
Punjabi 42 192 22%
South Asian 6921 30774 22%
Kalash 14 48 29%
Telugu 71 204 35%
Bangladeshi 80 172 47%
Sri Lanka Tamil 105 204 51%
Adi-Dravida, Karnataka 21 34 62%
Masai Kenya 192 286 67%
Austro-Asiatic tribe, Odisha 43 56 77%
Luhya Kenya 155 188 82%
Hausa 68 76 90%
Mende Sierra Leone 155 170 91%
Gambian 209 226 92%
Ibo 90 94 96%
Austro-Asiatic tribe, Odisha 92 96 96%
Esan Nigeria 193 198 97%
Yoruba Nigeria 213 216 99%
Biaka 135 136 99%
East Asian 18728 18856 99%
Ghana 140 140 100%
Mbuti 74 74 100%

Last fall Crawford et al. reported that rs1426654 is embedded in a haplotype that’s about ~30,000 years ago. Additionally, they contend that its presence within Africa is probably no earlier than the Holocene, the last ~12,000 years.  Martin et al. report that KhoeSan exhibit higher frequencies of the derived allele because of Eurasian back-migration and then in situ natural selection. Of course, not all Eurasians. Most East Asians have the ancestral variant of rs1426654.

This leaves us with West Eurasians, North Africans, and South Asians. I’ve put a few South Asian populations in the list to show you that there is a wide range of variation in allele frequencies. The South Asians in Gnomad, probably mostly Diaspora, have the ancestral variant at only 22%. In contrast, Austro-Asiatic speaking South Asian groups from northeast India have very high frequencies of the ancestral variant. There has clearly been in situ selection in some South Asian populations for the derived variant at rs1426654. Ancestral North Indian groups (ANI) probably brought the derived allele, and Ancient Ancestral South Indians (AASI) probably tended to carry the ancestral allele, like East Eurasians and Oceanians. Additionally, South Asian populations often have high drift. Some of the differences in the Alfred data seem to be impacted by this.

The situation in the Middle East, North Africa, and Europe is different.  In the Middle East and North Africa, the ancestral variant is present at frequencies around 1-10%.  Some of this can probably be attributed to admixture from Africa and in some cases South and East Asian populations. Ancient DNA from the Middle East and North Africa presents a mixed picture. The farmers who brought the Neolithic to Europe carried the derived variant at rs1426654, and some of the ancient Middle Eastern samples carry it. But not all. The recent Iberiomauserian samples which date to ~15,000 years ago don’t seem to have had the derived variant.

Though the hunter-gatherers of Western Europe only seem to have carried the ancestral variant at rs1426654, the hunter-gatherers of Scandinavia and Eastern Europe did exhibit the derived variant in some frequency, though lower than modern Europeans.

My own hunch is that the original genetic background against which the A mutation at rs1426654 emerged will be found increasing in frequency first somewhere in the Near East after the Last Glacial Maximum. But no ancient population shows the frequencies of the derived variant we see in modern Europeans. In isolated populations subject to drift it wouldn’t be surprising if the ancestral variant decreased to ~0%, But in European populations today in the vast majority of cases the ancestral variant is far lower than 1%, even though we know that within the last 10,000 years the ancestral populations streams had several groups with very high frequencies of that ancestral variant. The low frequency is not due to a freakish bottleneck all across Europe. It has to be selection

One thing I have pointed out is that this very low frequency of the ancestral variant indicates that the advantage at rs1426654 for the A allele in Europe is additive. In Northern Europe, the frequency of the derived variant that confers lactase persistence tops out at around ~90 percent. We know this region of the genome has been targeted by natural selection, but lactase persistence also happens to express dominantly genetically. That is, one variant of the mutant allele confers the phenotype. Once you hit ~90 percent of the derived variant only ~1 percent of the population would be lactose intolerant homozygotes (two copies of the ancestral variant). In the Gnomad sample of 60,000+ Europeans, they count three homozygote genotypes rs1426654. That’s 0.005%.

Something is happening at rs1426654. Selection. But why? No one really has any explanation beyond the obvious.

May 8, 2018

The peoples of the Maghreb have some Pleistocene roots

Filed under: Human Population Genetics,North Africa,Population genetics — Razib Khan @ 11:58 pm
Moroccan Berber man

The Maghreb is an important and interesting place. In the history of Western civilization, the tension between Carthage, the ancient port city based out of modern-day Tunisia, and Rome, is one of the more dramatic and tragic rivalries that has resonances down through the ages. Read Adrian Goldsworthy’s chapter on the Battle of Cannae in The Punic Wars for what I’m alluding to (and of course there was Cato the Younger’s dramatic remonstrations).

Later Roman Africa, which really encompassed northern Morocco, coastal Algeria, and Tunisia and Tripolitania, became a major social and economic pillar of the Imperium. Not only did men such as the emperor Septimius Severus and St. Augustine have roots in the region, but these provinces were a major economic bulwark for the Western Empire in its last century. The wealthy Senators of the 4th and 5th century were often absentee landlords of vast estates in North Africa. The fall of these provinces to the Vandals and Alans in the 430s began the transformation of the Western Empire based in Rome into a more regional player, rather than a true hegemon (perhaps an analogy here can be made to the loss of Anatolia by the Byzantines in the 11th century).

Another important aspect of North Africa is that it is the westernmost extension of the region possibly settled by Near Eastern farmers in Africa. The native Afro-Asiatic Berber languages seem to have been dominant in the region despite the influence and prestige of Punic and Latin in the cities when Muslim Arabs conquered the region in the late 7th century. The genetic-demographic characteristics of the region are relevant to attempts to understand the origins of the Afro-Asiatic languages more generally since Berber is part of the clade with the Semitic languages.

A preprint and a paper utilizing ancient DNA have shed a great deal of light on these questions recently. The paper is in Science, Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations. The preprint is Ancient genomes from North Africa evidence prehistoric migrations to the Maghreb from both the Levant and Europe. They are in broad agreement, though they cover somewhat different periods.

The figure below is the big finding of the Science paper:

They retrieved some genotypes from a site in northern Morocco, Taforalt, which dates to ~15,000 years before the present. This is a Pleistocene site, before the rise of agriculture. The Taforalt individuals are about 65% Eurasian in affinity, and 35% Sub-Saharan African. This confirms that the Eurasian back-migration to northern Africa predates the Holocene, just as many archaeologists and geneticists have reported earlier.

The samples from the preprint date to a later time. IAM in the samples dates to 7,200 years before the present, and KEB to ~5,000 years before the present. It seems pretty clear that the IAM samples in the preprint exhibit continuity with the Taforalt samples. Though it is not too emphasized in the preprint the lower K’s seem to strongly suggest that the IAM samples have Sub-Saharan African ancestry, just like the Taforalt samples which are nearly 8,000 years older. In the KEB samples, the fraction drops, probably diluted in part by ancestry related to what we elsewhere term “Early European Farmer” (EEF), related to the Anatolian farming expansion.

Both the Taforalt and IAM samples, in particular, seem to exhibit strong affinities to Natufian/Levantine peoples. Additionally, many of these samples carry Y chromosome haplogroup E1b, just like some of the Natufians. These results indicate that the Natufian-North African populations were exchanging genes or one cline rather deep in the Pleistocene.

Though various methods have suggested that there is a lot of recent Sub-Saharan African admixture, dating to the Arab period, in North Africa, these results suggest that much of it is far older. The Mozabites, as an isolated Berber group, reflect this tendency. Though some individuals have inflated African ancestry due to recent admixture, much of it is older and evener. And yet the Mozabites seem to have less Sub-Saharan African ancestry on average than the IAM sample.

There aren’t enough data points to make a strong inference about the temporal transect, but these few results imply a decline in Sub-Saharan ancestral component after the Pleistocene with further farming migration, and then a rise again with the trans-Saharan slave trade during the Muslim period. Another issue, highlighted in the preprint, is likely heterogeneity within the Maghreb in ancestry (lowland populations in modern North Africa tend to have more Sub-Saharan ancestry due to where slaves were settled).

In the Science paper the authors make an attempt to adduce the origin of the Sub-Saharan contribution to the Taforalt individuals. The result is that there is no modern or ancient proxy that totally fits the bill. These individuals have affinities to many Sub-Saharan African populations.  The Sub-Saharan component is likely heterogeneous, but attempts to model European genetic variation during the Ice Age ran into trouble that divergence from modern populations was quite great. Until we get more ancient DNA there probably won’t be too much more clarity.

On the issue of the Eurasian ancestry, it’s clearly quite like the Natufians. But curiously the authors find that the Neanderthal ancestry in these samples is greater than that found in early Holocene Iran samples. From this, the authors conclude that they may have had a lower fraction of “Basal Eurasian” (BEu) than those populations further to the east. But already 15,000 years ago BEu populations were mixed with more generic West Eurasians to generate the back-migration to Africa. If BEu diverged from other Eurasians >50,000 years ago, then it may have merged back into the “Out-of-Africa” populations around or before the Last Glacial Maximum, ~20,000 years ago.

Finally, the authors looked at some pigmentation genes. Curiously the Taforalt and IAM individuals did not carry the derived variants for pigmentation found in many West and South Eurasians, but the KEB did. This confirms results from Europe, and population genomic inference in modern samples, that selection for derived pigmentation variants is relatively recent in the Holocene.

I do want to add that one possibility about the Sub-Saharan ancestry in the Taforalt, and probably all modern North Africans to a lesser extent, is that it is ancient and local. We now know proto-modern humans were present in the region >300,000 years ago. Northwest Africa may have been part of the multi-regional metapopulation of H. sapiens, as opposed to the Eurasian biogeographic zone that it is often placed, before a post-LGM back migration of Eurasians.

April 26, 2018

The Ancient Neanderthal Mariner

Filed under: Human Evolution,Human Population Genetics — Razib Khan @ 10:35 pm

More recent stuff on Neanderthals of interest, Neandertals, Stone Age people may have voyaged the Mediterranean:

A decade ago, when excavators claimed to have found stone tools on the Greek island of Crete dating back at least 130,000 years, other archaeologists were stunned—and skeptical. But since then, at that site and others, researchers have quietly built up a convincing case for Stone Age seafarers—and for the even more remarkable possibility that they were Neandertals, the extinct cousins of modern humans.

But a growing inventory of stone tools and the occasional bone scattered across Eurasia tells a radically different story. (Wooden boats and paddles don’t typically survive the ages.) Early members of the human family such as Homo erectus are now known to have crossed several kilometers of deep water more than a million years ago in Indonesia, to islands such as Flores and Sulawesi. Modern humans braved treacherous waters to reach Australia by 65,000 years ago. But in both cases, some archaeologists say early seafarers might have embarked by accident, perhaps swept out to sea by tsunamis.

The effective population size of Australian people is just too large for me to imagine that it was only a few individuals swept out on driftwood. There was some sort of sea-going craft which mediated migration to Sahul from Sundaland. Just because we have only recent evidence of sea-going craft doesn’t mean that they weren’t around for tens of thousands of years before that.

I’ve been hearing about Neanderthal tools on islands like Crete, which were never connected with the European mainland, for a while now. It seems that people are finally convinced that this is the real deal, as the stratigraphy came together to confirm dates. One thing that seems obvious from this, as well as Neanderthal “art”, is that the differences between modern humans and Neanderthals were more quantitative than qualitative. Differences of degree, not of kind.

It is hard to deny that modern human expansion between 60 and 15 thousand years ago is sui generis. Hominins didn’t make it to the New World or Sahul, what later became Oceania, until our own kind. There’s also a fair amount of evidence that our lineage pushed the northern frontier of human habitation beyond what Neanderthals ever did. But in the process of marking off our distinctiveness, it seems to me that we’ve overemphasized the differences between us and Neanderthals, and dismissed or ignored evidence of “human-like” “advanced” behaviors from them.

I’ll still go with the prediction that we’ll never find a singular gene which marks us off from other human lineages.

April 21, 2018

There were possibly late archaic introgression events in Eurasia

Filed under: Human Population Genetics — Razib Khan @ 12:14 am

A few weeks ago I posted on the strong likelihood that there were at least two Denisovan admixture events in Eurasia into modern humans. That’s probably the floor, not the ceiling. We have an Altai Denisovan genome, but the proportion is so low in most of South and Southeast Asia I don’t think we have a good grasp of how that component differs from the Oceanian fraction, which is much higher.

At the AAPA meeting last week I noticed something strange in one of the presentations: introgressed Denisovan variants which were present among East Asian populations, but lacking elsewhere. The fractions were not >50%, but they were >10%. The Denisovan variants were nearly absent outside of this core zone of East Asians.

There are two possible reasons for this distribution. One reason is that Denisovan variants were segregating in East Asians for thousands of years, and a common bottleneck, or, more likely selection, drove them up in frequency. Another, not exclusive, explanation is that admixture occurred in East Asia relatively late. The Denisovan signature is totally absent in the New World. Either that’s selection or drift eliminating variation, or, it’s the fact that this admixture event happened in East Asia less than about 30,000 years ago when Native American populations’ East Asian-like source population began to divergence from that of East Asians.

One thing that we know from paleontology is that species exist before the remains we find, and persist after the remains we find. It’s quite possible that small relic populations of Denisovans persisted for thousands of years after modern humans came to dominate the East Asian landscape.

April 20, 2018

So merfolk are a real thing now: adaptation to diving

When Rasmus Nielsen presented preliminary work on diving adaptations a few years ago at ASHG I really didn’t know what to think. To be honest it seemed kind of crazy. Everyone was freaking out over it…and I guess I should have. But it just seemed so strange I couldn’t process it. High altitude adaptations, I understood. But underwater adaptations?

The paper is out now, and open access, Physiological and Genetic Adaptations to Diving in Sea Nomads. There are a lot of moving parts in it, so I really recommend Carl Zimmer’s piece, Bodies Remodeled for a Life at Sea:

On Thursday in the journal Cell, a team of researchers reported a new kind of adaptation — not to air or to food, but to the ocean. A group of sea-dwelling people in Southeast Asia have evolved into better divers.

When Dr. Ilardo compared scans from the two villages, she found a stark difference. The Bajau had spleens about 50 percent bigger on average than those of the Saluan.

Only some Bajau are full-time divers. Others, such as teachers and shopkeepers, have never dived. But they, too, had large spleens, Dr. Ilardo found. It was likely the Bajau are born that way, thanks to their genes.

A number of genetic variants have become unusually common in the Bajau, she found. The only plausible way for this to happen is natural selection: the Bajau with those variants had more descendants than those who lacked them.

As some of you might know “sea nomads” are common across much of Southeast Asia. The Bajau are just one major group. The anthropology here is not surprising…but the biology most definitely is. For various technical reasons, the authors didn’t have extremely fine-grained genome data (high coverage sequence data, or very high-density chips). So they didn’t do some haplotype-based tests (e.g., iHS), though that might not matter anyhow (see below why). But, looking at the genome-wide relatedness and comparing that to makers which deviated from that expectation, both of which they could do robustly, the authors narrowed in on candidates for targets of selection. From the paper: “Remarkably, the top hit of our selection scan (Table 1) is SNP rs7158863, located just upstream of BDKRB2, the only gene thus far suggested to be associated with the diving response in humans.

There are many cases where researchers find selection signals in an ORF of unknown function. In this case, the top hit happens to be exactly in light with the biological characteristic you’re already curious about. The alignment is so good it’s hard to believe.

But wait, there’s more! Spleen size variation is not due to variation on just one locus. It’s polygenic, albeit probably dominated by larger effect quantitative trait loci (QTLs) than something like height (so more like skin color). They compared the Bajau to a nearby population, the Saluan, as well as Han Chinese as an outgroup. On the whole the distribution of allele frequency differences should reflect the phylogeny (Han(Bajau, Saluan)). The key is to look for cases where the Bajau are the outgroup. From the paper:

While some of the selection signals uniquely present in the Bajau may be related to other environmental factors, such as the pathogens, several of the other top hits also fall in candidate genes associated with traits of possible importance for diving. Examples include FAM178B, which encodes a protein that forms a stable complex with carbonic anhydrase, the primary enzyme responsible for maintaining carbon dioxide/bicarbonate balance, thereby helping maintain the pH of the blood….

FAM1788 shows up again later:

We identified one region overlapping chr2:97627143, which falls in the gene FAM178B, that falls in the 99% quantile of the genome-wide distribution for the fD statistic (Martin et al., 2015). Of the populations considered, this region exclusively stands out in the Bajau, and the signal appears strongest when using Denisova as source. Notably, this region was also proposed as a candidate for Denisovan introgression in Oceanic populations by….

What they’re saying here is that the allele at this locus adapted to diving may have come originally from the Denisovans! Remember, we already know that one of the Tibetan high altitude adaptations come from the Denisovans. So this isn’t surprising, but it is pretty cool. But most of the other hits don’t seem to be introgressed. That is, they come from modern humans (or have been segregating in our species for a long, long, time).

Many of the alleles found at high frequencies in the Bajau are found in other populations, just as very low frequencies. This implies that selection is operating on standing variation. Another suggestion that this is so is that the widths of the regions of the genome impacted by selection seem rather narrow. In contrast, the Eurasian adaptation to lactose digestion is from a de novo mutation, something that wasn’t at high frequency at all in the ancestral human populations. The sweep is strong and powerful around that single mutation, and huge swaps of the genome around it “hitchhiked” along so that on a population-wide level the area around the mutational target was homogenized (basically, a lot of one single original mutant human is found around that causal variant for lactase persistence).

Anyone who has learned basic quantitative genetics knows that one way to change a mean trait value is just to change the allele frequencies at a lot of different loci…over time you’ll have a lot of low-frequency alleles present in an individual which would otherwise never have occurred. Eventually, you can have a median value which is outside of the range of the original distribution. The mechanism here in a dynamic sense seems totally comprehensible, though as Carl Zimmer notes, and the rather short-shrift given in the Cell paper suggest, they’re not sure in a proximate sense how the selection is working (i.e., obviously there is a fitness implication but how does it manifest? Do people die? Are they unable to support a family?).

One key issue is to consider the demographic history of these people. The authors tried to model it genetically:

We found a model compatible with the data that has a divergence time of ∼16 kya, with subsequent high migration from Bajau to Saluan and low migration from Saluan to Bajau (for details see STAR Methods). We note that the estimate of 16 kya may reflect the divergence of old admixture components shared in different proportions by the Saluan and the Bajau, similarly to, for example, European populations being closely related to each other but differing in the proportion of ancient admixture components….

The authors cite papers which outline the real story about what happened, so they know that the model is somewhat unrealistic. For example, Ancient genomes document multiple waves of migration in Southeast Asian prehistory:

Southeast Asia is home to rich human genetic and linguistic diversity, but the details of past population movements in the region are not well known. Here, we report genome-wide ancient DNA data from thirteen Southeast Asian individuals spanning from the Neolithic period through the Iron Age (4100-1700 years ago). Early agriculturalists from Man Bac in Vietnam possessed a mixture of East Asian (southern Chinese farmer) and deeply diverged eastern Eurasian (hunter-gatherer) ancestry characteristic of Austroasiatic speakers, with similar ancestry as far south as Indonesia providing evidence for an expansive initial spread of Austroasiatic languages. In a striking parallel with Europe, later sites from across the region show closer connections to present-day majority groups, reflecting a second major influx of migrants by the time of the Bronze Age.

The upshot is that the predominant genetic character of Southeast Asia dates to the Neolithic, and to a great extent even more recently. The deep divergence between two Austronesian groups may be an artifact of drift in one group (probably the Bajau), or different proportions of admixture from the primary ancestral components in maritime Southeast Asia: Austronesian, Austro-Asiatic, and indigenous hunter-gatherer. As per Lipson 2014 the Bajau are probably mostly Austronesian but may have Negrito ancestry from the Phillippines, as well as indigenous hunter-gatherer more closely related to Malaysian Negritos. There probably isn’t so much Austro-Asiatic in Sulawesi, but I’d bet the farmers have more of that.

Ultimately the question here is are the adaptations to diving old or new? Anthropologists and historians have all sorts of theories, as reported in the Carl Zimmer article and hinted at in the paper. My own bet is that they are both old and new. By this, I mean that some sort of maritime lifestyle was surely practiced by indigenous people between the end of the last Ice Age and the arrival of farmers. But if the variation was present in humans more generally, the Austronesians would probably also have the capacity for the diving adaptations. Mixing with hunter-gatherers and another bout of selection could have done the trick in concert. So the adaptations and lifestyle are old, but the Bajau people may date to the last 2,000 years, and selection within this population may be that recent.

A lot of the answer might be found in looking at the other sea nomad groups….

March 31, 2018

The maturation of the South Asian genetic landscape

Filed under: Dravidian,Human Population Genetics,India,Indo-Aryan — Razib Khan @ 9:39 pm


The above is a stylized map from the preprint, The Genomic Formation of South and Central Asia. In broad strokes, it says some things that are very expected, and some things that are not so expected.

The abstract is long, but I’ll reproduce it in full:

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

First Turk Empire

Though the abstract is focused on South Asia, the preprint actually has quite a bit about Inner Asia, because of the provenance of the samples. We often view the typical person in the past as a peasant in an agricultural society, and therefore relatively immobile over their lifetime. The story we like to tell ourselves is that non-elites in premodern societies, on the whole, had narrow horizons, delimited by their home village, or the neighboring network of villages.

But results from this work and others show that mobile populations where individuals spanned vast areas of Eurasia across their lifetimes, were not that uncommon for pastoralists. We know this historically, as empires such as that of the Turks and Mongols were defined by a ruling elite whose writ extended from eastern to western Eurasia. The Sintashta samples, which exhibit genetic heterogeneity, with some individuals very different from the norm in their settlement, is exactly what you’d expect from a social and political culture which was united in some fashion over huge distances.

As the sample sizes for ancient DNA have increased it seems rather clear that demographic dynamics that we see in later historical expansions of Inner Asian polities extends back to the Bronze Age. With expanding populations across the ecologically friendly landscape, the ancient proto-Indo-Europeans seem to have mixed with the local substrate wherever they went, just as Turks did later. As they moved west, they mixed with late Neolithic Europeans, as they went east, they mixed with Siberian populations, and as they conquered south they mixed with descendants of West Asian farmers.

One of the primary aspects that I think one needs to keep in mind is that one can’t just imagine that this was defined by simple diffusion dynamics. Historically the boundary between pastoralists and peasants could be fluid, but when political resistance collapses pastoralists have been able to use their military prowess to swarm across the lands of agriculturalists. In other words, centuries of gradual inter-demic gene flow might be interrupted by a rapid “pulse” admixture. There’s no reason that pre-literate polities couldn’t exist. The Inca were one such example, the homogeneity of the Uruk civilization in the 4th millennium BC is strongly suggestive of an imperial hegemony or paramountcy.

Another dynamic is that pastoralists are highly mobile, and so may leapfrog over territory which is unsuitable. Or, they may move so rapidly that there isn’t much mixing with populations in between point A and point B.

This is apparently the case with the Bactria–Margiana Archaeological Complex. These people were mostly descended from people related to the eastern farmers of West Asia, those in modern day Iran. Some of their ancestry had affinities with Anatolian farmers, and there is some evidence even of Siberian admixture in this region. But there are three important takehomes of this preprint in relation to this area 1) the BMAC did not contribute much genetically to South Asia at all, 2) steppe ancestry, related to that of the Yamna culture of the Pontic region, only shows up in BMAC ~2000 3)  there is actually evidence of South Asian (Indus valley?) migration into the BMAC.

The fact that Yamna-like ancestry shows up in the BMAC region so late is a strong reason to suspect that Indo-Iranian peoples did not move to Iran and India until after 2000 BC. In earlier comments on this issue, I was rather vague about timing, because the Corded-Ware people show up in Europe before 2500 BC, and I was going along with the parsimonious idea that this was part of one single cultural and social revolution.

I was wrong. Going back to the Turkic analogy, there were multiple waves of migration and folk wandering by Turkic pastoralists. By different Turkic groups. One of the major ones occurred due to the rise of the Mongols, and the Mongols were not even Turks. The same seems to be true of Inner Eurasian Indo-European groups.

Moving on to South Asia, there are two primary constructs which come out of this preprint. “Indus Periphery” and “Ancient Ancestral South Indians.” I’ll call the former InPe and the latter is termed AASI. To some extent these complement and replace the earlier terms “Ancestral North Indian” and “Ancestral South Indian” (ANI and ASI). The AASI are the ancient hunter-gatherers of the Indian subcontinent. The authors suggested that divergence of this group from other eastern Eurasians occurred very early, that the division between the ancestors of the Papuans, Onge, and AASI was even polytomic (that basically separated very quickly without discernible structure).

The InPe samples are from eastern Iran and the BMAC. They’re unique in having AASI ancestry, at variable fractions (indicating contemporaneous admixture). They also resemble samples from Swat Valley which date to 1200 BC and later, with one major difference: the Swat Valley samples have steppe ancestry.

There are no samples from the Indus Valley proper, so the authors suggest that the InPe are reasonable proxies. Additionally, they assert that ASI can best be modeled as a mixture between InPe and AASI. In other words, there were two admixture events. Their Pulliyar samples are actually pretty good proxies for the resultant ASI, while the Kalash of Pakistan are good proxies for the ANI, who are presumably now modeled as a mixture of steppe populations with the InPe.

This resolves the enigmatic result that Priya Moorjani reported to me last year: less than 4,000 years ago “pure” ANI and ASI people existed. She was presumably going off admixture timing estimates. These results suggest that in some form ANI and ASI still exist, and the first admixture occurred with the creation of InPe.

Using a new method the authors contend that InPe emerged 4700-3000 BC. If this is true then the Indus Valley Civilization (IVC) was a compound of AASI and Iranian agriculturalists (sampled from the eastern end of the cline of admixture with Anatolians, that is, they had none of that ancestry). They also post the first arrival of agriculture to Mehrgarh by 2,000 years at the least. I suspect that it will turn out there were earlier admixtures, which are not being detected. For various ecological reasons the West Asian cultural complex was portable only to the northwest fringe of South Asia, and there it persisted for ~4,000 years. This served as a natural eastern limit for cultures which were migrating out of the West Asian zone, and a point where AASI hunter-gatherers constantly mixed into the local population.

As the IVC sites begin to get sampled in the future I predict that instead of a homogeneous transect of admixture over time and space we’ll see a lot of heterogeneity.

In the Swat samples, the authors see two correlated trends, an increase in steppe ancestry, and an increase in AASI ancestry. No doubt this dates to the “great admixture” which occurred between 2000 BC, and some time before 1000 AD (the Bengali admixture with East Asians dates to between 0 and 1000 AD, as does that of Brahmins who left the North Indian plain and mixed with local populations elsewhere).

Finally, the authors detect a skew toward steppe ancestry among some populations, in particular, Brahmins. The skew is in relation to Iranian farmer ancestry, the two being the primary constituents of ANI ancestry. In Who We Are and How We Got Here David Reich says some of the ANI admixture is much more recent than the rest, judging by tract length. And also going by the BMAC and Swat samples it seems that the time period for when Indo-Aryans arrived in South Asia has to be in the interval between 2000 BC and 1200 BC.

There’s another aspect of the preprint which allows for dating. The arrival of Austro-Asiatic people in South Asia probably has to postdate the expansion of the same group in Vietnam about 4,000 years ago (though not necessarily obviously). But the Munda Austro-Asiatic people of northeast India exhibit curious genetic patterns. They clearly have East Asian ancestry related to other Austro-Asiatic populations in Southeast Asia, but they have a lot less “West Eurasian” in their ANI/ASI mix. The authors resolve this by suggesting that the Munda arrived in South Asia when there was still heterogeneity among the ASI, and unadmixed AASI.

After 2000 BC the IVC went into decline. Various groups of Indo-Aryans were expanding and admixing. From the other end of the subcontinent arrived rice cultivators from Southeast Asia. At some point, they ran into an ASI population that had some Iranian admixture, but not as much as typical. All of this probably occurred in the period between 2000 BC and 1000 BC. I know that some researchers have argued that the Gangetic plain was inhabited by Munda speaking peoples before it was inhabited by Indo-Aryans. The main issue I’ve had with this is that modern Munda peoples are very genetically distinctive, and there’s no evidence of East Asian ancestry in most populations of the Gangetic plain (the main exceptions are those which have experienced Tibetan influence/contact).

So here is my interpretation of the genetic and historical evidence:

1) IVC emerges out of a matrix that was a synthesis of West Asian farmers and indigenous hunter-gatherers. I would not be surprised if later genetic work recapitulates the findings in Europe of an initial period of separation, and then a “resurgence” of indigenous ancestry as the barriers between the two groups break.

2) The period between 2000 BC and 1000 BC is the beginning of the transformation of the South Asian genetic and ethnolinguistic landscape, with the intrusion of two different groups from different directions, Indo-Aryans to the west and Austro-Asiatics from the east. Austro-Asiatic rice culture was superior to western wheat culture because rice is more delicious than wheat, but the Indo-Aryans ultimately established cultural supremacy across South Asia by the Iron Age.

3) The situation in South India is more complicated and confused. The admixture of groups like Pulliyar from InPe and AASI into the classic ASI configuration seems to be more recent than 2000 BC (their low bound dates go as late as 400 BC). The admixture may have occurred in various places, not just in South India. The evidence from this paper suggests that the Andronovo/Sintashta cultural zone was characterized by some genetic heterogeneity due to variation in admixture with neighboring peoples, and the same could be said for the IVC then. I would not be surprised if northern IVC locations had more AASI than southern IVC, as the latter were more insulated from the east due to the Thar desert (the results are consistent with earlier work that suggest modern populations in the lower Indus basis have less Indo-Aryan and more Iranian, with less AASI).

4) We need to be careful about assuming that everything here is a linear combination of distinct and separable atomic units of cultural integrity and wholeness. What I mean is that though Brahmins and some other North Indian groups are enriched for steppe ancestry, it is not only their purview. Rather, it may be that these upper caste groups simply mixed less with the other populations with Iranian and AASI ancestry. The statistics in this paper do not detect enrichment of steppe ancestry in South Indian Brahmins. I believe this is simply an artifact of the reality that South Indian Brahmins mixed with Iranian-enriched elites, like Reddys, when they emigrated to the south.

Though the model outlined in the preprint is much more complicated than a simple ANI/ASI mix, it still simplifies the demographic histories of many populations. For example, own survey of the data suggests that Brahmins who left the Indo-Gangetic plain mixed with local elites wherever they went (Bengali Brahmins have East Asian ancestry, just as South Indian Brahmins have more Iranian-like ancestry).

5) Language is important but is not determinative. R1a1a-Z93 arrived in South Asia relatively late with groups from the steppe. Its frequency is highest in the northwest, and among upper castes. That is, it is correlated in a coarse manner to steppe ancestry. But R1a1a-Z93 is pervasive throughout South Asia irrespective of caste and region. Even in Dravidian speaking southern populations, some groups have quite a bit of R1a1a-Z93.

The analogy that presents itself here is Southern Europe, where some groups with high frequencies of R1b, such as the Basques and Sardinians, are clearly descended in the main from pre-steppe populations. What this suggests is that a broad social-culture prestige network mediated by males extended itself into regions where its cultural hegemony was not assured. Additionally, the autosomal genetic impact was modest, even if privileges given to particular male lineages allowed them to sweep other groups out of the gene pool.

Tamil history precipitates out only a little later than that of North Indian Indo-Aryan civilization. I suspect that this is not a coincidence, that South Asia after the collapse of the IVC and the arrival of the Indo-Aryans and Mundas, could be thought of as a brought mixing cauldron genetically and culturally. In many regions, Dravidian languages persisted in the face of the expansive Indo-Aryan, but there was a cultural influence, likely reciprocal. This is why once Indian civilization reemerged its coherent unity set against peoples to the west and east was not strange despite the linguistic gap between the north and the south.

The only exception here might be the Munda. As I have said, R1a1a-Z93 is pervasive. But it is nearly unfound among the Munda, who tend to carry relatively exotic Southeast Asian Y lineages such as O. I believe that the Munda were in some way losers in a cultural conflict, but they maintained themselves in the hills above the Gangetic plain.

Finally, two reflections, one navel-gazing, one big picture. Genome bloggers in the years around 2010 actually anticipated many of these results. There’s some hindsight bias here because you remember the times you are right and not the times you were wrong. We were right that there was more than one ANI pulse. Additionally, we were looking at the ratio between “Eastern European” and “West Asian” ancestry years ago and noticing the skewed patterns, with North Indian Brahmins biased toward the former and South Indian elite non-Brahmins skewed toward the latter. Chaubey 2010 suggested to us that something was different about the Munda not only in their East Asian ancestry but in their ANI/ASI ancestry. They just didn’t seem to have any Indo-European ancestry (steppe), and a lot of ASI. Over the past few years I’ve been suggesting that Dravidian languages were not primal to South India, but the product of a recent expansion (though part of this is due to scientific publications).

The truth was out there. It just took ancient DNA and the analytic chops of the Reich group and their collaborators to prune the tree of possibilities so that we could zero in on a few precise and likely models.

In the general, I wonder about the role of clines, diffusions, and pulses. The models that the foremost practitioners of the science of ancient DNA utilize tend to assume pulse admixtures, rather than isolation-by-distance gene flow. This isn’t always a crazy assumption. But there was a discussion in the paper of a west-east admixture cline between Anatolian farmers and Iranian farmers. Is this cline due to admixture, or was it always there? A paper from a few years ago implied that early farmers were highly structured, structure that broke down later.

Also, the polytomy at the base of the eastern Eurasian human family tree, where all the major lineages diverge rapidly from each other, makes me wonder about gene flow vs. admixture. It seems possible that the polytomy may mask a phylogenetic tree topology which had gradually bifurcating nodes, if periodically a single daughter population replaced all its sister lineages in a local geographic zone. Much of history in human meta-populations may be characterized by isolation-by-distance and gene flow, erased by the extinction of most lineages and expansion of a favored lineage.

March 29, 2018

We’re descended from Lilith and Eve

Filed under: Human Population Genetics — Razib Khan @ 5:13 pm

From the comments:

Something that confused me very early on in the book- the San are shown branching off from the rest of humanity prior to Mitochondrial Eve. How can Eve be a common ancestor in this case? Admixture?

The commenter is talking about an early portion of Who We Are and How We Got Here. Someone who reads a book like that is “in the know,” and this is a reasonable question. But it points to a bigger issue that’s going to crop up with the complexificaiton of the origin of anatomically modern humanity over the last few years, and proceeding forward.

An upside of the very-recent-out-of-Africa model, where all modern humans descended exclusively from a group of East Africans who lived ~50,000 years ago, is that it was very simple. So simple that you could write the model out on a postcard.

The new model benefits from being correct and making humans less sui generis (though perhaps that is a bug rather than a feature to some?), but it also forces more thought and complexity on the lay audience.

Calibration on the coalescence of the last common ancestor of all mitochondrial DNA lineages for humans has changed several times, the last estimates are for a time to last common ancestor for all mtDNA lineages being around 100 to 200 thousand years ago. This is curious in light of the fact that both fossils and genomics are starting to suggest that anatomically modern humans emerged in their current form 200 to 400 thousand years ago.

The shallower coalescence isn’t that surprising. Y and mtDNA both have lower effective population sizes and so higher turnover rates. These high turnover rates mean the extinction of other lineages. As most of you know, the extinction of these mtDNA lineages does not mean that the genetic material of other women alive at the same time as “mtDNA Eve” is not present in modern humans (though who knows what it means to say there’s distinctive genetic material left after all these generations with recombination). Eve was always simply a personification of the coalescence of the mtDNA genealogy. Both the Y and mtDNA phylogenies and coalescence were useful in their time. They pointed to the likely important role of Africa in the origin of modern humans, and the relatively recent time depth of our species. But their coalescence at a specific time was somewhat random around a certain expected value. This is why it was not surprising at all that “Y chromosomal Adam” and “mtDNA Eve” lived at different times (there is some evidence that the Y chromosome has had a lower long-term effective population size).

The above question is inspired by the fact that San Bushmen seem to diverge earlier in their total genome than in their mtDNA. There’s always been a distinction in the literature between demographic divergence between two populations, and the divergence of their genetic genealogies. Oftentimes daughter populations share genetic variation that dates back to before their separation. But sometimes, you have this situation where it seems that the starting point of genetic variation post-dates the divergence between population.

What’s the explanation? I think the simplest one is admixture and reciprocal gene flow, as implied by the commenter. In fact, Pontus Skoglund’s latest African ancient DNA paper implies that there was some sort of isolation-by-distance cline in the eastern part of the continent, from modern Ethiopia far to the south.

And, it may also turn out that the San Bushmen themselves are an admixture between two very different populations, one more like other eastern Africans, and one basal to this clade. If so, then it may be that their divergence estimate is a compound, and the most divergent mtDNA lineages come from the eastern African population that mixed with the more basal population.

The bigger answer is that we really need to move beyond the “mitochondrial Eve” story as being central. It had its time and played its role, but we can move beyond it. Otherwise, the public will be in for a big surprise as ancient DNA starts to uncover the story of a whole antediluvian world within Africa of anatomically modern humans that flourished for hundreds of thousands of years before a small branch left to populate the rest of the world ~50,000 years ago.

March 28, 2018

One the eons of salutary neglect

Filed under: Human Population Genetics — Razib Khan @ 12:35 am


New preprint, Something old, something borrowed: Admixture and adaptation in human evolution. This part jumped out at me:

…Indeed, for most traits, the contribution of archaic human alleles to present-day human phenotypic variation is not significantly larger than those of randomly drawn non-introgressed alleles occurring at the same frequency in modern humans. Interestingly, in both studies, neurological and behavioral phenotypes are an exception, with Neanderthal alleles contributing more to variation in these traits than frequency-matched modern human alleles.

I joked that perhaps we can talk about people “acting like a Neanderthal” again?

But seriously, I was thinking today about one particular stage of human evolutionary history, the long sojourn outside of Africa for the ancestors of non-Africans (including “Basal Eurasians”) which produced a sustained bottleneck. In David Reich’s new book he alludes to it, and I’ve seen other mentions of it (this is an old idea).

How long was the bottleneck? What was the normal census size? What were the cultural implications of having a small isolated population?

The PSMC and MSMC diagrams I’ve seen don’t really answer my questions.

March 20, 2018

Carl Zimmer profile of the Reich group at work

Filed under: Human Population Genetics — Razib Khan @ 12:34 am

The New York Times has a review up (sort of) of Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, David Reich Unearths Human History Etched in Bone. But since Carl has been covering the publications coming out of the Reich lab for many years now it’s kinds of a survey of the whole operation and how David and company go where they are.

The last few paragraphs are pretty tantalizing:

As of last month, Dr. Reich’s team has published about three-quarters of all the genome-wide data from ancient human remains in the scientific literature. But the scientists are only getting started.

They also have retrieved DNA from about 3,000 more samples. And the lab refrigerators are filled with bones from 2,000 more denizens of prehistory.

Dr. Reich’s plan is to find ancient DNA from every culture known to archaeology everywhere in the world. Ultimately, he hopes to build a genetic atlas of humanity over the past 50,000 years.

“I try not to think about it all at once, because it’s so overwhelming,” he said.

Three years ago I was having a discussion with someone from Reich’s group and mentioned offhand that in terms of getting data I give the nod to Eske Willerslev’s group of researchers, though I thought the people around David and Nick Patterson tended to perform a deeper analysis. Three years is a long time, and as the results since then have shown, the “SNP capture” methodology is very cost effective. They might not get the whole genome sequences of individuals, but they get lots of individuals. And for a lot of population genomic analysis, you want lots of individuals more than the whole genome sequence.

But not all. The more ancient individuals probably have a lot of variation “private” to them and their population, so you don’t know all the neat polymorphisms you might miss.

With that gripe submitted, it’s pretty incredible that the Reich lab has 3,000 ancient samples in the pipeline for analysis. In Who We Are and How We Got Here David Reich outlines just how he and his collaborators transformed the artisanal process of data generation from ancient DNA into a rationalized and commoditized factory process.

March 11, 2018

Turks are Anatolian under the hood, somewhat more Greek than Armenian

Filed under: Armenians,Greeks,Human Population Genetics,Turks — Razib Khan @ 11:40 pm

My post, Are Turks Armenians Under The Hood?, attracted a little bit of controversy. The main criticism, which was a valid one, is that I did not sample Anatolian Greeks. A reader passed on three Anatolian Greek samples. I also added a Cypriot data set. To my mild surprise, the Anatolian Greeks and Cypriots cluster together, at the end of the Greece cline toward West Asians. Therefore, for further analysis, I pooled the three Greeks with the Cypriots.

Additionally, there are two Balkan Turk samples. Even on the PCA it’s pretty clear that they’re genetically very different from the other Turks (one of them is from what has become Bulgaria), though the shift toward East Asians indicates that Turkification is very rarely a matter purely of religious conversion to Islam and assimilation of the Turkish language (obviously it initially is for many people, but these people then intermarry with those with some East Asian ancestry).

One of the major problems is that the Armenian sample and the Anatolian Greek/Cypriot sample are genetically very close. This is obvious in the Fst distance. This is also totally reasonable since both populations occupy Anatolia, and historically there would have been a lot of gene flow between the two groups through isolation-by-distance dynamics.

The Turk position closer to East Asians is due to their East Asian admixture.

You can see it in the admixture plot too. As we all know there is definitely some northern admixture in the mainland Greeks. I haven’t bothered to check with the Mycenaean paper, but I assume that some of this is due to the migration of Slavs after much of the Balkans was abandoned after the reign of Maurice.

Of course, I ran Treemix too. Again, the closeness of the Anatolian Greeks/Cypriots and the Armenians is an issue in making a definitive conclusion.

In terms of drift the Turks seem about as far from Anatolian Greeks as Armenians. There’s the gene flow you’d expect, there are two from East Asians to Turks. I think that’s due to the East Asian source being somewhat heterogeneous, and the Dai outgroup not modeling the source populations perfectly.

Finally, there’s the f3 statistics. They basically show what I’m saying above: Armenians and Anatolian Greeks are both good model sources for Turks. The likely truth is that there is gene flow from all across Anatolia into these Turkish samples.

Group X1 X2 f3 z
Turkey anatolian_cypriot Dai -0.0029 -36.3940
Turkey Armenians Dai -0.0026 -34.2083
Turkey Greece3 Dai -0.0025 -32.7389
Turkey Georgian Dai -0.0026 -30.8836
Turkey Greece2 Dai -0.0024 -29.5462
Turkey GreekCentral Dai -0.0025 -23.6454
Turkey Greece1 Dai -0.0026 -23.0283
Turkey GreekThessaly Dai -0.0024 -20.1595
Turkey Armenians Lithuanians -0.0005 -11.5356
Turkey Lithuanians Dai -0.0012 -10.7473
Turkey Georgian Lithuanians -0.0004 -7.9691
GreekThessaly Armenians Lithuanians -0.0006 -6.6390
GreekThessaly anatolian_cypriot Lithuanians -0.0006 -6.3748
GreekThessaly Greece3 Lithuanians -0.0005 -4.6347
Greece2 anatolian_cypriot Lithuanians -0.0008 -15.7114
Greece2 Armenians Lithuanians -0.0008 -14.4083
Greece2 Greece3 Lithuanians -0.0005 -10.4508
Greece2 Georgian Lithuanians -0.0005 -8.2727
Greece1 anatolian_cypriot Lithuanians -0.0006 -6.5563
Greece1 Armenians Lithuanians -0.0005 -6.3712
Greece1 Greece3 Lithuanians -0.0004 -4.7896
Greece1 Georgian Lithuanians -0.0003 -3.1104
balkan_turk Greece1 Dai -0.0024 -7.1425
balkan_turk GreekThessaly Dai -0.0021 -6.1764
balkan_turk GreekCentral Dai -0.0021 -5.7848
balkan_turk Greece2 Dai -0.0019 -5.6794
balkan_turk anatolian_cypriot Dai -0.0019 -5.5944
balkan_turk Armenians Lithuanians -0.0014 -5.3207
balkan_turk Greece3 Dai -0.0017 -5.0815
balkan_turk Lithuanians Dai -0.0017 -5.0190

March 9, 2018

Demographic replacement in Southeast Asia during the Holocene

Filed under: Human Population Genetics,Southeast Asia — Razib Khan @ 12:15 am

Well sometimes you feel silly, and it’s not your fault. Yesterday our podcast on Sundaland went live (we talked about Doggerland and Beringia too!). Though I expressed a fair amount of skepticism, I took the argument that Stephen Oppenheimer presented in Eden of the East, that modern Austronesians are long-term residents of Southeast Asia, seriously.

The alternative view, most forcefully put by Peter Bellwood in books such as First Farmers, is that Austro-Asiatic and Austronesian people were agriculturalists issuing out of southern China that transformed the region over the past 4,000 years (the Austronesians from Taiwan specifically, though during the Pleistocene Taiwan was connected to the mainland).

I lean toward Bellwood’s view, and today a preprint came out which basically confirms it in totality, Ancient Genomics Reveals Four Prehistoric Migration Waves into Southeast Asia. The abstract:

Two distinct population models have been put forward to explain present-day human diversity in Southeast Asia. The first model proposes long-term continuity (Regional Continuity model) while the other suggests two waves of dispersal (Two Layer model). Here, we use whole-genome capture in combination with shotgun sequencing to generate 25 ancient human genome sequences from mainland and island Southeast Asia, and directly test the two competing hypotheses. We find that early genomes from Hoabinhian hunter-gatherer contexts in Laos and Malaysia have genetic affinities with the Onge hunter-gatherers from the Andaman Islands, while Southeast Asian Neolithic farmers have a distinct East Asian genomic ancestry related to present-day Austroasiatic-speaking populations. We also identify two further migratory events, consistent with the expansion of speakers of Austronesian languages into Island Southeast Asia ca. 4 kya, and the expansion by East Asians into northern Vietnam ca. 2 kya. These findings support the Two Layer model for the early peopling of Southeast Asia and highlight the complexities of dispersal patterns from East Asia.

The transition to full-fledged rice agriculture occurred in Vietnam ~4,000 years ago. In First Farmers Bellwood reports on an archaeological site dating to that period where skeletal evidence has been adduced to record the presence of both Northeast Asian and Australo-Melanesian types. These results make clear though that these hunter-gatherers in Southeast Asia are more similar to the Onge of the Andaman Islands, as well as the Negritos of the interior of the Malay peninsula. They’re totally in alignment with the earlier morphological results (also, readers might be curious to know that one site of the Hoabinhian culture is in Yunnan, China). This shouldn’t be surprising, as the Andaman Islands were a peninsula which extended from southern Burma during the Pleistocene.

Already the most accepted model for the introduction of intensive agriculture into Southeast Asia is that it was brought by Austro-Asiatic peoples. These results confirm that. Additionally, it seems clear that Austro-Asiatic ancestry made it to island Southeast Asia, whether directly or through Austronesian admixture before arriving in island Southeast Asia. Java and Bali have some of the higher fractions ancestries most closely associated with Austro-Asiatic groups on the mainland.

Deeper digging into the admixture distributions has long made it pretty evident that some areas had much higher Austronesian fractions in Indonesia than others, and it wasn’t just a function of distance from the Phillippines. Why? My own hunch is that Austronesians brought social and cultural systems which were better adapted to island Southeast Asia, and were more fully able to exploit the local ecology. Meanwhile, aside from a few fringe areas such as the Malay peninsula and coastal Vietnam, they were not successful on the mainland.

The authors also detect migrations into Southeast Asia besides that of the Austro-Asiatics and Austronesians. One element seems correlated with the Tai migrations, and another with Sino-Tibetan peoples, most clearly represented in Southeast Asia by the Burmans. The excellent book, Strange Parallels: Volume 1, Integration on the Mainland: Southeast Asia in Global Context, c.800–1830, recounts the importance of the great migrations of the Tai people into Southeast Asia ~1000 A.D. Modern-day Thailand was once a flourishing center of Mon civilization, an Austro-Asiatic people related to the Khmers of Cambodia. The migrations out of the Tai highlands of southern China reshaped the ethnography of the central regions of mainland Southeast Asia. The Tai also attempted to take over the kingdoms of the Burmans. Though they failed in this, the Shan states of the highlands are the remnants of these attempts (tendrils of the Tai migrations made it to India, the Ahom people of Assam were Tai). Vietnam, shielded by the Annamese Cordillera, came through this period relatively intact. It is also well known that Cambodia’s persistence down to the present has much to do with the shielding it received from France in the 19th century in the wake of Thai expansion.

There are two bigger issues that this paper sheds light on. One is spatial, and the other is temporal.

They detect shared drift between Austro-Asiatic people and tribal populations in northeast India. This is not surprising. A 2011 paper found that Munda speaking peoples, whose variant of Austro-Asiatic is very different from that of Southeast Asia, are predominant carriers of Y chromosome O2a. This is very rare in Indo-European speaking populations, and nearly absent in Dravidian speaking groups. Additionally, their genome-wide patterns indicate some East Asian admixture, albeit a minority, while they carry the derived variant of EDAR, which peaks in Northeast Asia.

One debate in relation to the Munda people is whether they are primal and indigenous, or whether they are intrusive. The genetic data strongly point to the likelihood that they are intrusive. An earlier estimate of coalescence for O2a in South Asia suggested a deep history, but these dates have always been sensitive to assumptions, and more recent analysis of O2a diversity suggests that the locus is mainland Southeast Asia.

Now that archaeology and ancient DNA confirm Austro-Asiatic intrusion into northern Vietnam ~4,000 years ago, I think it also sheds light on when these peoples arrived in India. That is, they arrived < 4,000 years ago. As widespread intensive agriculture came to Burma ~3,500 years ago, I think that makes it likely that Munda peoples arrived in South Asia around this period.

I now believe it is likely that the presence of Austro-Asiatic, Dravidian, and Indo-Aryan languages in India proper was a feature of the period after ~4,000 years ago. None of the languages of the hunter-gatherer populations of the subcontinent remain, with the possible exception of isolates such as Nihali and Kusunda.

The temporal issue has to do with the affinities of these peoples, and how they relate to the settling of Eastern Eurasia. All the Southeast Asian groups after the original Australo-Melanesians share more of an affinity with the Tianyuan individual than Papuans. The implication here is that Tianyuan is closer to the ancestors of various agriculturalists in Southeast Asia than just some random basal Eastern Eurasian. But, since Tianyuan dates to 40,000 years ago, and, is from the Beijing region, it is hard to make strong inferences from comparisons with only it. The heartland of ancient Chinese culture in Henan was to the south of the Tianyuan, after all. More samples are needed before one can truly tease out the pattern of isolation-by-distance vs. admixture that led to the emergence of the proto-farmer populations which settled Southeast Asia.

In the podcast above one thing that came up is that a lot of genetic data indicate decreased diversity as one moves from the south to the north in East Asia. This has long been taken to mean that humans migrated north, and so were subject to bottleneck effects. I pointed out that this may simply be a consequence of admixture between two very different groups of people in Southeast Asia, elevating diversity statistics.

And yet as the map at the end of the preprint suggests it is highly plausible that Pleistocene Asia was marked by a south to north dynamic of migration. The Austro-Asiatic peoples who migrated south during the Holocene may simply have been backtracking the migration of their ancestors. What these results, and ancient DNA more generally, tell us is that humans were often on the move. The Pleistocene world of climate change probably meant that humans had to be on the move.

February 15, 2018

White modern Northern Europeans are genetically more like brown South Asians than brown(ish) ancient Northern Europeans were

Filed under: Human Population Genetics — Razib Khan @ 8:01 am

The Guardian has a piece by Arathi Prasad, Thanks to Cheddar Man, I feel more comfortable as a brown Briton. Dr. Prasad is a geneticist, so the science is pretty decent (she’s probably seen the documentary ahead of time too).

But there is a curious quirk here and it reveals something about human psychology: modern Britons are genetically much closer to South Asians, like Arathi Prasad, than these ancient darker-skinned Britons. The plot to the left illustrates this (it’s using the Dystruct package). The far right of the top panels represent South Asians. You can see Europeans pretty clearly. Let’s note two things:

1) Modern Europeans (except for Sardinians) share an orange “steppe” component with most South Asians (these are no doubt Indo-European migrations of the Bronze Age)

2) The brown element represents European hunter-gatherers. This element is found at varying quantities across Europe, with the lowest fractions in Sardinians. Though present in South Asians (this may or may not be an artifact to be honest), it’s not present at very high frequencies.

One always has to be careful about taking these proportions as literal representations of ancestral populations. They are not. But what they show is that modern Northern Europeans and South Asians have been touched by the same population movements over the past 5,000 years, and so are genetically much closer than the people who lived in Northern Europe and South Asia 5,000 years ago.

Humans are a visual species. In a pre-modern environment, physical cues were important for group identity, though I suspect just as much due to scarification and tattooing as phenotypic differences due to biology. The fact that Cheddar Man, and Paleolithic hunter-gatherers in Western Europe more generally, probably resembled modern South Asians more than they do modern Northern Europeans (I think they were more likely to be olive-brown than dark-brown, but I’m not confident), is more salient to human folk biology than the fact that modern Northern Europeans are much closer genetically to South Asians than the more “brown” ancient Northern Europeans.

Stuff like this always reminds me of the deep wisdom in Artur C. Clarke’s Childhood’s End. The ultimately benevolent alien species which mentored humanity shielded us from their physical appearance because the knew we’d find it horrifying. The substance of what they did for us, who they were, was going to be less important to immature humans than the fact of what they looked like.

Note: Fst between Sindhi from Pakistan and WHG (Cheddar Man was one) is 0.087. Sindhi from Pakistan and English is 0.023. English to WHG is 0.058 (source). Fst can not be naively interpreted as “genetic distance.” But, this gets at the fact that Mesolithic European hunter-gatherers were very distant from modern South Asians. And widespread gene flow and admixture over the past 5,000 has compressed a lot of genetic differences which were starker across geography in the past.

Older Posts »

Powered by WordPress