Razib Khan One-stop-shopping for all of my content

February 3, 2021

So many assumptions about Africa

Filed under: Human Population Genetics — Razib Khan @ 12:47 pm


I have been staring and this figure and rereading Ancient West African foragers in the context of African population history. The Shum Laka sample from this paper, dating to four to eight thousand years ago, have drawn my attention, and I’m just looking at them a lot.

It seems ridiculous I’ve been using Nigerians as my “African reference” for decades. Most African populations, including Pygmies and Khoisan, have Eurasian admixture from the last 10,000 years. And what about deeper back-to-Africa ancestry? That seems likely and is hinted at in the above paper.

Modern human lineages have a deep history in Africa and the Near East. I think we’re going to have a transformation of our understanding of what happened in these regions in the near future.

January 28, 2021

Got milk long before genes for milk

Filed under: Human Population Genetics,Lactose,Lactose tolerance — Razib Khan @ 9:56 am


The story of lactase persistence (“lactose tolerance”) evolving is one of the best gene-culture coevolution stories we had. Arguably it was the canonical example. The story was simple, multiple times humans took up dairy-culture, and multiple times humans changed so that they could digest lactose, milk sugar, into adulthood. This is about 30% of the caloric intake of raw milk (the rest being fat and protein). For some people their gut flora reacts negatively to the sugar bath if it’s not digested, leading to discomfort in addition to wasted calories.

In the 2000’s several mutations were discovered around LCT, the gene responsible for producing lactase, which breaks down lactose. One mutation was found across Europe and Central Asia. Another among the Arabs. And Another in East Africa. The “mutational target” was big. The mutation in the European and Central Asian variant breaks a regulatory element that represses the expression of LCT in adults. There are lots of ways to break something. Lactase persistence isn’t really a gain of function, it’s just never shutting off the function, which itself is a feature, not a bug.

The haplotype around LCT is long and indicative of a really strong sweep in Europeans. It was in some ways a positive control for tests of selection.

The problem is that there are now major problems with this narrative. In short, dairy-culture predates the increase in frequency for lactase persistence alleles by thousands of years. The ancient DNA transects in Europe are so good that it seems pretty clear that the frequency was way lower during the Iron Age, and didn’t reach “modern” levels until the historical period.

The same is now known to be true in Africa: Humans were drinking milk before they could digest it.

This doesn’t mean that these mutations have nothing to do with milk. But there needs to be a rethink of the selection story. Perhaps there was a genetic modifier that spread recently which isn’t a big mutational target, and that’s why the lactose digestion alleles rose in the last 3,000 years? I don’t know. No one really does.

December 27, 2020

What was the population of the Americas in 1492?

Filed under: Ancient DNA,Human Population Genetics — Razib Khan @ 12:04 pm

Several people have asked me about the new study on ancient DNA in the Caribbean, A genetic history of the pre-contact Caribbean. There is a lot to this paper, some of which is outside of my purview (e.g., I don’t know anything about the archaeology of this region so can’t interpret the genetic results well). One of the major things they did was establish patterns of relatedness. This seems like a major step forward in terms of future applicability to ancient DNA.

But the biggest thing that jumped out at me had to do with effective population size. Carl Zimmer’s write-up highlights this issue:

The genetic variations also allowed Dr. Reich and his colleague to estimate the size of the Caribbean society before European contact. Christopher Columbus’s brother Bartholomew sent letters back to Spain putting the figure in the millions. The DNA suggests that was an exaggeration: the genetic variations imply that the total population was as low as the tens of thousands.

This matters because it starts to change our sense of revisionism (now orthodox?) in books such as 1491: New Revelations of the Americas Before Columbus. To reconcile the small numbers of indigenous people by the 16th century in the Caribbean the hypothesis that there were mass die-offs due to disease, or, the Spanish were inordinately cruel (“The Black Legend”). These results suggest that the scale of the pandemic shock was less of an issue since the baseline number of native peoples is lower in the area.

What does this imply for the rest of the New World? I don’t know. But perhaps the huge census sizes argued for by some scholars won’t hold? It probably depends on the region. But with enough ancient DNA, the same sort of analyses could be replicated.

December 7, 2020

The Greeks in the mountains

Filed under: Human Population Genetics — Razib Khan @ 11:15 am

The New Yorker has a long feature that explores the strange results from the paper last year, Ancient DNA from the skeletons of Roopkund Lake reveals Mediterranean migrants in India. Basically, they found a bunch of Indians who died 1,000 years ago, and, a bunch of Greeks who died a few centuries ago. They were buried naturally in a very isolated lake high in the Himalayas. There are all sorts of hypotheses regarding the Greeks, whose bones indicate a Mediterranean diet, and the closest match to individuals in Crete. My personal experience is that “mainland Greeks” tend to be a bit Northern European shifted, so these individuals may have been Anatolian or Aegean Greeks.

Stuart Fidel, who sometimes comments on this weblog, suggests these were Armenian traders. But David Reich correctly points out Armenians are very distinct genetically from Greeks (though the two are not entirely different obviously!). Another hypothesis is a bone mix-up, but the issue here is there are a lot of individuals who are of the same population and seem to have lived in the same region. How could bone mix-ups produce so many systematic errors?

Ultimately there’s no final answer in the piece, though hopefully, someone will present a reasonable conjecture.

Because the piece has Reich and his lab spotlighted, they allude to the controversy around him. This is ultimately going to be the legacy of the hit-piece from a few years back. He’s now a “controversial figure,” which is, to be frank not a bad thing in the eyes of some of the Reich lab’s scientific rivals. Most media treatments that aren’t purely about his research (i.e., Carl Zimmer’s column in The New York Times covering the Reich lab publications) will mention this now.

Here’s why he’s a mensch:

Still, some anthropologists, social scientists, and even geneticists are deeply uncomfortable with any research that explores the hereditary differences among populations. Reich is insistent that race is an artificial category rather than a biological one, but maintains that “substantial differences across populations” exist. He thinks that it’s not unreasonable to investigate those differences scientifically, although he doesn’t undertake such research himself. “Whether we like it or not, people are measuring average differences among groups,” he said. “We need to be able to talk about these differences clearly, whatever they may be. Denying the possibility of substantial differences is not for us to do, given the scientific reality we live in.”

This is, in 2020, is an old-fashioned view. There are now young American researchers who frankly express disquiet and discomfort at the idea of studying human population genetic variation, period.  Including people who themselves have studied topics such as polygenic adaptation in humans. This would be a very strange view for older researchers, but it’s not totally out of the norm today, so expect someone like Reich to be viewed as quite the dinosaur in a decade. It seems ridiculous to say, but I do wonder if we’re seeing the end of the “humans as a model organism” era. Lots of ppl are not happy with the new atmosphere, but lots of people just keep quiet and go along.

November 24, 2020

Whole genomes of ancient farmers and hunter-gatherers

Filed under: Ancient DNA,Human Population Genetics — Razib Khan @ 10:29 am

A new preprint uses about a dozen ancient genomes to create a model of the origins of Europeans and European farmers more precisely. The big deal here is that they aren’t relying on the same old SNP-array, but using the whole genome. This allows for some more explicit model-building and testing. I do think explicit model creation is something that needs to be done. A lot of the work today is data-first, and there needs to be more “theory”.

The mixed genetic origin of the first farmers of Europe:

While the Neolithic expansion in Europe is well described archaeologically, the genetic origins of European first farmers and their affinities with local hunter-gatherers (HGs) remain unclear. To infer the demographic history of these populations, the genomes of 15 ancient individuals located between Western Anatolia and Southern Germany were sequenced to high quality, allowing us to perform population genomics analyses formerly restricted to modern genomes. We find that all European and Anatolian early farmers descend from the merging of a European and a Near Eastern group of HGs, possibly in the Near East, shortly after the Last Glacial Maximum (LGM). Western and Southeastern European HG are shown to split during the LGM, and share signals of a very strong LGM bottleneck that drastically reduced their genetic diversity. Early Neolithic Central Anatolians seem only indirectly related to ancestors of European farmers, who probably originated in the Near East and dispersed later on from the Aegean along the Danubian corridor following a stepwise demic process with only limited (2-6%) but additive input from local HGs. Our analyses provide a time frame and resolve the genetic origins of early European farmers. They highlight the impact of Late Pleistocene climatic fluctuations that caused the fragmentation, merging and reexpansion of human populations in SW Asia and Europe, and eventually led to the world’s first agricultural populations.

The supplements are worth reading too. It’s all there.

No mention of Basal Eurasians. The last author told me on Twitter that they weren’t needed, but Iosif Lazaridis (also on Twitter) disagrees, naturally.

November 16, 2020

The great southern displacement in East Asia

Filed under: East Asian,Human Population Genetics — Razib Khan @ 2:06 am

The new preprint, Genomic Insights into the Demographic History of Southern Chinese, is somewhat inaccurately titled. It’s really more about the progenitors of the various Southeast Asian language families, whose origins are in South China. Yes, mother southern Han Chinese absorbed local substrate, but that’s been known for a while.

The story here is successive incidents of ‘collapsing structure’ out of the Last Glacial Maximum. The various East Asian populations admixed after diversification 20-40,000 years ago, and there was a later stage of admixture driven by the expansion of the Han out of the north.

An admixture graph is the best way to get at the major features of their model:


The major finding is that the Austro-Asiatic, Hmong-Mien and Austronesian language families emerge from groups distributed west-east in the Yangzi basin, with the Krai-Dai being more of a synthesis. The Tibeto-Burmans were a later push that synthesized mostly with Austro-Asiatic populations. The details are less important than the reality that some sort of separation and then admixture explains a lot of the local differences. Additionally, their genetic results confirm what is obvious with the Kinh: genetically they are very different from Austro-Asiatic groups which they are often linguistically bracketed.

The most interesting finding is an Andaman-like “ghost population” that contributed to the Jomon, and less to other groups. You know where I’m going here: this is clearly the basal East Eurasian group called “Australo-Melanesian” that contributed genes to some Amazonian groups. This group is the one that contributed haplogroup D to Tibetans and Japanese.

With East Asian population structure I feel we have the broad features, but a lot of the details are rickety. We’ll see.

October 20, 2020

The Genetic History of the Middle East: into Arabia

A new massive preprint on the Middle East is out. I’ve edited the first figure to give people a general sense of the broad results and populations sampled. First, you have to know that these are high-quality modern samples. 137 individuals at 30x whole genome coverage.  In other words, basically the best genomic data you can get on sequences. No need to futz around with subsets of the data. This is important and needful because the 1000 Genomes doesn’t have a Middle Eastern population. So when looking to assemble variants there was a deficit in this domain. Even the WGS of the HGDP was not totally sufficient, since the Middle Eastern populations were not Arabian.

The populations here are sampled from both the classical “Fertile Crescent” and various points within the Arabian peninsula. At the end of the preprint, they do some analysis on selection, which I won’t talk about. The most interesting thing is that they confirm that Arabian people have a unique lactase persistence allele that seems to have been selected very recently, just like in Europeans. A lot of the selection analysis seems to be either replicate what you would find elsewhere. Or, they do not have enough power to detect polygenic selection (though they did detect selection on EDU).

The big finding to me is that this work confirms that there is a north-south cline in the Near East defined by a deep population structure. The admixture graph to the right captures the main features using Lebanese and Emiratis as the two extreme populations, but as you can see in the admixture plot above the cline really runs from the Caucasus to southern Arabia. If you analyze these populations one thing you will see is that Fertile Crescent populations, such as Druze, often seem more like Armenians and Georgians, than South Arabians. Why is this? After all, South Arabians and Fertile Crescent populations speak Semitic languages.

I think the issues here are multiple. First, there is recent admixture that obscures some of the deeper relationships. This is clear insofar as most Arab Muslim populations have Sub-Saharan African admixture. This is historically attested, and physically visible. The variation and range are quite high, in part due to spatial heterogeneity of slavery (e.g., more African slaves in lowlands than highlands), and the recency of the admixture producing variation due to incomplete mixing (the dates are usually 1000 A.D. and later).

But this is not the only admixture. All of the Fertile Crescent populations, along with groups to the north, have much more steppe drift than those to the south in Arabia. The details of the fractions don’t matter, it’s not much, but it’s not trivial, and it’s always higher than among the Arabians. Additionally, this element is new to the region, in relative terms. You can see the contribution in modern Lebanese in comparison to the Bronze Age Sidon samples, which date to 1800 BC. The source could be continuous gene flow during the Roman and Byzantine period, or even later. Or, it could also be Indo-European migrations.

We know that Indo-Iranian peoples were present in Upper Mesopotamia. The Mitanni Kingdom, which had Indo-Aryan affinities, shows up after 1750 BC. The Hittites, the Nesa, show up to the north in Anatolia a bit earlier.  Interestingly, the Hittites speak an Indo-European language that is often considered basal (the outgroup) to most of the others. Armenian, who emerges later in eastern Anatolia, is also quite distinct, just as Greek to the west is. In contrast, there is a lot of suggestive evidence of either genealogical or geographical connectedness between the ancestors of Indo-Iranian and Slavic language families.

The presence of these two very distinct ancestral components, steppe, and Sub-Saharan African, on top of the ancient Near Eastern base, produce distinctions in the modern populations which obscure some of the deeper strands. In the late 2000s when researchers and bloggers began running admixture analyses on Ethiopians it was clear that this population was a mix between “West Eurasian” and African which wasn’t Bantu. The West Eurasian donor population was often Yemeni, in particular Yemeni Jews. Later on, using more sophisticated methods some models suggested greater affinity in Ethiopian genomes to Levantine populations than Yemenis. What was going on?

We now know. It is quite clear Ethiopian populations lack steppe ancestry. In the earlier Bronze Age, and definitely, the Neolithic, Levantines lacked steppe ancestry. In fact, the Neolithic Levantines usually lacked “Iranian” ancestry. The West Eurasian ancestry in Northeast Africans, on the whole, is enriched for a Levantine ancestry quite similar to Natufian. Modern-day South Arabians are the closest to this population mix, even if they are not descended from ancient Levantines. They lack steppe.

Modern-day South Arabians in fact descend in part from indigenous hunter-gatherers, who were a sister clade to the ancestors of Natufians. The admixture graph makes that clear for the Emiratis with the least African ancestry have half their ancestry from this group. In the book Arabs, the author discusses at length various Yemeni legends of a fusion between distinct peoples on the edge of history. This could be recollections of the merge of indigenous Neolithic Arabians and peoples who expanded from the north.

The analyses of these samples confirm and reiterate what has been found with ancient DNA: at some point late in the Neolithic and early in the Bronze Age a massive admixture event occurred in the Fertile Crescent which brought a considerable amount of “Iranian” ancestry into the region (these ancient people are not like modern Iranians; in particular, they lacked steppe ancestry which is copious in much of Iran, particularly the east). This ancestry pushed south and westward so that ~50% of the ancestry of Arabians seems to be Iranian. That being said, I have some qualms here:

We explored whether this ancestry penetrated both the Levant and Arabia at the same time, and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,900 and 5,600 ya (Table S3), followed by admixture in Egypt (2,900-4,700 ya), East Africa (2,200-3,300) and Arabia (2,000-3,800). These times overlap with the dates for the Bronze Age origin and spread of Semitic languages in the Middle East and East Africa estimated from lexical data (Kitchen et al., 2009; Figure S8). This population potentially introduced the Y-chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ~5.6 [95% CI, 4.8-6.5] kya, agreeing with a potential Bronze Age expansion; however, we do find rarer earlier diverged lineages coalescing ~17 kya (Figure S9). The haplogroup common in Natufians, E1b1b, is also frequent in our dataset, with most lineages coalescing ~8.3 [7-9.7] kya, though we also find a rare deeply divergent Y-chromosome which coalesces 39 kya (Figure S9).

Some of these dates are hard to credit. For example, I obtain a midpoint estimate of Iranian admixture into Egypt around 1836 BC!

The fraction of Iranian ancestry is substantial. The admixture model in the supplements gives this for Egyptians: 45% Levant_N, 32% Iran_N, 8% EHG (Eastern European Hunter-Gatherer), and 15% Mota (African). The older date is 2700 BC. The oldest Egyptian writing dates to 2700 BC, but proto-hieroglyphs are 500 years older. The authors talk about Semitic languages, and ancient Egyptian is not Semitic. So it could be a minority population mixed into the Egyptians, but this is a massive event that we don’t have records of. In fact, the authors claim that it went into much of Northeast Africa at a relatively late date.

Additionally, the values for the Levant seem recent as well. That being said there was a pre-Sumerian civilization, the Uruk Civilization, which spread broadly from Mesopotamia between 4000 and 3000 BC. This is 6000 to 5000 years ago. The midpoint of this is 5500 years, while the midpoint of the admixture into the Syrians, who were on the edge of the Uruk Civilization is 3800 years ago. Basically, I think the evidence points to various statistical genomic artifacts reducing the age from when the admixture truly occurred (this has long been a problem in this field).

I honestly have no idea how to relate the expansion of Semitic languages to the expansion of Iranian languages. My friend Patrick Wyman believes that Anatolian farmers spoke Afro-Asiatic. These were very different people from the Iranians, who arrived from the east later. Additionally, history teaches us that Mesopotamia during the Bronze Age was very linguistically diverse. The Sumerians were not Semitic, and neither were their Elamite neighbors in Khuzistan. The Akkadians, who were more prevalent in the north of Mesopotamia, but were present from the beginning of Sumerian history, were Semitic.

There is still a mystery around the great admixture between Neolithic Near Easterners of the west and the east. I don’t think we’ve closed that chapter of the book.

That being said, there is a lot that is “solved” in this paper. For example, these authors seem to confirm that there is no evidence of “first wave” modern humans in Arabian populations earlier than the non-African radiation. Arabians, like other non-Africans, underwent a population expansion 50-70,000 years ago. Their separation from Mbuti Pygmies was gradual up until 120,000 years ago. Then there seems to have been a separation. What this is telling us, I believe, is that the ancestors of non-Africans were part of the African meta-population until 120,000 years ago. This is suspiciously close to the Eemian Interglacial, which dates to between 115,000 to 130,000 years ago. The Eemian was characterized by a “Green Sahara”, so it seems that this is when early modern humans ventured in substantial numbers out of the continent and to its peripheries. One issue that seems notable in the data is that proto-non-Africans seem to have been characterized by a period of isolation and small population size. Perhaps

But 50-70,000 years ago a massive expansion of one of these daughter populations occurred. These data confirm that Arabians seem to have the same Neanderthal admixture as everyone else, but, even accounting for Sub-Saharan African ancestry they also have somewhat less. In alignment with earlier research, they argue that this is due to admixture with “Basal Eurasian” populations which did not mix with Neanderthals ~55,000 years ago.  Or, more precisely, did not carry as much Neanderthal ancestry (it seems plausible that the Basal Eurasian populations are themselves a compound of conventional non-African at the base of the broader splits, and a deeper basal group which lacks Neanderthal ancestry).

Going back to the admixture graph, you notice that both western and eastern farmer populations are a compound of Basal Eurasian and various lineages that are broadly “West Eurasian.”  Natufians and Anatolian farmers are descended about half from groups related to European hunter-gatherers, while ancient Neolithic Iranians had ancestry related to these people, but even more to populations distantly related to Ancient North Eurasians (Paleo-Siberians). The events here are distant, but the sample proportion of Basal Eurasian ancestry indicates to me a rapidly expanding population at some point which mixed with a well-structured set of groups in the Near East.

The major takeaways

  • Near Easterners are part of the same broad diversification as all other non-Africans
  • The expansion of these non-Africans dates to 50-70,000 years ago
  • Archaeological evidence points to a very intense expansion in the period around ~50,000 years ago, and admixture with Neanderthals somewhat before then
  • At the beginning of the Holocene Near Easterners were deeply structured regionally, and had threaded together disparate ancestral components (Basal Eurasian, related to European hunter-gatherer and Paleo-Siberian)
  • Late in the Neolithic and early Bronze Age much of this structure collapsed, and there was a massive admixture of Iranian ancestry to the south and west (conversely, there is evidence in other work of admixture of western farmer ancestry to the east)
  • Finally, there is evidence for later incursions of steppe people into the northern Arabian fringe and Fertile Crescent
  • On top of this, there is historical admixture from Africans and in the north Turks and other groups

September 17, 2020

The genomic landscape of Brazil in 1950

Filed under: Admixture,Human Population Genetics,Human Variation,race — Razib Khan @ 12:12 am


A new whole-genome analysis out of Brazil has some interesting ancestry information. The preprint, Whole-genome sequencing of 1,171 elderly admixed individuals from the largest Latin American metropolis (São Paulo, Brazil):

As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases….

Admixed populations are useful for a lot of reasons. But let’s observe some things about his Brazilian population.

First, it’s old. The average age is 72, so these are people born in 1950. This is the genetic characteristics of Brazil in 1950 in many ways, not today. This is why you see so many individuals who self-identify as Asian who are nearly 100% Asian. These individuals are the children of Japanese immigrants. In 1950 the endogamy of the community was high. Today the youngest generation of Japanese Brazilians is 60% mixed.

Second, most of the ancestry of self-identified Brazilian whites in this sample is mostly white. Like the Japanese, a large number of these individuals are probably the children of European immigrants. I suspect this accounts for many of the 20% of the “white” sample that has no trace non-European ancestry. But observe that around another 20% has trace proportions (~1%) of non-European ancestry, mostly African.  My supposition, in this case, is that these are “old stock” white Brazilians. That this, one or both of their parents descend from Portuguese Brazilians who settled in overwhelmingly European areas and retain some non-European admixture due to long-term residence in Brazil. The remainder is white Brazilians who have substantial non-European ancestry, with a small minority whose proportions are quite high from a North American perspective.

A point of comparison is probably useful. About 95% of non-Hispanic whites in the United States seem to have almost no detectable non-European ancestry using this sort of model-based clustering. This illustrates the massive demographic difference between the USA and Latin American nations. The vast majority of white Latin Americans look quite Iberian, but the majority also have far more non-European ancestry than 95% of North American whites. This is partly a reflection of the smaller population sizes of native peoples in North America, and, the nature of hypodescent for people of any African ancestry in the United States, so that mixed individuals were integrated into African Americans.

Third, the people who are “mixed” and black in Brazil are more European than you might expect. All the estimates of European ancestry I’ve seen for self-identified black Brazilians (a somewhat protean category due to social changes over the past few generations) indicate higher European ancestry fraction than among African Americans (~20% median in the latter). Self-identified “mixed” Brazilians have more European ancestry than anything.

The native category is interesting because most of these people have only a minor component of that ancestry. Additionally, a huge number of white, mixed, and black Brazilians have native ancestry. This is not surprising from previous work. Ancestry deconvolution indicates this is an old admixture, and mtDNA lineages are more native than Y chromosomes. There was a sex asymmetry in the early settlement, and native women married into the settler population. Both black and white Brazilians (and mixed) have lots of native ancestry.

Finally, though there is some overlap between these groups (despite their average differences), I assume that the overlap is much greater in contemporary cohorts in terms of genomic ancestry. It will be interesting to see when we get temporal transects in Brazil to see how assortative mating does, or doesn’t, work.

Looking forward to more of this from Latin America. So many opportunities for admixture mapping!

The genomic landscape of Brazil in 1950

Filed under: Admixture,Human Population Genetics,Human Variation,race — Razib Khan @ 12:12 am


A new whole-genome analysis out of Brazil has some interesting ancestry information. The preprint, Whole-genome sequencing of 1,171 elderly admixed individuals from the largest Latin American metropolis (São Paulo, Brazil):

As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases….

Admixed populations are useful for a lot of reasons. But let’s observe some things about his Brazilian population.

First, it’s old. The average age is 72, so these are people born in 1950. This is the genetic characteristics of Brazil in 1950 in many ways, not today. This is why you see so many individuals who self-identify as Asian who are nearly 100% Asian. These individuals are the children of Japanese immigrants. In 1950 the endogamy of the community was high. Today the youngest generation of Japanese Brazilians is 60% mixed.

Second, most of the ancestry of self-identified Brazilian whites in this sample is mostly white. Like the Japanese, a large number of these individuals are probably the children of European immigrants. I suspect this accounts for many of the 20% of the “white” sample that has no trace non-European ancestry. But observe that around another 20% has trace proportions (~1%) of non-European ancestry, mostly African.  My supposition, in this case, is that these are “old stock” white Brazilians. That this, one or both of their parents descend from Portuguese Brazilians who settled in overwhelmingly European areas and retain some non-European admixture due to long-term residence in Brazil. The remainder is white Brazilians who have substantial non-European ancestry, with a small minority whose proportions are quite high from a North American perspective.

A point of comparison is probably useful. About 95% of non-Hispanic whites in the United States seem to have almost no detectable non-European ancestry using this sort of model-based clustering. This illustrates the massive demographic difference between the USA and Latin American nations. The vast majority of white Latin Americans look quite Iberian, but the majority also have far more non-European ancestry than 95% of North American whites. This is partly a reflection of the smaller population sizes of native peoples in North America, and, the nature of hypodescent for people of any African ancestry in the United States, so that mixed individuals were integrated into African Americans.

Third, the people who are “mixed” and black in Brazil are more European than you might expect. All the estimates of European ancestry I’ve seen for self-identified black Brazilians (a somewhat protean category due to social changes over the past few generations) indicate higher European ancestry fraction than among African Americans (~20% median in the latter). Self-identified “mixed” Brazilians have more European ancestry than anything.

The native category is interesting because most of these people have only a minor component of that ancestry. Additionally, a huge number of white, mixed, and black Brazilians have native ancestry. This is not surprising from previous work. Ancestry deconvolution indicates this is an old admixture, and mtDNA lineages are more native than Y chromosomes. There was a sex asymmetry in the early settlement, and native women married into the settler population. Both black and white Brazilians (and mixed) have lots of native ancestry.

Finally, though there is some overlap between these groups (despite their average differences), I assume that the overlap is much greater in contemporary cohorts in terms of genomic ancestry. It will be interesting to see when we get temporal transects in Brazil to see how assortative mating does, or doesn’t, work.

Looking forward to more of this from Latin America. So many opportunities for admixture mapping!

July 20, 2020

Solute carrier family genes are important…but how?

Filed under: Human Genetics,Human Population Genetics — Razib Khan @ 10:57 pm

Over the last ten years David Reich and other researchers have been constructing what is basically an atlas of human demographic history. Taking the genealogies written in our DNA, mapping them onto population bifurcations and admixtures, and synthesizing that back together with what we know from history and archaeology.

To a great extent, this is a project of human phylogenomics. Taking genome-wide data and constructing phylogenies out of it (or, perhaps more precisely, graphs, as this is on a intra-species time scale mostly and characterized by lots of gene flow across the “tips” of the tree). But there’s another thing you can do with modern human genomics and evolution: look at patterns of selection within the genome.

The Reich group has already started doing this. For example, they have adduced that CCR5 delta 32 mutation seems to have emerged out of the Yamnaya horizon.

Last fall, a paper came out in MBE, Ancestry-Specific Analyses Reveal Differential Demographic Histories and Opposite Selective Pressures in Modern South Asian Populations, which I gave a cursory read, but which I’ve looked at more closely. It takes a “natural experiment,” the emergence of Indian subcontinental populations from a massive admixture between lineages which diverged 40,000 years ago, and looks to see which genetic regions deviate from what you would expect based on overall genome.

The method is simple: imagine that “Ancestral North Indians” are fixed for an allele at a gene in one state and “Ancestral South Indians” are fixed in the other state. Indian populations are about 50:50 (with a range). If the frequency today in Indian populations is 95% for the allele that is from the “Ancestral North Indians”, one might be suspicious as to what’s going on. Or, vice versa.

In the paper, they used whole genomes to reconstruct the ancestral steppe/Iranian population without any residual “Ancient Ancestral South Indian” (AASI), the latter of which has no West Eurasian. They did the same for the AASI. These reconstructions are always dicey, but they made a good faith effort to check their work. On the whole, that section was impressive. The authors seem to be roughly aligned with the results in Narasimhan et al. 2019. The AASI seems to be homogeneous, with the exception of attempting to model them from donors which were Munda or Burusho, both groups with deep East Asian admixture (illustrating the problem with deconvolution). Second, they show that the AASI are not clustering with the Andamanese, which makes sense since these groups diverged closer to 40,000 years ago. Finally, the steppe/Iranian group looks most like Armenian middle-to-late Bronze Age people. A synthesis of steppe and some Iranian-like ancestry.

But this isn’t the most interesting part of the paper. It’s the selection. Here are the top, top, candidates:

Component# of Pops with Sig ValueGenes (±50-kb Region)
ANI22 (percentile = 99.9949) THUMPD3, SETD5 
21 (percentile = 99.9814) SNAP91, RIPPLY2, CYB5R4, MRAP2, CEP162, TBX18 
21 (percentile = 99.9814) TRIM31, TRIM40, TRIM10, TRIM15, TRIM26, HLA-L 
19 (percentile = 99.9383) Intergenic 
18 (percentile = 99.9195) ZNF681, ZNF726, ZNF254 
ASI−21 (percentile = 0.0057) RXFP3, SLC45A2, AMACR, C1QTNF3, ADAMTS12 
−16 (percentile = 0.038) SRXN1, SCRT2, SLC52A3 
−16 (percentile = 0.038) Intergenic 
−15 (percentile = 0.0757) Intergenic 
−14 (percentile = 0.1268) ATP6V1H, RGS20, TCEA1, LYPLA1, MRPL15 

 

I’ll quote the authors at length from the “Discussion”:

We also show that the interaction between alleles that were highly polarized between the two ancestry sources that admixed in South Asia caused patterns of admixture imbalance across the majority of sampled groups, hence unlikely explainable by population specific random drift, and perhaps due to positive or negative environmental pressures. Interestingly, we report how loci that include genes involved with diabetes (SETD5), diet (ZNF) and the immune response (HLA) show West Eurasian (N) haplotypes to be significantly more represented compared with the South Asian (S) counterparts. This might be a stark contrast to what is expected, given the long-term history of local adaptation of S haplotypes in local environment. We speculate that the diet-related signal may be linked with post-Neolithic dietary shifts that might have followed the arrival of the West Eurasian component in the area, whereas the overrepresentation of West Eurasian HLA haplotypes might have some similarity, although at a different time scale, with what has happened in Native American populations after recent colonization likely caused by European borne epidemic (Lindo et al. 2016).

On the other hand, the top region for significant enrichment of South Asian ancestry includes the rs16891982-G allele of SLC45A2 gene (associated with light skin pigmentation in West Eurasians), suggesting purifying selection at this locus following admixture…the overall abundance of these West Eurasian alleles is drastically reduced in 21 out of 25 South Asian populations analyzed here…Such a strong negative pressure against a light pigmentation allele may be explained by the high ultraviolet (UV) radiation at South Asian latitudes and this result seems to be further corroborated by similar N ancestry deficiencies in TYRP1 and BNC2 genes for as many as 11 South Asian populations (supplementary table 4, Supplementary Material online). However, purifying selection against maladaptive light pigmentation alleles in high UV environment is not observed for all pigmentation alleles; in fact, the rs1426654-A allele of the SLC24A5 gene…shows instead an increase of frequency in South Asian…Taken together, our results point to opposite pressures on some West Eurasian alleles involved in skin and eye pigmentation. On one hand, SLC45A2 seem to have undergone some selective pressure that removed most of West Eurasian alleles that arrived in the area after the admixture event. Conversely, the SLC24A5 (rs1426654-A) West Eurasian allele seems to have escaped such a negative pressure perhaps thanks to its apparent neutral role with respect to susceptibility to skin carcinoma caused by UV radiation…

As I said, in the phylogenomic analysis above the authors suggest that the AASI population was homogeneous. I think this suggests that a single ancestral population was absorbed into expanding Iranian-related-farmers in NW South Asia. The prevalence of deeply diverged haplogroup M on the mtDNA in subcontinental peoples points to female mediated admixture. The positive selection for various “lifestyle” alleles indicates to me that expanding Iranian-related-farmers absorbed AASI tribes, in particular the women, and assimilated them to the new lifestyle.

The results from pigmentation are surprising, but not shocking. Knowing what I know about the ancestral frequency distribution of the various alleles, it was clear that the derived fraction of SLC24A5 was enriched. A lot of the other ones that are responsible for variation in Europeans looked either selected against or, the ancestral Indo-Aryans et al. were not quite like modern Europeans. These data point to in situ selection.

But why selection for some pigmentation alleles and not others? First, I don’t think cancer is a major selective pressure. That happens late in life. Rather, I think SLC24A5 in the derived variant does something that has nothing to do with pigmentation. It was positively selected among the Khoisan people of Southern Africa and looks to have been selected in Ethiopia as well after the admixture event. In Europe itself its frequency is so high that there has clearly been lots of positive selection since the “great admixture.”

As far as the other alleles, perhaps it is pigmentation. But perhaps it is something else?

Round and round we’ve been going with these genome-wide studies, but in the 2020s I think biologists who know the molecular pathways in a way that plumbs the depths of pleiotropy need to get involved.

May 19, 2020

Correlated response is a big story of selection

Filed under: Human Population Genetics — Razib Khan @ 10:55 pm

Adaptation is clearly one of the most important processes in understanding how evolution occurs. In a classical sense, it’s easy to understand. Parallel adaptations in body plans make dolphins and swordfish shaped the same. It’s physics.

But with the emergence of DNA, a lot of the focus on adaptation has been displaced to the signatures of natural selection on the molecular level. Phenotypes are controlled by variation in genotypes, and instead of description and hypothesizing, researchers can actually infer from the genetic patterns the history and arc of adaptation. 

At least that’s the theory.

The initial tests for signatures of natural selection focused on adaptation between species. For example, Tajima’s D. Usually this took the form of comparing variation across two lineages of Drosophila. In the 2000s with genome-wide data new methods predicated on looking at ‘haplotype structure’ (variation across sequences of genes) emerged. Instead of between species, these methods focused on the selection within species (e.g., why are some humans adapted to malaria?). These methods were good at picking up strong signals at a few genes where the selective sweeps were recent.

But as datasets and genomics got bigger and better researchers focused on more fundamental patterns and analyses, such as looking at ‘site frequency spectra.’ Ultimately the goal was to go beyond selection at a single locus (e.g., lactase persistence), and understand polygenic characteristics (e.g., height). Obviously, this is much harder because polygenic characters are distributed across many genetic loci, and issues of statistical power are always going to loom large (and there is the soft vs hard sweep issue too!).

A new preprint is an excellent introduction to this wild world, Disentangling selection on genetically correlated polygenic traits using whole-genome genealogies:

We present a full-likelihood method to estimate and quantify polygenic adaptation from contemporary DNA sequence data. The method combines population genetic DNA sequence data and GWAS summary statistics from up to thousands of nucleotide sites in a joint likelihood function to estimate the strength of transient directional selection acting on a polygenic trait. Through population genetic simulations of polygenic trait architectures and GWAS, we show that the method substantially improves power over current methods. We examine the robustness of the method under uncorrected GWAS stratification, uncertainty and ascertainment bias in the GWAS estimates of SNP effects, uncertainty in the identification of causal SNPs, allelic heterogeneity, negative selection, and low GWAS sample size. The method can quantify selection acting on correlated traits, fully controlling for pleiotropy even among traits with strong genetic correlation (|rg| = 80%; c.f. schizophrenia and bipolar disorder) while retaining high power to attribute selection to the causal trait. We apply the method to study 56 human polygenic traits for signs of recent adaptation. We find signals of directional selection on pigmentation (tanning, sunburn, hair, P=5.5e-15, 1.1e-11, 2.2e-6, respectively), life history traits (age at first birth, EduYears, P=2.5e-4, 2.6e-4, respectively), glycated hemoglobin (HbA1c, P=1.2e-3), bone mineral density (P=1.1e-3), and neuroticism (P=5.5e-3). We also conduct joint testing of 137 pairs of genetically correlated traits. We find evidence of widespread correlated response acting on these traits (2.6-fold enrichment over the null expectation, P=1.5e-7). We find that for several traits previously reported as adaptive, such as educational attainment and hair color, a significant proportion of the signal of selection on these traits can be attributed to correlated response, vs direct selection (P=2.9e-6, 1.7e-4, respectively). Lastly, our joint test uncovers antagonistic selection that has acted to increase type 2 diabetes (T2D) risk and decrease HbA1c (P=1.5e-5).

There’s a lot going on here. This is my favorite passage:

To address these issues, we recently developed a full-likelihood method, CLUES, to test for selection and estimate allele frequency trajectories. 21 The method works by stochastically integrating over both the latent ARG using Markov Chain Monte Carlo, and the latent allele frequency trajectory using a dynamic programming algorithm, and then using importance sampling to estimate the likelihood function of a focal SNP’s selection coefficient, correcting for biases in the ARG due to sampling under a neutral model.

Alrighty then! Someone’s a major-league nerd.

The preprint is fine, but ultimately this is something you get a “feel” for by working with models, data, and general analyses in the field. And I don’t have a strong feel since I don’t work with these sorts of data and questions myself. So what do I know? That being said, I like the preprint because it satisfies an intuition I’ve long had: correlated response is a big part of the story of polygenic selection.

Basically, you have to remember that complex traits are subject to variation at a host of genetic positions. And genetic variants rarely have singular effects. That is, one locus usually exhibits pleiotropy. The genetic effect shapes a lot of characteristics. Therefore, if there is a strong selection on a gene, more traits than simply the target of selection will be impacted. In animal breeding making huge, meaty, fast-growing lineages can render them infertile if selection is taken too far. That’s a bad correlated response.

After correcting for the genetic correlation the authors note that some traits, such as EDU and hair color, are not really selected directly at all. This is like the fact that we know EDAR is associated with hair thickness and is a strong target of selection. We have no idea what the trait of interest is. But it’s a pretty big deal. All these quantitative traits controlled by variation across the genome are being reshaped by adaptation on other traits. What are those traits? This preprint doesn’t answer that really.

Hopefully, we’ll make some headway in the 2020s because we’re definitely looking through the mirror darkly.

May 17, 2020

Knanaya & Kerala: perhaps there is some different down south?

Filed under: Human Population Genetics — Razib Khan @ 2:03 am


Over the past few months I have been getting together some samples from people from Kerala, with a focus on Knanaya Christians. A subset of the brother St. Thomas Christian community, two things have jumped out in my analyses:

– they are quite endogamous

– they are shifted off the ‘India-cline’

More precisely, like Cochin and Mumbai Jews, they are often shifted toward Middle Eastern populations. This is relevant because the Knanaya believe themselves, like most St. Thomas Christians, descended in part from Jews or Christians from the Middle East.

All that being said, looking more deeply into the data I’m not quite as sure. One of the reasons is that Kerala may not be as “structured” as other parts of India. Some of this is well known. The Nair samples I have are shifted toward South Indian Brahmins, which is plausible in light of connections between Nairs and Brahmins.The Brahmin-adjacent Ambalavasi seem quite similar to Brahmins. These are not surprising. But, Kerala samples I have as a whole seem notably shifted on the India cline more toward the “north” than I would have expected. This could be due to gene flow from without and within Kerala, in a way that is not typical in other parts of the subcontinent.

I say this because even the Ezhava, who were basically what we’d call a Dalit community (no longer today), show a shift.

Instead of talking, let me post some admixture plots (unsupervised):

Now, supervised:

Now TreeMix:

Here is an admixturegraph (using the Narasimhan et al. right-populations):

Test_PopSteppeAHGIndusValley
Bengali0.1520.4130.435
Cochin_Jews0.2120.1880.6
K_Bunt0.1970.3070.496
K_Ezhava0.1470.2810.572
K_Iyer0.2710.1640.565
K_Knanaya0.1490.1860.665
K_Mapilla0.1410.2930.565
K_Nair0.1720.2440.584
K_Nambudiri0.2480.1830.569
K_Nasrani0.1340.270.596
K_Poduval0.20.2760.524
K_Vaniya0.2150.1390.646
K_Varma0.2130.1160.67
Brahui0.236-0.1810.945
Mumbai_Jews0.271-0.1340.863
Patel0.1670.270.562
Pulliyar0.050.5750.375
TamilBrahmin0.1910.2620.547
UP_Brahmin0.2930.2230.484
Velamas0.1010.2980.601

I ran f3-stat. Here it is filtered of any z-scores that are > -2.

Thoughts?

April 4, 2020

Hard sweeps and natural selection obscured by Bronze Age admixture

Filed under: Human Population Genetics,Selection — Razib Khan @ 10:48 pm

The above is the map from the Online Ancient Genome Repository. You can see the variation by region. There’s a lot of ancient DNA in Europe. Very little in Asia. And only moderate amounts elsewhere.

The map is from a new preprint, Ancient human genomes reveal a hidden history of strong selection in Eurasia:

The role of selection in shaping genetic diversity in natural populations is an area of intense interest in modern biology, especially the characterization of adaptive loci. Within humans, the rapid increase in genomic information has produced surprisingly few well-defined adaptive loci, promoting the view that recent human adaptation involved numerous loci with small fitness benefits. To examine this we searched for signatures of hard sweeps – the selective fixation of a new or initially rare beneficial variant – in 1,162 ancient western Eurasian genomes and identified 57 sweeps with high confidence. This unexpectedly extensive signal was concentrated on proteins acting at the cell surface, and potential selection pressures include cold adaptation in early Eurasian populations, and oxidative stress from carbohydrate-rich diets in farming populations. Critically, these sweep signals have been obscured in modern European genomes by subsequent population admixture, especially during the Bronze Age (5-3kya) and empires of classical antiquity.

So the “big thing” that they found here is that admixture obscures signals of selection. More precisely, it obscures signals of hard selective sweeps, the classical variant where a single position in a single haplotype rises up in frequency rapidly due to positive selection.

If you read further into the paper you note that they believe admixture, due to the mixing of backgrounds, attenuates the signal of hard sweeps, and may even imply that these hard sweeps are soft sweeps through the mixing of distinct genetic backgrounds. I honestly didn’t follow that too closely, but I guess it depends on the selection coefficient and rate of mixing. They are reporting lots of selection events of >1%, and I wonder about how credible this is (Haldane’s dilemma?).

That being said, the functional significance of these selection events is important. Basically, they look like adaptations to climate and changes in diet. What authors seem to be suggesting here is that the shift in lifestyle and expansion of farmers in the early Holocene was a pretty big deal, and the mixing between various divergent streams during the Bronze Age muddled the signals.

If the authors are right, that means that ancient DNA is going to be very big for understanding the trajectory of selection, because it’s not just going to be subtle polygenic changes.

March 17, 2020

Blood group A at greater risk from COVID-19 (maybe)

Filed under: coronavirus,Human Population Genetics — Razib Khan @ 10:26 am

To a great extent much of the population genetics of humans in the 20th-century that doesn’t involve external traits is the population genetics of blood groups. A, B, and O, along with Rhesus factor. Read L. L. Cavalli-Sforza and William Bodmer’s The Genetics of Human Populations, the first edition of which was written in the 1960s. The emergence of more genetic markers, and Y, mtDNA, and genome-wide analysis has marginalized the exploration of population genetic variation of ABO. But it’s still useful. And it’s still functionally important (there’s a reason that A and B groups evolved!).

Many years ago while reading Alan Templeton’s Population Genetics and Microevolutionary Theory I stumbled upon the fact that spontaneous abortion (miscarriage) is associated with blood group differences between mother and fetus on the ABO blood groups. Basically, women who are O (and so genotype OO) have issues with fetuses that express A or B antigen. This isn’t deterministic, just a change in probabilities (I’m A, my wife is O, and our children are a mix, as my genotype is AO).

ABO has also been associated with different risks to different diseases (e.g., it is well known that those who express blood group B are more at risk for Hepatitis B).

So with that, a new preprint, ABO blood group and susceptibility to severe acute respiratory syndrome:

…The ABO group in 3694 normal people in Wuhan showed a distribution of 32.16%, 24.90%, 9.10% and 33.84% for A, B, AB and O, respectively, versus the distribution of 37.75%, 26.42%, 10.03% and 25.80% for A, B, AB and O, respectively, in 1775 COVID-19 patients from Wuhan Jinyintan Hospital. The proportion of blood group A and O in COVID-19 patients were significantly higher and lower, respectively, than that in normal people (both P < 0.001). Similar ABO distribution pattern was observed in 398 patients from another two hospitals in Wuhan and Shenzhen. Meta-analyses on the pooled data showed that blood group A had a significantly higher risk for COVID-19 (odds ratio-OR, 1.20; 95% confidence interval-CI 1.02~1.43, P = 0.02) compared with non-A blood groups, whereas blood group O had a significantly lower risk for the infectious disease (OR, 0.67; 95% CI 0.60~0.75, P < 0.001) compared with non-O blood groups. In addition, the influence of age and gender on the ABO blood group distribution in patients with COVID-19 from two Wuhan hospitals (1,888 patients) were analyzed and found that age and gender do not have much effect on the distribution…

It looks like from their data that A individuals were:

1) more likely to get infected
2) more likely to have severe responses

The individual difference is modest. You aren’t invulnerable if you are O. But, this might impact the course and severity of COVID-19 as it runs through populations…

Here is the table:

Here are blood group distributions:

February 14, 2020

The complex origins of our species in Africa

Filed under: Human Evolution,Human Population Genetics — Razib Khan @ 3:37 am

The figure to the right illustrates a model that is put forward in a new paper, Recovering signals of ghost archaic introgression in African populations. This was originally a preprint, Recovering signals of ghost archaic introgression in African populations. So we’ve discussed the implications extensively. Carl Zimmer has covered the story in The New York Times, while Georbe Busby did so in The Conversation.

Broadly, the results are getting at something which plenty of people have been noticing for many years: when it comes to Sub-Saharan Africans, there is something deeply diverged in West Africans vis-a-vis non-West Africans. These results seem to suggest that the divergence between this outgroup lineage and our own is a bit earlier than the modern-Neanderthal/Denisovan split. There are many abstruse statistical inferences and simulations, and it looks like the reviewers made them do a lot of analyses. But the general result is something other groups have seen as well, so I believe it. Additionally, the admixture of this lineage into West Africans seems to have occurred about 50,000 years ago, suspiciously close to the general expansion of modern humans out of Africa (or the most recent expansion).

From the discussion:

The signals of introgression in the West African populations that we have analyzed raise questions regarding the identity of the archaic hominin and its interactions with the modern human populations in Africa. Analysis of the CSFS in the Luhya from Webuye, Kenya (LWK) also reveals signals of archaic introgression, although our interpretation is complicated by recent admixture in the LWK that involves populations related to western Africans and eastern African hunter-gatherers (section S8) (20). Non-African populations (Han Chinese in Beijing and Utah residents with northern and western European ancestry) also show analogous patterns in the CSFS, suggesting that a component of archaic ancestry was shared before the split of African and non-African populations. A detailed understanding of archaic introgression and its role in adapting to diverse environmental conditions will require analysis of genomes from extant and ancient genomes across the geographic range of Africa.

This work seems more a question than an answer.

January 26, 2020

Indian ancestry maritime Southeast Asia

Filed under: Human Population Genetics — Razib Khan @ 2:38 pm

In the comments, people keep asking about Indonesia, and Java in particular. The reason is pretty simple: before wholesale conversion to Islam maritime Southeast Asia was dominated at the elite level by Indic social and religious forms. I say “Indic” because unlike mainland Southeast Asia Theravada Buddhism did not supplant other Indian religions, and in fact, while indigenous Buddhism that led to the Borobudur temple complex in the 9th-century went extinct, Hinduism persisted for quite a bit longer and persists to this day. Not only are there long-standing Hindu traditions in Bali, but far eastern Java remained a Hindu kingdom until 1770, and there remain Javanese Hindus (some of them are recent converts).

As several mainland Southeast Asian groups seem to have Indian admixture, what is the evidence for Indonesia? (the Singapore genome data offers up some Malays, and though some show recent Indian admixture, all of them have some Indian admixture). Luckily, there is a paper and data, Complex Patterns of Admixture across the Indonesian Archipelago. It uses the GLOBETROTTER framework, so I decided to reanalyze the data in a simpler manner, adding the Cambodians as a check (since from my previous posts you know a fair amount about that as a baseline).

Three points.

1) Definitely gene flow. But on the whole less than mainland Southeast Asia?

2) Lots of heterogeneity. Not surprising. The Sumatra samples seem to be taken from Aceh. This may matter a great deal.

3) In mainland Southeast Asia east of Burma there hasn’t been lots of colonial migration of Indians, nor a great deal of trade. The opportunities within maritime Southeast Asia for contact with outsiders are far greater. The inspection of results from Malaysia indicates continuous gene flow over a long period of time. In contrast, the results from Thailand and Cambodia indicate an early pulse.

January 25, 2020

The Indian admixture into Southeast Asia is not just a function of distance

Filed under: Human Population Genetics,Southeast Asia — Razib Khan @ 10:41 pm

In the comments to the post below about Indian ancestry in Thailand, some observed that this should not be surprising due to reciprocal gene flow and proximity. Implicitly, I think what is being suggested here is that there is isolation by distance and continuous gene flow. Obviously some of this is true, but there details here which suggest that it is simply not just geography at work.

The reason I was curious about the Dusun people in coastal Borneo is that while Malays all seem to have Indian ancestry, many tribal Austronesian groups in maritime Southeast Asia do not. The Indian admixture into the Malays is not just recent. Some of it seems quite a bit older than the colonial period.

In the context of Southeast Asia, it seems that some of the more ancient Austro-Asiatic people, in particular, the Mon and Khmer, have Indian ancestry, and groups which mixed with Austro-Asiatic substrates, such as Burmans and Thai, also have this.

Additionally, some groups in the northeastern states of India have less “Indian” admixture than the Thai and Khmer. To show this, see this PCA:

The Garo and Naga live in India (some Garo are in Bangladesh). The “East Indian” samples seem to be mostly Mizo. Of course, some of these groups are intrusive to the northeast. But still. Here are admixture and TreeMix:

The issue in Southeast Asia is that ethnolinguistic groups are the product of several syntheses and migrations. Most of what is today “Thailand” was the domain of Mon-Khmer people in 500 AD. Most of the ancestry seems to date to that period, though there was an overlay from Tai people to the north. In Burma the population is a synthesis of Burman elements with connections to northern East Asians, and Austro-Asiatic people such as the Mon in the south. Additionally, a later movement of Tai people also occurred in Burma (the upland Shan). In Vietnam, the Kinh moved south and seem to have replaced the indigenous Chams and Khmer (there is very little Indian-like ancestry in any Vietnamese samples).

When looking at the map the plausible route of gene flow is clearly from the northeastern part of the subcontinent overland. But several people have pointed out to me that this is very difficult terrain. Recently, I have been convinced that a maritime intrusion of Munda languages into Odisha is plausible. One of the potential points of departure for the proto-Munda is the Tanintharyi region of today’s far southern Burma, adjacent to southern Thailand. I propose that the Tanintharyi region served as a cultural and demographic valve, initially mediating overseas expansion by a group of Southeast Asian rice farmers, who eventually established connections across the Bay of Bengal between South and Southeast Asia.

The absorption of lowland Munda domains in Odish by Indo-Aryan speakers did not entail the disruption of the flow of goods and people in the preestablished trade network. Rather, these routes which were preexistent were co-opted by the Kalinga state, and later on by various southern Indian polities facing east on the Bay of Bengal. Inspection of the Y and mtDNA haplogroup profile suggests these were elite Indian males, with few females. This is very different from a folk expansion through Arakan, which would involve both and women.

Thais may have more Indian ancestry than Cambodians and less than Burmese

Filed under: Human Population Genetics,Thai — Razib Khan @ 12:05 am
Click to enlarge

There were some questions about the Indian ancestry of the Thai. The dataset released by the Reich lab has some Thai. I pulled that data, and some other Southeast Asian groups, and Tamils and Tajiks. The merging only left 62,000 SNPs, but that’s probably enough to answer this question. The PCA above shows the West Eurasian shift of some groups. The Thai definitely seem pulled to the Tamils, and are similar to the Cambodians, but with a bit more Indian ancestry and less “southern” Southeast Asian.

Below the fold are admixture and TreeMix plots. Basically you see what I’m talking about but in more detail. The Indian-like ancestry in the Luzon samples is really Spanish. The Ami and Atyal are Taiwanese aborigines. You see that they have the least West Eurasian ancestry. Even southern mainland Chinese seem to have some of that, indicating long-distance gene flow. But groups like Miao, Vietnamese/Kinh, and Dusun (Austronesians from Borneo) don’t the Indian ancestry that Thai/Lao/Cambodians/Malay have.

September 29, 2019

Humans are basically invasive weeds

Filed under: Human Population Genetics — Razib Khan @ 3:14 pm

One of the somewhat surprising things we have learned over the last decade is that massive admixture and homogenization has occurred between distinct human lineages over the last 10,000 years. By this, I mean that we’re not talking simply about continuous gene-flow between neighboring populations, but massive expansions of small groups and assimilation of very different groups from the expanding groups. As a stylized fact, it looks like “Early European Farmers” we as distinct from Mesolithic hunter-gatherers as modern Northern Europeans are from Han Chinese (pairwise Fst ~0.10). The fusion of these two groups later merged in much of Europe with migrants from the east, the western edge of the forest-steppe.

The empirical pattern seems to be that cultural innovations (e.g., agriculture) trigger demographic revolutions, which homogenize and admix vast regions. This is a story of demographic history. Phylogeography.

But there is another aspect, natural selection. Humans are not exempt from this. Selection operates upon genetic variation, which is preexistent (“standing variation”), or, comes from new mutations (de novo).

It seems plausible that cultural innovation has resulted in a great deal of selection over the last 10,000 years. So where did the raw material come from? One argument that has been playing out is between those who argue that it’s from variation within human populations that is ancestral and shared, and new variation. This is where admixture comes into play.

A new preprint on bioRxiv uses the 1000 Genomes data in the New World to suggest that admixture resulted in the introduction of a lot of adaptive alleles into populations of mostly European and Native background from African ancestry. Basically, it seems likely that the American tropics were colonized by African tropical diseases, which entailed adaptations which were already existent within African populations. Admixture-enabled selection for rapid adaptive evolution in the Americas:

Background: Admixture occurs when previously isolated populations come together and exchange genetic material. We hypothesized that admixture can enable rapid adaptive evolution in human populations by introducing novel genetic variants (haplotypes) at intermediate frequencies, and we tested this hypothesis via the analysis of whole genome sequences sampled from admixed Latin American populations in Colombia, Mexico, Peru, and Puerto Rico. Results: Our screen for admixture-enabled selection relies on the identification of loci that contain more or less ancestry from a given source population than would be expected given the genome-wide ancestry frequencies. We employed a combined evidence approach to evaluate levels of ancestry enrichment at (1) single loci across multiple populations and (2) multiple loci that function together to encode polygenic traits. We found cross-population signals of African ancestry enrichment at the major histocompatibility locus on chromosome 6, consistent with admixture-enabled selection for enhanced adaptive immune response. Several of the human leukocyte antigen genes at this locus (HLA-A, HLA-DRB51 and HLA-DRB5) showed independent evidence of positive selection prior to admixture, based on extended haplotype homozygosity in African populations. A number of traits related to inflammation, blood metabolites, and both the innate and adaptive immune system showed evidence of admixture-enabled polygenic selection in Latin American populations. Conclusions: The results reported here, considered together with the ubiquity of admixture in human evolution, suggest that admixture serves as a fundamental mechanism that drives rapid adaptive evolution in human populations.

The period after 1492 is easy for us to think about. But what ancient DNA has shown us is that it’s not as uncommon a phase as we might have thought.

September 22, 2019

Selection against height in Sardinians

Filed under: Human Population Genetics,Selection — Razib Khan @ 12:00 pm

Evidence of polygenic adaptation at height-associated loci in mainland Europeans and Sardinians:

Adult height was one of the earliest putative examples of polygenic adaptation in human. By constructing polygenic height scores using effect sizes and frequencies from hundreds of genomic loci robustly associated with height, it was reported that Northern Europeans were genetically taller than Southern Europeans beyond neutral expectation. However, this inference was recently challenged. Sohail et al. and Berg et al. showed that the polygenic signature disappeared if summary statistics from UK Biobank (UKB) were used in the analysis, suggesting that residual uncorrected stratification from large-scale consortium studies was responsible for the previously noted genetic difference. It thus remains an open question whether height loci exhibit signals of polygenic adaptation in any human population. In the present study, we re-examined this question, focusing on one of the shortest European populations, the Sardinians, as well as on the mainland European populations in general. We found that summary statistics from UKB significantly correlate with population structure in Europe. To further alleviate concerns of biased ascertainment of GWAS loci, we examined height-associated loci from the Biobank of Japan (BBJ). Applying frequency-based inference over these height-associated loci, we showed that the Sardinians remain significantly shorter than expected (~ 0.35 standard deviation shorter than CEU based on polygenic height scores, P = 1.95e-6). We also found the trajectory of polygenic height scores decreased over at least the last 10,000 years when compared to the British population (P = 0.0123), consistent with a signature of polygenic adaptation at height-associated loci. Although the same approach showed a much subtler signature in mainland European populations, we found a clear and robust adaptive signature in UK population using a haplotype-based statistic, tSDS, driven by the height-increasing alleles (P = 4.8e-4). In summary, by examining frequencies at height loci ascertained in a distant East Asian population, we further supported the evidence of polygenic adaptation at height-associated loci among the Sardinians. In mainland Europeans, we also found an adaptive signature, although becoming more pronounced only in haplotype-based analysis.

The whole literature on selection and height is confused. This is definitely an unformed and new area of exploration, so I wouldn’t put my money on any particular result. But, it is important to note I think that the association of particular genetic variants with differences in height is stronger than the signature of selection on those variants. Second, the preprint is hard to follow because there are all sorts of factors like ascertainment in the huge datasets necessary to do analysis on polygenic traits that date from the way the data were generated in the late 2000s (as well as new datasets coming online).

I think looking at variants in East Asians, and how they impact Europeans, is pretty neat. Obviously, some of the variants that impact polygenic traits are going to be rare, and so not shared between populations, but a lot of it is probably “standing variation” that dates back to before the Out of Africa event. In other words, the key thing is to look at differences in frequencies of alleles which are present in most populations, not different alleles which are not present in all populations.

One element that jumps out at me is the trajectory of selection, and how much is due to events that date deep into the past, to such an extent that it might not make sense to talk about populations as we understand them today. So, for example, they talk about selection events going back to beyond 10,000 years…but all the populations that we survey today did not really exist that deeply in time. This doesn’t mean that selection didn’t happen. “Populations” is a human construct, alleles are alleles. They may have been subject to selection in a variety of populations which admixed themselves out of existence in turn (there was selection for larger brains on and off for millions of years up until about ~200,000 years ago in various hominin populations).

The strongest selection result in this preprint seems to be that something is going on with Sardinians, the most direct descendants of Neolithic farmers. As noted on Twitter I think this has more to do with the nature of calorie restriction, or lack thereof, than selection on height per se. A lot more has to be done on understanding how the “secondary products revolution” (going from simple cereal farming to agro-pastoralism) impacted on human nutrition to understand selection on height, which does seem to be a reoccurring signal across human groups.

 

Older Posts »

Powered by WordPress

Do NOT follow this link or you will be banned from the site!