Razib Khan One-stop-shopping for all of my content

August 2, 2018

DNA results from Rakhigarhi are now being reported (really!)

Filed under: Indian Genetics — Razib Khan @ 10:34 pm

It looks like Outlook India is the first out of the gates to start reporting on the results from Rakhigarhi in northwest India, We Are All Harrapans. This is a “mature phase” Harrapan site that dates to about 2250 BC or so. Media reports have always been garbled on this topic, so anything that is coming not out of a paper needs to be treated cautiously. But I’ve heard some of the same things from independent sources from a while back, so I believe that this reporting is broadly on the mark.

Basically, the individual(s) they got DNA out of did not have any Eurasian steppe ancestry. This seems to confirm again that Eurasian steppe ancestry, which is found in fractions as high as ~30% in twice-born varna in Northern India (e.g., Rajputs, Tiwari Brahmins), arrived after 2000 BC. That is, after the peak period of the Indus Valley Civilization.

Again, one has to be wary of anything from the media because I’ve heard so many confusing things, including claims of garbled quotes, but here’s one of the authors of the forthcoming paper being quoted:

We did some analysis to figure out the exact date of the admixture. We have prepared a model in which all these stats fit together very tightly and that model suggests the Central Asian admixture happened about 1500-1000 BC…. Significant mixing happened around 1000 BC, also at 800 BC and 600 BC.

This is totally in line with the results from the March preprint discussed in the piece. That is, the Swat Valley samples show admixture and genetic change after 1200 BC. And the semi-historical understanding that we have of India during the period between 1000 BC and the rise of Mauryas is that it was a society in flux. But the only way the dating was changed by the Rakhigarhi results is if the genome is high enough quality that it allowed them to narrow down the parameters on some of the estimates of admixture.

One thing to keep in mind is that it is unlikely that the “Harappan people” were one single people genetically. There was probably a lot of variation in admixture with the indigenous South Asian substrate. And, I believe that the inflated steppe & AASI (“Ancestral Ancestral South Indian”) ancestry you see in some North Indian Brahmin groups compared to Sindhis (who are more “Iranian”) is evidence that the Indo-Aryan intrusion resulted in an expansion of people with West Eurasian ancestry much deeper into South Asia than was the case with the Harappans.

And of the Harappans, some of the Indian scholars have asserted that their descendants are still present in the region. I think this is right, insofar as some of the jati groups, often scheduled caste, in the northwestern region of South Asia share a lot more affinity with populations to the south and east.

June 12, 2018

No steppe ancestry in the the Rakhigarhi samples = non sequitur

Filed under: India Genetics,Indian Genetics — Razib Khan @ 10:36 pm

Harappan site of Rakhigarhi: DNA study finds no Central Asian trace, junks Aryan invasion theory:

The much-awaited DNA study of the skeletal remains found at the Harappan site of Rakhigarhi, Haryana, shows no Central Asian trace, indicating the Aryan invasion theory was flawed and Vedic evolution was through indigenous people.

“The Rakhigarhi human DNA clearly shows a predominant local element — the mitochondrial DNA is very strong in it. There is some minor foreign element which shows some mixing up with a foreign population, but the DNA is clearly local,” Shinde told ET. He went on to add: “This indicates quite clearly, through archeological data, that the Vedic era that followed was a fully indigenous period with some external contact.”

I haven’t heard anything definitive, but this is what I have heard: that the genetics they could analyze indicates continuity, but none of the steppe element ubiquitous in modern North India (and that there was contamination in the Korean lab). The Rakhigarhi samples date to 2500 to 2250 BC last I checked. That means they shouldn’t have any steppe ancestry if the model of the relatively late demographic impact of Indo-Aryans after 2000 BC is correct.

Basically, the whole article is kind of a non sequitur. I do understand that many archaeologists think there was continuity culturally. And there could have been. But taking into account the genetics of the modern region of India where Rakhigarhi is located, there was a major demographic perturbation after 2250 BC.

March 22, 2018

The peopling of the Indian subcontinent at the dawn of knowing

Filed under: Indian Genetics — Razib Khan @ 8:47 pm

A few people have been pointing me to a new paper, A Bayesian phylogenetic study of the Dravidian language family, which implies that the Dravidian language family diversified ~4,500 years ago. I don’t have much to say about the paper itself since it aligns with my own conclusions, but it’s well outside of any field that I can judge (though it does use standard phylogenetic packages I’ve used).

Recently I’ve been going back to old posts of mine on South Asian population genetics because no matter how much some people drag their feet on this question, we’re pretty close to knowing how South Asians came to be. Here’s what I said in December of 2010:

Who were the Indo-Iranians? I lean toward the proposition that they do derive from the Andronovo culture of the Eurasian steppe. This would date the entrance and expansion of Indo-Aryans in northern India 3-4,000 years ago. I also contend that the dominant element of ancestry among modern South Asians is not Indo-Aryan. Rather, it is an ancient stabilized hybrid of pre-agricultural societies in the Indus valley and Neolithic farmers who originated from what is today western Iran and eastern Anatolia. Therefore, I posit that the “Aryanization” of the Indian subcontinent is properly modeled as the same processes which led to the emergence of an Anatolian and Rumelian Turkish identity; a small elite population which forces an identity shift among the majority.

Where was I wrong? Where was I right?

Even looking at ADMIXTURE plots which don’t always give an accurate sense of population history it seemed likely that “Ancestral North Indian” (ANI) was not one thing. Some South Asian populations seemed to have much stronger affinities to West Asian populations. And in particular those from highland West Asia, toward the Caucasus. These include groups in southern Pakistan, but also to some extent in South India. In contrast, other groups had affinities with Eastern European populations, in particular, high caste North Indians, and to a lesser extend Indo-European peoples more generally.

I think I got the dynamic correct. Subsequent analyses comparing ancient DNA from the Caucasus and Iran suggest that all South Asians have a lot of shared drift (ergo, ancestry) with highland West Asians, while a smaller subset has high shared drift (ergo, ancestry) with pastoralists from the Eurasian steppe. The groups match up with what the ADMIXTURE plots were suggesting.

There was more than one pulse of ANI-like ancestry and that one of them was like West Asians and one more like Europeans. Remember, this is before we knew the acronyms ANE, WHG, and EEF. Or CHG and Eastern Middle Eastern Farmers and Western Middle Eastern Farmers.

But, I think I was wrong about the magnitude of the admixture. This was before ancient DNA had revolutionized our understanding of population movement and turnover. I was still resisting the mass migration of a whole folk across huge distances. I’m more open to that now. I am not sure I still believe the very high steppe fractions implied in some of the recent analyses, but it’s certainly higher than I would have believed back then.

Finally, the recent diversification of the Dravidian languages supports the model that their current distribution is not primordial. Rather, they probably expanded relatively recently from the northwest of the subcontinent. Probably earlier than the Indo-Aryan expansion into the Gangetic plain, but not that much earlier.

Additionally, because the Dravidians were not primordial, but expanding only somewhat ahead of Indo-Aryans, they were part of an interactive social-cultural sphere with the Indo-Aryans. I think the very high frequency of R1a1a-Z93 in some non-Brahmin South Indian groups, even tribal ones, suggests to me that the expansiveness of some paternal Indo-Aryan kin networks across the whole subcontinent.

Addendum: Much of the attention goes to the ANI dynamics. But though recent work attests to the overwhelmingly diversity, and basal character, of South Asian mtDNA lineages, we can’t be entirely sure that they are indigenous without ancient DNA. If a migration from the east at the Pleistocene-Holocene boundary was characterized by gradual diffusion of groups with reasonable effective population sizes they could have brought over their diversity.

January 14, 2018

The genetics of the St. Thomas Christians

Filed under: Indian Genetics,Nasrani,St. Thomas Christians — Razib Khan @ 10:08 pm

First, I have to say I appreciate everyone who keeps sending data to the South Asian Genotype Project. Basically, I’m automating the pipeline, finding ways to merge data from a host of sources, but also figuring out how to refine the analysis.

But until then, today I decided to do some more manual analysis of three St. Thomas Christian samples I have (also called Nasranis). The reason is that there were some questions on Twitter in relation to the genetics of this group, and though three is not a great sample size, it’s better than nothing.

The St. Thomas Christians are a diverse group of people of various denominations in the southern state of Kerala who have various origin stories. Though today the St. Thomas Christians have a range of denominational and sectarian affinities, their origins probably have something to the Church of the East.

These Christians also have origins about among the local Brahmin community, Jews, and West Asian settlers. To be honest, whenever people tell me about the Brahmin origins unless they were recent converts I discount this because there are about ten times as many St. Thomas Christians in Kerala as there are Brahmins. There is a small Jewish community in the area, and this region of India was long part of the Indian Ocean trade network of the Arabs.

I merged the three Nasrani samples with a lot of other populations. Zooming in on the South Asians, if you look at the PCA plot to the left (click it), you’ll see that they are not in the same cluster as the South Indian Brahmins (Brahmins from the four South Indian states are very similar to each). But, in comparison to non-Brahmin South Indians, they do seem Brahmin shifted.

As I have observed before these South Indian Brahmins can be thought of as more than 50% North Indian Brahmin, but the remainder being South Indian non-Brahmin. Aside from exotic exceptions (Parsis, Bengalis), most South Asians exist on an ANI-ASI “cline,” with lower caste South Indians being at one end of the cline (more ASI), and populations in the far northwest, such as the Kalash, being at the other end (more ANI). The PCA would suggest that the Nasrani are more ANI-shifted than a generic South Indian group, but less so than South Indian Brahmins.

Using Treemix to detect gene flow events, what I found is that the Nasranis look like a generic South Indian group. There’s no evidence of gene flow from Middle Eastern populations (Jews, Persians).

I did some f-3 tests and there isn’t anything conclusive I see to suggest Middle Eastern gene flow into the Nasranis fro that avenue.

Finally, I ran ADMIXTURE in supervised mode. Here are the average results for a set of South Asian populations (mean values):

Group Druze Georgian Han Iranian Telugu Yemenite Jew
Bangladeshi 1% 2% 12% 1% 83% 1%
Chamar 0% 0% 3% 0% 97% 0%
Gujurati_Patel 0% 1% 0% 10% 89% 0%
UP Kshatriya 0% 3% 1% 21% 76% 0%
Nasrani 0% 4% 1% 12% 83% 0%
Pathan 0% 4% 1% 55% 40% 0%
Piramalai_Kallar 0% 0% 2% 0% 97% 0%
SI_Brahmin 0% 4% 1% 16% 78% 0%
Telugu_Reddy 0% 3% 0% 0% 94% 3%
UP_Brahmin 0% 4% 1% 26% 69% 0%
UP_Kayastha 0% 0% 1% 20% 79% 0%
Velama 1% 1% 0% 2% 96% 0%
West_Bengal_Kayastha 0% 0% 7% 8% 85% 0%

In these data the Nasrani do look shifted in the same direction as South Indian Brahmins, though less so. Observe that there is no clear Middle Eastern signal in the Nasrani above and beyond what you see in South Asians. This, despite the fact that Indian Jews show a very strong signal of admixture from the Middle East. At this point I am confident in rejecting Nasrani St. Thomas Christian origins in a converted Jewish community, or one with a large degree of West Asian admixture.

Though the genetic profile of these three individuals does not support clear descent from South Indian Brahmins, I can not reject the model of Brahmin admixture into this community. On the contrary, a plausible model would see to be that various South Indian groups, including Brahmins, contributed to the Nasrani community over the centuries.

To be continued….

December 22, 2017

“Rakhigarhi paper” out in January 2018?

Filed under: Indian Genetics,Rakhigarhi — Razib Khan @ 10:01 pm

Tony Joseph has an interesting piece up, Who built the Indus Valley civilisation?, which people are asking me about via email. First, I don’t have any inside information. Last I heard in September was that the Rakhigarhi results were “one or two months away,” like they have been for a year or so. So I put it out of mind.

In any case, here are the important points:

All this could now change thanks to the science of genetics and four ancient skeletons excavated from a village called Rakhigarhi in Haryana. The four people to whom these bones once belonged — a couple, a boy and a man — lived roughly 4,600 years ago when the Indus Valley civilisation was in full bloom.

In the three-and-a-half years since its excavation, Shinde has brought together scientists from Indian and international institutions like the Centre for Cellular and Molecular Biology, Hyderabad (CCMB), Harvard Medical School, Seoul National University, and the University of Cambridge to work on different parts of the project, including extracting and analysing DNA from these ancient people, reconstructing their faces, and studying the remains of their habitation to understand their daily habits and ways of life.

The DNA analysis will also help figure out their height, body features, and even the colour of their eyes….

Joseph also asserts that the publication will happen in a “leading international journal” in a month or so. If I had to bet, I’d say Nature.

Harvard Medical School suggests to me they finally got David Reich’s group involved. As for Cambridge University, Eske Willerslev now has an appointment there.

The piece talks a lot about Y and mtDNA. But if they are talking about height, body features, and color of eyes, they must have gotten genome-wide data. If Eske Willerslev is involved they may have sequenced the whole genome at some coverage of at least one of the samples.

If I had to bet I think the Rakhigarhi samples will be Y haplogroups J2 or the Indian branch of L, and the mtDNA will be an Indian branch of M. In terms of genome-wide patterns they will exhibit a mixture between West Eurasian ancestry, with strong affinities to Near Eastern farmers from the Zagros, and what we now term “Ancestral South Indians” (AS), who descend from the aboriginal peoples of the subcontinent, and are genetically somewhat closer to East Eurasians than West Eurasians (to be fair, I think it is not implausible that much of ASI heritage is the product of westward migration out of Southeast Asia during the Pleistocene and early Holocene).

July 28, 2017

The Indo-Aryan migration to the Indian subcontinent

Filed under: India Genetics,Indian Genetics — Razib Khan @ 7:45 am

The piece is up at India Today. The headline and title are of course optimized for clicks. I would, for example, say that the Indo-Aryans came from the west, not the West.

In the course of writing this it has become clear that many people have very specific commitments on this issue. I think it is clear I do not. Genetic inference methods have wide shoulders of confidence in particular dates. So I’ll leave it to those with more archaeological knowledge to argue over specific date. But it strikes me that the dates point to a likelihood that much of the expansion and diversification of Indo-Aryans may precede their expansion into the Gangetic plain ~1500 BCE, the date preferred by many scholars.

Apparently we shouldn’t have to wait too long for ancient DNA from Rakighari (months, not years). But I doubt that will settle anything, as opposed to being preliminary and setting off new debates.

June 19, 2017

Indian genetics, the never-ending argument

Filed under: Genetics,India,Indian Genetics,Indo-Europeans,science — Razib Khan @ 10:44 pm

I am at this point somewhat fatigued by Indian population genetics. The real results are going to be ancient DNA, and I’m waiting on that. But people keep asking me about an article in Swarajya, Genetics Might Be Settling The Aryan Migration Debate, But Not How Left-Liberals Believe.

First, the article attacks me as being racist. This is not true. The reality is that the people who attack me on the Left would probably attack magazines like Swarajya as highly “problematic” and “Islamophobic.” They would label Hindu nationalism as a Nazi derivative ideology. People should be careful the sort of allies they make, if you dance with snakes they will bite you in the end. Much of the media lies about me, and the Left constantly attacks me. I’m OK with that because I do believe that the day will come with all the ledgers will be balanced. The Far Left is an enemy of civilization of all stripes. I welcome being labeled an enemy of barbarians. My small readership, which is of diverse ideologies and professions, is aware of who I am and what I am, and that is sufficient. Either truth or power will be the ultimate arbiter of justice.

With that out of the way, there this one thing about the piece that I think is important to highlight:

To my surprise, it turned out that that Joseph had contacted Chaubey and sought his opinion for his article. Chaubey further told me he was shocked by the drift of the article that appeared eventually, and was extremely disappointed at the spin Joseph had placed on his work, and that his opinions seemed to have been selectively omitted by Joseph – a fact he let Joseph know immediately after the article was published, but to no avail.

Indeed, this itself would suggest there are very eminent geneticists who do not regard it as settled that the R1a may have entered the subcontinent from outside. Chaubey himself is one such, and is not very pleased that Joseph has not accurately presented the divergent views of scholars on the question, choosing, instead to present it as done and dusted.

I do wish Tony Joseph had quoted Gyaneshwer Chaubey’s response, and I’d like to know his opinions. Science benefits from skepticism. Unfortunately though the equivocation of science is not optimal for journalism, so oftentimes things are presented in a more stark and clear manner than perhaps is warranted. I’ve been in this position myself, when journalists are just looking for a quote that aligns with their own views. It’s frustrating.

There are many aspects of the Swarajya piece I could point out as somewhat weak. For example:

The genetic data at present resolution shows that the R1a branch present in India is a cousin clade of branches present in Europe, Central Asia, Middle East and the Caucasus; it had a common ancestry with these regions which is more than 6000 years old, but to argue that the Indian R1a branch has resulted from a migration from Central Asia, it should be derived from the Central Asian branch, which is not the case, as Chaubey pointed out.

The Srubna culture, the Scythians, and the people of the Altai today, all bear the “Indian” branch of R1a. First, these substantially post-date 6000 years ago. I think that that is likely due to the fact that South Asian R1a1a-Z93 and that of the Sbruna descend from a common ancestor. But in any case, the nature of the phylogeny of Z93 indicates rapid expansion and very little phylogenetic distance between the branches. Something happened 4-5,000 years ago. One could imagine simultaneous expansions in India and Central Asia/Eastern Europe. Or, one could imagine an expansion from a common ancestor around that time. The latter seems more parsimonious.

Additionally, while South Asians share ancestry with people in West Asia and Eastern Europe, these groups do not have distinctive South Asian (Ancestral South Indian) ancestry. This should weight out probabilities as to the direction of migration.

Second, I read some of the papers linked to in the article, such as Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia and Y-chromosomal sequences of diverse Indian populations and the ancestry of the Andamanese. The first paper has good data, but I’ve always been confused by the interpretations. For example:

A few studies on mtDNA and Y-chromosome variation have interpreted their results in favor of the hypothesis,70–72 whereas others have found no genetic evidence to support it.3,6,73,74 However, any nonmarginal migration from Central Asia to South Asia should have also introduced readily apparent signals of East Asian ancestry into India (see Figure 2B). Because this ancestry component is absent from the region, we have to conclude that if such a dispersal event nevertheless took place, it occurred before the East Asian ancestry component reached Central Asia. The demographic history of Central Asia is, however, complex, and although it has been shown that demic diffusion coupled with influx of Turkic speakers during historical times has shaped the genetic makeup of Uzbeks75 (see also the double share of k7 yellow component in Uzbeks as compared to Turkmens and Tajiks in Figure 2B), it is not clear what was the extent of East Asian ancestry in Central Asian populations prior to these events.

Actually the historical and ancient DNA evidence both point to the fact that East Asian ancestry arrived in the last two thousand years. The spread of the first Gokturk Empire, and then the documented shift in the centuries around 1000 A.D. from Iranian to Turkic in what was Turan, signals the shift toward an East Asian genetic influx. Alexander the Great and other Greeks ventured into Central Asia. The people were described as Iranian looking (when Europeans encountered Turkic people like Khazars they did note their distinctive physical appearance).

We have ancient DNA from the Altai, and those individuals initially seemed overwhelmingly West Eurasian. Now that we have Scythian ancient DNA we see that they mixed with East Asians only on the far east of their range.

The second paper is very confused (or confusing):

The time divergence between Indian and European Y-chromosomes, based on the closest neighbour analysis, shows two different distinctive divergence times for J2 and R1a, suggesting that the European ancestry in India is much older (>10 kya) than what would be expected from a recent migration of Indo-European populations into India (~4 to 5 kya). Also the proportions suggest the effect might be less strong than generally assumed for the Indo-European migration. Interestingly, the ANI ancestry was recently suggested to be a mix of ancestries from early farmers of western Iran and people of the Bronze Age Eurasian steppe (Lazaridis et al. 2016). Our results agree with this suggestion. In addition, we also show that the divergence time of this ancestry is different, suggesting a different time to enter India.

Lazaridis et al. accept a mass migration from the steppe. In fact, the migration is to such a magnitude that I’m even skeptical. Also, there couldn’t have been a European migration to South Asia during the Pleistocene because Europeans as we understand them genetically did not exist then!!!

I assume that many of the dates of coalescence are sensitive to parameter conditions. Additionally, they admit limitations to their sampling.

Ultimately the final story will be more complex than we can imagine. R1a is too widespread to be explained by a simple Indo-Aryan migration in my opinion. But we can’t get to these genuine conundrums if we keep having to rebut ideologically motivated salvos.

Related: Ancient herders from the Pontic-Caspian steppe crashed into India: no ifs or buts. I wish David would be a touch more equivocal. But I have to admit, if the model fits, at some point you have to quit.

June 9, 2017

The last days of pre-ancient DNA Indian population genomics

Filed under: Indian Genetics,science — Razib Khan @ 7:21 pm

If anyone wants to know about the population genetics of South Asia, I recommend three papers (all are open access):

Genetic Evidence for Recent Population Mixture in India

A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals

The promise of disease gene discovery in South Asia

In the near future ancient DNA will do for South Asia what has been done for Europe, and to a lesser extent the Near East. It will pull back our veil of ignorance. But until then we have genomic inference from larger data sets with a greater number of markers. What can we say now?

– The 2009 work that modern South Asians are broadly a compound of two streams of the out of Africa populations is correct. One is much like other West Eurasians. Another is distantly related to other East Eurasians, with possible affinities to Paleolithic Southeast Asian hunter-gatherers.

– The West Eurasian ancestry of South Asians, the “Ancestral North Indians” (ANI), does likely seem to be a a mixture at minimum between two groups. One element is related to the eastern farmers who first adopted agriculture on the slopes of the Zagros ~10,000 year ago. Another stream is closely related to the Yamna people who flourished on the Eurasian steppe north of the Black Sea ~5,000 years ago.

– The Munda people see to have a distinct Southeast Asian component that ties them with other Austro-Asiatic peoples. Their migration was almost certainly tied to the Neolithic migration of rice farmers.

– The R1a1a-Z93 Y chromosomal lineage found across much of South Asia, and especially the higher castes and the north, increased in frequency within the last 4,000 years. It is almost certainly exogenous to South Asia; ancient DNA from the steppe finds the Z93 in Iranic peoples, but no Indian ancestry in these groups.

June 15, 2011

The Cape Coloureds are a mix of everything

A Cape Coloured family

I’ve mentioned the Cape Coloureds of South Africa on this weblog before. Culturally they’re Afrikaans in language and Dutch Reformed in religion (the possibly related Cape Malay group is Muslim, though also Afrikaans speaking traditionally). But racially they’re a very diverse lot. In this way they can be analogized to black Americans, who are about ~75% West African and ~25% Northern European, with the variance in ancestral proportions being such that ~10% are ~50% or more European in ancestry. The Cape Coloureds though are much more complex. Some of their ancestry is almost certainly Bantu African. This element is related to the West African affinities of black Americans. And, they have a Northern European element, which likely came in via the Dutch, German, and Huguenot settlers (mostly males). But the Cape Coloureds also have other contributions to their genetic heritage. Firstly, they have Khoisan ancestry, whether from Bushmen or Khoi. This is well known in their oral memory. The the hinterlands of the Cape of Good Hope are beyond the ecological range of the Bantu agricultural toolkit, so the region was still dominated ...

April 23, 2011

Resolutions in the Indian genetic layer cake

Filed under: Genetics,Genomics,Indian Genetics,Indian genomics — Razib Khan @ 7:54 pm

Two years ago Reconstructing Indian Genetic History reframed how we should view South Asian historical genomics. In short, Indians can be viewed as a hybrid between a West Eurasian group, “Ancestral North Indians” (ANI) and a very different group, “Ancestral South Indians” (ASI), which had distant connections to West and East Eurasians. At least to a first approximation. Last fall I posted on a new paper which surveyed the Austro-Asiatic speaking peoples of India, and concluded that they were exogenous to the subcontinent. This is an interesting point. Prehistoric treatments of South Asia often use linguistic terms to denote putative ancient populations. One model is that first it was the Munda, the most ancient Austro-Asiatics. Then the Dravidians. And finally the Indo-Aryans. These genetic data imply that the Munda arrived after the initial ANI-ASI synthesis. The Munda people of India can be thought of as ANI-ASI, with an overlay of East Eurasian ancestry.

Zack Ajmal’s K = 11 ADMIXTURE run has highlighted some further issues. He has a set of Austro-Asiatic samples, as well as a host of Indo-Aryan and Dravidian speaking populations. I now believe we can now further clarify and refine our model of the peopling ...

March 12, 2011

Harappa Ancestry Project @ N ~ 50

Zack Ajmal now has over 50 participants in the Harappa Ancestry Project. This does not include the Pakistani populations in the HGDP, the HapMap Gujaratis, the Indians from the SVGP. Nevertheless, all these samples still barely cover vast heart of South Asia, the Indo-Gangetic plain. Here is the provenance of the submitted samples Zack has so far:

Punjab: 7 Iran: 7 Tamil: 6 Bengal: 5 Andhra Pradesh: 2 Bihar: 2 Karnataka: 2 Caribbean Indian: 2 Kashmir: 2 Uttar Pradesh: 2 Sri Lankan: 2 Kerala: 2 Iraqi Arab: 2 Anglo-Indian: 1 Roma: 1 Goa: 1 Rajasthan: 1 Baloch: 1 Unknown: 1 Egyptian/Iraqi Jew: 1 Maharashtra: 1

Again, note the underrepresentation of two of India’s most populous states, Uttar Pradesh, ~200 million, and Bihar, ~100 million. Nevertheless, there are already some interesting yields from the project. Below I’ve reedited Zack’s static images (though go to his website for something more dynamic) with the labels of individuals. I’ve highlighted myself and my parents with the red pointers.

To the left is a set of plots and tables which I’ve spliced together from Zack’s various posts. What you need to know is that this at K = 12, and I’ve used the labels that Zack gave the various putative “ancestral populations” which emerged out ...

January 24, 2011

Harappa Ancestry Project, update

Last week I announced the Harappa Ancestry Project. It now has its own dedicate website, http://www.harappadna.org. Additionally, it has its own Facebook page. For Zack to get his own URL he needs about 10 more “likes,” so please like it! (if you are so disposed) Finally, from what I’ve heard the first wave of the 23andMe holiday sale results are coming online this week. Actually, one of the relatives who I purchased the kit for is in processing currently, so I know that we should have a bunch of new people in the system very, very, soon.

Speaking of people, last I heard Zack had gotten about a dozen responses. That’s enough to start an initial round of runs, but obviously he needs more people. More importantly, the goal here is to get better population coverage. One of the things we know intuitively and also from the most current research is the existence of a lot of within-region population variation in South Asia which is structured by community. In other words, a sample of 30 people, where you have 3 from 10 different communities exhibiting geographical and ...

August 10, 2010

PCA, Razib around the world (a little)

price_fig1I have put up a few posts warning readers to be careful of confusing PCA plots with real genetic variation. PCA plots are just ways to capture variation in large data sets and extract out the independent dimensions. Its great at detecting population substructure because the largest components of variation often track between population differences, which consist of sets of correlated allele frequencies. Remeber that PCA plots usually are constructed from the two largest dimensions of variation, so they will be drawn from just these correlated allele frequency differences between populations which emerge from historical separation and evolutionary events. Observe that African Americans are distributed along an axis between Europeans and West Africans. Since we know that these are the two parental populations this makes total sense; the between population differences (e.g., SLC24A5 and Duffy) are the raw material from which independent dimensions can pop out. But on a finer scale one has to be cautious because the distribution of elements on the plot as a function of principal components is sensitive to the variation you input to generate the dimensions in the first place.

I can give you a concrete example: me. I showed you my 23andMe ancestry painting yesterday. I didn’t show you my position on the HGDP data set because I’ve shared genes with others and I don’t want to take the step of displaying other peoples’ genetic data, even if at a remove. But, I have reedited some “demo” screenshots and placed where I am on the plot to illustrate what I’m talking about above. The first shot is my position on the two-dimensional plot of first and second principal components of genetic variation from the HGDP data set.


gsa-lillymendel-worldNo surprise that I’m in the Central/South Asian cluster. But what may surprise you is that I’m not in the South Asian cluster, I’m in the Central Asian cluster. In the Central Asian cluster are Uyghurs and Hazaras. These are two hybrid populations, a mixture of West and East Eurasian elements. The Uyghurs are likely the outcome of a process of admixture between the Iranian and Tocharian Indo-European populations of the cities of the Tarim basin, and later Turkic speaking settlers who arrived in the wake of the expansion and later collapse of the first Uyghur Empire (the historical connection between the current Uyghurs and ancient Uyghurs is tenuous at best, and complicated). The Hazaras are a more recent population, likely emerging as the product of intermarriages between Mongol soldiers who arrived in the 13th century, and indigenous women, Persians, Turks, and assorted Indo-Iranian groups between the Zagros and Khyber Pass. It is somewhat ironic that I’m on the edge of the Hazara cluster since they are almost certainly in part descended from Genghis Khan’s family, and my own surname is Khan. But I know that my Y chromosomal lineage is R1a1, very common across Central and Southern Eurasia, and not a Mongolian one at all.

23andmepcazoomZoom! Now we’ve constrained the input data set to the Central/South Asian groups. First, look at the Kalash. They’re strange, which is no surprise, they’re an inbred mountain group in Pakistan who have not adopted Islam. The Pakistani Taliban looks to be ending them as we speak. I really would prefer that they were just thrown out of the data set for this zoom view, because on this fine grained scale I don’t think they add much at all. They’re just an example of what long term endogamy can do to your allele frequencies. The bigger picture is the axis between the populations of Pakistan, and those of Central Asia. Observe that I’ve changed position. Whereas when taking world wide genetic variation into account I clustered with Central Asians, now I’m 2/3 of the way to the South Asian cluster. I will tell you that I’ve shared “genes” with around 50 South Asians now, from various parts of the subcontinent, and in the 23andMe plot they overlay the South Asians nearly perfectly. I’ve put labels at the approximate ethno-linguistic position. I’m an outlier. 23andMe tells me that I’m 43% “East Asian.” The typical South Asian is in the 10-30% range. My first assumption was that I have a lot of ancient South Indian, which just shows up as East Asian in their algorithm. With this in mind I tried sharing with a lot of South and East Indians, and found out two interesting points. First, South Indians seem no higher than 30-35% East Asian. Bengalis on the other hand are more East Asian, with Bangladeshis more East Asian than West Bengalis. My sample size for Bengalis is small, so take that with caution. Second, the PCA plots put the South Indians firmly in the South Asian cluster, but the Bengalis trail out toward my own position. This indicates again that different methods are telling you slightly different things. The PCA is only a thin slice of variation, but it’s highly informative of between population differences. A Bengali and a South Indian with the same “East Asian” fraction in the ancestry painting nevertheless have consistently different positions on the PCA, with Bengalis closer to the East Asians. Additionally, there’s an ethnic Persian in this zoom plot that I’m describing, and they are positioned near the Balochi. But on the world wide plot they’re on the margins of the European cluster. Another illustration that position of an element is sensitive to the input data because of how the dimensions are generated.

Blaine Bettinger, who inspired me to post this, told a story with his ancestry painting which was plausible. What can I say? First, I have less than 1% African ancestry. This could be noise. But, I do observe that the South Asians with Muslim names are enriched in the set of those who I’ve shared genes with and who have less than 1%, but not 0%, African ancestry. Just as Muslim South Asians have non-trivial West Asian ancestry, I suspect that many of us have Sub-Saharan African ancestry through the same dynamic. Sub-Saharan African soldiers were prominent across South Asia with the arrival of Muslims. Bengal even has a period of rule by Abyssinian rulers. But the bigger issue for me is the East Asian component. Here is a figure from a paper published 4 years ago:

journal.pgen.0020215.g005

The figure is showing Fst value comparing Indian Americans with Europeans and East Asians. Fst measures between population differences in allele frequency, in this case the alleles being 207 indels. Take a look at the Bengalis. These are West Bengalis, who I believe have a lesser East Asian component, but even there the allele frequency difference to East Asians is near that of Europeans. The Assamese, who speak a language very close to Bengali, are similar. Assam was ruled by a Tibeto-Burman people for nearly 600 years. The Oriya speakers, from the southwest of Bengal, are more distant from East Asians. As one goes south and east, and west and north, the distance from East Asians increases. This shouldn’t be that surprising, but nice to confirm. The fact that the genetic distance increases as one goes south means that for northeast South Asia you need to complexify the model from a two-way admixture with “ancient North Indians” and “ancient South Indians.” Set next to these two is an East Asian element, which is also clear in the Indo-Aryan peoples of Nepal.

Sheikh Hasina, Khaleda ZiaOf course anyone who knows Bengalis won’t be totally surprised by an East Asian component to their ancestry. To the left are head shots of the two women who have dominated Bangladeshi politics for the past two decades, Khaleda Zia and Sheik Hasina. They’re both Bengalis, but they do look different, and I know many people who look like one or the other (or a combination). My family is from one of most easternmost districts of Bengali, next to Tripura. In fact my late maternal grandmother lived in Tripura for some of her childhood (she was almost trampled to death by the Maharani of Tripura’s insane elephant as a young girl!). When I was a young child I once saw a black and white photo from my father’s college days, and I was curious who the Asiatic looking young man in the middle of the photograph was. Turns out it was my father! Sometimes our expectations affect how we perceive people. I have never perceived my father to have an Asian cast to his features as a more mature man, but others have told me that he does still exhibit them.

There is still the question of how Bengalis came to have this particular admixture. I think the most plausible scenario probably synthesizes conventional village-to-village intermarriage and isolation-by-distance, along with some component of migrationism. Tribes such as the Chakma have left Burma in historical time. The Chakma of Bangladesh now speak a dialect of Bengali, not their ancestral Sino-Tibetan tongue. I believe that a non-trivial portion of Bengalis have ancestors who were tribal people who shifted their religious identity to that of Hinduism or Islam (from Theravada Buddhism in the case of the Chakma, or animism in the case of the Garos before their Christianization). But eastern South Asia is adjacent to mainland Southeast Asia, and it stands to reason that continuous gene flow would over time would also have introduced East Asian alleles into the Bengali gene pool.

Image Credit: TopNews.in

August 6, 2010

Strange genetic variation in South Asia

Filed under: Genetics,Genomics,Indian Genetics,Indian genomics — Razib Khan @ 12:11 am

Dienekes has a post up where he highlights the fact that the recent paper on South Asian metabolic diseases has a figure which elucidates population structure within the region. Accounting for structure is important for genome-wide associations since you might get a spurious correlations if trait value/disease frequency is simply tracking cryptic population variation. Dienekes says:

The existence of two clusters is kind of obvious, while their interpretation is not as dots of the same color appear in both clusters: a placement of these individuals in a global context might have been useful here. Things are clearer at the top cluster which shows a clear gradient anchored by Punjabi Sikh and Hindu Tamils on either end.

Also of interest is the group of isolated Muslim/Christian individuals on the left which deviate strongly from the mainstream; these probably represent exogenous elements that don’t resembe the bulk of the Indian population.

The second issue is easily addressed. The Christian outliers are both give English as their native language. That suggests to me that they’re Anglo-Indian, a community of mixed South Asian and European origin. South Asian Muslims are overwhelmingly of indigenous origin. But, a minority of the Muslim elite are West Asian, or have substantial West Asian ancestry, as is evident by the fact that they look white. Benazir Bhutto’s mother was of Kurdish and Persian ethnic background (her family was from Esfahan in Iran). I’ve reedited the religious & linguistic PC plots to fit onto the screen.

indiaweird1

So what’s going on with the cluster which extends along the second principal component? The first component is probably just a European/West Asian-South Asian axis of variation. But I don’t understand where the variation for the second is coming from. Observe that the one South Indian group, Tamil speakers, are not represented in the secondary cluster. The plot reminded me of something I saw last fall.

Below is figure S4 is from the supplements of Reconstructing Indian population history. I added some labels. The Indian cluster is tight when the genetic variation includes non-Indian groups. But, when you constrain the variation to Europeans and South Asians only, something strange happens:
guj.pdf-pages

The Gujarati sample is from Houston, and is from HapMap Phase 3. I have a suspicion that the secondary cluster among the Gujaratis here is of the same class of phenomenon as the secondary cluster in the first plot. The Anglo-Indians and West Asian Muslims serve as rough proxies for Europeans, and you have an expected European-South Asian axis. But you also have this strange orthogonal component. I had assumed that the plot from the Reich et al. paper was an anomaly, but I’m not so sure seeing the second paper.

July 23, 2010

One principal component to rule them all?

ResearchBlogging.orgDespite the reality that I’ve cautioned against taking PCA plots too literally as Truth, unvarnished and without any interpretive juice needed, papers which rely on them are almost magnetically attractive to me. They transform complex patterns of variation which you are not privy to via your gestalt psychology into a two or at most three dimensional representation which can you can grok immediately. That is why History and Geography of Genes was so engrossing. You recognize patterns which were otherwise unrecognizable. But how you interpret those patterns, that’s a wholly different matter. And how those patterns arise is also not something one can ignore.

price_fig1First, let’s start with an easy case. To the left is a PCA plot with four populations. Nigerians, East Asians (Chinese + Japanese), Europeans (whites from Utah), and finally, African Americans. The x-axis is the first principal component of variation, and the y-axis the second. That means that the x-axis is the independent dimension of variation within the patterns of genetic data which explains the largest fraction of the total amount of genetic variation. The sum totality of the variation can be decomposed into an large set of independent dimensions which can be rank ordered from the largest explanatory components to the smaller ones, successively by number. In a human genetic context the first principal component invariably separates Africans from non-Africans, and the second principal component often maps onto a west-east axis from Europe to the New World. Subsequent principal components can often be useful in smoking out fine scale distinctions, or relationships which are confused by the existence of similar but different signals in admixed populations.

The interpretation of this plot is rather easy. You see that African Americans lay along a continuum between Nigerians and Europeans, skewed toward Nigerians, with some outliers toward East Asians. We know from other genetic findings that ~20% of the African American ancestral quanta is European, but, that quanta is not equally distributed across the population. ~10% of the African American population is more than 50% European in ancestry, while 90% is less than 50% European. And so you have a distribution which reflects this variation. As for the outliers, I will speculate and suggest that these are indications of Native American ancestry among some African Americans.

The story I presented above is probably plausible as an explanation of the visual because we have a wealth of historical data to corroborate the plausibility of that narrative. The fit between the results from the technique of analysis of genetic variation and what scholars have long inferred from textual sources is relatively easy. It is far more difficult to look at a PCA plot, and generate a plausible narrative that you yourself accept with a high degree of confidence with little external support. It is with that caveat in mind that I present Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping:

High-throughput genotyping data are useful for making inferences about human evolutionary history. However, the populations sampled to date are unevenly distributed, and some areas (e.g., South and Central Asia) have rarely been sampled in large-scale studies. To assess human genetic variation more evenly, we sampled 296 individuals from 13 worldwide populations that are not covered by previous studies. By combining these samples with a data set from our laboratory and the HapMap II samples, we assembled a final dataset of ~ 250,000 SNPs in 850 individuals from 40 populations. With more uniform sampling, the estimate of global genetic differentiation (FST) substantially decreases from ~ 16% with the HapMap II samples to ~ 11%. A panel of copy number variations typed in the same populations shows patterns of diversity similar to the SNP data, with highest diversity in African populations. This unique sample collection also permits new inferences about human evolutionary history. The comparison of haplotype variation among populations supports a single out-of-Africa migration event and suggests that the founding population of Eurasia may have been relatively large but isolated from Africans for a period of time. We also found a substantial affinity between populations from central Asia (Kyrgyzstani and Mongolian Buryat) and America, suggesting a central Asian contribution to New World founder populations.

The studies which came out of the original HapMap had northern Europeans, Yoruba from Nigerians, and Chinese & Japanese. These three populations can tell us a lot, but there’s something lacking in the coverage. The HGDP sample is better. But specifically because of political considerations it was not feasible to collect Indian samples, so Pakistani ones are used in their stead. Additionally, the HGDP sample is a touch biased toward isolated and distinctive populations, such as the Kalash of Pakistan. This genetic distinctiveness is important to catalog because it is fast disappearing. But the Kalash are so unique because of their long history of isolation, so one can’t really use them as a proxy population for Pakistanis, as one could with Sindhis. The POPRES sample seems to complement the HGDP well, but I don’t see it being used so much. Since the next phase of the HapMap has more populations, some of the deficiencies which emerged with the utilization of just three terminal groups (in a World Island context) will soon no longer be an issue.

But until that time it’s nice when studies come out which close some of the gaps in our knowledge of world wide genetic variation. This is one such study. I’m somewhat familiar with the samples already because I’ve seen it in an analysis of Indian populations. It seems that it is somewhat skewed toward South and Southeast Asian populations, but hey, these are groups which need to draw the long straw sometimes as well.

Before I go any further I should mention that they use a SNP-chip with hundreds of thousands of markers. Additionally, they looked at copy number variation. Two rather different types of variation within the genome, probably to double check that the outcomes were the same. Population historical events which shape patterns of genomic variation would presumably have a similar large scale effect on both types of variation. In their results that checked out, or so they claimed, as the paper is a manuscript without the supplements attached.

Though there’s some interesting fine-grained analysis to be had, they draw some macro-scale and deep time inferences as well. First, you probably know the famous fact that 15% of variation in genes is between races, and 85% within races. That’s derived from the Fst statistic, which is basically partitioning between and within population variance across two populations. Obviously the value of Fst varies by the set of populations you’re comparing. That between Mbuti Pygmies and Japanese is far higher than between Chinese and Japanese. Using the HapMap the Fst was 16%. About what you’d expect. To equalize sample sizes with the HapMap they randomly selected individuals from a pooled set grouped by continent from their populations, and calculated Fst. They found values around 11%. Why the difference? Because their data set included populations which were between the three clusters within the HapMap.

This is naturally not a surprising result at all, but it does reiterate one issue which sometimes crops up: Platonism in relation to race. The northern European whites in the HapMaps are the whites par excellence. Turks, who are perhaps more centrally located in the genetic variation of West Eurasian and North African peoples, what used to be termed “Caucasoid,” are “less white.” Similarly, Nigerians are more African than Ethiopians. Chinese and Japanese are more Asian than Burmese. And so forth. When modeling between group differences there is I think a somewhat old-fashioned tendency to consider some populations racial archetypes. That modulates the input which modifies the results somewhat. The analytical technique may be as cold as stone, but they are used by flesh and blood human beings.

There is also some funny business going on with haplotype and SNP heterozygosities which I think needs to be highlighted, and speaks to the fact that SNP-chips are not perfect. They’re tools, and human tools are impacted by arbitrary or instrumental choices humans make. Let me quote:

We also compared the SNP and haplotype heterozygosity values in each population (Figure 2B). These two quantities are generally highly correlated, although there are several exceptions: First, SNP heterozygosity is higher than haplotype heterozygosity in European and Central Asian populations. This may reflect a SNP ascertainment bias, since many of these polymorphisms were historically selected to maximize heterozygosity in European populations. Second, the Pygmy sample shows a low SNP heterozygosity despite relatively high haplotype heterozygosity. This unusual pattern could be caused by stronger effects of SNP ascertainment bias in this population than in others. Indeed, a recent study of Khoisan individuals (another hunter-gatherer group from Africa) showed a similar pattern: despite high SNP heterozygosity (~60%) in whole-genome sequence data, a Khoisan individual showed low heterozygosity on the SNP microarray genotypes (~22%) . Alternatively, this difference could also reflect unique attributes of population history.

In plain English the gene chips were designed with Europeans in mind, so they don’t necessarily pick up all the variation in non-European groups, who are believe it or not genetically different. This issue cropped up (as alluded to in the above text) with the recent paper which sequenced some Bushmen as well as Desmond Tutu. The Bushmen have a lot of variation, this is well known, but they have variation at markers where Europeans don’t, and if Europeans don’t the chips may not look for polymorphism at that locus. This sort of thing probably doesn’t affect broad population relationships, but if you want to zoom in and do analysis which is sensitive to fine distinctions and quantitative differences, then it might be problematic.

Let’s jump to the pretty charts. First, a PCA plot with all of the individuals from all of the populations:

indo1

Note that PC 1 accounts for nearly eight times as much variation as PC 2. This speaks to the African vs. non-African gap. Because their data set is relatively thick in “intermediate” groups you see a spectrum. The vertical axis is obviously mostly east-west. And here’s the accompanying bar plot derived from the ADMIXTURE program. K = putative ancestral populations.

indo2

With this many populations at K = 12 I think you could write a fantasy novel worthy of Tolkien. K = 4 is more realistic. Among the African populations you see likely Eurasian admixture in some eastern, and it seems Bushmen, individuals. In Eurasia itself you see a clinal gradation of admixture between putative ancestral components that seems to follow longitude rather well.

Because so much of the variation in the total sample is due to Africans, removing them from the picture will allow us to focus more on the relationships of the Eurasian groups. And so that’s exactly what they did. Note that focusing on the Eurasian groups does not mean simply magnifying or zooming in on the Eurasian section of the PCA plot, rather, the plots are regenerated with a subset of the previous genetic variation. In other words, the dimensions will shake out a bit differently.

The first plot shows Eurasian populations as a whole. The second removes Europeans and Near Easterners.

indo3

Notice again the scale. The vast majority of the variance seems to be east-west. But, there is a noticeable north-south split. For the South Asian population it looks like they had Pakistanis who were farmers of modest means (Arain), high caste South Indians, and very low caste or tribal South Indians. For this Indian sample there’s a problem, and it’s the sample problem which plagued the Up Series, they are looking at the very top and bottom of Indian society and ignoring the middle. Presumably the middle is going to be somewhere in the middle genetically as well, but nevertheless that’s something to consider in a paper which presumes to fill in the patchiness of others. In contrast, the Nepali sample was notably ethnically diverse, including both the dominant Indo-Aryan segment as well as the Tibeto-Burman Newar.

In the first panel there are some curious patterns with the Southeast Asian groups. Culturally, as in language and history, the Thai and Vietnamese have relatively recent roots in the southern regions of modern China. The Dai of Yunnan are the same people in origin as the Thai of Thailand and the Lao of Laos. Both derive from migrations from Yunnan. This is historically attested, even if somewhat fragmentarily. The heartland of the Vietnamese was in the Red River valley and north into southern China, and they spread down the coast and toward the Me kong only within the last 1,000 years. Southeast Asia was not uninhabited during this period. It was dominated by the Khmer Empire, which was slowly consumed by the expanding Thai and Vietnamese polities. Some scholars argue that French colonialism actually preserved an independent Khmer nation, which otherwise would have been divided between Thailand and Vietnam, as Poland was between Germany and Russia. So the Khmer are the indigenous people, while the Thai and Vietnamese are intrusive.

What do the PCA plots tell us? I do not know where the Vietnamese samples were collected. If they were from South Vietnam, then their close position to the Chinese suggests to me that there was substantial demographic replacement or expansion from the Red River valley. In contrast, the Thai are relatively distant from the Chinese. In fact, the Cambodians are somewhat closer to the Chinese! The samples here are small, and the sets overlap, so I wouldn’t put too much stock in that. But, Thailand is geographically closer to South Asia, so isolation by distance models would predict this pattern. It seems that the ethnogenesis of the Thai occurred through the expansion of the Thai identity, likely among Khmer peoples. And it is intriguing that the Iban, an indigenous people of Borneo, are closer to the Vietnamese than they are to the Cambodians. We know that there was substantial migration between coast Vietnam and Maritime Southeast Asia, the Chams of central Vietnam, and dominant in the southern half of the nation before the Vietnamese expansion, are a Malayan people who may have migrated from Borneo.

Shifting to the second panel there’s more here to say about the South Asians. First, geography. The two lower caste groups are actually Dalits from Andhara Pradesh, a South Indian state. Dalits used to be called outcastes, so they aren’t even lower caste, but without caste. The upper caste groups are Brahmins from Andhara Pradesh and Tamil Nadu. Finally, the Irula are tribal people from Tamil Nadu. To me the tribal samples often produce weird results, and I suspect that has to do with population bottlenecks and their demographic isolation. People leave the tribes (becoming part of the Hindu society, or converting to Islam or Christianity), but few join them. The Pakistani sample are Araina, a group of conventional Punjabi farmers who have a made up ancestry from Arabs (obviously made up because they don’t cluster with Near Easterners). Let’s compare to a chart from Reich et al.:

indiareich7

It seems to me that they’re in rough agreement (Reich et al. uses the same two low caste groups for Andhara Pradesh for low caste South Indians by the way). Though South Indian Brahmins speak South Indian languages, and reside amongst other South Indian groups, their genetic heritage is somewhat different. Similarly, tribal peoples are also distinct from caste Hindus. Reich et al. posit that South Asians can be modeled as a composite of two groups, Ancestral North Indians, ANI, and Ancestral South Indians, ASI. Presumably the former are intrusive to the subcontinent in relation to the latter. There seem two clear dimensions along which the ratio of ANI to ASI vary: geography and caste. The proportion of ASI seems to increase from the northwest to the southeast. And, the proportion of ANI seems to increase from tribal to low caste to upper caste. The Pakistani sample does not seem to be from an elite caste (or it does not seem they were converted from an elite caste), but they have more affinity with West Eurasian populations than South Indian Brahmins. It is likely that the latter are intrusive to the south, and have admixed with the local population.

Finally, a word on the Nepali sample. On top of the ANI-ASI mixture, the Nepali groups have varying levels of Tibeto-Burman, and so East Asian, affinity. This is not a surprise if you have met Nepalis. The Assamese, and to a lesser extent Bengalis, also exhibit this pattern of Tibeto-Burman admixture. The Brahmins of Nepal are intrusive like the Brahmins of South India, and like the South Indians they admixed with the local substrate.

Next let’s move to a ADMIXTURE plot.

indo6

The selection of a particular K obviously is conditioned by the patterns which “fit” with what you know, and what you expect. With that caution aired, the population represented by red can easily be thought of as a Middle Eastern group which expanded with agriculture. That seems to be what the authors favor. The brown population is the modal Indian ancestral population, which has little presence outside the subcontinent (nice color coding by the way! Brown people are brown). A green color represents a population which the tribal group, the Irula, are heavily weighted on. This reminds me too much of the Kalash. I suspect that the Irula went through some bottleneck or other distinctive event, and some have assimilated to various low status groups in South India.

I’m not a fantasist intent on world-building, so I’ll stop with that in reading the tea leaves of the charts. But there’s an important section which I skipped over, and will move back to now. And that’s the deep time aspect:

A more likely explanation for the OoA bottleneck is that Eurasia was populated by a larger population that had been relatively isolated from other modern human populations for tens of thousands of years prior to the expansion. The first fossil evidence for modern humans outside of Africa is in the Middle East at Skhul and Qafzeh between 80,000-100,000 years ago, which is at least 20,000 years prior to the Eurasian diaspora. If a population of modern humans remained in the Middle East until the expansion into Eurasia, there would have been sufficient time for genetic drift to reduce heterozygosity dramatically before the Eurasia expansion. This “Middle East isolation” hypothesis provides a robust explanation for the relative homogeneity of European and Asian populations relative to African populations (see Figures 3A-B) and is supported by a recent maximum likelihood estimate of 140,000 years ago for the time of Eurasian-West African population separation . Interestingly, a recent study of the Neandertal genome suggests that the non-African individuals, but not the Africans, contain similar amount of admixture (1-4%) with the Neandertals . The authors suggest that the admixture must have happened between the Neandertals with an ancestral non-African population before the Eurasian expansion. Given the fossil, archaeological, and genetic evidence, the Middle East isolation hypothesis warrants rigorous evaluation as whole-genome sequence data become available.

Like the vast majority of genetic studies this work supports the Out of Africa hypothesis. Non-Africans are all branches from a specific African branch. Or more accurately, an African branch which left Africa. The reduction in heterozygosity, a measure of genetic variation, from Africa to Eurasians was large. Additionally, within Africa south of the Sahara there’s little difference in heterozygosity as a function of geography, but outside of Africa it drops off as a function of distance from Africa. A plausible model then is a radiation from a small ancestral population to the four corners of the world, going through a series of bottlenecks along the way. Or at least that’s a model supported by genomic data. But, the drop in heterozygosity is so great a quick separation from the parental African population would require an implausibly small number of founders (less than 10 in one generation). So, to explain the data, they are suggesting here that the original population was not quite so small, but was isolated from the large African population for thousands of years. They assume genetic drift reduced heterozygosity, but if the model is correct I suspect that the way it worked was that bottlenecks due to climatic fluctuations swept clean a lot of the genetic variation. But in the interregnum the isolated population may have interbred with Neandertals. In fact, perhaps they picked up genes from Neandertals when their own effective population was extremely small.

In any case, a wide ranging paper. They manage to tie their results into two other blockbuster papers.

H/T Dienekes

Citation Xing J, Watkins WS, Shlien A, Walker E, Huff CD, Witherspoon DJ, Zhang Y, Simonson TS, Weiss RB, Schiffman JD, Malkin D, Woodward SR, & Jorde LB (2010). Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping. Genomics PMID: 20643205

Powered by WordPress