Razib Khan One-stop-shopping for all of my content

May 9, 2017

Bell Beaker going to crash-land and blow our minds soon?

Filed under: Anthroplogy,Bell Beaker — Razib Khan @ 3:03 pm

I’m pretty sure that the Beak part is not a typo…. (if it is, Pontus better delete it)

April 18, 2017

Women hate going to India

Filed under: Anthroplogy,Genetics,Human Genetics,India,Parsi — Razib Khan @ 9:11 pm


For some reason women do not seem to migrate much into South Asia. In the late 2000s I, along with others, noticed a strange discrepancy in the Y and mtDNA lineages which trace one’s direct male and female lines: in South Asia the male lineages were likely to cluster with populations to the north an west, while the females lines did not. South Asia’s females lines in fact had a closer relationship to the mtDNA lineages of Southeast and East Asia, albeit distantly.

One solution which presented itself was to contend there was no paradox at all. That the Y chromosomal lineages found in South Asia were basal to those to the west and north. In particular, there were some papers suggesting that perhaps R1a1a originated in South Asia at the end of the last Pleistocene. Whole genome sequencing of Y chromosomes does not bear this out though. R1a1a went through rapid expansion recently, and ancient DNA has found it in Russia first. But in 2009 David Reich came out with Reconstructing Indian population history, which offered up somewhat of a possible solution.

What Reich and his coworkers found that South Asia seems to be characterized by the mixture of two very different types of populations. One set, ANI (Ancestral North Indian), are basically another western or northwestern Eurasian group. ASI (Ancestral South Indian), are indigenous, and exhibit distant affinities to the Andaman Islanders. The India-specific mtDNA then were from ASI, while the Y chromosomes with affinities to people to the north and west were from ANI. In other words, the ANI mixture into South Asia was probably through a mass migration of males.

But it’s not just Y and mtDNA in this case only. A minority of South Asians speak Austro-Asiatic languages. The most interesting of these populations are the Munda, who tend to occupy uplands in east-central India. Older books on India history often suggest that the Munda are the earliest aboriginals of the subcontinent, but that has to confront the fact that most Austro-Asiatic language are spoken in Southeast Asia. There was no true consensus where they were present first.

Genetics seems to have solved this question. The evidence is building up that Austro-Asiatic languages arrived with rice farmers from Southeast Asia. Though most of the ancestry of the Munda is of ANI-ASI mix, a small fraction is clearly East Asian. And interestingly, though they carry no East Asian mtDNA, they do carry East Asian Y. Again, gene flow mediated by males.

The same is true of India’s Bene Israel Jewish community.

A new preprint on biorxiv confirms that the Parsis are another instance of the same dynamic: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection:

Zoroastrianism is one of the oldest extant religions in the world, originating in Persia (present-day Iran) during the second millennium BCE. Historical records indicate that migrants from Persia brought Zoroastrianism to India, but there is debate over the timing of these migrations. Here we present novel genome-wide autosomal, Y-chromosome and mitochondrial data from Iranian and Indian Zoroastrians and neighbouring modern-day Indian and Iranian populations to conduct the first genome-wide genetic analysis in these groups. Using powerful haplotype-based techniques, we show that Zoroastrians in Iran and India show increased genetic homogeneity relative to other sampled groups in their respective countries, consistent with their current practices of endogamy. Despite this, we show that Indian Zoroastrians (Parsis) intermixed with local groups sometime after their arrival in India, dating this mixture to 690-1390 CE and providing strong evidence that the migrating group was largely comprised of Zoroastrian males. By exploiting the rich information in DNA from ancient human remains, we also highlight admixture in the ancestors of Iranian Zoroastrians dated to 570 BCE-746 CE, older than admixture seen in any other sampled Iranian group, consistent with a long-standing isolation of Zoroastrians from outside groups. Finally, we report genomic regions showing signatures of positive selection in present-day Zoroastrians that might correlate to the prevalence of particular diseases amongst these communities.

The paper uses lots of fancy ChromoPainter methodologies which look at the distributions of haplotypes across populations. But some of the primary results are obvious using much simpler methods.

1) About 2/3 of the ancestry of Indian Parsis derives from an Iranian population
2) About 1/3 of the ancestry of Indian Parsis derives from an Indian popuation
3) Almost all the Y chromosomes of Indian Parsis can be accounted for by Iranian ancestry
4) Almost all the mtDNA haplogroups of Indian Parsis can be accounted for by Indian ancestry
5) Iranian Zoroastrians are mostly endogamous
6) Genetic isolation has resulted in drift and selection on Zoroastrians

The fact that the ancestry proportion is clearly more than 50% Iranian for Parsis indicates that there was more than one generation of males who migrated. They did not contribute mtDNA, but they did contribute genome-wide to Iranian ancestry. There are wide intervals on the dating of this admixture event, but they are consonant oral history that was later written down by the Parsis.

So there you have it. Another example of a population formed from admixture because women hate going to India.

Citation: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection.
Saioa Lopez, Mark G Thomas, Lucy van Dorp, Naser Ansari-Pour, Sarah Stewart, Abigail L Jones, Erik Jelinek, Lounes Chikhi, Tudor Parfitt, Neil Bradman, Michael E Weale, Garrett Hellenthal
bioRxiv 128272; doi: https://doi.org/10.1101/128272

April 13, 2017

The revenge of the cavemen

Filed under: Anthroplogy,History — Razib Khan @ 5:08 pm

In 2012 I wrote Post-Neolithic revenge of the foragers. There were two proximate rationales for my thoughts at the time. First, I thought Peter Bellwood’s thesis of agricultural based demographic expansions in First Farmers was being vindicated in the broadest sketch, but there were many countervailing details. Second, there were already suggestions that genetic data was not indicative of a final victory of farmers by pastoralists.

There were several immediate issues that came to mind in the non-genetic domain. Bellwood argued that agriculture shape the distribution of modern language families, but the spread of Turkic and Finnic peoples seem likely to have been post-agricultural, and not based on farming. Both these groups were arguably nomadic, one pastoralist, and the other engaging in mixed use lifestyles which were reminiscent of classic hunting and gathering. And, there has been anthropological evidence that though pure hunter-gatherers, such as indigenous Australians, do not take to cultivation easily, they quickly transition to pastoralism. In other words, the skills and mores which are common among hunter-gatherers can translate rapidly once domesticate based nomadism spreads.

The Turks, or the Saami with their reindeer, are evidence of this transition, and its success. It seems plausible that the same was the case with Indo-Europeans, and that is what I thought at the time.

Now we have more data from ancient DNA. It does seem there was a “resurgence” of Mesolithic hunter-gatherer ancestry as time passed, with Neolithic farmers exhibiting a more indigenous genetic profile in Europe. Additionally, the arrival of Indo-European steppe ancestry brought another dollop of “hunter-gatherer” ancestry from beyond the fringes of Europe proper.

So what story can we tell of the transition between the Late Neolithic (LN) and the Early Bronze Age (ENA) in Europe? First, the proto-Indo-Europeans were people from the fringes and boundaries. Their genetics indicate some sort of influence from the Near East, likely via the Maykop people. But their roots were also deep in eastern Europe, from the local hunter-gatherers who had affinities with Siberians to their east and European hunter-gatherers to their west. From from this synthesis emerged something special, a warlike group of mobile pastoralists who quickly swept the field.

This reminds me of something from Peter Turchin’s book, War and Peace and War: The Rise and Fall of Empires. Populations on the borders or frontiers of ethno-cultural (and possibly political) zones may exhibit more group cohesion than those from “core” areas. The Indo-Europeans were a border folk. They may also take to cultural innovations more quickly, in The Making of a Christian Aristocracy it is clear that switching to the new religion occurred faster among elites in outlying regions than in the core.

A second issue, which is not proven, but may be possible, is that once the Indo-Europeans moved into the North European plain, they allied with residual hunter-gatherer populations. A classic enemy-is-my-enemy proposition. This would likely result in a higher proportions of Pleistocene ancestry in later generations due to assimilation.

The moral of the story is that often there is no final victory in the war. Human history is full of reversals.

The reality of cultural hitchhiking

Filed under: Anthroplogy,Cultural hitchhiking,Genetics,History — Razib Khan @ 2:55 pm

The figure to the left is from a paper, The mountains of giants: an anthropometric survey of male youths in Bosnia and Herzegovina, which attempts to explain why the people from the uplands of the western Balkans are so tall. Anyone who has watched high level basketball, or perused old physical anthropology textbooks, knows that average heights in the Dinaric Alps are quite high in comparison to the rest of Europe, matched only in the region around Scandinavia. The Dutch of late have been the world champions in height, and explanations such as recent selection and their high consumption of dairy products have been given. In this paper the authors point out that the people who live in the Dinaric uplands are not a population which consumes a inordinately high protein diet, at least in relation to their neighbors.

Rather, they suggest that the height of the people who reside in the Dinarics is due to a genetic factor. There is now good genomic evidence that selection accounts for at least some of the difference in height between Northern and Southern Europeans. That is, seems that there have been divergent pressures in these two locales, their genetic differences due to historical demography aside.

The exception to this north-south gradient is obviously in the Dinarics. Another way in which the Dinarics are exception is that it has the highest frequency of Y chromosomal haplgroup I. The other mode of haplogroup I is in Scandinavia. I1 is common among people who live in Sweden, while I2 among the peoples of the western Balkans. I has an interesting history because the vast majority of Mesolithic hunter-gatherer males in Europe belong to this haplogroup. It is very rare outside of Europe. This is in contrast to the other major European haplogroups, which are found outside of Europe at appreciable frequencies.

It is likely that I is indicative of a lineage which roots in Europe which go back to the late Pleistocene period after Last Glacial Maximum ~20,000 years ago. As the world warmed ~10,000 years ago small populations of hunter-gatherers rapidly expanded from their refuges and either most of the males were I, or in the drift process on the edge of the wave of advance I became very common. It is plausible that in terms of alleles which account for variation in height these hunter-gatherers were enriched for those conferring larger size. Cold weather populations tend to be larger. Additionally, they probably consumed a relatively diversified but high protein diet, allowing for greater median size than among farmers at the Malthusian carrying capacity.

But, there has been a lot of selection over the past 10,000 years, and I am skeptical that this correlation between I and height in Europe is anything but a coincidence. Rather, the phylogeny which I exhibits brings me to another issue which I think is not often highlighted: I1 in particular may have “hitchhiked” with the exogenous lineages such as R1b and R1a in early Indo-European society.

That is, in the patrilineal descent groups expanding across the landscape and monopolizing access to resources and mates, the non-invasive I somehow integrated themselves into the broader cultural complex, and partook in the plenty. Like R1b and R1a it exhibits a rake-like topology which suggests rapid recent expansion.

This would not be exceptional. The modern Russian state’s origins are in the polities created by Keivan Rus, who were famously Scandinavian. Rurik was by origin a Sweden, and his dynasty eventually came to encompass most of the eastern Slavic peoples, and rule over the Russian people and state until the 17th century. Because there were so any descendants of this dynasty it was possible to adduce its Y chromosomal haplogroup, N1c1. The kicker is that this is clearly a Finnic lineage, with the most recent evidence being that it is a remnant of a recent migration out of Siberia to the west. The implication here is that the direct male lineage of Rurik were assimilated into the Scandinavian culture and power structure, and were possibly chieftains of Finnic tribes somewhere along the Baltic littoral.

Another example is the House of Wessex. Alfred the Great is arguably the first true king of England. Here are the names of some of the earlier monarchs of the House of Wessex, Ceawlin, Cynric, and Cynegils. Even someone without a background in historical linguistics may be curious about whether these are Anglo-Saxons, and there is a line of thinking that perhaps the forebears of Alfred were British warlords, who “went Saxon,” in a fashion analogous to Gallo-Roman aristocrats who assimilated to Frankish-Germanic norms and forms in the 6th and 7th centuries in the Merovingian domains.

Overall what you see in the genetic data are many things, but rarely a straightforward story. Just as genes can impact culture (e.g., lactase persistence), so culture impacts the distribution of genes. Just as human polities are coalitions, so genetic lineages themselves in their distribution and evolutionary history exhibit fingerprints of these past socio-political events and ideas.

April 8, 2017

Why only one migrant per generation keeps divergence at bay

The best thing about population genetics is that because it’s a way of thinking and modeling the world it can be quite versatile. If Thinking Like An Economist is a way to analyze the world rationally, thinking like a population geneticist allows you to have the big picture on the past, present, and future, of life.

I have some personal knowledge of this as a transformative experience. My own background was in biochemistry before I became interested in population genetics as an outgrowth of my lifelong fascination with evolutionary biology. It’s not exactly useless knowing all the steps of the Krebs cycle, but it lacks in generality. In his autobiography I recall Isaac Asimov stating that one of the main benefits of his background as a biochemist was that he could rattle off the names on medicine bottles with fluency. Unless you are an active researcher in biochemistry your specialized research is quite abstruse. Population genetics tends to be more applicable to general phenomena.

In a post below I made a comment about how one migrant per generation or so is sufficient to prevent divergence between two populations. This is an old heuristic which goes back to Sewall Wright, and is encapsulated in the formalism to the left. Basically the divergence, as measured by Fst, is proportional to the inverse of 4 time the proportion of migrants times the total population + 1. The mN is equivalent to the number of migrants per generation (proportion times the total population). As the mN become very large, the Fst converges to zero.

The intuition is pretty simple. Image you have two populations which separate at a specific time. For example, sea level rise, so now you have a mainland and island population. Since before sea level rise the two populations were one random mating population their initial allele frequencies are the same at t = 0. But once they are separated random drift should begin to subject them to divergence, so that more and more of their genes exhibit differences in allele frequencies (ergo, Fst, the between population proportion of genetic variation, increases from 0).

Now add to this the parameter of migration. Why is one migrant per generation sufficient to keep divergence low? The two extreme scenarios are like so:

  1. Large populations change allele frequency very slowly due to drift, so only a small proportion of migration is needed to prevent them from diverging
  2. Small populations change allele frequency very fast due to drift, so a larger proportion of migration is needed to prevent them from drifting

Within a large population one migrant is a small proportion, but drift is occurring very slowly. Within a small population drift is occurring fast, but one migrant is a relatively large proportion of a small population.

Obviously this is a stylized fact with many details which need elaborating. Some conservation geneticists believe that the focus on one migrant is wrongheaded, and the number should be set closer to 10 migrants.

But it still gets at a major intuition: gene flow is extremely powerful and effective at reducing differences between groups. This is why most geneticists are skeptical of sympatric speciation. Though the focus above is on drift, the same intuition applies to selective divergence. Gene flow between populations work at cross-purposes with selection which drives two groups toward different equilibrium frequencies.

This is why it was surprising when results showed that Mesolithic hunter-gatherers and farmers in Europe were extremely genetically distinct in close proximity for on the order of 1,000 years. That being said, strong genetic differentiation persists between Pygmy peoples and their agriculturalist neighbors, despite a long history of living nearby each other (Pygmies do not have their own indigenous languages, but speak the tongue of their farmer neighbors). In the context of animals physical separation is often necessary for divergence, but for humans cultural differences can enforce surprisingly strong taboos. Culture is as strong a phenomenon as mountains or rivers….

April 4, 2017

Sex bias in migration from the steppe (revisited)

Filed under: Anthroplogy,Genetics,Genomics,History — Razib Khan @ 11:21 pm

Last fall I blogged a preprint which eventually came out as a paper in PNAS, Ancient X chromosomes reveal contrasting sex bias in Neolithic and Bronze Age Eurasian migrations. The upshot is that the authors found that there was far less steppe ancestry on the X chromosomes of Bronze Age Central Europeans than across the whole genome. The natural inference here is that you had migrations of males into territory where they had to find local wives.

But the story does not end there. Iosif Lazaridis and David Reich have put out a short not on biorxiv, Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe. It’s short, so I suggest you read the note yourself, but the major issue seems to be that on X chromosomes ADMIXTURE in supervised mode seems to behave really strangely. Lazaridis and Reich find that there seems to be a downward bias of steppe ancestry. Ergo, the finding was an artifact.

Goldberg et al. almost immediately responded, Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe. Their response seems to be that yes, ADMIXTURE does behave strangely, but the overall finding is still robust.

With these uncertainties I do wonder if it’s hard at this point to evaluate the alternative models. But, we do have archaeology and mtDNA. What do those say? On that basis, from what little I know, I am inclined to suspect a strong male bias of migration.

Citation: Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe, Amy Goldberg, Torsten Gunther, Noah A Rosenberg, Mattias Jakobsson
bioRxiv 122218; doi: https://doi.org/10.1101/122218

Citation: Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe, Iosif Lazaridis, David Reich, bioRxiv 114124; doi: https://doi.org/10.1101/114124

How a Eurasian “band of brothers” shaped the world

Filed under: Anthroplogy,Corded Ware,History,Indo-Europeans — Razib Khan @ 1:10 pm


When I was eight years old I saw a map which genuinely confused me. I had opened up deluxe dictionary at my elementary school and saw a map of the world’s language families, and noticed that there were a group of dialects which spanned the Bay of Bengal to the North Sea. In fact, according to this map the language I had first learned to speak, Bengali, was in the same language family as English.

This was hard to wrap my mind around, but there it was in front of me. Further research at the public library confirmed this fact. And, upon further reflection it was obvious to me there were similarities…I had been learning French at school, and English, Bengali, and French, all exhibited similarities in the first ten numbers. English and French I understood in terms of a natural relationship, but Bengali?

My personal and professional interests have never been in domains where I would explore the topic first hand, but the origins of Indo-European languages have always been a hobby. I read books such as The Horse, the Wheel, and Language and In Search of the Indo-Europeans when I could. When taking in excellent works such as Empires of the Silk Road the Indo-European thread was always something I kept in mind.

But the above works take a more old-fashioned Eurasian heartland “marauders from the steppe” viewpoint. Starting about 15 years ago I began to look into a different framework: Indo-Europeans as farmers. For me begins with the 2002 paper, Mapping the Origins and Expansion of the Indo-European Language Family, which finds that “the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago” (this is the last paper I can remember reading in paper format). The model is elaborated by Peter Bellwood in works such as First Farmers, though he applies it to most language families.

But its origins go back decades, with the archaeologist Colin Renfrew. Rather than dramatic explosions from the steppe, Renfrew and colleagues suggest that the demographic expansion enabled by agriculture as a mode of production allowed for groups like Indo-Europeans to rapidly swamp their neighbors and enter into a process known as a wave of advance. There wasn’t a organized movement. Rather, farming enables the growth of population to such an extent that it was almost an undirected thermodynamic law that the original farmers would radiate outward, away from zones at the Malthusian carrying capacity and out toward virgin land.

It was a parsimonious theory, and phylogenetic techniques seem to have supported it. But then came ancient DNA to overturn the apple-cart. I won’t reshash what you probably already know, but will point to the two most relevant papers, Massive migration from the steppe was a source for Indo-European languages in Europe and Population genomics of Bronze Age Eurasia. Basically there was massive population turnover during the early Bronze Age. The genetic data aligned well with predictions you’d make from the old “marauders from the steppe” model, not the demic diffusion of farmers who were subject to high endogenous population growth over time.

Of course the Anatolian model proponents have an answer. There is a thesis whereby the steppe pastoralists derive from Anatolians, and so the European population turnover was of one Indo-European group by another. This is possible, but to my knowledge this model was never foregrounded by Anatolianists before. Rather, it strikes me as a way to “save” their framework.

So far much of the battle has been between archaeologists, who tend to favor gradualism, and often even  cultural diffusion as opposed to migration, and historical linguists and arriviste geneticists, who tend toward a more classical migration-from-the-steppe perspective.

A new paper in Antiquity takes the sledgehammer to the Anatolian hypothesis with an archaeology first tack. Re-theorising mobility and the formation of culture and language among the Corded Ware Culture in Europe. They don’t pull punches:

…the Anatolian hypothesis must be considered largely falsified. Those Indo-European languages that later came to dominate in western Eurasia were those originating in the migrations from the Russian steppe during the third millennium BC.

Why would they say this? There is a major paper coming out:

These local processes of social integration between intruding Yamnaya/Corded Ware populations and remnant Neolithic populations can be applied to language dispersal. We should expect that the transformation from Proto-Indo-European to Pre-Proto Germanic would reveal the same kind of hybridisation between an earlier Neolithic language of the Funnel Beaker Culture, and the incoming Proto-Indo-European language. This is precisely what recent linguistic research has been able to demonstrate (Kroonen & Iversen in press). In their study on the formation of Proto-Germanic in Northern Europe, Kroonen and Iversen document a bundle of linguistic terms of non-Indo-European origin linked to agriculture that were adopted by Indo-European-speaking groups who were not fully fledged farmers.

They also contend that the Neolithic language was roughly the same throughout the zone of Indo-European expansion. From what those who would know about these sorts of things have told me this is plausible, because the Neolithic farmers spread so rapidly from a small founder culture, and exhibited broad Europe-wide similarities for a thousand years. Curiously, the chart shows that Germanic languages may have been influenced by a hunter-gatherer language, which the others were not. I suspect this may have to do with the relatively late persistence of hunter-gatherers in some maritime environments facing the Baltic and North Sea.

The paper, which is open access, needs to be read in full. Here are some important points:

  • Burial type seems to be a more robust form of indicator of dominant cultural identity
  • Corded Ware males practiced exogamy
  • Corded Ware males traveled long distances
  • Corded Ware culture was initially exclusively pastoralist
  • There is a great deal of circumstantial, and some genetic, evidence that Corded Ware communities were characterized by having women who were clearly from the Neolithic farming population
  • There was intergroup violence as a function of culture
  • The Corded Ware and Neolithic populations persisted near each other geographically, though the Neolithic groups seem to have retreated to uplands
  • The Corded War engaged in a wholesale pattern of landscape sculpting, burning down forests to produce pasture

Neolithic Y lineages, such as G2, are far rarer in Northern Europea today that R1a and R1b (in contrast, the hunter-gatherer I seems to have gone through an expansion just like R1a and R1b). We already have a model for what went on here, the Iberian settlement of the New World. Among mestizo populations there are huge skews of mtDNA and Y, with the former almost all Amerindian (with some African) and the latter almost all European (with some African).

The Corded War are the ancestors of the German peoples who we see emerge into the light of history during antiquity. What these data are telling is that the Germans are the product of a massive period of biological and cultural amalgamation and synthesis between indigenous groups and intrusive populations from the steppe. The archaeological data indicate that the intrusion was male mediated. The “battle axe” culture probably lived up to its name. And they weren’t likely exceptional….

April 2, 2017

Why are so many of us “star-men”

Filed under: Anthroplogy,Culture,R1a,R1b,Star phylogenies — Razib Khan @ 11:32 pm

Seven years ago I wrote 1 in 200 men direct descendants of Genghis Khan. It’s the most popular post I’ve ever written. As of now there have been 630,000 “sesssions” (basically visits) on that page alone. I suspect that many more have read my summary of The Genetic Legacy of the Mongols, the original paper on which it was based, than that paper (though it’s a good paper, you should read it).

At the time I wrote that people often asked me if I was a descendent of Genghis Khan. That seems unlikely on the paternal lineage. My Y chromosome is R1a1ab2-Z93. This is typically found in South Asia, and among Iranian peoples, as well as in the Altai region of western Mongolia. It is not common among Mongols though, even if it is found amongst them, likely due to gene flow from the west. The particular branch of R1a1a that I carry has been found in ancient remains from the Srubna culture of the eastern Pontic steppe. As a friend of mine might say, I am the scion of marauders from the steppe, even though not Genghiside ones. The fact that I have the last name Khan is simply a legacy of the custom whereby South Asian Muslim lineages of a particular status accrued the surname to denote their position within Islamicate civilization.

But though I am no direct descendent of Genghis, it turns out that my Y chromosome shares a similar history. The figure to the left is focused on European Y chromosomes, and at the top you see various “R” lineages. It turns out that R1b and R1a are both basically subject to the same explosive dynamics as the Genghis Khan haplotype: both exploded into star phylogenies relatively recently in time. Trees of the R1 lineages always show them to exhibit a rake-like pattern. This is due to the fact that starting from a small base they expanded so rapidly that they did not develop the intricate node-structure you see in lineages which accrued mutations at a more normal pace.

What could have caused such explosive growth? We know why Genghis Khan and his sons left so many descendents: conquest yielded social status. For many generations having a male Genghiside bloodline was highly effective as a means to gain bonus points when attempting to scale the summits of power and wealth. This was even true in the Muslim regions of Central Asia, despite Genghis Khan’s negative impact on Islamic civilization (Transoxiana arguably never recovered from this period).

We don’t have anything like the “Secret History of the Indo-Aryans” to explain the emergence of these older star phylogenies. In The Horse, the Wheel, and Language, David Anthony argues that mobile populations domesticated the horse, and used that as a killer cultural advantage to spread their Indo-European language. In his book from the 2000s Anthony argues for elite transmission of language by the Kurgan people. But more recently he has been persuaded by genetic work which suggests massive population displacements and migrations into Europe during the late Neolithic and early Bronze Age.

Unfortunately the timing doesn’t work from what I can tell. The expansion of groups like the Corded Ware seem to pre-date the emergence of the steppe chariot toolkit by many centuries. It does so happen that the chariot was invented in the region where R1a1a2b-Z93 was also found to exist. So I suspect this “Scythian” R1a lineage did sweep across much of Central-South Eurasia thanks to the horse and the wheel. But a technological explanation is more difficult for the rest.

I will posit another speculative answer, stealing the idea from Snorri Sturluson. He believed that the gods that were remembered by his pagan Norse ancestors were at one point men of great renown and fame. Kings of yore. Over time they had been deified, and legends had grown up around them. Sturluson may have been right. Perhaps the Indo-European gods recollect the forefathres of R1a and R1b. What was there advantage? Perhaps it was a hierarchical stratified social structure which brooked no individualism against the interests of the lineage unit? It may be that asabiyya is worth more than a chariot?

March 28, 2017

How Indians are a lot like Latin Americans

Filed under: Anthroplogy,Genetics,India — Razib Khan @ 5:45 am


Pretty much any person of Indian subcontinental origin in the United States of a certain who isn’t very dark skinned has probably had the experience of being spoken to in Spanish at some point. When I was younger growing up in Oregon I had the experience multiple times of Spanish speakers, probably Mexican, pleading with me to interpret for them because there was no one else who seemed likely. It isn’t a genius insight to conclude I was most likely South Asian…but it wasn’t out of the question I was Mexican. This applies even more to lighter skinned South Asians. In the Central Valley of California, where there are many Sikhs from Punjabi and Mexicans, this confusion occurred a lot for some Indian kids.

Of course biogeographically there isn’t that much connection between South Asia and the New World. But it isn’t crazy that Christopher Columbus labelled the peoples of the New World “Indian.” After all, they were a brown-skinned people whose features were not African, East Asian, or West Eurasian. And, it turns out genetically there is a coincidence that connects the New World and South Asia: the mixed peoples of Latin America with Amerindian and European ancestry recapitulate an admixture which resembles what occurred in South Asia thousands of years ago. It looks as if about half the ancestry of South Asians is West Eurasian and half something more like eastern Eurasians.

On principles component analysis that means that South Asian and Mexican and Peruvian samples often overlap. This is somewhat curious because the non-West Eurasian ancestors of South Asians and Amerindians diverged in ancestry on the order of 25 to 45 thousand years before the present. And the Iberian ancestry of the mixed people of the New World is almost as far from the character of South Asian West Eurasian ancestry as you can get (in the parlance of this blog, lots of EEF, less CHG, not too much ANE).

A new paper, A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals, highlights another similarity: massive bias in biogeographic ancestry by sex. More precisely, the rank order of West Eurasian ancestry in South Asia is skewed like so: Y chromosome > whole-genome > mtDNA (as is evident in the above figure).

I actually began writing about this in the late 2000s, when the fact that South Asian mtDNA was very different from West Eurasian mtDNA, and South Asian Y chromosome was mostly West Eurasian, was obvious. Then work using genome-wide data sets began to point to massive intra-Eurasian admixture between very diverged lineages. The paper is not revolutionary, but worth reading for its thoroughness and how it brings together all the lines of evidence.

Finally, no ancient DNA. That’s probably for the future, but I don’t expect any surprises.

Citation: A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals.

March 23, 2017

Ancestry inference won’t tell you things you don’t care about (but could)

Filed under: Anthroplogy,Genetics,Genomics,Personal genomics — Razib Khan @ 5:59 pm

The figure above is from Noah Rosenberg’s relatively famous paper, Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. The context of the publication is that it was one of the first prominent attempts to use genome-wide data on a various of human populations (specifically, from the HGDP data set) and attempt model-based clustering. There are many details of the model, but the one that will jump out at you here is that the parameter defines the number of putative ancestral populations you are hypothesizing. Individuals then shake out as proportions of each element, K. Remember, this is a model in a computer, and you select the parameters and the data. The output is not “wrong,” it’s just the output based how you set up the program and the data you input yourself.

These sorts of computational frameworks are innocent, and may give strange results if you want to engage in mischief. For example, let’s say that you put in 200 individuals, of whom 95 are Chinese, 95 are Swedish, and 10 are Nigerian. From a variety of disciplines we know to a good approximation that non-Africans form a monophyletic clade in relation to Africans (to a first approximation). In plain English, all non-Africans descend from a group of people who diverged from Africans more than 50,000 years ago. That means if you imagine two populations, the first division should be between Africans and non-Africans, to reflect this historical demography. But if you skew the sample size, as the program looks for the maximal amount of variation in the data set it may decide that dividing between Chinese and Swedes as the two ancestral populations is the most likely model given the data.

This is not wrong as such. As the number of Africans in the data converges on zero, obviously the dividing line is between Swedes and Chinese. If you overload particular populations within the data, you may marginalize the variation you’re trying to explore, and the history you’re trying to uncover.

I’ve written all of this before. But I’m writing this in context of the earlier post, Ancestry Inference Is Precise And Accurate(Ish). In that post I showed that consumers drive genomics firms to provide results where the grain of resolution and inference varies a lot as a function of space. That is, there is a demand that Northern Europe be divided very finely, while vast swaths of non-European continents are combined into one broad cluster.

Less than 5% Ancient North Eurasian

Another aspect though is time. These model-based admixture frameworks can implicitly traverse time as one ascends up and down the number of K‘s. It is always important to explain to people that the number of K‘s may not correspond to real populations which all existed at the same time. Rather, they’re just explanatory instruments which illustrate phylogenetic distance between individuals. In a well-balanced data set for humans K = 2 usually separates Africans from non-Africans, and K = 3 then separates West Eurasians from other populations. Going across K‘s it is easy to imagine that is traversing successive bifurcations.

A racially mixed man, 15% ANE, 30% CHG, 30% WHG, 30% EEF

But today we know that’s more complicated than that. Three years ago Pickrell et al. published Toward a new history and geography of human genes informed by ancient DNA, where they report the result that more powerful methods and data imply most human populations are relatively recent admixtures between extremely diverged lineages. What this means is that the origin of groups like Europeans and South Asians is very much like the origin of the mixed populations of the New World. Since then this insight has become only more powerful, as ancient DNA has shed light as massive population turnovers over the last 5,000 to 10,000 years.

These are to some extent revolutionary ideas, not well known even among the science press (which is too busy doing real journalism, i.e. the art of insinuation rather than illumination). As I indicated earlier direct-to-consumer genomics use national identities in their cluster labels because these are comprehensible to people. Similarly, they can’t very well tell Northern Europeans that they are an outcome of a successive series of admixtures between diverged lineages from the late Pleistocene down to the Bronze Age. Though Northern Europeans, like South Asians, Middle Easterners, Amerindians, and likely Sub-Saharan Africans and East Asians, are complex mixes between disparate branches of humanity, today we view them as indivisible units of understanding, to make sense of the patters we see around us.

Personal genomics firms therefore give results which allow for historically comprehensible results. As a trivial example, the genomic data makes it rather clear that Ashkenazi Jews emerged in the last few thousand years via a process of admixture between antique Near Eastern Jews, and the peoples of Western Europe. After the initial admixture this group became an endogamous population, so that most Ashkenazi Jews share many common ancestors in the recent past with other Ashkenazi Jews. This is ideal for the clustering programs above, as Ashkenazi Jews almost always fit onto a particular K with ease. Assuming there are enough Ashkenazi Jews in your data set you will always be able to find the “Jewish cluster” as you increase the value.

But the selection of a K which satisfies this comprehensibility criterion is a matter of convenience, not necessity. Most people are vaguely aware that Jews emerged as a people at a particular point in history. In the case of Ashkenazi Jews they emerged rather late in history. At certain K‘s Ashkenazi Jews exhibit mixed ancestral profiles, placing them between Europeans and Middle Eastern peoples. What this reflects is the earlier history of the ancestors of Ashkenazi Jews. But for most personal genomics companies this earlier history is not something that they want to address, because it doesn’t fit into the narrative that their particular consumers want to hear. People want to know if they are part-Jewish, not that they are part antique Middle Eastern and Southwest European.

Perplexment of course is not just for non-scientists. When Joe Pickrell’s TreeMix paper came out five years ago there was a strange signal of gene flow between Northern Europeans and Native Americans. There was no obvious explanation at the time…but now we know what was going on.

It turns out that Northern Europeans and Native Americans share common ancestry from Pleistocene Siberians. The relationship between Europeans and Native Americans has long been hinted at in results from other methods, but it took ancient DNA for us to conceptualize a model which would explain the patterns we were seeing.

An American with recent Amerindian (and probably African) ancestry

But in the context of the United States shared ancestry between Europeans and Native Americans is not particularly illuminating. Rather, what people want to know is if they exhibit signs of recent gene flow between these groups, in particular, many white Americans are curious if they have Native American heritage. They do not want to hear an explanation which involves the fusion of an East Asian population with Siberians that occurred 15,000 to 20,000 years ago, and then the emergence of Northern Europeans thorough successive amalgamations between Pleistocene, Neolithic, and Bronze Age, Eurasians.

In some of the inference methods Northern Europeans, often those with Finnic ancestry or relationship to Finnic groups, may exhibit signs of ancestry from the “Native American” cluster. But this is almost always a function of circumpolar gene flow, as well as the aforementioned Pleistocene admixtures. One way to avoid this would be to simply not report proportions which are below 0.5%. That way, people with higher “Native American” fractions would receive the results, and the proportions would be high enough that it was almost certainly indicative of recent admixture, which is what people care about.

Why am I telling you this? Because many journalists who report on direct-to-consumer genomics don’t understand the science well enough to grasp what’s being sold to the consumer (frankly, most biologists don’t know this field well either, even if they might use a barplot here and there).

And, the reality is that consumers have very specific parameters of what they want in terms of geographic and temporal information. They don’t want to be told true but trivial facts (e.g., they are Northern European). But neither they do want to know things which are so novel and at far remove from their interpretative frameworks that they simply can’t digest them (e.g., that Northern Europeans are a recent population construction which threads together very distinct strands with divergent deep time histories). In the parlance of cognitive anthropology consumers want their infotainment the way they want their religion, minimally counterintuitive. Consume some surprise. But not too much.

November 20, 2013

The long First Age of mankind

Filed under: Anthroplogy,Archaeology,Siberians — Razib Khan @ 10:22 am

OldSiberian

“What it begins to suggest is that we’re looking at a ‘Lord of the Rings’-type world – that there were many hominid populations,” says Mark Thomas, an evolutionary geneticist at University College London who was at the meeting but was not involved in the work.

- Mark Thomas, as reported by Nature

This is in reference to the ancient DNA meeting where David Reich reported that the Denisovans, an exotic archaic population which contributed ~5-10 percent of the ancestry of Papuans, was itself a synthesis of Neandertals and a mysterious group currently unknown. This is not surprising, as the broad outlines of these results were presented at ASHG 2012, though no doubt they’re moving closer to publication. But for this post I want to shift the focus to a different time and place, after the ancient admixture with archaic lineages, and to the reticulation present within our own.


But first we need to backtrack a bit. Let’s think about what we knew in the early 2000s. If you want a refresher, you might check our Spencer Wells’ The Journey of Man or Stephen Oppeneheimer’s Out of Eden, which focused on Y and mtDNA lineages respectively. These books were capstones to the era of uniparental phylogeographic analysis of the spread and diversification of anatomically modern African hominids ~50-100,000 years ago. Rather than looking at the whole genome (the technology was not there yet) these researchers focused on pieces of DNA passed down via direct maternal or paternal lineages, and reconstructed clean phylogenetic trees using a coalescent framework. Broadly speaking these trees were concordant, and told us that our lineage, all extant humans, derived from a small African population which flourished ~100,000 years ago. These insights suffused the thought of human evolutionary thinkers in other disciplines (see The Dawn of Human Culture). H. sapiens sapiens, veni, vidi, vici.

After that initial “Out of Africa” migration a series of bottlenecks and founder events led to the expansion of our lineage, as it replaced all predecessors. By the Last Glacial Maximum, ~20-25,000 years ago, the rough outlines of human genetic variation were established (with the exception of the expansion into the New World). We know now that this picture is very incomplete at the most innocuous, and highly misleading given the least charitable interpretation.

Reticulation. Graphs. Admixture. These words all point to the reality that rather than being the culmination of deep rooted regional populations which date back to the depths of the Pleistocene, most modern humans are recombinations of ancient lineages. On the grandest scale this is illustrated by the evidence of ‘archaic’ ancestry in modern humans. But even more pervasively we see evidence of widespread admixture between distinct lineages which are major world populations which we think of as archetypes. This is true for Amerindians, South Asians, and Europeans. This is also the case for Ethiopians, and Australian populations. A major problem crops up when we talk about extinct ancient populations which were the founding substituent elements of modern ones: it doesn’t make sense to use modern referents when they are simply recombinations of what they are describing. But language and history being what they weare we can’t change the awkwardness of talking about “Ancestral North Eurasians,” anodyne and somewhat incoherent at the same time (Eurasia is a modern construct with contemporary historical salience).

Into the mix comes another ancient DNA paper which reconstructs the genome of a boy who lived in Siberia, near Lake Baikal, somewhat over 20,000 years ago. It’s titled Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Here’s the topline finding: a substantial minority of the ancestry of modern Native Americans derives from a North Eurasian population which has closer affinities to West Eurasians than East Eurasians. And, this is an old admixture event. In the paper itself they observe that all “First American” populations seem to exhibit the same admixture distance to the Siberian genome. These results are also broadly consistent with the admixture of this population in Western Eurasia, especially northeast Europe. As among Amerindian populations it seems that this element is at substantial minority across Europe as a whole, and perhaps at parity in some populations, such as Finns.

Fig1To the left you see the geographical affinities of the MA-1 Siberian sample. It is shifted toward West Eurasians in the PCA. But on the map with circles representing populations, the definite evidence of admixture between Amerindians and MA-1 is clear in the shading. The statistic used, f-3, looks for complex population history between and outgroup (X) and a putative clade. From this test it is evident Amerindians had some admixture related to MA-1. Because of the dating of Siberian remains it does not seem likely that admixture was from Amerindians to West Eurasian and related populations. Rather, the reverse seems more plausible. You can also see from the map the close affinities with particular European and Central Asian populations of MA-1. This is intriguing, and requires further follow up. Though MA-1 and its kin were closer to West Eurasians than East Eurasians, it still seems likely that there was an early divergence between the populations of north-northeast Eurasia, and those of the southwest. Eventually they came back together in various proportions to produce modern Europeans, but it seems likely that during the Pleistocene these two groups went their own way.

treemixThere are hints of this in the TreeMix plot to the right. Note now drifted MA-1 is in relation to other West Eurasians (the branch is long). I suspect some of this is due to the fact that this individual is nearly 1,000 generations in the past. Not only is it difficult to name ancient populations with those of moderns, I suspect that some of the variation in the ancient populations has been lost, and so they seem exotic and difficult to fit into a broader phylogenetic framework (they had hundreds of thousands of SNPs though). And yet MA-1 can be fitted into the broader framework of populations which went north or west after leaving Africa because of mtDNA and Y chromosome results. Both of these indicate that MA-1 was basal to West Eurasians, with haplogroup U for mtDNA, and R for the Y lineage.

To really understand what’s going on here is going to take a while. A later subfossil, circa ~15,000 years before the present, yielded some genetic material, and exhibited continuity with MA-1. This suggests that Siberia may have had massive population replacement relatively recently. We know this was likely the case elsewhere. Reading Jean Manco’s Ancestral Journeys one possible scenario is that Pleistocene Europeans were MA-1 like, but were replaced by Middle Eastern farmers in the early Neolithic. But later eruptions from Central Asia brought mixed populations (Indo-Europeans?) with substantial MA-1 affinities to the center of European history.

Finally, one must make a note of phenotype. The authors looked at 124 pigmentation related SNPs (see supplemental). The conclusion seems to be that MA-1 was not highly de-pigmented, as is the case with most modern Northern Europeans. This stands to some reason, as substantial ancestry of this sort in Amerindians would result in phenotypic variation which does not seem to be present. Though the authors do suggest that coarse morphological variation among early First Americans (e.g., Kennewick Man) might be due to this population, which had West Eurasian affinities.

Where does this leave us? More questions of course. Though I’m confident the befuddlement will clear up in a few years….

Citation: doi:10.1038/nature12736

Addendum: Please read the supplements. They’re rich enough that you don’t need to read the letter if you don’t have access. Also, can we now finally bury the debate when east and west Eurasians diverged? Obviously it can’t have been that recent if a >20,000 year old individual had closer affinity to western populations.

The post The long First Age of mankind appeared first on Gene Expression.

November 13, 2013

The color of life as a coincidence

Filed under: Anthroplogy,Evolution,Evolutionary Genetics,Genetics of taste,Taste — Razib Khan @ 12:35 am

Credit: Eric Hunt

Credit: Eric Hunt

I do love me some sprouts! Greens, bitters, strong flavors of all sorts. I’ve always been like this. Some of this is surely environment. My family comes from a part of South Asia known for its love of bracing and bold sensation. But perhaps I was born this way? There’s a fair amount of evidence that taste has a substantial genetic component. This does not mean genes determine what one tastes, but it certainly opens the door for passive gene-environment correlations. If you do not find a flavor offensive, you are much more likely to explore it depths, and cultivate your palette.

220px-Durio_kutej_F_070203_ime

Dost thou dare?
Credit: W.A. Djatmiko

And of course I’m not the only one with a deep interest in such questions. With the marginal income available to us many Americans have become “foodies,” searching for flavor bursts and novelties which their ancestors might never have been able to comprehend. More deeply in a philosophical sense the question of qualia reemerges if there is a predictable degree of inter-subjectivity in taste perception (OK, qualia is always there, though scientific sorts tend to view it as intractable in a fundamental sense).


But there’s heritability, and then there’s genes. We know that perception in some ways is heritable, but what is perhaps more interesting is if you can peg a specific genomic location to it. Then the evolutionary story becomes all the richer. And so it is with the locus TAS2R16, where a nonsynonymous mutation at location 516 seems to result in heightened sensitivity to bitter tastes. More specifically, it’s rs846664, and the derived T allele is fixed outside of Africa, while the ancestral G allele still segregates at appreciable fractions within African populations. A new paper in Molecular Biology and Evolution puts this locus under a microscope, though it does not come up with any clear conclusions. Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa presents some interesting findings. First, let’s look at the distribution of the variation in their sample populations at the SNP of most particular interest:

Region Population T516G
Outside of Africa Non-Africans 0.000
Ethiopia Semitic 0.059
Tanzania Sandawe 0.083
Ethiopia Omotic 0.093
Ethiopia Cushitic 0.095
Tanzania Iraqw 0.111
West Central Africa Fulani 0.114
Kenya Niger-Kordofanian 0.133
Ethiopia Nilo-Saharan 0.156
Kenya Afroasiatic 0.162
West Central Africa Niger-Kordofanian 0.214
Kenya Nilo-Saharan 0.225
Kenya Luo 0.250
Central Africa Niger-Kordofanian 0.329
Tanzania Hadza 0.333
Central Africa Bulala 0.361
Central Africa Nilo-Saharan 0.367
West Central Africa Afroasiatic 0.462
West Central Africa Nilo-Saharan 0.500

As you can see T is fixed outside of Africa, and varies across many African populations  Previous work implied this, though coverage within Africa was not good. One thing to observe though is that the frequency of A within Africa can not be explained by recent Eurasian admixture. The frequency is way too high for that to be the sole explanation, and in any case there is no evidence that ~33% of the Hadza’s ancestry is of Eurasian provenance (the Hadza being one of the three major groups of African hunter-gatherers, along with the Bushmen and Pygmies).

Within the paper the authors resequenced ~1,000 base pairs across diverse African populations in an exonic region of this gene (the stuff that codes for amino acids). What they discovered is that of the SNPs segregating, 516 in particular was critical toward effecting phenotyping change. Not only did individuals with the T variant notably exhibit stronger bitter sensitivity, but in vitro expression with a reporter was elevated. Because they had such a dense genomic region they could perform various nucleotide based tests to detect natural selection, and, attempt coalescent models to infer genealogical history.

I’m going to spare you some of the gory details at this point. Here’s what they found. First, it does look like the region is under natural selection in many African populations, in particular, the derived haplotype with T at 516 at the center. But this result is not reproduced across all tests. The coalescent simulations make clear why: the mutation is an old variant with deep roots in the hominin lineage. In other words this variation pre-dates H. sapiens. It looks like the T allele has rapidly increased in frequency relatively recently, though more on the order of ~50,000 years, rather than ~10,000.* Basically around the time of the “Out of Africa” event. Additionally, there’s a tell-tale sign that this is being subject to selection within Africa: the genetic differences across populations at TAS2R16 far exceed the genome-wide values (the Fst at this locus is in the top 1% of loci within the African genome). Finally, one should note that the G allele haplotypes seem to be much more strongly constrained, as if they’re under purifying selection. This means that the switch to T is not all gain.

At this point you may be ready for a story about how some African populations, like Eurasians, underwent a lifestyle change, and diet changes resulted in a shift in sensory perception. That does not seem to be the story. Rather, the authors did not seem to be able to agree upon a neat explanation for what is driving these recent sweeps up from ancient standing genetic variation. They do observe that the variation does tend to cluster geographically, more so than the genome-wide results would imply. There’s likely some adaptation going on, they simply don’t know what. In the introduction and elsewhere you can see that variation at TAS2R16 does correlate with other traits. Not too surprising due to the relatively ubiquity of pleiotropy; one gene with many effects.

Stepping outside of the implications of this specific result, let’s think about what might be a takeaway: something as essential as taste perception might be a side effect of other aspects of evolutionary processes. In other words, we don’t know what the phenotypic target of selection is in this case, but we do have a good handle one of the major side effects, which is sensory perception. How one taste seems like a big deal.** Andthere have been many theories propounded that variation in bitter sensitivity is due to adaptation to poisonous plants and such, but really no one knew, and that was just the most plausible of low hanging fruit. With these results from Africa, where there is more variation in the trait and genes, and good geographic coverage, that seems to be an implausible model to adhere to (one would think the hunter-gatherer Hadza would exhibit the most sensitivity, no?). Many of the traits and tendencies which we humans see as fundamental, essential, and of great import, many actually be side effects of powerful evolutionary forces hammering at the genetic-correlation matrices which define the hidden network of co-dependencies within the genome. So there, I said it. Life is an accident. Enjoy it.

Citation: Campbell, Michael C., et al. “Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa.” Molecular biology and evolution (2013): mst211.

* If it was closer to ~10,000 I think haplotype based tests would come back with something, but they do not.

** Some Epicureans might be accused of reducing the good to taste!

The post The color of life as a coincidence appeared first on Gene Expression.

The color of life as a coincidence

Filed under: Anthroplogy,Evolution,Evolutionary Genetics,Genetics of taste,Taste — Razib Khan @ 12:35 am

Credit: Eric Hunt

Credit: Eric Hunt

I do love me some sprouts! Greens, bitters, strong flavors of all sorts. I’ve always been like this. Some of this is surely environment. My family comes from a part of South Asia known for its love of bracing and bold sensation. But perhaps I was born this way? There’s a fair amount of evidence that taste has a substantial genetic component. This does not mean genes determine what one tastes, but it certainly opens the door for passive gene-environment correlations. If you do not find a flavor offensive, you are much more likely to explore it depths, and cultivate your palette.

220px-Durio_kutej_F_070203_ime

Dost thou dare?
Credit: W.A. Djatmiko

And of course I’m not the only one with a deep interest in such questions. With the marginal income available to us many Americans have become “foodies,” searching for flavor bursts and novelties which their ancestors might never have been able to comprehend. More deeply in a philosophical sense the question of qualia reemerges if there is a predictable degree of inter-subjectivity in taste perception (OK, qualia is always there, though scientific sorts tend to view it as intractable in a fundamental sense).


But there’s heritability, and then there’s genes. We know that perception in some ways is heritable, but what is perhaps more interesting is if you can peg a specific genomic location to it. Then the evolutionary story becomes all the richer. And so it is with the locus TAS2R16, where a nonsynonymous mutation at location 516 seems to result in heightened sensitivity to bitter tastes. More specifically, it’s rs846664, and the derived T allele is fixed outside of Africa, while the ancestral G allele still segregates at appreciable fractions within African populations. A new paper in Molecular Biology and Evolution puts this locus under a microscope, though it does not come up with any clear conclusions. Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa presents some interesting findings. First, let’s look at the distribution of the variation in their sample populations at the SNP of most particular interest:

Region Population T516G
Outside of Africa Non-Africans 0.000
Ethiopia Semitic 0.059
Tanzania Sandawe 0.083
Ethiopia Omotic 0.093
Ethiopia Cushitic 0.095
Tanzania Iraqw 0.111
West Central Africa Fulani 0.114
Kenya Niger-Kordofanian 0.133
Ethiopia Nilo-Saharan 0.156
Kenya Afroasiatic 0.162
West Central Africa Niger-Kordofanian 0.214
Kenya Nilo-Saharan 0.225
Kenya Luo 0.250
Central Africa Niger-Kordofanian 0.329
Tanzania Hadza 0.333
Central Africa Bulala 0.361
Central Africa Nilo-Saharan 0.367
West Central Africa Afroasiatic 0.462
West Central Africa Nilo-Saharan 0.500

As you can see T is fixed outside of Africa, and varies across many African populations  Previous work implied this, though coverage within Africa was not good. One thing to observe though is that the frequency of A within Africa can not be explained by recent Eurasian admixture. The frequency is way too high for that to be the sole explanation, and in any case there is no evidence that ~33% of the Hadza’s ancestry is of Eurasian provenance (the Hadza being one of the three major groups of African hunter-gatherers, along with the Bushmen and Pygmies).

Within the paper the authors resequenced ~1,000 base pairs across diverse African populations in an exonic region of this gene (the stuff that codes for amino acids). What they discovered is that of the SNPs segregating, 516 in particular was critical toward effecting phenotyping change. Not only did individuals with the T variant notably exhibit stronger bitter sensitivity, but in vitro expression with a reporter was elevated. Because they had such a dense genomic region they could perform various nucleotide based tests to detect natural selection, and, attempt coalescent models to infer genealogical history.

I’m going to spare you some of the gory details at this point. Here’s what they found. First, it does look like the region is under natural selection in many African populations, in particular, the derived haplotype with T at 516 at the center. But this result is not reproduced across all tests. The coalescent simulations make clear why: the mutation is an old variant with deep roots in the hominin lineage. In other words this variation pre-dates H. sapiens. It looks like the T allele has rapidly increased in frequency relatively recently, though more on the order of ~50,000 years, rather than ~10,000.* Basically around the time of the “Out of Africa” event. Additionally, there’s a tell-tale sign that this is being subject to selection within Africa: the genetic differences across populations at TAS2R16 far exceed the genome-wide values (the Fst at this locus is in the top 1% of loci within the African genome). Finally, one should note that the G allele haplotypes seem to be much more strongly constrained, as if they’re under purifying selection. This means that the switch to T is not all gain.

At this point you may be ready for a story about how some African populations, like Eurasians, underwent a lifestyle change, and diet changes resulted in a shift in sensory perception. That does not seem to be the story. Rather, the authors did not seem to be able to agree upon a neat explanation for what is driving these recent sweeps up from ancient standing genetic variation. They do observe that the variation does tend to cluster geographically, more so than the genome-wide results would imply. There’s likely some adaptation going on, they simply don’t know what. In the introduction and elsewhere you can see that variation at TAS2R16 does correlate with other traits. Not too surprising due to the relatively ubiquity of pleiotropy; one gene with many effects.

Stepping outside of the implications of this specific result, let’s think about what might be a takeaway: something as essential as taste perception might be a side effect of other aspects of evolutionary processes. In other words, we don’t know what the phenotypic target of selection is in this case, but we do have a good handle one of the major side effects, which is sensory perception. How one taste seems like a big deal.** Andthere have been many theories propounded that variation in bitter sensitivity is due to adaptation to poisonous plants and such, but really no one knew, and that was just the most plausible of low hanging fruit. With these results from Africa, where there is more variation in the trait and genes, and good geographic coverage, that seems to be an implausible model to adhere to (one would think the hunter-gatherer Hadza would exhibit the most sensitivity, no?). Many of the traits and tendencies which we humans see as fundamental, essential, and of great import, many actually be side effects of powerful evolutionary forces hammering at the genetic-correlation matrices which define the hidden network of co-dependencies within the genome. So there, I said it. Life is an accident. Enjoy it.

Citation: Campbell, Michael C., et al. “Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa.” Molecular biology and evolution (2013): mst211.

* If it was closer to ~10,000 I think haplotype based tests would come back with something, but they do not.

** Some Epicureans might be accused of reducing the good to taste!

The post The color of life as a coincidence appeared first on Gene Expression.

November 8, 2013

Selection happens; but where, when, and why?

Filed under: Anthroplogy,Genetics,Genomics,Pigmentation — Razib Khan @ 1:49 am
Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Nina Davuluri, Miss America 2014, Credit: Andy Jones

Nina Davuluri, Miss America 2014, Credit: Andy Jones

One of the secondary issues which cropped up with Nina Davuluri winning Miss America is that it seems implausible that someone with her complexion would be able to win any Indian beauty contest. A quick skim of Google images “Miss India” will make clear the reality that I’m alluding to. The Indian beauty ideal, especially for females, is skewed to the lighter end of the complexion distribution of native South Asians. Nina Davuluri herself is not particularly dark skinned if you compared her to the average South Asian; in fact she is likely at the median. But it would be surprising to see a woman who looks like her held up as conventionally beautiful in the mainstream Indian media. When I’ve pointed this peculiar aspect out to Indians* some of them of will submit that there are dark skinned female celebrities, but when I look up the actresses in question they are invariably not very dark skinned, though perhaps by comparison to what is the norm in that industry they may be. But whatever the cultural reality is, the fraught relationship of color variation to aesthetic variation prompts us to ask, why are South Asians so diverse in their complexions in the first place? A new paper in PLoS Genetics, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, explores this genetic question in depth.

Much of the low hanging fruit in this area was picked years ago. A few large effect genetic variants which are known to be polymorphic across many populations in Western Eurasia segregate within South Asian populations. What this means in plainer language is that a few genes which cause major changes in phenotype are floating around in alternative flavors even within families among people of Indian subcontinental origin. Ergo, you can see huge differences between full siblings in complexion (African Americans, as an admixed population, are analogous). While loss of pigmentation in eastern and western Eurasia seems to be a case of convergent evolution (different mutations in overlapping sets of genes), the H. sapiens sapiens ancestral condition of darker skin is well conserved from Melanesia to Africa.


So what’s the angle on this paper you may ask? Two things. The first is that it has excellent coverage of South Asian populations. This matters because to understand variation in complexion you should probably look at populations which vary a great deal. Much of the previous work has focused on populations at the extremes of the human distribution, Africans and Europeans. There are obvious limitations using this approach. If you are looking at variant traits, then focusing on populations where the full range of variation is expressed can be useful. Second, this paper digs deeply into the subtle evolutionary and phylogenomic questions which are posed by the diversification of human pigmentation. It is often said that race is often skin deep, as if to dismiss the importance of human biological variation. But skin is a rather big deal. It’s our biggest organ, and the pigmentation loci do seem to be rather peculiar.

You probably know that on the order of ~20% of genetic variation is partitioned between continent populations (races). But this is not the case at all genes. And pigmentation ones tend to be particular notable exceptions to the rule. In late 2005 a paper was published which arguably ushered in the era of modern pigmentation genomics, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. The authors found that one nonsynonomous mutation was responsible for on the order of 25 to 33% of the variation in skin color difference between Africans and Europeans. And, the allele frequency was nearly disjoint across the two populations, and between Europeans and East Asians. When comparing Europeans to Africans and East Asians almost all the variation was partitioned across the populations, with very little within them. The derived SNP, which differs from the ancestral state, is found at ~100% frequency in Europeans, and ~0% in Africans and East Asians. It is often stated (you can Google it!) that this variant is the second most ancestrally informative allele in the human genome in relation to Europeans vs. Africans.

SLC24A5 was just the beginning. SLC45A2, TYR, OCA2, and KITLG are just some of the numerous alphabet soup of loci which has come to be understood to affect normal human variation in pigmentation. Despite the relatively large roll call of pigmentation genes one can safely say that between any two reasonably distinct geographic populations ~90 percent of the between population variation in the trait is going to be due to ~10 genes. Often there is a power law distribution as well. The first few genes of large effect are over 50% of the variance, while subsequent loci are progressively less important.

So how does this work to push the overall results forward?

– With their population coverage the authors confirm that SLC24A5 seems to be polymorphic in all Indo-European and Dravidian speaking populations in the subcontinent. The frequency of the derived variant ranges from ~90% in the Northwest, and ~80% in Brahmin populations all over the subcontinent, to ~10-20% in some tribal groups.

– Though there is a north-south gradient, it is modest, with a correlation of ~0.25. There is a much stronger correlation with longtitude, but I’m rather sure that this is an artifact of their low sampling of Indo-European populations in the eastern Gangetic plain. As hinted in the piece the correlation with longitude has to do with the fact that Tibetan and Burman populations in these fringe regions tend to lack the West Eurasian allele.

– Using haplotype based tests of natural selection the authors infer that the frequency of this allele has been driven up positively in north, but not south, India. It could be that the authors lack power to detect selection in the south because of lower frequency of the derived allele. And, I did wonder if selection in the north was simply an echo of what occurred in West Eurasia. But if you look at the frequency of the A allele in the north most of the populations seem to have a higher frequency of the derived variant than they do of inferred “Ancestral North Indian”.

What’s perhaps more interesting is the bigger picture of human evolutionary dynamics and phylogenetics that these results illuminate. Resequencing the region around SLC24A5 these researchers confirmed it does look like the derived variant is identical by descent in all populations across Western Eurasia and into South Asia. What this means is that this mutation arose in someone at some point around the Last Glacial Maximum, after West Eurasians separated from East Eurasians. The authors gives some numbers using some standard phylogenetic techniques, but admit that it is ancient DNA that will give true clarity on the deeper questions. When I see something written like that my hunch, and hope, is that more papers are coming soon.

When I first read The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, I thought that it was essential to read Ancient DNA Links Native Americans With Europe and Efficient moment-based inference of admixture parameters and sources of gene flow. The reason goes back to the plot which I generated at the top of this post: notice that Native Americans do not carry the West Eurasian variant of SLC24A5. What the find of the ~24,000 Siberian boy, and his ancient DNA, suggest is that there was a population with affinities closer to West Eurasians than East Eurasians that contributed to the ancestry of Native Americans. The lack of the European variant of SLC24A5 in Native Americans suggests to me that the sweep had not begun, or, that the European variant was disfavored. What the other paper reports is that on the order of 20-40% of the ancestry of Europeans may be derived from an ancient North Eurasian population, unrelated to West Eurasians (or at least not closely related). It is likely that this population has something to do with the Siberian boy. Since Europeans are fixed for the derived variant of SLC24A5, that implies to me that sweep must have occurred after 24,000 years ago.

journal.pgen.1003912.g002At this point I have to admit that I believe need to be careful calling this a “European variant.” Just because it is nearly fixed in Europe, does not imply that the variant arose in Europe. If you look at the frequency of the derived variant you see it is rather high in the northern Middle East. Looking at some of the populations in the Middle Eastern panel the ancestral variant might be all explained by admixture in historical time from Africa. If the sweep began during the last Ice Age, then most of Europe would have been uninhabited. The modern distribution is informative, but it surely does not tell the whole story.

Where we are is that SLC24A5 , and pigmentation as a whole, is coming to be genomically characterized fully. We don’t know the whole story of why light skin was selected so strongly. And we don’t quite know where the selection began, and when it began. But through gradually filling in pieces of the puzzle we may come to grips with this adaptively significant trait in the nearly future.

Citation: Basu Mallick C, Iliescu FM, Möls M, Hill S, Tamang R, et al. (2013) The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. PLoS Genet 9(11): e1003912. doi:10.1371/journal.pgen.1003912

* From my personal experience American born Indians often do not share the same prejudices and biases, partly because subtle shades of brown which are relevant in the Indian context seem ludicrous in the United States.

The post Selection happens; but where, when, and why? appeared first on Gene Expression.

Selection happens; but where, when, and why?

Filed under: Anthroplogy,Genetics,Genomics,Pigmentation — Razib Khan @ 1:49 am
Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Nina Davuluri, Miss America 2014, Credit: Andy Jones

Nina Davuluri, Miss America 2014, Credit: Andy Jones

One of the secondary issues which cropped up with Nina Davuluri winning Miss America is that it seems implausible that someone with her complexion would be able to win any Indian beauty contest. A quick skim of Google images “Miss India” will make clear the reality that I’m alluding to. The Indian beauty ideal, especially for females, is skewed to the lighter end of the complexion distribution of native South Asians. Nina Davuluri herself is not particularly dark skinned if you compared her to the average South Asian; in fact she is likely at the median. But it would be surprising to see a woman who looks like her held up as conventionally beautiful in the mainstream Indian media. When I’ve pointed this peculiar aspect out to Indians* some of them of will submit that there are dark skinned female celebrities, but when I look up the actresses in question they are invariably not very dark skinned, though perhaps by comparison to what is the norm in that industry they may be. But whatever the cultural reality is, the fraught relationship of color variation to aesthetic variation prompts us to ask, why are South Asians so diverse in their complexions in the first place? A new paper in PLoS Genetics, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, explores this genetic question in depth.

Much of the low hanging fruit in this area was picked years ago. A few large effect genetic variants which are known to be polymorphic across many populations in Western Eurasia segregate within South Asian populations. What this means in plainer language is that a few genes which cause major changes in phenotype are floating around in alternative flavors even within families among people of Indian subcontinental origin. Ergo, you can see huge differences between full siblings in complexion (African Americans, as an admixed population, are analogous). While loss of pigmentation in eastern and western Eurasia seems to be a case of convergent evolution (different mutations in overlapping sets of genes), the H. sapiens sapiens ancestral condition of darker skin is well conserved from Melanesia to Africa.


So what’s the angle on this paper you may ask? Two things. The first is that it has excellent coverage of South Asian populations. This matters because to understand variation in complexion you should probably look at populations which vary a great deal. Much of the previous work has focused on populations at the extremes of the human distribution, Africans and Europeans. There are obvious limitations using this approach. If you are looking at variant traits, then focusing on populations where the full range of variation is expressed can be useful. Second, this paper digs deeply into the subtle evolutionary and phylogenomic questions which are posed by the diversification of human pigmentation. It is often said that race is often skin deep, as if to dismiss the importance of human biological variation. But skin is a rather big deal. It’s our biggest organ, and the pigmentation loci do seem to be rather peculiar.

You probably know that on the order of ~20% of genetic variation is partitioned between continent populations (races). But this is not the case at all genes. And pigmentation ones tend to be particular notable exceptions to the rule. In late 2005 a paper was published which arguably ushered in the era of modern pigmentation genomics, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. The authors found that one nonsynonomous mutation was responsible for on the order of 25 to 33% of the variation in skin color difference between Africans and Europeans. And, the allele frequency was nearly disjoint across the two populations, and between Europeans and East Asians. When comparing Europeans to Africans and East Asians almost all the variation was partitioned across the populations, with very little within them. The derived SNP, which differs from the ancestral state, is found at ~100% frequency in Europeans, and ~0% in Africans and East Asians. It is often stated (you can Google it!) that this variant is the second most ancestrally informative allele in the human genome in relation to Europeans vs. Africans.

SLC24A5 was just the beginning. SLC45A2, TYR, OCA2, and KITLG are just some of the numerous alphabet soup of loci which has come to be understood to affect normal human variation in pigmentation. Despite the relatively large roll call of pigmentation genes one can safely say that between any two reasonably distinct geographic populations ~90 percent of the between population variation in the trait is going to be due to ~10 genes. Often there is a power law distribution as well. The first few genes of large effect are over 50% of the variance, while subsequent loci are progressively less important.

So how does this work to push the overall results forward?

- With their population coverage the authors confirm that SLC24A5 seems to be polymorphic in all Indo-European and Dravidian speaking populations in the subcontinent. The frequency of the derived variant ranges from ~90% in the Northwest, and ~80% in Brahmin populations all over the subcontinent, to ~10-20% in some tribal groups.

- Though there is a north-south gradient, it is modest, with a correlation of ~0.25. There is a much stronger correlation with longtitude, but I’m rather sure that this is an artifact of their low sampling of Indo-European populations in the eastern Gangetic plain. As hinted in the piece the correlation with longitude has to do with the fact that Tibetan and Burman populations in these fringe regions tend to lack the West Eurasian allele.

- Using haplotype based tests of natural selection the authors infer that the frequency of this allele has been driven up positively in north, but not south, India. It could be that the authors lack power to detect selection in the south because of lower frequency of the derived allele. And, I did wonder if selection in the north was simply an echo of what occurred in West Eurasia. But if you look at the frequency of the A allele in the north most of the populations seem to have a higher frequency of the derived variant than they do of inferred “Ancestral North Indian”.

What’s perhaps more interesting is the bigger picture of human evolutionary dynamics and phylogenetics that these results illuminate. Resequencing the region around SLC24A5 these researchers confirmed it does look like the derived variant is identical by descent in all populations across Western Eurasia and into South Asia. What this means is that this mutation arose in someone at some point around the Last Glacial Maximum, after West Eurasians separated from East Eurasians. The authors gives some numbers using some standard phylogenetic techniques, but admit that it is ancient DNA that will give true clarity on the deeper questions. When I see something written like that my hunch, and hope, is that more papers are coming soon.

When I first read The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, I thought that it was essential to read Ancient DNA Links Native Americans With Europe and Efficient moment-based inference of admixture parameters and sources of gene flow. The reason goes back to the plot which I generated at the top of this post: notice that Native Americans do not carry the West Eurasian variant of SLC24A5. What the find of the ~24,000 Siberian boy, and his ancient DNA, suggest is that there was a population with affinities closer to West Eurasians than East Eurasians that contributed to the ancestry of Native Americans. The lack of the European variant of SLC24A5 in Native Americans suggests to me that the sweep had not begun, or, that the European variant was disfavored. What the other paper reports is that on the order of 20-40% of the ancestry of Europeans may be derived from an ancient North Eurasian population, unrelated to West Eurasians (or at least not closely related). It is likely that this population has something to do with the Siberian boy. Since Europeans are fixed for the derived variant of SLC24A5, that implies to me that sweep must have occurred after 24,000 years ago.

journal.pgen.1003912.g002At this point I have to admit that I believe need to be careful calling this a “European variant.” Just because it is nearly fixed in Europe, does not imply that the variant arose in Europe. If you look at the frequency of the derived variant you see it is rather high in the northern Middle East. Looking at some of the populations in the Middle Eastern panel the ancestral variant might be all explained by admixture in historical time from Africa. If the sweep began during the last Ice Age, then most of Europe would have been uninhabited. The modern distribution is informative, but it surely does not tell the whole story.

Where we are is that SLC24A5 , and pigmentation as a whole, is coming to be genomically characterized fully. We don’t know the whole story of why light skin was selected so strongly. And we don’t quite know where the selection began, and when it began. But through gradually filling in pieces of the puzzle we may come to grips with this adaptively significant trait in the nearly future.

Citation: Basu Mallick C, Iliescu FM, Möls M, Hill S, Tamang R, et al. (2013) The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. PLoS Genet 9(11): e1003912. doi:10.1371/journal.pgen.1003912

* From my personal experience American born Indians often do not share the same prejudices and biases, partly because subtle shades of brown which are relevant in the Indian context seem ludicrous in the United States.

The post Selection happens; but where, when, and why? appeared first on Gene Expression.

December 18, 2012

Buddy, can you spare some ascertainment?

The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).


To the left are the list of populations against which the Human Origins 1 Array was ascertained, and they look rather comprehensive to me. In contrast, for Geno 2.0 ‘ancestrally informative markers’ were ascertained on 450 populations. The ultimate question for me is this: is all the extra ascertainment on diverse and obscure groups worth it? On first inspection Geno 2.0′s number of SNPs looks modest as I stated, but in my experience when you quality control and merge different panels together you are often left with only a few hundred thousand SNPs in any case. 100-200,000 SNPs is also sufficient to elucidate relationships even in genetically homogeneous regions such as Europe in my experience (it’s more than enough for model-based clustering, and seems to be overkill for MDS or PCA). One issue that jumps out at me about the Affymetrix chip is that it is ascertained toward the antipodes. In contrast, Geno 2.0 takes into account the Eurasian heartland. I suspect, for example, that Geno 2.0 would be better for population or ancestry assignment for South Asians because it would have more informative markers for those populations.

Ultimately I can’t really say much more until I use both marker sets in different and similar contexts. Since Geno 2.0 consciously excludes many functional and medically relevant SNPs its utility is primarily in the domain of demographics and history. If the populations in question are well covered by the Human Origins 1 Array, I see no reason why one shouldn’t go with it. Not only does it have more information about biological function, but the number of markers are many fold greater. On the other hand, Geno 2.0 may be more useful on the “blank zones” of the Affy chip. Hopefully the Genographic Project results paper for Geno 2.0 will come out soon and I can pull down their data set and play with it.

Cite: arXiv:1212.4116

December 12, 2012

A lighter shade of brown: the Dan MacArthur chronicles, not a Romani

Filed under: Anthroplogy,Daniel MacArthur,Human Genetics,Human Genomics — Razib Khan @ 9:25 am

Pakistani honor guard

A few days ago I suggested that Dr. Daniel MacArthur might have South Asian ancestry. Now, when confronted with surprise the best option is to stick with your prior assumption, unless that surprise is powerful enough for you to “update” your model. After a few days of further analysis I will update: I do think Dan MacArthur has South Asian ancestry. Dienekes dug further, and noticed that there are hallmarks of “Ancestral South Indian” ancestry along the first 2/3 or so of chromosome 10. Now, you do have to remember that this genomic region is only half South Asian. The other half is European.

But in any case, one question that some people brought up: perhaps MacArthur has Romani heritage? I’m skeptical of this partly because:

1) there weren’t that many Romani in Britain in the 19th century

2) The British Romani are already very highly admixed

Another friend, who is a population genomicist himself, expressed some skepticism that such a long segment wasn’t broken up by recombination over the generations. My only moderately informed answer is this: we’d only notice the long segments, because if a very small region of ‘exotic’ ancestry was embedded within the dominant ancestral component it probably would not show up on some of these tests (or, we’d assume it was noise). Dan has another segment of South Asian ancestry, but much smaller in size. It may be there are other regions which we could find if we used better reference populations.

Here’s what I tentatively want to do with Dan’s data now. First, take the 80 MB or so which has South Asian ancestry, and phase it. That way I’d have a South Asian chromosome and a European one, and we could look for matches for only the South Asian one. But being busy I didn’t have time to do this. What I did have time to do was reduce the chromosomal region under consideration, and then run an IBS distance analysis in a private data set I have. This is a crude, but not always uninformative analysis. But by looking at the relationships I can now conclude that Dan MacArthur probably does not have Romani ancestry. Why? Because the Romani are of Northwest Indian heritage, and MacArthur’s match pattern using the diploid genotype (so South Asian + European) does not match what I expect would emerge from such a combination.

The full table is below, but to me the fact that he has so many matches with Northwest Indian populations is evidence that his ancestry was not Northwest Indian. Otherwise, he would be matching more Utah white (CEU samples) more often. Rather, someone with a mix of more conventional South Asian ancestry and European ancestry often resembles some of the less South Asian populations of South Asia (e.g., Brahui) in these crude measures. In fact, one of the closest matches to Dan’s IBS profile’s is that of my own mother. She is a rather vanilla ethnic Bengali, so I think there is a strong chance that his Indian ancestry is similar. This weak genetic data isn’t really the primary reason. The British East India company operated out of Bengal for much of its history, and there are simply a lot of Bengalis.

There’s a lot more that can be done here. Since I don’t have time, here’s the pedigree file if anyone wants to play with them (Dan is DGM001).

Population Genetic distance from Dan Standardized distance
Brahui 0.253 81.268
Burusho 0.257 82.736
Razib’s Mother 0.258 82.783
CEU 0.258 82.993
Burusho 0.258 83.024
CEU 0.26 83.547
Sakilli 0.26 83.555
Brahui 0.261 83.831
Brahui 0.261 83.857
GIH 0.261 83.955
CEU 0.261 83.972
CEU 0.261 83.985
CEU 0.262 84.043
North Kannadi 0.262 84.169
CEU 0.262 84.207
CEU 0.262 84.318
CEU 0.262 84.33
CEU 0.263 84.391
Paniya 0.263 84.408
CEU 0.263 84.437
CEU 0.263 84.445
CEU 0.263 84.488
CEU 0.263 84.606
CEU 0.263 84.609
CEU 0.264 84.691
Brahui 0.264 84.709
CEU 0.264 84.752
CEU 0.264 84.764
Brahui 0.264 84.822
GIH 0.264 84.826
Burusho 0.264 84.841
CEU 0.264 84.898
CEU 0.264 84.975
North Kannadi 0.264 84.992
CEU 0.265 85.087
Paniya 0.265 85.212
CEU 0.265 85.226
CEU 0.265 85.25
CEU 0.265 85.25
CEU 0.265 85.278
CEU 0.265 85.299
North Kannadi 0.265 85.3
Burusho 0.265 85.309
Burusho 0.266 85.328
CEU 0.266 85.363
CEU 0.266 85.409
North Kannadi 0.266 85.412
CEU 0.266 85.436
Burusho 0.266 85.446
Bene Israel 0.266 85.508
CEU 0.266 85.521
GIH 0.266 85.618
GIH 0.267 85.661
CEU 0.267 85.696
CEU 0.267 85.722
CEU 0.267 85.732
Brahui 0.267 85.777
GIH 0.267 85.793
CEU 0.267 85.799
CEU 0.267 85.816
Cochin Jews 0.267 85.85
CEU 0.267 85.943
Brahui 0.268 85.996
CEU 0.268 86.005
Cochin Jews 0.268 86.011
CEU 0.268 86.08
CEU 0.268 86.115
CEU 0.268 86.18
GIH 0.268 86.229
Cochin Jews 0.268 86.234
CEU 0.268 86.244
Burusho 0.268 86.265
CEU 0.268 86.277
CEU 0.268 86.278
CEU 0.269 86.288
CEU 0.269 86.291
CEU 0.269 86.318
CEU 0.269 86.325
CEU 0.269 86.326
GIH 0.269 86.327
CEU 0.269 86.329
CEU 0.269 86.354
CEU 0.269 86.387
CEU 0.269 86.463
CEU 0.269 86.515
CEU 0.269 86.517
CEU 0.269 86.55
CEU 0.27 86.609
Paniya 0.27 86.682
CEU 0.27 86.687
CEU 0.27 86.696
CEU 0.27 86.717
CEU 0.27 86.733
Sakilli 0.27 86.74
CEU 0.27 86.866
Malayan 0.27 86.879
North Kannadi 0.27 86.883
CEU 0.271 86.937
Brahui 0.271 86.952
Burusho 0.271 86.956
CEU 0.271 86.957
CEU 0.271 86.977
North Kannadi 0.271 86.995
GIH 0.271 87.018
CEU 0.271 87.042
CEU 0.271 87.066
CEU 0.271 87.07
Brahui 0.271 87.09
Bene Israel 0.271 87.094
Sakilli 0.271 87.141
CEU 0.271 87.2
CEU 0.271 87.24
North Kannadi 0.272 87.253
CEU 0.272 87.297
Burusho 0.272 87.307
CEU 0.272 87.327
GIH 0.272 87.353
CEU 0.272 87.355
Cochin Jews 0.272 87.381
CEU 0.272 87.384
CEU 0.272 87.5
CEU 0.272 87.535
CEU 0.273 87.594
Malayan 0.273 87.676
CEU 0.273 87.702
CEU 0.273 87.741
Burusho 0.273 87.806
CEU 0.273 87.846
Cambodians 0.274 87.932
North Kannadi 0.274 87.951
CEU 0.274 87.951
Burusho 0.274 88.03
CEU 0.274 88.047
CEU 0.274 88.081
CEU 0.274 88.089
CEU 0.274 88.101
CEU 0.274 88.179
CEU 0.274 88.19
North Kannadi 0.275 88.243
CEU 0.275 88.32
GIH 0.275 88.325
CEU 0.275 88.349
Brahui 0.275 88.393
CEU 0.275 88.402
CEU 0.275 88.457
Bene Israel 0.276 88.552
CEU 0.276 88.577
CEU 0.276 88.603
CEU 0.276 88.647
CEU 0.276 88.7
CEU 0.276 88.729
CEU 0.276 88.814
CEU 0.276 88.85
Brahui 0.276 88.855
CEU 0.277 88.923
GIH 0.277 88.99
Paniya 0.277 89.082
CEU 0.277 89.118
CEU 0.277 89.15
CEU 0.277 89.151
CEU 0.277 89.17
CEU 0.278 89.184
Cambodians 0.278 89.208
Cambodians 0.278 89.233
Cambodians 0.278 89.383
CEU 0.278 89.45
CEU 0.278 89.493
Cambodians 0.279 89.522
CEU 0.279 89.595
CEU 0.279 89.679
CEU 0.279 89.753
CEU 0.279 89.762
CEU 0.279 89.807
Cambodians 0.28 89.942
GIH 0.28 90.085
CEU 0.281 90.178
Brahui 0.281 90.364
Cambodians 0.282 90.543
Cambodians 0.282 90.559
Cambodians 0.282 90.77
Cambodians 0.283 90.898
CEU 0.283 90.956
CEU 0.284 91.316
CHD 0.289 92.952
Sakilli 0.29 93.103
Bene Israel 0.29 93.122
CHD 0.291 93.619
CHD 0.291 93.663
CHD 0.293 94.125
CHD 0.293 94.248
CHD 0.294 94.451
CHD 0.294 94.629
CHD 0.296 94.965
CHD 0.296 95.279
Yorubas 0.297 95.298
CHD 0.297 95.368
CHD 0.297 95.438
CHD 0.297 95.441
Yorubas 0.297 95.567
CHD 0.298 95.678
CHD 0.298 95.828
CHD 0.299 96.032
CHD 0.299 96.127
CHD 0.3 96.349
CHD 0.3 96.403
CHD 0.3 96.443
CHD 0.3 96.508
CHD 0.3 96.523
CHD 0.3 96.533
CHD 0.301 96.575
CHD 0.301 96.598
CHD 0.301 96.624
CHD 0.301 96.625
CHD 0.301 96.738
CHD 0.301 96.758
CHD 0.301 96.869
Yorubas 0.302 97.106
CHD 0.303 97.37
CHD 0.303 97.41
Yorubas 0.304 97.681
CHD 0.304 97.713
CHD 0.304 97.747
Yorubas 0.304 97.829
CHD 0.304 97.838
CHD 0.305 98.106
CHD 0.306 98.309
Yorubas 0.307 98.499
CHD 0.307 98.546
CHD 0.307 98.547
CHD 0.307 98.606
CHD 0.307 98.764
CHD 0.307 98.78
CHD 0.307 98.803
Yorubas 0.308 98.947
Yorubas 0.308 99.03
Yorubas 0.309 99.411
Yorubas 0.309 99.417
CHD 0.309 99.452
CHD 0.31 99.624
Yorubas 0.311 100

A lighter shade of brown: the Dan MacArthur chronicles, not a Romani

Filed under: Anthroplogy,Daniel MacArthur,Human Genetics,Human Genomics — Razib Khan @ 9:25 am

Pakistani honor guard

A few days ago I suggested that Dr. Daniel MacArthur might have South Asian ancestry. Now, when confronted with surprise the best option is to stick with your prior assumption, unless that surprise is powerful enough for you to “update” your model. After a few days of further analysis I will update: I do think Dan MacArthur has South Asian ancestry. Dienekes dug further, and noticed that there are hallmarks of “Ancestral South Indian” ancestry along the first 2/3 or so of chromosome 10. Now, you do have to remember that this genomic region is only half South Asian. The other half is European.

But in any case, one question that some people brought up: perhaps MacArthur has Romani heritage? I’m skeptical of this partly because:

1) there weren’t that many Romani in Britain in the 19th century

2) The British Romani are already very highly admixed

Another friend, who is a population genomicist himself, expressed some skepticism that such a long segment wasn’t broken up by recombination over the generations. My only moderately informed answer is this: we’d only notice the long segments, because if a very small region of ‘exotic’ ancestry was embedded within the dominant ancestral component it probably would not show up on some of these tests (or, we’d assume it was noise). Dan has another segment of South Asian ancestry, but much smaller in size. It may be there are other regions which we could find if we used better reference populations.

Here’s what I tentatively want to do with Dan’s data now. First, take the 80 MB or so which has South Asian ancestry, and phase it. That way I’d have a South Asian chromosome and a European one, and we could look for matches for only the South Asian one. But being busy I didn’t have time to do this. What I did have time to do was reduce the chromosomal region under consideration, and then run an IBS distance analysis in a private data set I have. This is a crude, but not always uninformative analysis. But by looking at the relationships I can now conclude that Dan MacArthur probably does not have Romani ancestry. Why? Because the Romani are of Northwest Indian heritage, and MacArthur’s match pattern using the diploid genotype (so South Asian + European) does not match what I expect would emerge from such a combination.

The full table is below, but to me the fact that he has so many matches with Northwest Indian populations is evidence that his ancestry was not Northwest Indian. Otherwise, he would be matching more Utah white (CEU samples) more often. Rather, someone with a mix of more conventional South Asian ancestry and European ancestry often resembles some of the less South Asian populations of South Asia (e.g., Brahui) in these crude measures. In fact, one of the closest matches to Dan’s IBS profile’s is that of my own mother. She is a rather vanilla ethnic Bengali, so I think there is a strong chance that his Indian ancestry is similar. This weak genetic data isn’t really the primary reason. The British East India company operated out of Bengal for much of its history, and there are simply a lot of Bengalis.

There’s a lot more that can be done here. Since I don’t have time, here’s the pedigree file if anyone wants to play with them (Dan is DGM001).

Population Genetic distance from Dan Standardized distance
Brahui 0.253 81.268
Burusho 0.257 82.736
Razib’s Mother 0.258 82.783
CEU 0.258 82.993
Burusho 0.258 83.024
CEU 0.26 83.547
Sakilli 0.26 83.555
Brahui 0.261 83.831
Brahui 0.261 83.857
GIH 0.261 83.955
CEU 0.261 83.972
CEU 0.261 83.985
CEU 0.262 84.043
North Kannadi 0.262 84.169
CEU 0.262 84.207
CEU 0.262 84.318
CEU 0.262 84.33
CEU 0.263 84.391
Paniya 0.263 84.408
CEU 0.263 84.437
CEU 0.263 84.445
CEU 0.263 84.488
CEU 0.263 84.606
CEU 0.263 84.609
CEU 0.264 84.691
Brahui 0.264 84.709
CEU 0.264 84.752
CEU 0.264 84.764
Brahui 0.264 84.822
GIH 0.264 84.826
Burusho 0.264 84.841
CEU 0.264 84.898
CEU 0.264 84.975
North Kannadi 0.264 84.992
CEU 0.265 85.087
Paniya 0.265 85.212
CEU 0.265 85.226
CEU 0.265 85.25
CEU 0.265 85.25
CEU 0.265 85.278
CEU 0.265 85.299
North Kannadi 0.265 85.3
Burusho 0.265 85.309
Burusho 0.266 85.328
CEU 0.266 85.363
CEU 0.266 85.409
North Kannadi 0.266 85.412
CEU 0.266 85.436
Burusho 0.266 85.446
Bene Israel 0.266 85.508
CEU 0.266 85.521
GIH 0.266 85.618
GIH 0.267 85.661
CEU 0.267 85.696
CEU 0.267 85.722
CEU 0.267 85.732
Brahui 0.267 85.777
GIH 0.267 85.793
CEU 0.267 85.799
CEU 0.267 85.816
Cochin Jews 0.267 85.85
CEU 0.267 85.943
Brahui 0.268 85.996
CEU 0.268 86.005
Cochin Jews 0.268 86.011
CEU 0.268 86.08
CEU 0.268 86.115
CEU 0.268 86.18
GIH 0.268 86.229
Cochin Jews 0.268 86.234
CEU 0.268 86.244
Burusho 0.268 86.265
CEU 0.268 86.277
CEU 0.268 86.278
CEU 0.269 86.288
CEU 0.269 86.291
CEU 0.269 86.318
CEU 0.269 86.325
CEU 0.269 86.326
GIH 0.269 86.327
CEU 0.269 86.329
CEU 0.269 86.354
CEU 0.269 86.387
CEU 0.269 86.463
CEU 0.269 86.515
CEU 0.269 86.517
CEU 0.269 86.55
CEU 0.27 86.609
Paniya 0.27 86.682
CEU 0.27 86.687
CEU 0.27 86.696
CEU 0.27 86.717
CEU 0.27 86.733
Sakilli 0.27 86.74
CEU 0.27 86.866
Malayan 0.27 86.879
North Kannadi 0.27 86.883
CEU 0.271 86.937
Brahui 0.271 86.952
Burusho 0.271 86.956
CEU 0.271 86.957
CEU 0.271 86.977
North Kannadi 0.271 86.995
GIH 0.271 87.018
CEU 0.271 87.042
CEU 0.271 87.066
CEU 0.271 87.07
Brahui 0.271 87.09
Bene Israel 0.271 87.094
Sakilli 0.271 87.141
CEU 0.271 87.2
CEU 0.271 87.24
North Kannadi 0.272 87.253
CEU 0.272 87.297
Burusho 0.272 87.307
CEU 0.272 87.327
GIH 0.272 87.353
CEU 0.272 87.355
Cochin Jews 0.272 87.381
CEU 0.272 87.384
CEU 0.272 87.5
CEU 0.272 87.535
CEU 0.273 87.594
Malayan 0.273 87.676
CEU 0.273 87.702
CEU 0.273 87.741
Burusho 0.273 87.806
CEU 0.273 87.846
Cambodians 0.274 87.932
North Kannadi 0.274 87.951
CEU 0.274 87.951
Burusho 0.274 88.03
CEU 0.274 88.047
CEU 0.274 88.081
CEU 0.274 88.089
CEU 0.274 88.101
CEU 0.274 88.179
CEU 0.274 88.19
North Kannadi 0.275 88.243
CEU 0.275 88.32
GIH 0.275 88.325
CEU 0.275 88.349
Brahui 0.275 88.393
CEU 0.275 88.402
CEU 0.275 88.457
Bene Israel 0.276 88.552
CEU 0.276 88.577
CEU 0.276 88.603
CEU 0.276 88.647
CEU 0.276 88.7
CEU 0.276 88.729
CEU 0.276 88.814
CEU 0.276 88.85
Brahui 0.276 88.855
CEU 0.277 88.923
GIH 0.277 88.99
Paniya 0.277 89.082
CEU 0.277 89.118
CEU 0.277 89.15
CEU 0.277 89.151
CEU 0.277 89.17
CEU 0.278 89.184
Cambodians 0.278 89.208
Cambodians 0.278 89.233
Cambodians 0.278 89.383
CEU 0.278 89.45
CEU 0.278 89.493
Cambodians 0.279 89.522
CEU 0.279 89.595
CEU 0.279 89.679
CEU 0.279 89.753
CEU 0.279 89.762
CEU 0.279 89.807
Cambodians 0.28 89.942
GIH 0.28 90.085
CEU 0.281 90.178
Brahui 0.281 90.364
Cambodians 0.282 90.543
Cambodians 0.282 90.559
Cambodians 0.282 90.77
Cambodians 0.283 90.898
CEU 0.283 90.956
CEU 0.284 91.316
CHD 0.289 92.952
Sakilli 0.29 93.103
Bene Israel 0.29 93.122
CHD 0.291 93.619
CHD 0.291 93.663
CHD 0.293 94.125
CHD 0.293 94.248
CHD 0.294 94.451
CHD 0.294 94.629
CHD 0.296 94.965
CHD 0.296 95.279
Yorubas 0.297 95.298
CHD 0.297 95.368
CHD 0.297 95.438
CHD 0.297 95.441
Yorubas 0.297 95.567
CHD 0.298 95.678
CHD 0.298 95.828
CHD 0.299 96.032
CHD 0.299 96.127
CHD 0.3 96.349
CHD 0.3 96.403
CHD 0.3 96.443
CHD 0.3 96.508
CHD 0.3 96.523
CHD 0.3 96.533
CHD 0.301 96.575
CHD 0.301 96.598
CHD 0.301 96.624
CHD 0.301 96.625
CHD 0.301 96.738
CHD 0.301 96.758
CHD 0.301 96.869
Yorubas 0.302 97.106
CHD 0.303 97.37
CHD 0.303 97.41
Yorubas 0.304 97.681
CHD 0.304 97.713
CHD 0.304 97.747
Yorubas 0.304 97.829
CHD 0.304 97.838
CHD 0.305 98.106
CHD 0.306 98.309
Yorubas 0.307 98.499
CHD 0.307 98.546
CHD 0.307 98.547
CHD 0.307 98.606
CHD 0.307 98.764
CHD 0.307 98.78
CHD 0.307 98.803
Yorubas 0.308 98.947
Yorubas 0.308 99.03
Yorubas 0.309 99.411
Yorubas 0.309 99.417
CHD 0.309 99.452
CHD 0.31 99.624
Yorubas 0.311 100

December 2, 2012

TreeMix: Who were the West Eurasian ancestors of Ethiopians?

Filed under: Anthroplogy,Ethiopia,Genetics,Genomics — Razib Khan @ 3:46 pm

One of the primary concerns/questions I had about Luca Pagani’s paper on the genetic origin of Ethiopians is that he found that their West Eurasian ancestor was closer to Levantine than Arabian. I was confused by this because on model-based clustering (e.g., Admixture) when you push down to a fine level of granularity you always see that the Ethiopians cluster with the Yemenis for their non-African ancestry. More precisely, Yemeni Jews are often ~100% component X, which ~50% of the ancestry of Ethiopians.

From what I recall Pagani et al. used haplotype windows which they assigned to Eurasian or African ancestral components, and they compared these to the populations related to the putative ancestral groups. Because Pagani et al. used blocks of the genome, rather than just on specific genotypes, I weight their finding more strongly. But I wanted to double check with TreeMix if the finding in Admixture was peculiar.

So again, I took a ~150,000 SNP set ran it on TreeMix with migration = 5.

Again, you see that the gene flow to the Ethiopians is coming from a position on the tree rather close to Yemenite Jews. One model which may explain this, and still align with Pagani’s findings, is that Arabians themselves are a synthetic population. A “pure” Yemenite Jew may have ancient admixture of African affinity beneath an intrusive element from the north. The parallelism between Ethiopia and Arabia in this model is clear, with the major difference being magnitude of the source population admixture (greater in Arabia), as well as some differences of the target population.

This again reiterates us to be careful of trust first-blush summaries.

Layering genetic histories

Filed under: Anthroplogy,Genetics,Genomics,Human Genetics,Human Genomics — Razib Khan @ 12:14 pm

As a follow up to my post from yesterday, I decided to run TreeMix on a data set I happened to have had on hand (see Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data for more on TreeMix). Basically I wanted to display a tree with, and without, gene flow.

The technical details are straightforward. I LD pruned ~550,000 SNPs down to ~150,000. I ran TreeMix without and with migration parameters with the Bantu Kenya population being the root. Finally, when I did turn on the migration parameter I set it for 5. You can see the results below.

Most of the flows are pretty expected. The West Eurasian flow from the Turks to the Uygurs makes sense, because there is a large West Asian component to what the Uygurs have (from East Iranians?). The Chuvash are a Turkic group with minor, but significant, Turkic component. The HGDP Russian sample does have some East Eurasian ancestry. And the Moroccans also have African ancestry. But your guess is as good as mine with the Bantu flow in. These are I think Kenya, so it might be trying to interpret Nilotic admixture as generalized Eurasian.

A minor note: installing TreeMix and generating the appropriate files from pedigree format is not to difficult. But you might have confusion in how to generate the pedigree input file. You do it like so in PLINK:

./plink --noweb --bfile YourFile --freq --within YourGroupNamesFile --out YourOutPutFile

It’s the last you want to put into TreeMix’s python conversion script. The YourGroupNamesFile is basically the .fam file with an extra column, the population names for each individual.

Older Posts »

Powered by WordPress