Razib Khan One-stop-shopping for all of my content

August 9, 2018

The new post-genetic paradigm will come

Filed under: Archaeology,History,Prehistory — Razib Khan @ 1:29 am

Oftentimes the domain on which a technical framework is applied matters a great deal. Imagine, if you will, an explicit statistical test for a phylogenetic relationship between a set of extant populations, whereby one infers a group of ancestral populations. If the genus is Drosophila, it’s academic. Interesting, but academic. If the genus is Homo, then it gets complicated.

People care a great deal about the historical inferences made from human population genomic datasets. I say genomic, and not genetic, because the last ten years with genome-wide analyses and ancient DNA is very different from what we saw in the late 20th century and aughts. The definitive granularity is such that population genomics has touched upon very sensitive and precious issues, both in a scholarly and non-scholarly context.

A lot of the time I have my head down reading supplements where the statistical methods are. The reality is that this sort of science is cutting edge, and there are always later revisions. Usually you can see where those revisions might come from if you look at the detailed methods and conclusions that are found in the supplements. Also, you will find that that is where you see the limitations, and the reasons that the authors chose particular parameters.

To give you a sense of what I’m talking about, consider 2016’s Genomic insights into the origin of farming in the ancient Near East. The paper proper is 24 pages. But the supplemental text is 148 pages. There is a lot of interesting stuff in there, but I would just jump to page 125 and read the whole section there and down to the end. The method portion is important because you always need to take number values in results with a grain of salt. You see for example later work which refines fractions significantly when it comes to estimating admixture between a finite set of putative populations. And the last section seems likely to become a paper in and of itself at some point

But that doesn’t mean that the genetic inferences are not robust and come out of a vacuum. In the details the phylogenetic models being tested are going to be wrong on many particulars, but in relation to hypotheses being tested they are often entirely sufficient to reject to accept.

For example, there was long the idea that the Basque people of the western trans-Pyrenees region of Spain and France descended from pre-farming Europeans, and therefore the Basque language, which is an isolate, might have local roots which went back to the Pleistocene. Today, ancient DNA along with explicit testing of various phylogenetic scenarios makes it clear that the largest fraction of Basque ancestry derives from “Early European Farmers,” who represent a demographic pulse which radiated out of the Eastern Mediterranean and reached Spain 7,500 years ago. Of course Basques do have local hunter-gatherer ancestry, but these Mesolithic peoples themselves were the last in a sequence of very distinctive populations in Pleistocene Europe. Finally, Basques do have admixture from Indo-European peoples, just less than other people in Iberia.

Of course, genetics can’t tell us about languages. Using linguistic labels in population genetic papers is to some extent a lexical convenience, but it is also one we use because of the constellation of information we have. The last major demographic pulse into Iberia is associated with an ancestry which derives from Central Eurasia. This ancestry is copious in Northern Europe, but is also found in South Asia, and ancient DNA suggests its expansion occurred between 5,000 and 3,500 years ago. It also happens that the Indo-European languages are spoken in both India and Europe. The natural inference then is to make an association between this language family, and this demographic pulse.

Some observers note discordance between estimated fractions from paper to paper, but don’t seem to understand that the point isn’t to estimate fractions of ancestry as ends in and of themselves, but to estimate fractions of ancestry to expose and highlight demographic change (or lack thereof). We can say with a very high degree of certainty that the period between 3000 and 2000 BC witnessed massive demographic change in Northern Europe. Somewhat later there was a similar change in Southern Europe, but more demographically modest. These are simple facts.

There are some scholars, frankly often archaeologists, who dismiss the relevance of the genetic findings. But anyone who has read archaeology knows that there are many cases where researchers see demographic continuity, and posit in situ cultural evolution, where it is just as possible that a new people arrived. The reason ancient DNA has revolutionized our understanding of prehistory isn’t because it has brought us new knowledge, it has foregrounded old and buried knowledge. The knowledge being that migration matters.

But genetics is only a skeleton. A framework. True flesh on the bones of the story needs the input of archaeologists, linguistics, and other scholars. In Who We Are and How We Got Here David Reich expresses his ambition to construct a historical genetic atlas of the world. But that atlas will be all the poorer without the input from other fields besides genetics. Many archaeologists have gotten on board with genetics as a tool, but the reality is that there needs to occur the rejection of some theories precious to some scholars if there is going to be total buy-in. Eventually that will happen, and a new synthesis will arise.

July 30, 2018

Ancient India, archaeology, etc.

Filed under: ancient india,Prehistory — Razib Khan @ 11:43 pm

I think I have asked before, but I’m soliciting suggestions about a book on Indian prehistory, with a focus on the period between 10 and 2 thousand years ago. India: The Ancient Past: A History of the Indian Subcontinent from c. 7000 BCE to CE 1200 looks decent, but I don’t have an ability to evaluate this stuff.

The reason is pretty simple. I’ve been asked to write a book chapter on the genetics of India. The draft is written, and I think we’re 80-90% done with the genetic “big picture.” The real work is going to be in synthesizing with archaeology. To be entirely frank I’m not sure how open Indian archaeologists are going to be to the new genetics, which is not stopping at any time in the near future. So I think perhaps I should see what I can snap together myself.

Anyway, suggestions appreciated. Though keep in mind that I don’t know much archaeology and don’t care that much about ancient village plans….

April 1, 2018

When do people forget where they come form?

Filed under: Migration,Prehistory — Razib Khan @ 9:09 pm

When it comes to the arrival of Indo-Aryans to South Asia a major question Indians always post is “if they are invaders why don’t they mention that in their mythology?” My standard rejoinder is straightforward: we have plenty of paleogenetic evidence that many populations are intruders, but their mythology doesn’t indicate that. If the Indian objection is to hold then why not others? Are all human populations autochtonous in their native lands?

And yet the most recent work suggests that steppe ancestry didn’t arrive in the BMAC region until 2000 BC. That means that the Cemetary H culture in Punjab dating to 1900 BC is the earliest likely candidate for Indo-Aryans in South Asia. The Rigveda was composed as early as 200 years after this date, or as late as 700 years. Could they have “forgotten” where they came from?

The Irish are one people who have preserved their mythology due to the gradual and indigenous nature o Christianization. But 2,500 years after their arrival en masse from the continent they had forgotten the details. But, the motif of invasion was preserved, though we don’t know if that is a memory of their past, or just a channeling of the mythos of the period when their folklore was written down. Another example might be the Japanese, who arrived about 1,100 years before the Heian period, the first flowering of literate civilization on the island. To my knowledge, their mass migration from southern Korea was mostly forgotten by then.

With all this in mind, I decided to reread Genetic origins of the Minoans and Mycenaeans. The conclusion of the paper is that it’s clear an appreciable, though minority, component of Mycenaean ancestry seems to have some affinity with Indo-European groups. The two candidates for the donors are people from the Eurasian steppe, or Copper age Armenians. For linguistic reasons that I can barely evaluate, I lean toward the former. This implies that the proto-Greeks arrived in the late 3rd millennium or early 2nd millennium. The mythology of ancient Greek as recorded by Hesiod and others in the Archaic period probably dates in part to the Bronze Age (some of the Greek gods are recorded in the Linear B tablets). To my knowledge the Greeks do not record when they arrived from outside of Greece.

This suggests that 1,000 years is sufficient for a forgetting, at least for a semi-literate society.

The last is key. Societies with written histories can maintain continuity. But what about oral societies?

September 29, 2017

The war between the Aesir and Vanir

Filed under: Aesir,Indo-European,Mythology,Prehistory,Vanir — Razib Khan @ 7:09 pm

In Snorri Sturluson’s preservation of pre-Christian Scandinavian mythos, he outlines two groups of gods, the Aesir and the Vanir. Though ultimately presented as a united pantheon in comparison to beings such as the giants, there are references to a war between these two divine factions. But, there is still scholarly debate as to the significance of the division between the Aesir and Vanir.

At one extreme some contend that the division was concocted by Sturluson himself for stylistic or poetic reasons. In contrast, others suggest that the Aesir-Vanir division is substantive, and reflects deep historical origins. The Vanir, in this telling, are the fertility gods of pre-Indo-European peoples. The Aesir, are the gods of the Indo-Europeans. The war between the two factions then is a memory of the conflict between the indigenous farmers, and the incoming Indo-European pastoralists. Sturluson himself suggested that the gods of the Norse mythos were simply deifications of great historical personages of the past, lending credence to the idea that the folklore preserved the memory of history.

Ultimately we may never know the real story behind the Aesir-Vanir war (if it ever occurred). But a new paper in The American Journal of Archaeology sheds some light on the transition to Indo-European language in modern Denmark’s Jutland, Talking Neolithic: Linguistic and Archaeological Perspectives on How Indo-European Was Implemented in Southern Scandinavia:

…Farming arrived in Scandinavia with the Funnel Beaker culture by the turn of the fourth millennium B.C.E. It was superseded by the Single Grave culture, which as part of the Corded Ware horizon is a likely vector for the introduction of Indo-European speech. As a result of this introduction, the language spoken by individuals from the Funnel Beaker culture went extinct long before the beginning of the historical record, apparently vanishing without a trace. However, the Indo-European dialect that ultimately developed into Proto-Germanic can be shown to have adopted terminology from a non-Indo-European language, including names for local flora and fauna and important plant domesticates. We argue that the coexistence of the Funnel Beaker culture and the Single Grave culture in the first quarter of the third millennium B.C.E. offers an attractive scenario for the required cultural and linguistic exchange, which we hypothesize took place between incoming speakers of Indo-European and local descendants of Scandinavia’s earliest farmers.

There is a lot of interesting detail in the paper itself. First, the Corded Ware arrived in Jutland in ~2850 BCE, but only occupied the western and central parts of the peninsula. The Funnel Beaker complex, along with influences and interactions with the hunter-gatherer Pitted Ware culture, persisted in robust form until ~2600 BCE in the east of Jutland. Additionally, the authors note that there was a notable cultural geographic division which separated the former Funnel Beaker territory as it was in ~2600 BCE down to ~1500 BCE, when the two zones fused together into a unified Nordic Bronze Age culture.

An explicit analogy is made to the character of prehistoric Aegean society, where a pre-Indo-European matrix was coexistent with Indo-European cultures which arrived from the north for centuries, and even millennia, down to the Classical Greek period (the Pelasgians).

But the similarity is closer than just one of form: the language of the Funnel Beaker people may have existed on a dialect continuum with the farming peoples of the Mediterranean. That is, Neolithic Europe was probably united by an ethno-cultural linguistic complex similar in scale and quality to that of the Bantus in modern Africa.

One of the hypotheses about the origins of the Vanir is that they were agricultural fertility gods. As it happens many of the hypothesized borrowings of non-Indo-European words into Germanic are of agricultural nature. Additionally, the table within the paper illustrates that many of these words span very different Indo-European language families. The implication is strong that Minoan, Basque, and the pre-Indo-European languages of Northern Europe are genetically related to each other.

Genetics does not illuminate everything, but I do think that it gives a certain solidity now to the nature of demographic turnover and variation in prehistoric Europe. With that in mind archaeologists and folklorists can interpret the mythologies and legends which have been passed down to us from the liminal periods on the edge of history and prehistory.

For example, the thesis that pre-Indo-European religion revolved around cthonic deities of the earth (e.g., the Tuatha de Danann) makes a lot more sense if you believe that these people were agriculturalists. In contrast, the Indo-Europeans from the east arrived as pastoralists, and it is not, therefore, a surprise that the one Indo-European god who has an undisputed cognate across all branches of the Indo-European peoples is the sky god, whether he is known as Zeus, Jupiter, or Dyauṣ Pitār.

September 17, 2012

The great Eurasian explosion

Filed under: Genetics,Genomics,History,Prehistory — Razib Khan @ 8:38 pm

Dr. Joseph Pickrell has updated his preprint, The genetic prehistory of southern Africa, with some more material on the Sandawe. I’ve explored the genetics of the Sandawe a bit using ADMIXTURE, so I jumped straight to the section on ROLLOFF:

…To further examine this, we turned to ROLLOFF. We used Dinka and French as representatives of the mixing populations (since date estimates are robust to improperly specified reference populations). The results are shown in Supplementary Figure S22. Both populations show a detectable curve, though the signal is much stronger in the Sandawe than in the Hadza. The implied dates are 89 generations (2500 years) ago for the Hadza and 66 generations (2000 years) ago for the Sandawe. These are qualitatively similar signals to those seen by Pagani et al. [65] in Ethiopian populations. There are two possible historical scenarios that could lead to these signals: either the Hadza and Sandawe both directly admixed with a western Eurasian population about 2,000 years ago, or they admixed with an east African population that was itself admixed with a western Eurasian population. The latter possibility would be consistent with known east African admixture into the Sandawe [16] .


Pagani et al. refers to the paper Ethiopian Genetic Diversity Reveals Linguistic ...

December 13, 2010

Live not by visualization alone

Synthetic map

In the age of 500,000 SNP studies of genetic variation across dozens of populations obviously we’re a bit beyond lists of ABO blood frequencies. There’s no real way that a conventional human is going to be able to discern patterns of correlated allele frequency variations which point to between population genetic differences on this scale of marker density. So you rely on techniques which extract the general patterns out of the data, and present them to you in a human-comprehensible format. But, there’s an unfortunate tendency for humans to imbue the products of technique with a particular authority which they always should not have.

ResearchBlogging.orgThe History and Geography of Human Genes is arguably the most important historical genetics work of the past generation. It has surely influenced many within the field of genetics, and because of its voluminous elegant visual displays of genetic data it is also a primary source for those outside of genetics to make sense of phylogenetic relations between human populations. And yet one aspect of this great work which never caught on was the utilization of “synthetic maps” to visualize components of genetic variation between populations. This may have been fortuitous, a few years ago a paper was published, Interpreting principal components analyses of spatial population genetic variation, which suggested that the gradients you see on the map above may be artifacts:

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.’s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

A paper earlier this year took the earlier work further and used a series of simulations to show how the nature of the gradients varied. In light of recent preoccupations the results are of interest. Principal Component Analysis under Population Genetic Models of Range Expansion and Admixture:

In a series of highly influential publications, Cavalli-Sforza and colleagues used principal component (PC) analysis to produce maps depicting how human genetic diversity varies across geographic space. Within Europe, the first axis of variation (PC1) was interpreted as evidence for the demic diffusion model of agriculture, in which farmers expanded from the Near East ∼10,000 years ago and replaced the resident hunter-gatherer populations with little or no interbreeding. These interpretations of the PC maps have been recently questioned as the original results can be reproduced under models of spatially covarying allele frequencies without any expansion. Here, we study PC maps for data simulated under models of range expansion and admixture. Our simulations include a spatially realistic model of Neolithic farmer expansion and assume various levels of interbreeding between farmer and resident hunter-gatherer populations. An important result is that under a broad range of conditions, the gradients in PC1 maps are oriented along a direction perpendicular to the axis of the expansion, rather than along the same axis as the expansion. We propose that this surprising pattern is an outcome of the “allele surfing” phenomenon, which creates sectors of high allele-frequency differentiation that align perpendicular to the direction of the expansion.

The first figure shows the general framework with which they performed the simulations:


You have a lattice which consists of demes, population units, all across Europe. They modulated parameters such as population growth (r), carrying capacity (C), and migration (m). Additionally, they had various scenarios of expansion from the southwest or southeast, as well as two expansions one after another to mimic the re-population of Europe after the Ice Age by Paleolithic groups, and their later replacement by Neolithic groups. They modulated admixture and introgression of genes from the Paleolithic group to the Neolithics so that you had the full range where the final European were mostly Neolithic or mostly Paleolithic.

Below are some of the figures which show the results:

allesurAs you can see the strange thing is that in some models the synthetic map gradient is rotated 90 degrees from the axis of demographic expansion! In this telling the famous synthetic map showing Neolithic expansion might be showing expansion from Iberia. Perhaps a radiation from a post-Ice Age southern refuge?

One explanation might be “allele surfing” on the demographic “wave of advance.” Basically as a population expands very rapidly stochastic forces such as random genetic drift and bottlenecks could produce diversification along the edge of the population wave front. The reason for this is that these rapidly expanding populations explode out of serial bottlenecks and demographic expansions, which will produce genetic distinctiveness among the many differentiated demes bubbling along the edge of expansion. Alleles which may have been at low frequency in the ancestral population can “fix” in descendant populations on the edge of the demographic wave of advance. This is the explanation, more or less, that one group gave last year for the very high frequencies of R1b1b2 in Western Europe. With this, they overturned the classic assumption that R1b1b2 was a Paleolithic marker, and suggested it was a Neolithic one.

Here’s their conclusion from the paper:

A previous study showed that the original patterns observed in PCA might not reflect any expansion events (Novembre and Stephens 2008). Here, we find that under very general conditions, the pattern of molecular diversity produced by an expansion may be different than what was expected in the literature. In particular, we find conditions where an expansion of Neolithic farmers from the southeast produces a greatest axis of differentiation running from the southwest to the northeast. This surprising result is seemingly due to allele surfing leading to sectors that create differentiation perpendicular to the expansion axis. Although a lot of our results can be explained by the surfing phenomenon, some interesting questions remain open. For example, the phase transition observed for relatively small admixture rates between Paleolithic resident and Neolithic migrant populations occurs at a value that is dependent on our simulation settings, and further investigations would be needed to better characterize this critical value as a function of all the model parameters. Another unsolved question is to know why the patterns generally observed in PC2 maps for our simulation settings sometimes arise in PC1 maps instead. These unexplained examples remind us that PCA is summarizing patterns of variation in the sample due to multiple factors (ancestral expansions and admixture, ongoing limited migration, habitat boundary effects, and the spatial distribution of samples). In complex models such as our expansion models with admixture in Europe, it may be difficult to tease apart what processes give rise to any particular PCA pattern. Our study emphasizes that PC (and AM) should be viewed as tools for exploring the data but that the reverse process of interpreting PC and AM maps in terms of past routes of migration remains a complicated exercise. Additional analyses—with more explicit demographic models—are more than ever essential to discriminate between multiple explanations available for the patterns observed in PC and AM maps. We speculate that methods exploiting the signature of alleles that have undergone surfing may be a powerful approach to study range expansions.

What’s the big picture here? In the textbook Human Evolutionary Genetics it is asserted that synthetic maps never became very popular compared to PCA itself. I think this is correct. But, the original synthetic maps have become prominent for many outside of genetics. They figure in Peter Bellwood’s First Farmers, and are taken as a given by many pre-historians, such as Colin Renfrew. And yet a reliance on these sorts of tools must not be blind to the reality that the more layers of abstraction you put between your perception and comprehension of concrete reality, the more likely you are to be led astray by quirks and biases of method.

In this case I do think first-order intuition would tell us that synthetic maps which display PCs would be showing gradients as a function of demographic pulses. And yet the intuition may not be right, and with the overturning of old orthodoxies in the past generation of inferences from the variation patterns in modern populations, we should be very cautious.

Citation: Olivier François, Mathias Currat, Nicolas Ray, Eunjung Han, Laurent Excoffier, & John Novembre (2010). Principal Component Analysis under Population Genetic
Models of Range Expansion and Admixture Mol Biol Evol

Powered by WordPress