Razib Khan One-stop-shopping for all of my content

February 4, 2021

Lewontin’s Paradox in the 21st century

Filed under: Population genetics — Razib Khan @ 7:00 pm

Why do species get a thin slice of π? Revisiting Lewontin’s Paradox of Variation:

Under neutral theory, the level of polymorphism in an equilibrium population is expected to increase with population size. However, observed levels of diversity across metazoans vary only two orders of magnitude, while census population sizes (Nc) are expected to vary over several. This unexpectedly narrow range of diversity is a longstanding enigma in evolutionary genetics known as Lewontin’s Paradox of Variation (1974). Since Lewontin’s observation, it has been argued that selection constrains diversity across species, yet tests of this hypothesis seem to fall short of explaining the orders-of-magnitude reduction in diversity observed in nature. In this work, I revisit Lewontin’s Paradox and assess whether current models of linked selection are likely to constrain diversity to this extent. To quantify the discrepancy between pairwise diversity and census population sizes across species, I combine genetic data from 172 metazoan taxa with estimates of census sizes from geographic occurrence data and population densities estimated from body mass. Next, I fit the relationship between previously-published estimates of genomic diversity and these approximate census sizes to quantify Lewontin’s Paradox. While previous across-taxa population genetic studies have avoided accounting for phylogenetic non-independence, I use phylogenetic comparative methods to investigate the diversity census size relationship, estimate phylogenetic signal, and explore how diversity changes along the phylogeny. I consider whether the reduction in diversity predicted by models of recurrent hitchhiking and background selection could explain the observed pattern of diversity across species. Since the impact of linked selection is mediated by recombination map length, I also investigate how map lengths vary with census sizes. I find species with large census sizes have shorter map lengths, leading these species to experience greater reductions in diversity due to linked selection. Even after using high estimates of the strength of sweeps and background selection, I find linked selection likely cannot explain the shortfall between predicted and observed diversity levels across metazoan species. Furthermore, the predicted diversity under linked selection does not fit the observed diversity–census-size relationship, implying that processes other than background selection and recurrent hitchhiking must be limiting diversity.

September 16, 2020

Natural selection continues (in the Viking world)

Filed under: Natural Selection,Population genetics,Scandinavia,Scandinavians — Razib Khan @ 2:47 pm


Nature has published a new Viking genomics paper. This morning I didn’t even bother to check it out, as I had other things going on, and there’s been so much ancient DNA from Scandinavia that my thought was “what else could we learn?” Well, it turns out I should have checked it out. The sample size is large enough that it reinforces and nails home the important point that natural selection in many traits has been continuing across the world.

Population genomics of the Viking world:

The maritime expansion of Scandinavian populations during the Viking Age (about AD 750–1050) was a far-flung transformation in world history1,2. Here we sequenced the genomes of 442 humans from archaeological sites across Europe and Greenland (to a median depth of about 1×) to understand the global influence of this expansion. We find the Viking period involved gene flow into Scandinavia from the south and east. We observe genetic structure within Scandinavia, with diversity hotspots in the south and restricted gene flow within Scandinavia. We find evidence for a major influx of Danish ancestry into England; a Swedish influx into the Baltic; and Norwegian influx into Ireland, Iceland and Greenland. Additionally, we see substantial ancestry from elsewhere in Europe entering Scandinavia during the Viking Age. Our ancient DNA analysis also revealed that a Viking expedition included close family members. By comparing with modern populations, we find that pigmentation-associated loci have undergone strong population differentiation during the past millennium, and trace positively selected loci—including the lactase-persistence allele of LCT and alleles of ANKA that are associated with the immune response—in detail. We conclude that the Viking diaspora was characterized by substantial transregional engagement: distinct populations influenced the genomic makeup of different regions of Europe, and Scandinavia experienced increased contact with the rest of the continent.

The phylogenetic patterns are not surprising at all. I’ve looked at enough Scandinavian genomes from Norway, Sweden, and Denmark, to be able to intuitively figure out the sources of random genomes without a label as long as I know they’re Nordic. The Danes will be south-shifted, the Swedes will be Finn-shifted (unless they’re from the far south across from Denmark), while the Norwegians will be neither. Basically this massive ancient DNA transect just confirms that things such as geographic proximity matters, and, that differential population size matters.

Gene flow from Denmark to Sweden, and from continental Europe into Denmark, is not surprising. This follows naturally from different population sizes, and after extensive Christianization of Denmark, the marriage networks of northern Germany and further south no doubt included Denmark. Perhaps of more interest is confirmation of reflux gene flow from the British Isles into Scandinavia. Some of these individuals may have been slaves, but also likely would be people of mixed background, as was the norm in Iceland Greenland, or even individuals who assimilated into totality to the Scandinavian culture through induction into warbands.

There are lots of details of phylogenomic note. For example, look in the supplements, and it seems that the “Picts” were pretty generic post-Bell Beaker people. Their “mystery” is somewhat solved? On the whole, most of the genomic variation of Northern Europe was established by the Bronze Age, but not all. On the margins, there are subtle and nuanced stories you can tell, and you need a sample size this large to tell that.

The most interesting aspect though is that this dataset confirms what many of us have suspected and seen in other results more tentatively: natural selection on complex traits is reshaping the human genome, in the past, and now. In 2016 Field et al. came out with a paper using pretty intense genomic methods to detect lots of sweeps in the European genome recently, and continuing. The method was persuasive, but the results were perplexing. I didn’t know if they were some strange artifact or not, and when I asked people in that lab at ASHG many of them weren’t sure either. Ancient DNA shows us that these were not artifacts or flukes, the allele frequencies have been changing over the last 2,000 years.

Last year last year I noticed that ancient DNA from the Baltic indicates that these people, the palest in the world using most measures, have gotten more lightly complected since the Iron Age. Noticeably so. If you look at the supplements of this paper the pigmentation loci don’t make it as clear. I think on the whole Vikings would not be visually distinctive from modern Scandinavians. But their statistical method makes it hard to refute that this ancient DNA transect is indicative of a reduction in frequency associated with very dark hair in Scandinavia. The fact that this happened in both the western and eastern Baltic region with culturally distinctive people tells me that some underlying cultural or more likely environmental pressure was being applied.

And, it is clear we don’t know the whole story with lactase persistence. Denmark and southern Sweden have among the highest percentages in the world, and that’s clearly not a function of the deep past, but sweeps continuing down into the present.

Are Scandinavians exceptional? I doubt it. It’s just that the climate and concentration of researchers mean that there is a whole lot of study and analysis of many individuals across Holocene time periods. Rather, think of them as a “model organism.” Evolution isn’t done with our species, not by a long-shot, and though we can detect a lot of selection in the genome…there is very little clarity why the selection is occurring (i.e., what are humans adapting to?).*

* Most human population geneticists seem to be now coming to a consensus that there’s a lot of “soft sweeps” on “standing genetic variation.” Since a lot of these soft sweeps happen at a lot of genomic positions, strong selection for trait x is going to result in side effects on a lot of other traits. The “genetic correlation.”

September 8, 2020

It’s raining founder events

Filed under: Founder Events,Population genetics — Razib Khan @ 8:23 pm
Click to enlarge

There’s a new preprint on bioRxiv that is very interesting, Reconstructing the history of founder events using genome-wide patterns of allele sharing across individuals:

…To learn about the frequency and evolutionary history of founder events, we introduce ASCEND (Allele Sharing Correlation for the Estimation of Non-equilibrium Demography), a flexible two-locus method to infer the age and strength of founder events. This method uses the correlation in allele sharing across the genome between pairs of individuals to recover signatures of past bottlenecks. By performing coalescent simulations, we show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios, with genotype or sequence data. We apply ASCEND to ~5,000 worldwide human samples (~3,500 present-day and ~1,500 ancient individuals), and ~1,000 domesticated dog samples. In both species, we find pervasive evidence of founder events in the recent past. In humans, over half of the populations surveyed in our study had evidence for a founder events in the past 10,000 years, associated with geographic isolation, modes of sustenance, and historical invasions and epidemics. We document that island populations have historically maintained lower population sizes than continental groups, ancient hunter-gatherers had stronger founder events than Neolithic Farmers or Steppe Pastoralists, and periods of epidemics such as smallpox were accompanied by major population crashes. Many present-day groups–including Central & South Americans, Oceanians and South Asians–have experienced founder events stronger than estimated in Ashkenazi Jews who have high rates of recessive diseases due to their history of founder events. In dogs, we uncovered extreme founder events in most groups, more than ten times stronger than the median strength of founder events in humans. These founder events occurred during the last 25 generations and are likely related to the establishment of dog breeds during Victorian times. Our results highlight a widespread history of founder events in humans and dogs, and provide insights about the demographic and cultural processes underlying these events.

This method is pretty cool because it scales and works on non-phased data (good luck phasing a lot of low coverage of ancient DNA!). Through simulation and comparison to earlier results, the authors show that ASCEND does a good job estimating

1) the timing of a founder event

2) the intensity of a founding event

One of my hobby-horses is that Ashkenazi Jews aren’t really that inbred or bottlenecked a group. They’ve been extensively studied, so there’s a laser-like focus on their population and medical genetics. Importantly, they also have a recessive disease load, usually attributed to their endogamy and small effective population size. Studying Ashkenazi Jewish genetics is easy if you think of it in grant terms since there are diseases that are well known you can focus on.

But one of the results in this preprint, which aligns with other earlier published work, is that there are many groups far more homogeneous due to extreme founder events/endogamy than Ashkenazi Jews. Some of the outcomes are not surprising. Lots of South Asian groups seem to be extremely homogeneous due to endogamy and small founding populations, though today many of them number in the millions. The strong implication from these results is that they carry a lot of deleterious recessive allele load.

The other groups are not surprising. Islanders, hunter-gatherers in marginal habits. Basically, populations artificially prevented from gene flow, or, those subject to strong cultural barriers.

The method not only estimates the intensity of the founder events but also the period. Many of the results are totally explicable. Many Northern Europeans seem to have founding populations that go back to the Corded Ware expansion. The founding of the Basque dates to the Roman Empire. Why? I think a reasonable hypothesis that for whatever reason this is when the ancient Aquitani emerged as an exclusive ethnocultural group, as opposed to Romanizing like their Iberian and Celtiberian neighbors.

Probably the most interesting result for me is one that is obvious in much of the data, but hasn’t been analyzed as thoroughly before: ancient European hunter-gatherers had very small effective populations due to narrow founder events. The question is: is this true in general for pre-agricultural people? Many anthropologists have argued that large agglomerations of sedentary populations were more common before the Holocene than we might think, and modern hunter-gatherers are biased samples (they occupy marginal territory).

As we obtain more ancient DNA that question will be answered in the generality. Over ten years ago Hawks et al. argued that large populations resulted in faster adaptation. Whatever details one might quibble within their model, I think the results from ancient DNA raise the possibility of the greater relative efficacy of selection (due to weaker drift) and more population connectedness allowing for easier flow of beneficial alleles.

The software is already available. I’m going to take it for a test drive…

August 24, 2020

Who do the English think they are?

Filed under: Anglo-Saxons,Fall of Rome,Genetics,History,Population genetics — Razib Khan @ 3:04 pm

In the early 5th century the Roman legions abandoned Britain, and the sceptered isle fell off the pages of history. When it reemerges two centuries later Celtic Britain had become the seedbed for the nation-state of England. The Christian religion, newly-established on the island at the time, had given way once again to paganism. Brythonic Celtic speech was ascendant only on the fringes. A cacophony of German dialects spread out across the fertile south and east, radiating out of the “Saxon Shore”.

This ethno-religious transformation of the island occurred under the shadow of semi-history, allowing for the development of an imaginative romantic tradition exemplified by the Arthurian Cycle. But this Dark Age also became a bone of contention between the English who saw themselves as deeply rooted in the land, and those who declared that they were a German folk who had won their new home through conquest and blood. The dominant view at any given time reflected social and political events of the 20th century more than facts. The propaganda value of myth meant more than the conjectures of scholars.

While in the early 20th century the dominant position was that the English were a people akin to German Saxons, a race apart from the Welsh, by the early 21st century serious scholars assumed that the spread of Anglo-Saxon culture occurred through imitation rather than replacement.

Today we can say with some confidence that neither stark view is correct, and that the middle path between is far more interesting and complex. Large numbers of Saxons, Angles and Jutes did in fact cross the North Sea — but the preponderance of England’s heritage still draws from the Celtic-speaking peoples. It is not coincidence that the earliest rulers in Alfred the Great’s lineage bear Celtic names, not German ones.

Those who argued for the erasure of the Celtic people did not do so without any basis. St. Gildas, a 6th century British Celt, recounted in On the Ruin and Conquest of Britain the defeat and destruction of his people at the hands of the Saxons. More recently, 19th century philologists observed that the number of Brythonic Celtic loan words in English is extremely small; in fact, there may be more Celtic loan words from Gaulish, due to the later Norman French influence. Finally, the collapse of institutions like the Roman Christian Church and the total decay of urban life indicates incredible disruption of the social hierarchy which characterised post-Roman Britain.

A contrast here exists with Gaul, which absorbed a German-speaking elite but retained Roman language and religion. Some of the nobility of southern and western France even traced their descent from Romans, not Germans. On a more demotic level, British archaeologists have also observed that the arrival of the Saxons seems to have been associated with a transformation of the layout of rural farmsteads. In most societies, farmers have customs and traditions which they hew to, and they are often quite stubborn and set in their ways. Such a change indicates new people, not just practices.

But by the late 20th century such views of cultural and demographic disruption were in bad odour. The dominant ethos is that people did not move, their customs and traditions did. Hengist and Horsa may have existed, but rather than a folk migration the Anglo-Saxon conquest was one of a small number of German mercenaries who were engaged in elite capture of the post-Roman peasantry.

The Welsh historian Norman Davies observed in his 1999 book The Isles that “blood price” in 8th century Wessex differed between whether one was Saxon or British, the implication here being that there were many Celtic Britons living in the Anglo-Saxon lands, even if our documentary evidence is from the Saxon elite; this would tally with the 6th century ancestors of Alfred the Great having names such as Ceawlin, Cynegils and Cerdic, all of which have a distinctive Welsh flavor.

Genetics has untangled the Gordian knot of this semi-historical mystery, although illumination has not come at once, and only in fits and starts. One of the primary reasons is that the genetic difference between “Celtic” and “German” peoples is very small. Most Northern Europeans separated from each other very recently. Ancient DNA from between three and eight thousand years ago shows that Northern Europe underwent several mass migrations which transformed the genetic landscape.

First, the blue-eyed dark-skinned hunter-gatherers who descend from Ice Age Europeans disappeared and were absorbed by brown-eyed pale-skinned farmers who moved north out of the Near East. Then, these agriculturists were themselves overwhelmed by a people who migrated out of the Eurasian steppe into Europe 5,000 years ago. These pastoralist people probably brought Indo-European languages, and 4,500 years ago they arrived in Britain as the Bell Beaker culture. Within a few generations there was 90% genetic turnover, as the farmers who first erected Stonehenge disappeared, and were replaced by people who seem to have arrived from what is today northern Germany, possibly prefiguring the later Anglo-Saxon migration.

The problem from the perspective of genetics in understanding the proportion of Anglo-Saxon ancestry in the modern English goes back to the reality that Germans and Celts themselves had only been separated for 3,000 years, at most. These are genetically very close populations, and the technology of the early 21st century could not resolve the questions being asked.

UCL geneticist Steve Jones did attempt such a thing in his 2003 book Y: The Descent of Men. Jones observed that the distribution of two Y chromosomal lineages exhibits a sharp break at Offa’s Dyke. A far higher proportion of Welsh men are R1b, which is very common across the Atlantic facade of Europe, while more English men carry R1a, which is found in higher frequencies in Germany and Norway. In contrast, Professor Jones observed that there was no difference in the maternal heritage of the Welsh and English, suggesting that the ethnic change was due to the impact of men. Jones’s UCL colleague Mark Thomas later developed an “apartheid model” to explain why the genetic difference between the English and Welsh was so striking.

But the true understanding of the situation could only be obtained by looking across the whole genome, not simply the paternal and maternal lineages. This was done by the Peopling of the British Isles Project, which published a paper in 2015 that drew from analysis on hundreds of thousands of genetic markers from 2,000 British individuals who were sampled from all across the United Kingdom.

They estimated that 10-40% of the ancestry in central and southern England was Anglo-Saxon — that is, DNA segments more similar to the Germans than the Welsh. Another paper from 2016, utilising ancient as well as contemporary DNA, estimated that 38% of the ancestry in the “East English” — people from East Anglia and the East Midlands — is derived from the Anglo-Saxons. These researchers actually found DNA from Dark Age-era graves identified as Anglo-Saxon, and some of these individuals were far more like the Germans in their DNA than the modern English; they differed from earlier Iron Age samples, proving beyond a doubt that a significant number of Germans did cross the North Sea in the 6th century.

Where does this leave us in relation to the question of whether the transformation of Dark Age Britain to early medieval England was one of genes or memes? The clear answer seems to be both. The emergence of a new style of farming, pottery and the collapse of urban Roman civilization and Christianity in eastern Britain was not simply due to the prestige and power of a small number of German warlords. Whole villages must have transplanted themselves across the North Sea, creating the nucleus of a new people, and absorbed the remaining British Celts. The lack of Celtic loanwords and the adoption of Saxon peasant culture may indicate the self-confidence of the newcomers. If St. Gildas is correct, the British elites moved to the west of the island, leaving the common people to their own devices.

But though the southern and eastern fringe of England has a substantial Anglo-Saxon demographic imprint, that fades out as one moves to the west, including to the lands that once comprised the kingdom of Wessex. There is far less German genetic influence in Hampshire, Berkshire or Wiltshire, let alone Devon. We know from early medieval records that Celtic language speakers did exist as late as the 8th century in these domains (and much later in Devon) but by then Old English, which is for all purposes a purely Germanic language, was dominant.

The genealogy of the House of Wessex may offer a clue as to what occurred in broad swaths of western England. In the 6th century Celtic names imply that this elite lineage was identified with British culture, and looked west, but by the 7th German names became common, and the kings were pagan. Though the Saxons may have imposed their way of life through sheer numbers in the east, explaining the light impact of British Celtic culture upon their folkways and language, their expansion beyond the Saxon Shore seems to have been due to the adoption of the German identity by native British. The killing of a Celtic-speaking individual under the Saxon system of blood price was far cheaper than for a German speaker, serving as a clear inducement to assimilate.

What science makes clear then is that both extreme scenarios presented in the 19th and 20th centuries were wrong. The English are not a race apart from the Welsh. The modern English are genetically closest to the Celtic peoples of the British Isles, but the modern English are not simply Celts who speak a German language. A large number of Germans migrated to Britain in the 6th century, and there are parts of England where nearly half the ancestry is Germanic.

These folk served as the focus of a cultural revolution that transformed the British Isles. It was not a passive affair: the cities, churches, and hamlets of the previous inhabitants were blotted out, and what had been one of the provinces of the Roman Empire became a backwater pagan land. Though the original Romano-British elites had some knowledge of Latin, and patronised the Christian Church, the patina of civilization was clearly thin upon them, and the loosely Christian Celtic warlords of Dark Age western Britain transformed seamlessly into the pagan kings of Anglo-Saxon England.

The initial founding of the Saxon Shore was surely based on a level of brutality that Christian priests, if any had lived to tell the tale, would have recorded with foreboding. But the transformation of vast swaths of western Britain into the core of what had become England by the Viking Age occurred consensually, so seductive had the Saxon society become to the Celts, highborn and low.

The lesson that history and genetics teach us that cultural change is a complex phenomenon, and a single factor does not explain the whole story. Today we live in an age of migration, and native peoples fear being replaced, while immigrant communities fear being assimilated. Numbers matter, but the Saxons tell us that numbers are not everything.

Who do the English think they are?

Filed under: Anglo-Saxons,Fall of Rome,Genetics,History,Population genetics — Razib Khan @ 3:04 pm

In the early 5th century the Roman legions abandoned Britain, and the sceptered isle fell off the pages of history. When it reemerges two centuries later Celtic Britain had become the seedbed for the nation-state of England. The Christian religion, newly-established on the island at the time, had given way once again to paganism. Brythonic Celtic speech was ascendant only on the fringes. A cacophony of German dialects spread out across the fertile south and east, radiating out of the “Saxon Shore”.

This ethno-religious transformation of the island occurred under the shadow of semi-history, allowing for the development of an imaginative romantic tradition exemplified by the Arthurian Cycle. But this Dark Age also became a bone of contention between the English who saw themselves as deeply rooted in the land, and those who declared that they were a German folk who had won their new home through conquest and blood. The dominant view at any given time reflected social and political events of the 20th century more than facts. The propaganda value of myth meant more than the conjectures of scholars.

While in the early 20th century the dominant position was that the English were a people akin to German Saxons, a race apart from the Welsh, by the early 21st century serious scholars assumed that the spread of Anglo-Saxon culture occurred through imitation rather than replacement.

Today we can say with some confidence that neither stark view is correct, and that the middle path between is far more interesting and complex. Large numbers of Saxons, Angles and Jutes did in fact cross the North Sea — but the preponderance of England’s heritage still draws from the Celtic-speaking peoples. It is not coincidence that the earliest rulers in Alfred the Great’s lineage bear Celtic names, not German ones.

Those who argued for the erasure of the Celtic people did not do so without any basis. St. Gildas, a 6th century British Celt, recounted in On the Ruin and Conquest of Britain the defeat and destruction of his people at the hands of the Saxons. More recently, 19th century philologists observed that the number of Brythonic Celtic loan words in English is extremely small; in fact, there may be more Celtic loan words from Gaulish, due to the later Norman French influence. Finally, the collapse of institutions like the Roman Christian Church and the total decay of urban life indicates incredible disruption of the social hierarchy which characterised post-Roman Britain.

Suggested reading
Britain's first Brexit was the hardest

By Tom Holland

A contrast here exists with Gaul, which absorbed a German-speaking elite but retained Roman language and religion. Some of the nobility of southern and western France even traced their descent from Romans, not Germans. On a more demotic level, British archaeologists have also observed that the arrival of the Saxons seems to have been associated with a transformation of the layout of rural farmsteads. In most societies, farmers have customs and traditions which they hew to, and they are often quite stubborn and set in their ways. Such a change indicates new people, not just practices.

But by the late 20th century such views of cultural and demographic disruption were in bad odour. The dominant ethos is that people did not move, their customs and traditions did. Hengist and Horsa may have existed, but rather than a folk migration the Anglo-Saxon conquest was one of a small number of German mercenaries who were engaged in elite capture of the post-Roman peasantry.

The Welsh historian Norman Davies observed in his 1999 book The Isles that “blood price” in 8th century Wessex differed between whether one was Saxon or British, the implication here being that there were many Celtic Britons living in the Anglo-Saxon lands, even if our documentary evidence is from the Saxon elite; this would tally with the 6th century ancestors of Alfred the Great having names such as Ceawlin, Cynegils and Cerdic, all of which have a distinctive Welsh flavor.

Genetics has untangled the Gordian knot of this semi-historical mystery, although illumination has not come at once, and only in fits and starts. One of the primary reasons is that the genetic difference between “Celtic” and “German” peoples is very small. Most Northern Europeans separated from each other very recently. Ancient DNA from between three and eight thousand years ago shows that Northern Europe underwent several mass migrations which transformed the genetic landscape.

First, the blue-eyed dark-skinned hunter-gatherers who descend from Ice Age Europeans disappeared and were absorbed by brown-eyed pale-skinned farmers who moved north out of the Near East. Then, these agriculturists were themselves overwhelmed by a people who migrated out of the Eurasian steppe into Europe 5,000 years ago. These pastoralist people probably brought Indo-European languages, and 4,500 years ago they arrived in Britain as the Bell Beaker culture. Within a few generations there was 90% genetic turnover, as the farmers who first erected Stonehenge disappeared, and were replaced by people who seem to have arrived from what is today northern Germany, possibly prefiguring the later Anglo-Saxon migration.

The problem from the perspective of genetics in understanding the proportion of Anglo-Saxon ancestry in the modern English goes back to the reality that Germans and Celts themselves had only been separated for 3,000 years, at most. These are genetically very close populations, and the technology of the early 21st century could not resolve the questions being asked.

Suggested reading
Britain's divisions go way, way back

By Ed West

UCL geneticist Steve Jones did attempt such a thing in his 2003 book Y: The Descent of Men. Jones observed that the distribution of two Y chromosomal lineages exhibits a sharp break at Offa’s Dyke. A far higher proportion of Welsh men are R1b, which is very common across the Atlantic facade of Europe, while more English men carry R1a, which is found in higher frequencies in Germany and Norway. In contrast, Professor Jones observed that there was no difference in the maternal heritage of the Welsh and English, suggesting that the ethnic change was due to the impact of men. Jones’s UCL colleague Mark Thomas later developed an “apartheid model” to explain why the genetic difference between the English and Welsh was so striking.

But the true understanding of the situation could only be obtained by looking across the whole genome, not simply the paternal and maternal lineages. This was done by the Peopling of the British Isles Project, which published a paper in 2015 that drew from analysis on hundreds of thousands of genetic markers from 2,000 British individuals who were sampled from all across the United Kingdom.

They estimated that 10-40% of the ancestry in central and southern England was Anglo-Saxon — that is, DNA segments more similar to the Germans than the Welsh. Another paper from 2016, utilising ancient as well as contemporary DNA, estimated that 38% of the ancestry in the “East English” — people from East Anglia and the East Midlands — is derived from the Anglo-Saxons. These researchers actually found DNA from Dark Age-era graves identified as Anglo-Saxon, and some of these individuals were far more like the Germans in their DNA than the modern English; they differed from earlier Iron Age samples, proving beyond a doubt that a significant number of Germans did cross the North Sea in the 6th century.

Where does this leave us in relation to the question of whether the transformation of Dark Age Britain to early medieval England was one of genes or memes? The clear answer seems to be both. The emergence of a new style of farming, pottery and the collapse of urban Roman civilization and Christianity in eastern Britain was not simply due to the prestige and power of a small number of German warlords. Whole villages must have transplanted themselves across the North Sea, creating the nucleus of a new people, and absorbed the remaining British Celts. The lack of Celtic loanwords and the adoption of Saxon peasant culture may indicate the self-confidence of the newcomers. If St. Gildas is correct, the British elites moved to the west of the island, leaving the common people to their own devices.

But though the southern and eastern fringe of England has a substantial Anglo-Saxon demographic imprint, that fades out as one moves to the west, including to the lands that once comprised the kingdom of Wessex. There is far less German genetic influence in Hampshire, Berkshire or Wiltshire, let alone Devon. We know from early medieval records that Celtic language speakers did exist as late as the 8th century in these domains (and much later in Devon) but by then Old English, which is for all purposes a purely Germanic language, was dominant.

Suggested reading
It wasn't the Berlin Wall that divided Germany

By James Hawes

The genealogy of the House of Wessex may offer a clue as to what occurred in broad swaths of western England. In the 6th century Celtic names imply that this elite lineage was identified with British culture, and looked west, but by the 7th German names became common, and the kings were pagan. Though the Saxons may have imposed their way of life through sheer numbers in the east, explaining the light impact of British Celtic culture upon their folkways and language, their expansion beyond the Saxon Shore seems to have been due to the adoption of the German identity by native British. The killing of a Celtic-speaking individual under the Saxon system of blood price was far cheaper than for a German speaker, serving as a clear inducement to assimilate.

What science makes clear then is that both extreme scenarios presented in the 19th and 20th centuries were wrong. The English are not a race apart from the Welsh. The modern English are genetically closest to the Celtic peoples of the British Isles, but the modern English are not simply Celts who speak a German language. A large number of Germans migrated to Britain in the 6th century, and there are parts of England where nearly half the ancestry is Germanic.

These folk served as the focus of a cultural revolution that transformed the British Isles. It was not a passive affair: the cities, churches, and hamlets of the previous inhabitants were blotted out, and what had been one of the provinces of the Roman Empire became a backwater pagan land. Though the original Romano-British elites had some knowledge of Latin, and patronised the Christian Church, the patina of civilization was clearly thin upon them, and the loosely Christian Celtic warlords of Dark Age western Britain transformed seamlessly into the pagan kings of Anglo-Saxon England.

The initial founding of the Saxon Shore was surely based on a level of brutality that Christian priests, if any had lived to tell the tale, would have recorded with foreboding. But the transformation of vast swaths of western Britain into the core of what had become England by the Viking Age occurred consensually, so seductive had the Saxon society become to the Celts, highborn and low.

The lesson that history and genetics teach us that cultural change is a complex phenomenon, and a single factor does not explain the whole story. Today we live in an age of migration, and native peoples fear being replaced, while immigrant communities fear being assimilated. Numbers matter, but the Saxons tell us that numbers are not everything.

The post Who do the English think they are? appeared first on UnHerd.

May 11, 2020

Selection for pigmentation loci…but not pigmentation?

Filed under: Pleiotropy,Population genetics,Population genomics — Razib Khan @ 1:00 am


About a year and a half ago at ASHG, I had a discussion with Dan Ju and Iain Mathieson about their work on ancient pigmentation. Or, more precisely, ancient pigmentation related genes. Now it’s out in a preprint, The evolution of skin pigmentation associated variation in West Eurasia:

…It is unclear whether selection has operated on all the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1158 individuals from West Eurasia covering a period of 40,000 years combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on skin pigmentation variants ascertained in the UK Biobank, but find this signal is driven mostly by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants; rather, only the top five show strong evidence of selection. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.

There are a lot of moving parts in this preprint. Look closely, and you will notice that the authors are careful to stipulate that they can’t really infer the pigmentation of ancient peoples, only the alleles ascertained in modern populations. This matters, because naive deployments of polygenic risk score models trained on modern populations projected on ancient ones seem highly suspect. I’m thinking here mostly of the “Cheddar Man is black” meme. It is true that using modern SNP batteries Mesolithic Europeans are predicted to be rather dark-skinned, but higher latitude humans tend to be paler, on average, than lower latitude humans (albeit, not as pale as the typical Northern European!). But, we can be sure about the alleles we do know about, and, their likely effect (the functional understanding of these pathways is pretty good).

The best modern genetic analyses of pigmentation suggest that variation is dominated by some large-effect loci, but that there is a large residual of smaller-effect loci segregating within the population (I’ve seen 50% accounted for with SNPs, and 50% as “ancestry”, which really masks small-effect QTLs). This is in contrast with the architecture in height, where there are few large-effect loci, and almost all of the variance is small-effect loci. What Ju et al. confirm is that selection “for pigmentation” is due to the large-effect loci; there’s no polygenic selection detectable on the smaller-effect loci for the ancient populations. Importantly, the change in allele frequency isn’t just due to admixture. It’s also due to selection after admixture.

I use quotes above because honestly, I think these sorts of results make it unclear what the selection was for. The general prior is conditioned on the fact that even after a few decades we still think of EDAR as a hair-thickness gene, but it’s one of the strongest signals of selection in the human genome. The “light” allele in SLC24A5 is at an incredibly high frequency in Europe today, and has increased in the last 4,000 years. Though this SNP is impactful for the complexion, it’s hard to imagine how strong selection must be to drive it from 95% to 99.5% (as per 2005 paper on this SNP, the “light” allele exhibits some phenotypic dominance).

As noted in the preprint, there’s not enough data on other regions of the world. It’s hard to assess what’s going on Europe without assessing other regions. The authors do present an intriguing suggestion: that lighter pigmentation in East Asia is driven by smaller-effect genes shifted through polygenic selection.

I’ll present a strange hypothesis: selection for lighter skin at high latitudes through polygenic selection on standing variation naturally takes populations to the coloring of Northeast Asians. But very light complexion, as you see in Northern Europe, could be due to strong selection on the large-effect pigmentation genes, and pigmentation itself may simply be a side effect due to a genetic correlation with the true target of selection.

April 13, 2020

Assessing the utility of models in ancient DNA admixture analyses

Filed under: Population genetics — Razib Khan @ 1:43 am

Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture:

qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and non-human) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm.

The Reich lab provides its software and data. It’s really not that to replicate and tweak some of the analyses they do in their papers (check the supplements for the detailed specifications of the parameters). I’ve done many times when I got curious about a detail they hadn’t explored.

The preprint above is a valuable addition to the intuitions one can develop through using the packages.

February 6, 2020

The Knanaya of Kerala do seem a bit more Near Eastern than other St. Thomas Christians

Filed under: Population genetics,St. Thomas Christians — Razib Khan @ 11:51 pm
Click to enlarge

Last year I was approached by someone from the Knanaya community of South India as to their genetics. The Knanaya believe themselves to be descendants of later Near Eastern migrants than the other Nasrani St. Thomas Christians (both communities seem to believe in some connection to Near Eastern Jews). The history of these communities is complex, but they are rooted in the Oriental Orthodox Christianity of Iraq and the Levant. You might be curious to note that the largest number of individuals associated with the Syrian Orthodox Church are South Indians.

For some context, I’d recommend The Lost History of Christianity by Philip Jenkins.

With some preliminary analysis, it did seem like the Knanaya community was enriched for Near Eastern ancestry, even compared to the other Nasrani samples I had. Recently I’ve been given a total of 11 samples of Knanaya, so I decided to do some further analysis (2 of the individuals seem somewhat related, so they are not independent data points).

If you look at the plot above, you can see that the y-axis is PC 3. This separates Northern European samples (Belorussians and Lithuanians) at one end and Yemeni Jews at the other. Groups such as Armenians are in the middle. You can see that some groups, such as the Mumbai Jews (Bene Israel), Cochin Jews, and, the Knanaya, do seem shifted toward the Yemeni Jews. Groups in the Levant and minorities like Assyrians are usually about 2/3 “northern” and 1/3 “southern” in ancestry.

To get a better sense of that, take a look at the Admixture barplot below.

Click to enlarge

This is a supervised run with several reference populations. The light blue are Yemeni Jews, and you can see quite clearly that the Knanaya show evidence of this ancestry, while most other Indian populations do not.

To get a sense of the ratio of northern Middle Eastern vs. southern Middle Eastern, here are the results for the Druze:

TreeMix is a little more ambivalent:

Click to enlarge

The flow to the Cochin and Mumbai Jewish groups is clearer (or from in the latter case). I think the history of the Kerala Christians, and the Knanaya in particular, is more complex.

I’ll probably run some more stats tomorrow to see what the best donor population is…

January 31, 2020

The details of Eurasian back-migration into Africa

Filed under: Africa,Neanderthal,Population genetics — Razib Khan @ 3:34 pm

Carl Zimmer has an interesting write-up on the new method to detect Neanderthal ancestry in Africa, Neanderthal Genes Hint at Much Earlier Human Migration From Africa. There are two quotes from researchers that are of note.

First, from David Reich:

Despite his hesitation over the analysis of African DNA, Dr. Reich said the new findings do make a strong case that modern humans departed Africa much earlier than thought.

“I was on the fence about that, but this paper makes me think it’s right,” he said.

It’s possible that humans and Neanderthals interbred at other times, and not just 200,000 years ago and again 60,000 years ago. But Dr. Akey said that these two migrations accounted for the vast majority of mixed DNA in the genomes of living humans and Neanderthal fossils.

Over the years I have had several discussions with members of the Reich lab about whether there was a major migration of the antecedent lineage of modern humans before the one that we detect 60,000 years ago. Many were quite skeptical because of the lack of clear genetic signal of anything before 60,000 years ago, as well as its correlation with a strong archaeological record. But, it seems now that David Reich at least is convinced that the evidence of admixture into Neanderthals means that there were descendants of the same lineage that led to the major “Out of Africa” expansion 60,000 years ago who had spread earlier (though the footprint was small, and their impact on later humans difficult to detect).

Second, Sarah Tishkoff says something that I forgot to mention in my earlier post:

Sarah Tishkoff, a geneticist at the University of Pennsylvania, is doing just that, using the new methods to look for Neanderthal DNA in more Africans to test Dr. Akey’s hypothesis.

Still, she wonders how Neanderthal DNA could have spread between populations scattered across the entire continent.

The second part isn’t that inexplicable. In the paper, they mention that they don’t have the power to analyze small sample numbers. So they focused on the 1000 Genomes samples, which are from West and East Africa. From agriculturalist and agro-pastoralist populations. If you listen to this week’s episode of The Insight Spencer and I talk extensively about the recent agriculturally mediated expansions within Africa. Much of the genetic landscape of the continent is novel, new, and of short historical time-depth. The Africa of Old Kingdom Egypt, 4,500 years ago, was very different.

As hinted by Tishkoff the key is going to be when we get samples from hunter-gatherers. Some of these have much lower Eurasian affinities, and likely they’ll carry less Neanderthal ancestry.

On a final note, this paper and the first author, Joshua Akey, hints at some resolution in the interminable disagreement about continuous gene flow vs. pulse admixture. Some of the methods to infer and detect admixture assume pulse admixture, and so our conception of the past has been skewed. On the other hand, I think it is plausible that in a patchy low population density Paleolithic landscape continuous gene flow may have been quite attenuated over long distances. Admixture then would occur when there were cultural revolutions and long-distance contact for short periods of time, before an equilibration. Basically, it’s some of both.

December 15, 2019

If you meed a model, kill it!

Filed under: Locator,Machine Learning,Population genetics — Razib Khan @ 3:19 pm

If you are awake in the year 2019 you have heard of “machine learning.” And, if you listened to my podcast The Insight you know that Andy Kern’s lab at University of Oregon is leveraging machine learning (and “deep learning” and “neural networks”) for population genetics.

Now, obviously in population genetics, you know that models are a big deal. The Hardy-Weinberg model. The coalescent. Various models of selection against which you can test data. This is not a coincidence. In the 20th-century population genetics was a data-poor field and a lot of work was done in the theoretical space since that’s where the work could be done (here’s to you two-locus models of selection from the 1970s!).

In the 2000s genomics transformed the landscape. All of a sudden there was a surfeit of data. On the one hand, this meant that there was a lot of material for models to work on. On the other hand, it turns out that some models aren’t too scalable to big data, nor do they turn out to be very robust (one reason for the persistence of single-locus phylogenetic models around mtDNA and Y is their elegant tractability).

This is where a “bottom-up” machine learning approach comes into the picture. Kern’s group just came out with new a preprint I’ve been hearing about for a while, Predicting Geographic Location from Genetic Variation with Deep Neural Networks:

Most organisms are more closely related to nearby than distant members of their species, creating spatial autocorrelations in genetic data. This allows us to predict the location of origin of a genetic sample by comparing it to a set of samples of known geographic origin. Here we describe a deep learning method, which we call Locator, to accomplish this task faster and more accurately than existing approaches. In simulations, Locator infers sample location to within 4.1 generations of dispersal and runs at least an order of magnitude faster than a recent model-based approach. We leverage Locator’s computational efficiency to predict locations separately in windows across the genome, which allows us to both quantify uncertainty and describe the mosaic ancestry and patterns of geographic mixing that characterize many populations. Applied to whole-genome sequence data from Plasmodium parasites, Anopheles mosquitoes, and global human populations, this approach yields median test errors of 16.9km, 5.7km, and 85km, respectively.

Reads of this weblog can jump to the empirical examples of the HGDP. They make sense, and I especially liked the local ancestry deconvolution analysis and variation in predictive power conditional on recombination.

Sometimes quantity has a quality all its own, and the eye-opening aspect of locator is how it can test a lot of propositions quickly (this is more important in the era of WGS datasets).  It’s no joke that dispensing with a model can speed things up.

One minor element I’ll note is that getting locator installed is not trivial from what I have seen. Especially the tensorFlow dependency. So I’ll probably have more updates once I get it up and running myself.

November 1, 2019

If marrying cousins is so bad why does everyone want to marry their cousins?

Filed under: Population genetics — Razib Khan @ 8:47 pm

The above figure illustrates the geographic distribution of the prevalence of people marrying people closely related to them. Mostly this involves cousin marriage. Most people know the urban legends around the debilities that occur due to cousin marriage, but traditionally the focus has been on rare recessive diseases (e.g., albinism). Now, a massive new study has been published (more than 400 authors, with sample sizes for 1 million or more for some characteristics) looking at a variety of traits, Associations of autozygosity with a broad range of human phenotypes:

In many species, the offspring of related parents suffer reduced reproductive success, a phenomenon known as inbreeding depression. In humans, the importance of this effect has remained unclear, partly because reproduction between close relatives is both rare and frequently associated with confounding social factors. Here, using genomic inbreeding coefficients (FROH) for >1.4 million individuals, we show that FROH is significantly associated (p < 0.0005) with apparently deleterious changes in 32 out of 100 traits analysed. These changes are associated with runs of homozygosity (ROH), but not with common variant homozygosity, suggesting that genetic variants associated with inbreeding depression are predominantly rare. The effect on fertility is striking: FROH equivalent to the offspring of first cousins is associated with a 55% decrease [95% CI 44–66%] in the odds of having children. Finally, the effects of FROH are confirmed within full-sibling pairs, where the variation in FROH is independent of all environmental confounding.

The offspring of first cousins have on average 0.10 fewer children. On an individual level, this is not that great of an effect. But in an evolutionary population genetics sense this is a serious selection coefficient.

On the whole, the paper is impressive in its scope. There are even sibling analyses to confirm the impact of runs of homozygosity causing problems due to rare alleles (since this paper involved r.o.h, of course, Jim Wilson is involved!).

Rather, I want to ask: if inbreeding is so bad genetically and biologically, why is it so common? One of the consequences of the Protestant Reformation is that the Roman Catholic Church’s strict enforcement of consanguinity rules were dropped, and cousin marriage became much more common among elites (such as the Darwin-Wedgewood family). The material rationale for cousin marriage is actually rather straightforward, in that it keeps accumulated property and power within the extended lineage. Marriages between children of brothers may cement alliances, while matrilocality and marriages between cross-cousins in South India have been associated with lower domestic abuse rates (in contrast, in North India strongly enforced exogamy has been associated with the idea that women marry into an alien household).

I would suggest perhaps that though marriages between relatives are biologically disfavored, there are many cases where it is culturally beneficial. In societies where collective family units engage in inter-group competition, some level of consanguinity may benefit cohesion. Other societies where individualism is more operative may exhibit no such incentives.

Note: I don’t see great evidence of purging genetic load in populations with more inbreeding. The rare variants are probably replenished constantly through mutation?

September 28, 2019

Phenotype does not imply admixture

Filed under: Population genetics — Razib Khan @ 5:26 pm

One of the questions I often get relate to whether “trait X comes from population Y and does that mean if one has trait X that one has more ancestry from population Y.” To give an illustration, I have had people ask “I have blue eyes, does that mean I am more ‘Western Hunter-Gather’ than other people?”

One issue is that though the WHG tended toward high frequency of the derived OCA2-HERC2 haplotype, other populations clearly carried it, the other is that admixture is so far in the past that having blue or brown eyes is not informative to any degree of ancestry. There were probably relict populations of WHG less than 4,000 years ago (David has mentioned of a sample less than 3,000 years ago in Scandinavia), but the admixture of WHG into other groups was very long ago. More than 1,500 generations ago. To a great extent, it seems plausible that even within populations variation in ancestral fractions should be marginal to non-existent.

But this is a verbal model. A new preprint on bioRxiv has posted a formal model that outlines the different parameters that shape the trajectory of this decoupling between phenotype and ancestry. Assortative mating and the dynamical decoupling of genetic admixture levels from phenotypes that differ between source populations:

Source populations for an admixed population can possess distinct patterns of genotype and phenotype at the beginning of the admixture process. Such differences are sometimes taken to serve as markers of ancestry—that is, phenotypes that are initially associated with the ancestral background in one source population are taken to reflect ancestry in that population. Examples exist, however, in which genotypes or phenotypes initially associated with ancestry in one source population have decoupled from overall admixture levels, so that they no longer serve as proxies for genetic ancestry. We develop a mechanistic model for describing the joint dynamics of admixture levels and phenotype distributions in an admixed population. The approach includes a quantitative-genetic model that relates a phenotype to underlying loci that affect its trait value. We consider three forms of mating. First, individuals might assort in a manner that is independent of the overall genetic admixture level. Second, individuals might assort by a quantitative phenotype that is initially correlated with the genetic admixture level. Third, individuals might assort by the genetic admixture level itself. Under the model, we explore the relationship between genetic admixture level and phenotype over time, studying the effect on this relationship of the genetic architecture of the phenotype. We find that the decoupling of genetic ancestry and phenotype can occur surprisingly quickly, especially if the phenotype is driven by a small number of loci. We also find that positive assortative mating attenuates the process of dissociation in relation to a scenario in which mating is random with respect to genetic admixture and with respect to phenotype. The mechanistic framework suggests that in an admixed population, a trait that initially differed between source populations might be a reliable proxy for ancestry for only a short time, especially if the trait is determined by relatively few loci. The results are potentially relevant in admixed human populations, in which phenotypes that have a perceived correlation with ancestry might have social significance as ancestry markers, despite declining correlations with ancestry over time.

There are a lot of words and math. It’s quite gnarly. But the figure at the top of the post shows the major effect.

Basically:

– loci in a trait (e.g., height) means that association between ancestry and trait decays more slowly
– stronger assortative mating of phenotype means that the association between ancestry and trait decays more slowly
– stronger assortative mating on ancestry means that the association between ancestry and trait decays more slowly

Since historically people did not have individualized genome-wide ancestry results “assortative mating on ancestry” means by physical appearance in the generality. To me panel E above is really what you should focus on. About 10 genes impact the phenotype, and assortative mating is at 0.5 (between 0 and 1.0). You see the correlation is already only ~0.50 between genome-wide ancestry and the trait in about 10 generations.

Anyway, dig into the math. I read the whole thing but didn’t go over the math in detail. The model and simulations make intuitive sense. I’d be curious how they fit empirical results (which are cited in the paper).

September 4, 2019

Extreme inbreeding is bad

Filed under: Population genetics — Razib Khan @ 11:05 pm


If you read a book like Principles of Population Genetics, or know a little animal breeding, you know inbreeding has some serious consequences. The UK Biobank turns out to have about ~100 individuals who are the products of extreme inbreeding (EI). That is, they are the offspring of parent-child pairings or full-sibling pairings, as inferred from the runs of homozygosity in their genomes (there are lots).

Intuition, theory, and a few results tell us that these individuals will have issues. Genomics confirms. Extreme inbreeding in a European ancestry sample from the contemporary UK population:

In most human societies, there are taboos and laws banning mating between first- and second-degree relatives, but actual prevalence and effects on health and fitness are poorly quantified. Here, we leverage a large observational study of ~450,000 participants of European ancestry from the UK Biobank (UKB) to quantify extreme inbreeding (EI) and its consequences. We use genotyped SNPs to detect large runs of homozygosity (ROH) and call EI when >10% of an individual’s genome comprise ROHs. We estimate a prevalence of EI of ~0.03%, i.e., ~1/3652. EI cases have phenotypic means between 0.3 and 0.7 standard deviation below the population mean for 7 traits, including stature and cognitive ability, consistent with inbreeding depression estimated from individuals with low levels of inbreeding. Our study provides DNA-based quantification of the prevalence of EI in a European ancestry sample from the UK and measures its effects on health and fitness traits.

The two major caveats are I’d put out there is that UK Biobank sample is a bit healthier and better educated than the average British person, and, the rates of individuals who were adopted is considerably higher in people who are products of EI than is the norm. In other words, these people are from an atypical sample, and they are themselves somewhat atypical (since they were given up for adoption they likely had no idea they were the products of EI).

Related Posts

  • 34
    Today I noticed Michael Grant's World Of Rome is on Kindle. It's a pretty decent survey. If you want a meatier survey, I recally recommend History of Rome. But that's not in print anymore, so you'll have to get a used copy. I love contemporary scholars, but they generally don't…
    Tags: population, genetics, study

July 24, 2019

The genetic discovery of France

Filed under: France,Population genetics — Razib Khan @ 9:57 pm

Finally, a deep drive into the population genetic structure of France, The Genetic History of France:

…These clusters match extremely well the geography and overlap with historical and linguistic divisions of France. By modeling the relationship between genetics and geography using EEMS software, we were able to detect gene flow barriers that are similar in the two cohorts and corresponds to major French rivers or mountains…A marked bottleneck is also consistently seen in the two datasets starting in the fourteenth century when the Black Death raged in Europe.

Nothing too surprising. In a nation of France’s size without strong socio-cultural dynamics that might encourage endogamy, it makes sense that geographic barriers are very important in structure. That being said, there does seem to be a correspondence between deep linguistic differences which date back to antiquity. Additionally, the people of Brittany turn out to be more “British” than not. This is not entirely surprising since the Breton dialect descends from the Brythonic language brought bystanders Celtic Britons (its closest relative is quasi-extinct Cornish).

I do wonder though how much France being a “target” nation for immigration over the centuries has shaped some of these patterns. I’m not talking here about recent non-European immigration, but the migration of Spaniards, Italians, and Poles, in the 19th-century, and earlier. Until the rise of Britain in the 18th-century France had been the largest, most powerful, and in the aggregate wealthiest, Western European nation in the post-Roman world. I suspect that this results in long-term trends toward cosmopolitanism genetically that might be absent in a few populations, such as the French Basque (who are distinct in these data).

July 6, 2019

Uyghur genetics and Kenneth Kidd – going beneath the surface

Filed under: China,Kenneth Kidd,Population genetics — Razib Khan @ 8:00 pm

The latest episode of NPR’s “Planet Money” was interesting to me and touched upon issues I’ve been thinking on a lot. Stuck In China’s Panopticon has a genetic angle. The Chinese government seems to be identifying and tracking Uyghurs with genetics. Or at least has the capability to do so. That is, in part, thanks to the work of Kenneth Kidd.

If you have read this weblog for a long time, or are a geneticist, you know who Kenneth Kidd is. You may have used his Alfred database. Though Wikipedia states that Kidd has been doing science in China since 1981, the podcast suggested that Kidd’s work under scrutiny dates to 2010.

That’s important. Because the reality is that the Chinese government did not need this late sampling to genetically identify Uyghurs. The HGDP data set has 10 Uyghurs already. People had been publishing on the pop genetics of the Uyghurs for more than 10 years by the time Kidd did his sampling. Alfred has 94 Uyghurs. This is better than 10, but for forensic purposes of ethnic identification, it’s probably superfluous.

In 2008 two Chinese researchers had already published a population genetic analysis with a bigger sample size than the HGDP. Kidd is not on the author list, so I don’t think he was involved.

Basically, Uyghurs are a group that will show admixture between various East and West Eurasian ancestry components many generations ago. This was already known before 2010. Only a few groups within China, such as Kazakhs, are even close to similar in their profile.

There is one area where I think Kidd’s work may have been pushing the frontier a bit: doing genealogical matching on diverse Uyghurs. Though I can’t imagine you could get more close relatives, the greater geographic diversity would probably implicate many more pedigrees.

Ultimately I don’t think the big picture is about Kenneth Kidd. Yes, forensics, genetics, and the  Chinese government give many Americans nightmares. But thousands and thousands of scientists in America do work in China, with China, or are themselves of Chinese origin. American researchers develop technology that is later used in China to clamp down on various dissenters from the regime in an authoritarian manner. American consumers purchase goods and services that power the Chinese economy. American researchers collaborate with Chinese researchers and have indirectly furthered Chinese institutions such as the Beijing Genomics Institute.

I think we need to be honest that this implicates all of us in a globalized “just-in-time” world economy. Do the reporters interviewing Kidd use iPhones made in China?

And, it even goes well beyond China. In general, I think the United States is a force for good. But, as the world’s current superpower we have done some nasty things. Our democratically elected presidents, all of the recent ones, have sent people to their deaths for the good of the world (so they thought). We have intervened in nations and caused massive destruction and death, even though we meant well. Many non-Americans have a deep suspicion of our nation because of the dark shadow that it casts in certain circumstances.

There are bigger questions about power, morality, and individual responsibility and culpability that I wish we’d address, rather than focusing on a single researcher. Especially when I don’t think Kidd’s work was nearly as necessary and essential as the media portrays it.

May 27, 2019

Population genetics + “deep learning”

Population genetics is many things, but a popular field that gets written up in Wired or the tech-press is usually not one of those things. It emerged out of Mendelian genetics in the early decades of the 20th-century, transforming elegant pedigrees into abstruse algebraic formulae. It was a peculiar hybrid of mathematics and evolutionary biology, both obsessions of late 19th-century Victorian academics. Population genetics was as much a product of a particular history as the topics that it studied.

In the population genetic lens, evolution became simply the “change in allele frequencies over time.” Alleles being the early term for different genetic variants, which were correlated with patterns of inheritance.

Whereas some fields of quantitative science are focused on the analysis of collected data, early population genetics was rather more fixated on logical deduction from theoretical models. These models involved the algebraic inferences that were consequences of assumptions about values of parameters such as mutation or natural selection, in the context of random mating populations. On occasion, these models were supplemented with geometric analogies and illustrations, but by and large, this domain of science was inhabited by thinkers who were comfortable in abstract symbols, rather than the mess and fuss of bench biology.

This was a matter of necessity as much as preference.

There simply was not much data in early genetics on a population-wide scale.

The structure of DNA was not elucidated until 1952. Molecular evolution did not emerge until the next decade, and what we term genomics is the product of the very end of the 20th-century.

But the growth in data since the year 2000 has been exponential. For its first 80 years, population genetics was a field with too little data, fixated on theory. In the last 20 years, as population genomics has bloomed researchers have had to confront the fact that the theoretical edifice built when there was access to genetic variation on dozens of loci within a species is not adequate in a world where one has access to whole genomes from hundreds of individuals.

Population genetics is now as much data science as theoretical science.

Words such as machine learning and deep learning have the characteristic of being both banal and esoteric. Who doesn’t know what a machine is? Or what deep means? And everyone learns! But of course, these terms refer to fields within computer science which have emerged to deal with the mass of data that modern society generates. Machines learning deeply seems to be quite a mysterious feat!

When population genetics was developed in the 1920s and 1930s to model evolutionary processes it was viewed as something of a mystery to most biologists. These theorists focused on the implications of models of the change in frequencies of alleles. They dealt in stylized conceptions of single mutations rising up rapidly in frequency due to strong positive selection, or perhaps a new mutation bouncing up and down in a “random walk” process of genetic drift. Relatively simple mathematical processes described simple evolutionary dynamics, which one could test with the limited data on hand.

Adaptation to malaria in Africa and the emergence of sickle-cell disease is a case in point. This is a situation where the selection pressure for individuals with a single copy of the mutant allele is balanced against the fitness cost to those who carry two copies of the mutant allele, and so exhibit sickle-cell disease. A simple algebraic relationship between the cost of sickle-cell disease and the protection conferred to carriers of the mutation against malaria can allow one to compute the allele frequencies at a single locus within populations.

But it turns out that much of natural selection is not so amenable to classical population genetic models.

A great deal of natural selection in populations is not easily localized to a specific locus. The human genome itself has 19,000 genes, and tens of millions of polymorphisms. Though there are some selected events which fit the model of a classical sweep up from a single mutation, most adaptation may occur through shifting the frequencies of many alleles across the genome in a subtle manner. Population genetic modeling from the early 20th-century was not designed to detect these subtle processes, because they would not have had the data to be able to detect them empirically for decades.

This is where buzzwords step in. Deep learning is a method of extracting features, patterns, out of a mass of raw data which is not digestible by humans. This is why it is applied to online marketing, to learn from the patterns of tens of millions of individuals, as well as their individual preferences, to generate a customized set of choices. This is in contrast to earlier methods of marketing which relies on segmentation by specific demographics defined by analysts. Classical marketing is not useless, but in the context of e-commerce, the newer methods of targeting individuals based on a mass of data are even more effective.

Machine and deep learning do not mean population genetic theory is irrelevant. On the contrary, classical population genetic theory is invaluable as a guide to the broad sweep of evolutionary change. It generates questions that one can finally test. Data science inference without a firm theoretical basis is directionless. But to test the details of population genetic processes one needs to lean on futuristic computer science.

Modern sequencing machines generate more data in a week than all of 20th-century genetics did over decades.

Only the interpretative tools developed in this century can absorb the scale of 21st-century genomics.

Population genetics + “deep learning” was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 5, 2019

Unleash the data kraken!

Filed under: data,Population genetics — Razib Khan @ 9:15 pm


The Reich lab has done a mitzvah and released a huge merged dataset of their modern and ancient populations in a big tarball. Actually, there are two files. One of them is a larger number of individuals with 600,000 SNPs (includes “Human Origins Array”) and the other has 1,200,000 SNPs, but fewer individuals. It is in EIGENSTRAT format.

For the convenience of readers who are more comfortable in PLINK/PEDIGREE format, I’ve converted them, and replaced the family ID column with population labels. The links take to you a zip file that has the three files for the binary format.

November 21, 2018

The people of the Andaman Islands are not genetic fossils

Filed under: Andamanese,Population genetics,Sentinelese — Razib Khan @ 5:06 pm

So this is in the news, Police: American adventurer John Allen Chau killed by isolated Sentinelese tribe on Indian island. There is some talk about whether the guy was a Christian missionary or not, but that’s not really too relevant. Whether he believes in evolution or not (he was a graduate of a very conservative Christian college), he definitely won a Darwin award before he expired.

North Sentinel is totally isolated, and the people who live there, the Sentinelese, are out of contact with the rest of the world. They are hostile to the outside world. And this is probably why the Sentinelese are still around, as the outside world does not have a good track record with hunter-gatherers. The Andamanese as a whole had a reputation for being very hostile to outsiders, as traders new not to stop too long for water.

Because the Sentinelese are back in the news, lots of stuff is being said about them in terms of their ancestry.

First, they are not that genetically unique. A recent paper on the genetics of Southeast Asia using ancient samples makes their affinities clear.  The Onge, an Andamanese tribe, are positioned close to the two ancient samples from Laos and Malaysia. They emerge out of the same milieu as Paleolithic Southeast Asians (whose  Hoabinhian culture persisted deep into the Holocene).

The Andamanese themselves are probably from mainland Southeast Asia. The gap between the islands and the mainland was smaller ~20,000 years ago when the sea levels were lower. They could have come up from the south or the north.

Second, they are not the most “ancient” people. That doesn’t make any sense. We are all people who are equally ancient. We all descend by and large outside of Africa from a migratory wave that expanded ~60,000 years ago. Andamanese, Chinese, and Europeans. What is “ancient” about them is that they are hunter-gatherers who have continued to practice that mode of production down to the present. But that’s a matter of culture and not genetics.

Third, in alignment with the above two points, they are not uniquely and distinctly isolated from all other human populations. They are not descendants of an early wave out of Africa preserved on these islands. They are not distinct from all other non-Africans. Rather, they seem to be closer to the peoples of Oceania, Papuans, and Australian Aboriginals, than Northeast Asians. And closer to Northeast Asians than they are to West Eurasians. The latest evidence is that the Andamanese were part of a broader diversification of lineages ~40-50,000 years ago to the east of India that gave rise to the peoples of the western Pacific Rim. Within this broader set of groups, some form a distinct clade that is not with Northeast Asians (often these are like “Australasian”).

Finally, the census size for the Sentinelese is in the range of 100 individuals. This seems on the edge of viability over the long term.

November 19, 2018

Have we seen the face of Rama?

Filed under: Population genetics,science — Razib Khan @ 11:29 pm


One of the problems with looking up pictures of the Kalash people of Pakistan is that photographers have a bias toward highlighting the most European-looking villagers. Let’s call this “Rudyard Kipling Lost White Races” syndrome. Therefore for your edification, I post the YouTube above which is probably more representative of what the Kalash look like.

The reason I post a link to what the Kalash look like is that it is germane to the answer to the question: what did the Indo-Aryans look like? The past tense is key since “Indo-Aryans” today means a lot of people in South Asia, in a literal sense.

In the post below Zach L. made a passing comment:

(1.) The AASI’s, which are sort of co-equivalent to the Negritos and Anadamese Islanders (one of the first coastal waves out of Africa that somehow also ended up in the Amazon). It’s interesting that they are substrate to every South Asian population (I think there are trace amounts in Central Asia, Afghanistan and even Iran).

(2.) the “Dravidian” farmers out of Iran. They are probably related to the J1/J2 types and might be an olive skinned population. Prominent in Sindh and Southern Pakistan through to South India (high % in Gujarat – must have been a locus of some sort).

(3.) our beloved Aryans who are especially prevalent among Brahmins, the Punjab and Haryana (though arguably the Haryanvis and East Punjab descend from Scythians to some extent). These look “European” but it’s a very different look to #2.

The Aryans are conventional European (light eyes, light hair, white skin) the ancient Dravidians would have (probably) looked like Middle Easterners (olive skin, dark hair dark eyes) and the AASI, ” looks like Papua New Guineans.

I can’t see any disagreement with point number two.

As for the AASI (“Ancient Ancestral South Indians”), we need to be careful here. They diverged from the ancestors of the people of Papua New Guinea ~40-50 thousand years ago. The divergence from the Andamanese, who probably migrated from mainland Southeast Asia, was not too much later. Aside from being very dark-skinned, the various extant “Australasian” people can be quite distinctive in appearance. The people of Papua, and native Australians, are quite robust. A substantial minority have blonde hair color due to a mutation common among Oceanians. The “Negrito” people of Southeast Asia and India all seem to be have adapted to a narrow relic niche, and may not be representative of their ancestors.

That being said, there is a particular non-West Eurasian look that many South Asians have which we can presume is the heritage of the AASI.

The comment about Aryans looking like Europeans raised my eyebrows a bit. This is a touchy subject, and to be honest my initial reaction was to be skeptical. But the more I read the primary literature to check up on Zach, the more reasonable this seemed to be. The dominant steppe signal into South Asia does resemble the people who were pushing into Central and Western Europe 1,000 years earlier than the Indo-Aryans, who were moving southward probably ~3,500 years ago. This is clear in rather simple statistical genetic analyses-populations such as the Kalash and Pathans for example show strong evidence of “European-like” gene flow.

Current work out of David Reich’s lab suggests that the Kalash are the best modern proxies we have for the “Ancestral North Indians,” the ANI. This population is modeled as:

– ~30% “steppe”, which is very similar to the ancestry which expaned westward into Europe between 3000 and 2500 BCE
– ~70% “Indus Periphery”, which seems the likely ancestral contribution of the people of the IVC, and is a heterogenous mix of Iranian-farmer and AASI

The mid-range estimate for the emergence of the Kalash mix is ~2,500 years before the present, but these usually have some downward bias, so it is reasonable that it would be greater than ~3,000 years. The samples from the Swat Valley dating to this period show gradual increase of “steppe” ancestry over time.

So one reason to be skeptical that the Indo-Aryans were “European-like” in appearance is that by the time they were flourishing in the lands previous inhabited by the IVC they may already have been more than 50% genetically like the people of the IVC. In which case, a minority would be very European-looking, but most would look vaguely West Asia, with some looking more stereotypically South Asian. If you look at the video above I think you do see the Kalash look this way.

One reason I’ve always been skeptical of the idea that the Indo-Aryans looked European, or, that their demographic impact was large, is that it seemed unlike both could be true. The expression of blue eyes among Indians was too low of a percentage.

Here is the frequency at a major SNP which predicts a lot of the blue vs. brown eye color.

What you see here is that the Kalash have the derived (“light”) variant at 25-30%. Notice that some Northern European populations are >75%.

Here are the frequencies from the 1000 Genomes:

I was a little surprise of the lack of variation from Punjabis (PJL), to Gujaratis (GIH), and Bangladeshis (BEB). Using the above logic the ~10% result would imply that a bit more than 10% European-like Indo-Aryan ancestry. This is reasonable.

But there are more SNPs than that that impact pigmentation. SLC24A5 is derived and fixed in Europeans, but pretty high frequency in South Asians (I have two homozygote derived copies and I’m rather brown). But some SNPs in SNP SLC45A2 are much more European specific in derived allele frequencies. So the 1000 Genomes surprised me somewhat:

Here you notice that the derived variant is nearly fixed in Northern Europe.  But in South Asian populations it’s not as high as you would expect. The frequency of OCA2 derived variant is higher than SLC45A2 in South Asia, while in Northern Europe it’s the opposite.

One explanation could be in situ selection in Northern Europe or in South Asia (or Central Asia). So these two markers suggest to me we can’t draw a straight line between physical affinity and total genetic ancestry/affinity.

 

November 3, 2018

It’s raining selective sweeps

Filed under: Population genetics,Population genomics,Selection — Razib Khan @ 11:44 pm

A week ago a very cool new preprint came out, Identifying loci under positive selection in complex population histories. It’s something that you can’t even imagine just ten years ago. The authors basically figure out ways to identify deviations of markers from expected allele frequency given a null neutral evolutionary model. The method is put first, which I really like, before getting to results or discussion. Additionally, they did a lot of simulation ahead of time. The sort of simulation that is really not possible before the sort of computational resources we have now.

Here’s the abstract:

Detailed modeling of a species’ history is of prime importance for understanding how natural selection operates over time. Most methods designed to detect positive selection along sequenced genomes, however, use simplified representations of past histories as null models of genetic drift. Here, we present the first method that can detect signatures of strong local adaptation across the genome using arbitrarily complex admixture graphs, which are typically used to describe the history of past divergence and admixture events among any number of populations. The method – called Graph-aware Retrieval of Selective Sweeps (GRoSS) – has good power to detect loci in the genome with strong evidence for past selective sweeps and can also identify which branch of the graph was most affected by the sweep. As evidence of its utility, we apply the method to bovine, codfish and human population genomic data containing multiple population panels related in complex ways. We find new candidate genes for important adaptive functions, including immunity and metabolism in under-studied human populations, as well as muscle mass, milk production and tameness in particular bovine breeds. We are also able to pinpoint the emergence of large regions of differentiation due to inversions in the history of Atlantic codfish.

On a related note in regards to selection, On the well-founded enthusiasm for soft sweeps in humans: a reply to Harris, Sackman, and Jensen. The authors are responding to a recent preprint criticizing their earlier work. The reason that it’s fascinating to me is that these sorts of arguments today are really concrete and not so theoretical. There’s a lot of data for analytic techinques to chew through, and computation has really transformed the possibilities.

A generation ago these sorts of debates would be a sequence of “you’re wrong!” vs. “no, you’re wrong!” Today the disputes involve a lot of data, and so have a reasonable chance of resolution.

The first preprint identifies the usual candidates in humans that you normally see, and expected targets in cattle and cod. Sure, that will given biologists more interested in mechanisms and pathways things to chew upon, but imagine once researchers have large numbers of genomes for thousands and thousands of species. Then they’ll be testing deviations from neutral allele frequencies across many trees, and getting a more general and abstract sense of the parameter that selection explores, conditional on particularities o evolutionary history.

This is why I’m excited about plans to sequence lots and lots of species.

Older Posts »

Powered by WordPress

Do NOT follow this link or you will be banned from the site!