Razib Khan One-stop-shopping for all of my content

December 14, 2017

A genetic map of the world

Filed under: Population genetics — Razib Khan @ 4:46 pm

The above map is from a new preprint on the patterns of genetic variation as a function of geography for humans, Genetic landscapes reveal how human genetic diversity aligns with geography. The authors assemble an incredibly large dataset to generate these figures. The orange zones are “troughs” of gene flow. Basically barriers to gene flow.  It is no great surprise that so many of the barriers correlate with rivers, mountains, and deserts. But the aim of this sort of work seems to be to make precise and quantitative intuitions which are normally expressed verbally.

To me, it is curious how the borders of the Peoples’ Republic of China is evident on this map (an artifact of sampling?). Additionally, one can see Weber’s line in Indonesia. There are the usual important caveats of sampling, and caution about interpreting present variation and dynamics back to the past. But I believe that these sorts of models and visualizations are important nulls against which we can judge perturbations.

As I said, these methods can confirm rigorously what is already clear intuitively. For example:

Several large-scale corridors are inferred that represent long-range genetic similarity, for example: India is connected by two corridors to Europe (a southern one through Anatolia and Persia ‘SC’, and
a northern one through the Eurasian Steppe ‘NC’)

We still don’t have enough ancient DNA to be totally sure, but it’s hard to ignore the likelihood that “Ancestral North Indians” (AN) actually represent two different migrations.

India also illustrates contingency of these barriers. Before the ANI migration, driven by the rise in agricultural lifestyles, there would likely have been a major trough of gene flow on India’s western border. In fact a deeper one than the one on the eastern border. And if the high genetic structure statistics from ancient DNA are further confirmed then the rate of gene flow was possibly much lower between demes in the past. Perhaps that would simply re-standardize equally so that the map itself would not be changed, but I suspect that we’d see many more “troughs” during the Pleistocene and early Holocene.

Because there are so many geographically distributed samples for humans, and frankly some of the best methods developers work with human data (thank you NIH), it is no surprise that our species would be mapped first. But I think some of the biggest insights may be with understanding the dynamics of gene flow of non-human species, and perhaps the nature and origin of speciation as it relates to isolation (or lack thereof).

December 10, 2017

Visualizing intra-European phylogenetic distances

Filed under: Europe,European genetics,Population genetics,Population genomics — Razib Khan @ 4:53 pm
Neighbor-joining tree of genetic distances between populations


In L. L. Cavalli-Sforza’s The History and Geography of Human Genes he used between population group genetic distances, as measured in FST values, to generate a series of visualizations, which then allowed him to infer historical processes. Basically the way it works is that you look at genetic variation, and see how much of it can be allocated to between groups. If none of it can be allocated to between groups, then in a population genetic sense it doesn’t make much sense to speak of distinctive groups, they’re basically one breeding population. The higher the FST statistic is, the more of the variation is partitioned between the groups.

Roughly this is used to correlate with genetic distance as well as evolutionary divergence. The longer two populations have been separated, the more and more genetic differences they’ll accumulate, inflating the FST value. There are a lot of subtleties that I’m eliding here (see Estimating and interpreting FST: the impact of rare variants for a survey of the recent literature on the topic and pathways forward), but for a long time, FST was the go-to statistic for making phylogenetic inferences on a within-species scale.

Today we have other techniques, Structure, Treemix, fineStructure, and various local ancestry packages.

But FST is still useful to give one a Gestalt sense of population genetic differences. Cavalli-Sforza admits in The History and Geography of Human Genes that European populations had very low pairwise FST, but because of the importance of Europe for sociocultural reasons a detailed analysis of the region was still provided in the text. Additionally, they had lots of European samples (non-European Caucasoids were thrown into one category for macro-group comparisons because there wasn’t that many samples).

Using results from the 2015 paper Massive migration from the steppe was a source for Indo-European languages in Europe, I visualized pairwise genetic distances for European populations, ancient and modern (Han Chinese as an outgroup), on a tree. What the results illustrate is that

  1. Ancient populations were very distinct in Europe from modern ones.
  2. Many modern groups are clustered close together.

The bulk of the population genetic structure in modern Europe seems to have been established in the period between 3000 BCE and 2000 BCE. This is not that much time for a lot of distinctiveness to develop, especially on the geographically open North European plain. I suspect with more and more Mesolithic and early to middle Neolithic DNA we’ll see that some of the modern population structure is a ghost of ancient substrate absorption.

Many of the ethno-national categories that are very significant in recent history, and impact the cultural memories of modern people and their genealogies, have very shallow roots. This does not mean they are not “real” (I don’t know what that’s supposed to mean at all), just that many of the identities which seem so salient to us today may be relatively recent in terms of their significance to large groups of humans….

December 8, 2017

The Saxon Panmixia

Filed under: Population genetics — Razib Khan @ 9:09 pm

One reason I quite like Norman Davies’ book The Isles is that it is a history of Britain and Ireland which explicitly aims to not privilege the story of the English inordinately. As the most powerful and numerous people of the British Isles the English loom large, but in the period between Gildas and Bede things were very different. In the early 600s the Welsh king Cadwallon ap Cadfan conquered and held Northumbria for a period, northern England from the Irish Sea to the North Sea. But this was the last time that a Celtic monarch held land in eastern England, unless you count the Tudors.

In The Isles, written at the turn of the century, Davies promotes the view dominant among historians at that time that the transition from British Celtic to Anglo-Saxon occurred through diffusion of elite culture. He alludes to the fact that in the year 700 the law code of Wessex alludes explicitly to the fact the weregild paid for the death of a Saxon was many-fold greater than that paid for a Briton (of the same class status). This suggests that many Britons were still resident in the Anglo-Saxon kingdoms. The contrasting view, which was dominant in the early 20th century, was that the English replaced the Celts in toto. The Irish, Welsh, and to some extent the Scots, were viewed as racially distinct from the Germanic English.

2015’s The fine scale genetic structure of the British population answered many of these questions. It turns out the maximal positions were incorrect. The authors estimate that 10-40% of the ancestry in eastern and southern England (the positions on the map) derive from Germanic peoples which we might term Saxon, Angles, and Jutes. Even if the fraction is as low as 10% that is not trivial. If we take a value closer to ~25%, unless there were massive reproductive advantages for elites, it could not have just been diffusion from the elite. Archaeologists also see wholesale changes in agricultural patterns in eastern England, indicative of a transfer of a whole folkway.

All that being said it is likely that the majority of the ancestry of the population of England proper descends from Britons. In fact, once the Anglo-Saxon cultural hegemony was established it seems that some elite Britons may also have changed their identity. It is always a curious fact that the names of the first kings in the genealogy of the House of Wessex are distinctively Celtic. Just as Romano-Gallic aristocrats began aping the styles and mores of the Frankish elite in the 6th century, so perhaps some British warlords became Saxons.

Using similar methods many of the same authors have now put out a preprint on Ireland, Insular Celtic population structure and genomic footprints of migration. Unlike the earlier work on Britain, they’ve acknowledged the ancient DNA work which has reshaped our understanding of population turnover in Ireland. That being said, they are focused on more recent events, as well as spatial structure in the modern era.

Though they don’t have access to as detailed a regional data set as in the earlier work on Britain, in this case, the authors managed to detect a lot of regional population structure within Ireland. Why? Though the Irish are relatively homogeneous, as all Northern Europeans are, looking at long tracts of the genome and the patterns therein can squeeze out more information.

The figure at the top of this post shows how well they can cluster individuals geographically: they’ve basically recapitulated the “map of the British Isles.” There aren’t too many surprises. Western Ireland seems to exhibit greater genetic differences as a function of distance. Probably because it’s less developed, and perhaps because it has been less impacted by outsiders. Ulster and southern Scotland are strongly connected genetically. There are two issues going on here. First, the famous migration of Protestants into this region of Ireland from Scotland and northern England that occurred after the conquest of the 16th century. And second, the earlier migration of Irish to Scotland, which resulted in the creation of the Dal Riata kingdom.

Additionally, the authors detect more admixture in several parts of Ireland from Norse than they had anticipated. The mixing of Scandinavians and Irish created a hybrid culture, the Norse-Gaels, which was highly influential around the Irish Sea. So it would not be exactly surprising if there was a greater Scandinavian contribution to Irish ancestry than had been anticipated.

Of greater interest to me is the impact of social-political institutions on the genetic structure or lack thereof. Both Britain and Ireland have homogenized modal clusters. In Britain, this is associated with the expanding cultural zone of Anglo-Saxon rule, and later became the core of England. In Ireland, it seems to be the Pale, where Anglo-Norman rule was dominant for many centuries. Rapid cultural change seems to induce a state of panmixia. Genetic distinctiveness in the British Isles seems to have persisted in populations which were geographically isolated, or politically insulated, from expansive, assimilative, and integrative cultures. The modal cluster in Ireland is far smaller than in England, which nicely correlates with the much more limited impact of the Anglo-Norman ascendency of the medieval period.

November 24, 2017

Soft selection for gentleness in Puerto Rican African Honeybees

Filed under: Population genetics,Population genomics,Soft Selection,Soft Sweep — Razib Khan @ 3:07 pm

When I was a kid “killer bees” were a major pop culture thing. There were movies about the bees, and we would get updates about their march northward in the news. They were a cautionary tale of our species’ hubris.

Today we have a little bit more perspective. These bees were actually just African honeybees, the ancestral population to European honeybees, which were introduced to the New World with Europeans centuries earlier than the African honeybees. African honeybees were not that different from European honeybees, but they were more aggressive and tended to outcompete European honeybee colonies. They are a major problem for the beekeeping industry, but not a major threat to human life.

Today the African and European populations in the United States seem to have stabilized in their ranges, with a hybrid zone between them. African bee’s migratory behavior makes them less competitive with European bees in colder climates.

A friend of mine once mentioned to me that if he had to do it all over again he would do research on the evolutionary genomics of Hymenoptera, and in particular bees. People care about bees. So it ‘s no surprise that I noticed this paper out in Nature Communications, A soft selective sweep during rapid evolution of gentle behavior in an Africanized honeybee:

Highly aggressive Africanized honeybees (AHB) invaded Puerto Rico (PR) in 1994, displacing gentle European honeybees (EHB) in many locations. Gentle AHB (gAHB), unknown anywhere else in the world, subsequently evolved on the island within a few generations. Here we sequence whole genomes from gAHB and EHB populations, as well as a North American AHB population, a likely source of the founder AHB on PR. We show that gAHB retains high levels of genetic diversity after evolution of gentle behaviour, despite selection on standing variation. We observe multiple genomic loci with significant signatures of selection. Rapid evolution during colonization of novel habitats can generate major changes to characteristics such as morphological or colouration traits, usually controlled by one or more major genetic loci. Here we describe a soft selective sweep, acting at multiple loci across the genome, that occurred during, and may have mediated, the rapid evolution of a behavioural trait.

Come for the bees, but stay for the soft selection! If you talk to anyone in evolutionary and population genomics you know that the future is in understanding patterns of soft selection and polygenic selection from standing variation. Though these are related phenomena which are associated with each other, all are all distinct.

Standing variation just refers to the diversity which is segregating in the population at any given time. At any given moment many loci exhibit polymorphism. This polymorphism can be a target of natural selection if it is correlated with heritable variation and differentials in fitness. Though soft selection can be quite wooly it’s inverse, hard selection, is clear: in genetic terms hard selection can be seen in allele frequency changes at a single variant in a locus, going from the point where it is a novel mutation to nearly fixed in the population. In Haldane’s original conception hard selection involved excess deaths, and imposed a limit on the rate of evolution as well as the amount variation you could expect within a given population. This model was convenient in the pre-genomic and early genomic era because empirical selection tests had to focus on large allele frequency changes around singular loci. Researchers didn’t have large numbers of whole-genome samples available (nor the computational ability to analyze them).

Today this is not a limitation. In the analysis above the authors had 30 individuals of the 3 populations sequenced at high quality (20x). They ended up with millions of genetic variants they could analyze.

The plot to the left shows that “gentle African honeybees” (gAHB) tend to be closer to the African honeybee populations (AHB) overall (though with some hybridization with European honeybees, EHB). This is not surprising.

But the key observation was that over 12 generations the African honeybees of Puerto Rico became progressively less aggressive, despite maintaining overall morphological similarities to the mainland Mexican African bees from which they likely derive. Though buried in the discussion, there is a rationale for why this morphological change may have occurred: the Puerto Rican bees are subject to a lot of negative selection against aggression because of the density of the island, as well as the reality that aside from humans there aren’t other many species where their aggressive tendencies are beneficial. Basically, if you are an aggressive colony, it’s harder to make a go in densely settled areas (the implication here then is that there are probably “gentle” African honeybee populations across Latin America, they just are never disaggregated from the broader meta-population).

Credit: Phillip Messer and Nandita Garud

It’s the genomics where the real evolutionary insight comes in: they found that there were multiple soft sweep events around genetic regions implicated in behavior. In their overall genome the gAHB of Puerto Rico resembled mainland AHB, but in this subset of genetic loci they resembled EHB. Many of these loci had also been known to be targets of selection when the original European bee population diverged from the ancestral African population. Basically this is a genomic illustration of convergent evolution.

Regular readers of this blog will recognize the ways they detected selection. They used a modified form of EHH, which is reasonable since the selection event was recent enough to have been associated with distinct haplotype blocks. Also, standard Fst analysis showed that these were outliers in relation to the broader genetic pattern of relatedness (these loci were more like EHB than AHB, while most loci were more like AHB than EHB).

So this a form of polygenic selection. Remember, natural selection only knows genes through the phenotype (with intra-genomic selection being an exception). A behavior like aggression is probably subject to the fourth law of behavior genetics. That is, variation won’t be defined around a single genetic locus. Rather, variation across the genome will be correlated with variation in the phenotype. As selection favors a particular value of the phenotype across the distribution the allele frequencies across many genetic loci will shift, but they will not necessarily fix. Polygenic selection operates on the dispersed standing genetic variation which explains much of the variation of the phenotype in question. Instead of total sweeps to fixation due to large fitness differences between a given allele and its alternative form, the selection impact is distributed and diffused across the genome.

Though most of the genetic variants seem to recapitulate the evolution of the less aggressive phenotype that occurred with the original migration north of African honeybees, some of the selection signatures were novel. This points to the reality that when you have soft selection on standing variation you may have similar phenotypes which evolve via different means. Additionally, the authors noted that these results were in contrast to controlled breeding experiments in mammals where selection for gentility (“domestication”) often targeted a few loci and exhibited strong pleiotropic effects (due to the genetic correlation). These results point to the limitations of inferences made from human-directed selection.

Soft selection is probably ubiquitous. Consider the evolution of skin color in humans. There are lots of variants and lots of variation, and most of the variation seems to be ancestral. Only at the locus SLC24A5 do you have a perfect illustration of a hard selective sweep, probably from a de novo mutation that emerged around the Last Glacial Maximum.

From a geneticists’ perspective evolution is basically conceived of as changes in allele frequencies over time. Much of this is due to natural selection. Now that the world of soft selection is opening up, I suspect that we’ll understand a lot more of what we see around us, at least in the generality.

Citation: A soft selective sweep during rapid evolution of gentle behaviour in an Africanized honeybee.

October 22, 2017

Machine learning swallowing population genetics = understanding patterns in population genomics

Filed under: Machine Learning,Population genetics — Razib Khan @ 1:09 pm

Dan Schriber and Andy Kern have a new review preprint out, Machine Learning for Population Genetics: A New Paradigm. On Twitter there has already been a little snark to the effect of “oh, you mean regression?” That’s fair enough, and the preprint would probably benefit from a lower key title, though that’s really the sort of titles journals seem to love.

I would recommend this preprint to two large groups of my readers. There are those with strong computational skills who are curious about biology. It makes it clear why population genomics benefits from machine learning methods. Second, those who are interested or trained in genetics with less of a computational and pop gen background.

Yes, all models are wrong. But some give insight, and some are just not salvageable. In population genomics some of the model-building is obviously starting to yield really fragile results.

September 16, 2017

Carving nature at its joints more realistically

Filed under: Admixture,construct,phylogenetics,Population genetics,Structure — Razib Khan @ 10:23 pm

If you are working on phylogenetic questions on a coarse evolutionary scale (that is, “macroevolutionary,” though I know some evolutionary geneticists will shoot me the evil eye for using that word) generating a tree of relationships is quite informative and relatively straightforward, since it has a comprehensible mapping onto to what really occurred in nature. When your samples are different enough that the biological species concept works well and gene flow doesn’t occur between node, then a tree is a tree (one reason Y and mtDNA results are so easy to communicate to the general public in personal genomics).

Everything becomes more problematic when you are working on a finer phylogenetic scale (or in taxa where inter-species gene flow is common, as is often the case with plants). And I’m using problematic here in the way that denotes a genuine substantive analytic issue, as opposed to connoting something that one has moral or ethical objections to.

It is intuitively clear that there is often genetic population structure within species, but how to summarize and represent that variant is not a straightforward task.

In 2000 the paper Inference of Population Structure Using Multilocus Genotype Data in Genetics introduced the sort of model-based clustering most famously implemented with Structure. The paper illustrates limitations with the neighbor-joining tree methods which were in vogue at the time, and contrasts them with a method which defines a finite set of populations and assigns proportions of each putative group to various individuals.

The model-based methods were implemented in numerous packages over the 2000s, and today they’re pretty standard parts of the phylogenetic and population genetic toolkits. The reason for their popularity is obvious: they are quite often clear and unambiguous in their results. This may be one reason that they emerged to complement more visualization methods like PCA and MDS with fewer a priori assumptions.

But of course, crisp clarity is not always reality. Sometimes nature is fuzzy and messy. The model-based methods take inputs and will produce crisp results, even if those results are not biologically realistic. They can’t be utilized in a robotic manner without attention to the assumptions and limitations (see A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots).

This is why it is exciting to see a new preprint which addresses many of these issues, Inferring Continuous and Discrete Population Genetic Structure Across Space*:

A classic problem in population genetics is the characterization of discrete population structure in the presence of continuous patterns of genetic differentiation. Especially when sampling is discontinuous, the use of clustering or assignment methods may incorrectly ascribe differentiation due to continuous processes (e.g., geographic isolation by distance) to discrete processes, such as geographic, ecological, or reproductive barriers between populations. This reflects a shortcoming of current methods for inferring and visualizing population structure when applied to genetic data deriving from geographically distributed populations. Here, we present a statistical framework for the simultaneous inference of continuous and discrete patterns of population structure….

The whole preprint should be read for anyone interested in phylogenomic inference, as there is extensive discussion and attention to many problems and missteps that occur when researchers attempt to analyze variation and relationships across a species’ range. Basically, the sort of thing that might be mentioned in peer review feedback, but isn’t likely to be included in any final write-ups.

As noted in the abstract the major issue being addressed here is the problem that many clustering methods do not include within their model the reality that genetic variation within a species may be present due to continuous gene flow defined by isolation by distance dynamics. This goes back to the old “clines vs. clusters” debates. Many of the model-based methods assume pulse admixtures between population clusters which are random mating. This is not a terrible assumption when you consider perhaps what occurred in the New World when Europeans came in contact with the native populations and introduced Africans. But it is not so realistic when it comes to the North European plain, which seems to have become genetically differentiated only within the last ~5,000 years, and likely seen extensive gene flow.

The figure below shows the results from the conStruct method (left), and the more traditional fastStructure (right):

There are limitations to the spatial model they use (e.g., ring species), but that’s true of any model. The key is that it’s a good first step to account for continuous gene flow, and not shoehorning all variation into pulse admixtures.

Though in beta, the R package is already available on github (easy enough to download and install). I’ll probably have more comment when I test drive it myself….

* I am friendly with the authors of this paper, so I am also aware of their long-held concerns about the limitations and/or abuses of some phylogenetic methods. These concerns are broadly shared within the field.

September 14, 2017

After agriculture, before bronze


The above plot shows genetic distance/variation between highland and lowland populations in Papa New Guinea (PNG). It is from a paper in Science that I have been anticipating for a few months (I talked to the first author at SMBE), A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea.

What does “strong genetic structure” mean? Basically Fst is showing the proportion of genetic variation which is partitioned between groups. Intuitively it is easy to understand, in that if ~1% of the genetic variation is partitioned between groups in one case, and ~10% in another, then it is reasonable to suppose that the genetic distance between groups in the second case is larger than in the first case. On a continental scale Fst between populations is often on the order of ~0.10. That is the value for example when you pool the variation amongst Northern Europeans and Chinese, and assess how much of it can be apportioned in a manner which differentiates populations (so it’s about ~10% of the variation).

This is why ancient DNA results which reported that Mesolithic hunter-gatherers and Neolithic farmers in Central Europe who coexisted in rough proximity for thousands of years exhibited differences on the order of ~0.10 elicited surprise. These are values we are now expecting from continental-scale comparisons. Perhaps an appropriate analogy might be the coexistence of Pygmy groups and Bantu agriculturalists? Though there is some gene flow, the two populations exist in symbiosis and exhibit local ecological segregation.

In PNG continental scale Fst values are also seen among indigenous people. The differences between the peoples who live in the highlands and lowlands of PNG are equivalent to those between huge regions of Eurasia. This is not entirely surprising because there has been non-trivial gene flow into lowland populations from Austronesian groups, such as the Lapita culture. Many lowland groups even speak Austronesian languages today.

Using standard ADMIXTURE analysis the paper shows that many lowland groups have significant East Asian ancestry (red), while none of the highland groups do (some individuals with East Asian admixture seem to be due to very recent gene flow). But even within the highlands the genetic differences are striking. The  Fst values between Finns and Southern European groups such as Spaniards are very high in a European context (due to Finnish Siberian ancestry as well as drift through a bottleneck), but most comparisons within the highland groups in PNG still exceeds this.

The paper also argues that genetic differences between Papuans and the natives of Australia pre-date the rising sea levels at the beginning of the Holocene, when Sahul divided between its various constituents. This is not entirely surprising considering that the ecology of the highlands during the Pleistocene would have been considerably different from Australia to the south, resulting in sharp differences in the hunter-gatherer lifestyles. Additionally, there does not seem to have been a genetic cline. Papuans are symmetrically related to all Australian groups they had samples from.

Using coalescence-based genomic methods they inferred that separation between highlands and some lowland groups occurred ~10-20,000 years ago. That is, after the Last Glacial Maximum. For the highlands, the differences seem to date to within the last 10,000 years. The Holocene. Additionally, they see population increases in the highlands, correlating with the shift to agriculture (cultivation of taro).

None of the above is entirely surprising, though I would take the date inferences with a grain of salt. The key is to observe that large genetic differences, as well as cultural differences, accrued in the highlands of PNG during the Holocene. In the paper they have a social and cultural explanation for what’s going on:

  Fst values in PNG fall between those of hunter-gatherers and present-day populations of west Eurasia, suggesting that a transition to cultivation alone does not necessarily lead to genetic homogenization.

A key difference might be that PNG had no Bronze Age, which in west Eurasia was driven by an expansion of herders and led to massive population replacement, admixture, and cultural and linguistic change (7, 8), or Iron Age such as that linked to the expansion of Bantu-speaking
farmers in Africa (24). Such cultural events have resulted in rapid Y-chromosome lineage expansions due to increased male reproductive variance (25), but we consistently find no evidence for this in PNG (fig. S13). Thus, in PNG, wemay be seeing the genetic, linguistic, and cultural diversity that sedentary human societies can achieve in the absence of massive technology-driven expansions.

Peter Turchin in books like Ultrasociety has aruged that one of the theses in Steven Pinker’s The Better Angels of Our Nature is incorrect: that violence has not decreased monotonically, but peaked in less complex agricultural societies. PNG is clearly a case of this, as endemic warfare was a feature of highland societies when they encountered Europeans. Lawrence Keeley’s War Before Civilization: The Myth of the Peaceful Savage gives so much attention to highland PNG because it is a contemporary illustration of a Neolithic society which until recently had not developed state-level institutions.

What papers like these are showing is that cultural and anthropological dynamics strongly shape the nature of genetic variation among humans. Simple models which assume as a null hypothesis that gene flow occurs through diffusion processes across a landscape where only geographic obstacles are relevant simply do not capture enough of the dynamic. Human cultures strongly shape the nature of interactions, and therefore the genetic variation we see around us.

September 10, 2017

Quantitative genomics, adaptation, and cognitive phenotypes

The human brain utilizes about ~20% of the calories you take in per day. It’s a large and metabolically expensive organ. Because of this fact there are lots of evolutionary models which focus on the brain. In Catching Fire: How Cooking Made Us Human Richard Wrangham suggests that our need for calories to feed our brain is one reason we started to use fire to pre-digest our food. In The Mating Mind Geoffrey Miller seems to suggest that all the things our big complex brain does allows for a signaling of mutational load. And in Grooming, Gossip, and the Evolution of Language Robin Dunbar suggests that it’s social complexity which is driving our encephalization.

These are all theories. Interesting hypotheses and models. But how do we test them? A new preprint on bioRxiv is useful because it shows how cutting-edge methods from evolutionary genomics can be used to explore questions relating to cognitive neuroscience and pyschopathology, Polygenic selection underlies evolution of human brain structure and behavioral traits:

…Leveraging publicly available data of unprecedented sample size, we studied twenty-five traits (i.e., ten neuropsychiatric disorders, three personality traits, total intracranial volume, seven subcortical brain structure volume traits, and four complex traits without neuropsychiatric associations) for evidence of several different signatures of selection over a range of evolutionary time scales. Consistent with the largely polygenic architecture of neuropsychiatric traits, we found no enrichment of trait-associated single-nucleotide polymorphisms (SNPs) in regions of the genome that underwent classical selective sweeps (i.e., events which would have driven selected alleles to near fixation). However, we discovered that SNPs associated with some, but not all, behaviors and brain structure volumes are enriched in genomic regions under selection since divergence from Neanderthals ~600,000 years ago, and show further evidence for signatures of ancient and recent polygenic adaptation. Individual subcortical brain structure volumes demonstrate genome-wide evidence in support of a mosaic theory of brain evolution while total intracranial volume and height appear to share evolutionary constraints consistent with concerted evolution…our results suggest that alleles associated with neuropsychiatric, behavioral, and brain volume phenotypes have experienced both ancient and recent polygenic adaptation in human evolution, acting through neurodevelopmental and immune-mediated pathways.

The preprint takes a kitchen-sink approach, throwing a lot of methods of selection at the phenotype of interest. Also, there is always the issue of cryptical population structure generating false positive associations, but they try to address it in the preprint. I am somewhat confused by this passage though:

Paleobiological evidence indicates that the size of the human skull has expanded massively over the last 200,000 years, likely mirroring increases in brain size.

From what I know human cranial sizes leveled off in growth ~200,000 years ago, peaked ~30,000 years ago, and have declined ever since then. That being said, they find signatures of selection around genes associated with ‘intracranial volume.’

There are loads of results using different methods in the paper, but I was curious note that schizophrenia had hits for ancient and recent adaptation. A friend who is a psychologist pointed out to me that when you look within families “unaffected” siblings of schizophrenics often exhibit deviation from the norm in various ways too; so even if they are not impacted by the disease, they are somewhere along a spectrum of ‘wild type’ to schizophrenic. In any case in this paper they found recent selection for alleles ‘protective’ of schizophrenia.

There are lots of theories one could spin out of that singular result. But I’ll just leave you with the fact that when you have a quantitative trait with lots of heritable variation it seems unlikely it’s been subject to a long period of unidirecitional selection. Various forms of balancing selection seem to be at work here, and we’re only in the early stages of understanding what’s going on. Genuine comprehension will require:

– attention to population genetic theory
– large genomic data sets from a wide array of populations
– novel methods developed by population genomicists
– and funcitonal insights which neuroscientists can bring to the table

June 27, 2017

Why you should learn some population genetics

Filed under: Population genetics — Razib Khan @ 10:03 pm

From reader surveys I know a substantial portion of the people who will see this post are financially well off (of those who aren’t, a large number are students). Therefore, you can invest in some books.

Often people ask me questions related to population genetics in the comments (sometimes I get emails). That is all well and good. But it is always better to be able to fish than have to ask for fish. Additionally, learning some population and quantitative genetics allows you to develop some tacit schemas through which you can process information coming at you, and through with you can develop some general intuition.

If you have a modest level of mathematical fluency and and the disposable income, here are three indispensable books which are like the keys to the kingdom:

* Elements of Evolutionary Genetics
* Principles of Population Genetics
* Introduction to Quantitative Genetics.

If you don’t have the cash to spare, there are online notes which are pretty good:

* Graham Coop’s Population Genetics notes
* Joe Felsenstein’s Theoretical Evolutionary Genetics

There are others online resources, but they are not as comprehensive. John Gillespie’s Population Genetics: A Concise Guide is good as very gentle introductions go, but if you are going to spend money, I think just plumping down for a more comprehensive textbook (which will have more genomics in it) is better over the long run.

The goal of getting these books isn’t to make you a population geneticist, but, if you are interested in evolutionary questions it gives you a powerful toolkit. Really nothing in evolutionary process makes sense except in the light of population genetics.

April 25, 2017

Dost thou know the equilibrium at panmixia?

Filed under: Genetics,Population genetics — Razib Khan @ 3:58 pm

If you read a blog about Biblical criticism from a Christian perspective it would probably be best if you were familiar with the Bible. You don’t have to have read much scholarly commentary, rather, just the New Testament. Barring that, at least the synoptic gospels!

At this point, with over 400 individuals responding to the reader survey, it is strange to consider that more people believe they have a handle on what Fst is than the Hardy-Weinberg Equilibrium. First, Fst is a more subtle concept than people often think it is. And second, because the HWE is so easy, important, and foundational to population genetics. I mean p^2 + 2pq^2 + q^2 = 1. Could it be simpler???

So a quick ask. If you are one of the people who doesn’t understand HWE or why it is important, please get yourself a copy of John Gillespie’s Population Genetics: A Concise Guide. I understand that not everyone has the time, interest, or money for Principles of Population Genetics, or any of the more “hardcore” texts. But Population Genetics: A Concise Guide will surely suffice to follow anything on this blog.

Or, barring that, please review the online resources which you have available. Two examples:

Graham Coop’s Notes on Population Genetics or Joe Felsenstein’s unpublished textbook Theoretical Evolutionary Genetics.

April 23, 2017

Why the rate of evolution may only depend on mutation

Filed under: Evolutionary Genetics,Genetics,Population genetics — Razib Khan @ 10:07 pm

Sometimes people think evolution is about dinosaurs.

It is true that natural history plays an important role in inspiring and directing our understanding of evolutionary process. Charles Darwin was a natural historian, and evolutionary biologists often have strong affinities with the natural world and its history. Though many people exhibit a fascination with the flora and fauna around us during childhood, often the greatest biologists retain this wonderment well into adulthood (if you read W. D. Hamilton’s collections of papers, Narrow Roads of Gene Land, which have autobiographical sketches, this is very evidently true of him).

But another aspect of evolutionary biology, which began in the early 20th century, is the emergence of formal mathematical systems of analysis. So you have fields such as phylogenetics, which have gone from intuitive and aesthetic trees of life, to inferences made using the most new-fangled Bayesian techniques. And, as told in The Origins of Theoretical Population Genetics, in the 1920s and 1930s a few mathematically oriented biologists constructed much of the formal scaffold upon which the Neo-Darwinian Synthesis was constructed.

The product of evolution

At the highest level of analysis evolutionary process can be described beautifully. Evolution is beautiful, in that its end product generates the diversity of life around us. But a formal mathematical framework is often needed to clearly and precisely model evolution, and so allow us to make predictions. R. A. Fisher’s aim when he wrote The Genetical Theory Natural Selection was to create for evolutionary biology something equivalent to the laws of thermodynamics. I don’t really think he succeeded in that, though there are plenty of debates around something like Fisher’s fundamental theorem of natural selection.

But the revolution of thought that Fisher, Sewall Wright, and J. B. S. Haldane unleashed has had real yields. As geneticists they helped us reconceptualize evolutionary process as more than simply heritable morphological change, but an analysis of the units of heritability themselves, genetic variation. That is, evolution can be imagined as the study of the forces which shape changes in allele frequencies over time. This reduces a big domain down to a much simpler one.

Genetic variation is concrete currency with which one can track evolutionary process. Initially this was done via inferred correlations between marker traits and particular genes in breeding experiments. Ergo, the origins of the “the fly room”.

But with the discovery of DNA as the physical substrate of genetic inheritance in the 1950s the scene was set for the revolution in molecular biology, which also touched evolutionary studies with the explosion of more powerful assays. Lewontin & Hubby’s 1966 paper triggered a order of magnitude increase in our understanding of molecular evolution through both theory and results.

The theoretical side occurred in the form of the development of the neutral theory of molecular evolution, which also gave birth to the nearly neutral theory. Both of these theories hold that most of the variation with and between species on polymorphisms are due to random processes. In particular, genetic drift. As a null hypothesis neutrality was very dominant for the past generation, though in recent years some researchers are suggesting that selection has been undervalued as a parameter for various reasons.

Setting the live scientific debate, which continue to this day, one of the predictions of neutral theory is that the rate of evolution will depend only on the rate of mutation. More precisely, the rate of substitution of new mutations (where the allele goes from a single copy to fixation of ~100%) is proportional to the rate of mutation of new alleles. Population size doesn’t matter.

The algebra behind this is straightforward.

First, remember that the frequency of the a new mutation within a population is \frac{1}{2N}, where N is the population size (the 2 is because we’re assuming diploid organisms with two gene copies). This is also the probability of fixation of a new mutation in a neutral scenario; it’s probability is just proportional to its initial frequency (it’s a random walk process between 0 and 1.0 proportions). The rate of mutations is defined by \mu, the number of expected mutations at a given site per generation (this is a pretty small value, for humans it’s on the order of 10^{-8}). Again, there are 2N individuals, so you have 2N\mu to count the number of new mutations.

The probability of fixation of a new mutations multiplied by the number of new mutations is:

    \[ \( \frac{1}{2N} \) \times 2N\mu = \mu \]

So there you have it. The rate of fixation of these new mutations is just a function of the rate of mutation.

Simple formalisms like this have a lot more gnarly math that extend them and from which they derive. But they’re often pretty useful to gain a general intuition of evolutionary processes. If you are genuinely curious, I would recommend Elements of Evolutionary Genetics. It’s not quite a core dump, but it is a way you can borrow the brains of two of the best evolutionary geneticists of their generation.

Also, you will be able to answer the questions on my survey better the next time!

April 14, 2017

Why overdominance probably isn’t responsible for much polymorphism

Filed under: Genetics,Population genetics — Razib Khan @ 10:54 pm

Hybrid vigor is a concept that many people have heard of, because it is very useful in agricultural genetics, and makes some intuitive sense. Unfortunately it often gets deployed in a variety of contexts, and its applicability is often overestimated. For example, many people seem to think (from personal communication) that it may somehow be responsible for the genetic variation around us.

This is just not so. As you may know each human carries tens of millions of genetic variants within their genome. Populations have various levels of polymorphism at particular positions in the genome. How’d they get there? In the early days of population genetics there were two broad schools, the “balance” and “classical.” The former made the case for the importance of balancing selection in maintaining variation. The latter suggested that the variation we see around us is simply a transient between fixation of a favored mutation from a low a frequency or extinction of a disfavored variant (perhaps environmental conditions changed and a high frequency variant is now disfavored). Arguably the rise of neutral theory and empirical results from molecular evolution supported the classical model more than the balance framework (at least this was Richard Lewontin’s argument, and I follow his logic here).

But even in relation to alleles which are maintained at polymorphism through balancing selection, overdominance isn’t going to be the major player.

Sickle cell disease is a classic consequence of overdominance; the heterozygote is more fit than the wild type or the recessive disease which is caused by homozygotes of the mutation. Obviously polymorphism is maintained despite the decreased fitness of the mutant homozygote because the heterozygote is so much more fit than the wild type. The final proportion of the alleles segregating in the population will be conditional on the fitness drag of the homozygote in the mutant type, because as per HWE it will be present in the population ~q2.

The problem is that this is clearly not going to scale across loci. That is, even if the fitness drag is more minimal than is the case with the sickle cell locus, one can imagine a cummulative situation. The segregation load is just going to be too high. Overdominance is probably a transient strategy which fades away as populations evolve more efficient ways to adapt that doesn’t have such a fitness load.

So how does balancing selection still lead to variation without heteroygote advantage? W. D. Hamilton argued that much of it was due to negative frequency dependent selection. Co-evolution with pathogens is the best case of this. As strategies get common pathogens adapt, so rare strategies encoded by rare alleles gain in fitness. As these alleles increase in frequency their fitness decreases due to pathogen resistance. Their frequency declines, and eventually the pathogens lose the ability to resist it, and its frequency increases again.

April 8, 2017

Why only one migrant per generation keeps divergence at bay

The best thing about population genetics is that because it’s a way of thinking and modeling the world it can be quite versatile. If Thinking Like An Economist is a way to analyze the world rationally, thinking like a population geneticist allows you to have the big picture on the past, present, and future, of life.

I have some personal knowledge of this as a transformative experience. My own background was in biochemistry before I became interested in population genetics as an outgrowth of my lifelong fascination with evolutionary biology. It’s not exactly useless knowing all the steps of the Krebs cycle, but it lacks in generality. In his autobiography I recall Isaac Asimov stating that one of the main benefits of his background as a biochemist was that he could rattle off the names on medicine bottles with fluency. Unless you are an active researcher in biochemistry your specialized research is quite abstruse. Population genetics tends to be more applicable to general phenomena.

In a post below I made a comment about how one migrant per generation or so is sufficient to prevent divergence between two populations. This is an old heuristic which goes back to Sewall Wright, and is encapsulated in the formalism to the left. Basically the divergence, as measured by Fst, is proportional to the inverse of 4 time the proportion of migrants times the total population + 1. The mN is equivalent to the number of migrants per generation (proportion times the total population). As the mN become very large, the Fst converges to zero.

The intuition is pretty simple. Image you have two populations which separate at a specific time. For example, sea level rise, so now you have a mainland and island population. Since before sea level rise the two populations were one random mating population their initial allele frequencies are the same at t = 0. But once they are separated random drift should begin to subject them to divergence, so that more and more of their genes exhibit differences in allele frequencies (ergo, Fst, the between population proportion of genetic variation, increases from 0).

Now add to this the parameter of migration. Why is one migrant per generation sufficient to keep divergence low? The two extreme scenarios are like so:

  1. Large populations change allele frequency very slowly due to drift, so only a small proportion of migration is needed to prevent them from diverging
  2. Small populations change allele frequency very fast due to drift, so a larger proportion of migration is needed to prevent them from drifting

Within a large population one migrant is a small proportion, but drift is occurring very slowly. Within a small population drift is occurring fast, but one migrant is a relatively large proportion of a small population.

Obviously this is a stylized fact with many details which need elaborating. Some conservation geneticists believe that the focus on one migrant is wrongheaded, and the number should be set closer to 10 migrants.

But it still gets at a major intuition: gene flow is extremely powerful and effective at reducing differences between groups. This is why most geneticists are skeptical of sympatric speciation. Though the focus above is on drift, the same intuition applies to selective divergence. Gene flow between populations work at cross-purposes with selection which drives two groups toward different equilibrium frequencies.

This is why it was surprising when results showed that Mesolithic hunter-gatherers and farmers in Europe were extremely genetically distinct in close proximity for on the order of 1,000 years. That being said, strong genetic differentiation persists between Pygmy peoples and their agriculturalist neighbors, despite a long history of living nearby each other (Pygmies do not have their own indigenous languages, but speak the tongue of their farmer neighbors). In the context of animals physical separation is often necessary for divergence, but for humans cultural differences can enforce surprisingly strong taboos. Culture is as strong a phenomenon as mountains or rivers….

December 18, 2012

Unveiling the genealogical lattice

To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.

Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.

This is on my mind because of the emergence of packages such as TreeMix and AdmixTools. Using software such as these on the numerous public data sets allows one to perceive the reality of admixture, and overlay lateral gene flow upon the tree as a natural expectation. But perhaps a deeper result is the character of the tree itself is torn asunder. The figure above is from a new paper, Efficient moment-based inference of admixture parameters and sources of gene flow, which debuts MixMapper. The authors bring a lot of mathematical heft to their exposition, and I can’t say I follow all of it (though some of the details are very similar to Pickrell et al.’s). But in short it seems that in comparison to TreeMix MixMapper allows for more powerful inference of a narrower set of populations, selected for exploring very specific questions. In contrast, TreeMix explores the whole landscape with minimal supervision. Having used the latter I can testify that that is true.

The big result from MixMapper is that it extends the result of Patterson et al., and confirms that modern Europeans seem to be an admixture between a “north Eurasian” population, and a vague “west Eurasian” population. Importantly, they find evidence of admixture in Sardinians, which implies that Patterson et al.’s original were not sensitive to admixture in putative reference populations (note that Patterson is a coauthor on this paper as well). The rub, as noted in the paper, is that it is difficult to estimate admixture when you don’t have “pure” ancestral reference populations. And yet here the takeaway for me is that we may need to rethink our whole conception of pure ancestral populations, and imagine a human phylogenetic tree as a series of lattices in eternal flux, with admixed nodes periodically expanding so as to generate the artifice of a diversifying tree. The closer we look, the more likely that it seems that most of the populations which have undergone demographic expansion in the past 10,000 years are also the products of admixture. Any story of the past 10,000 years, and likely the past 100,000 years, must give space at the center of the narrative arc lateral gene flow across populations.

Cite: arXiv:1212.2555 [q-bio.PE]

December 10, 2012

Is Daniel MacArthur ‘desi’?

My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).

Here’s my daughter’s new chromosome painting:

One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!

But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.

In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:

As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.

What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.

I did two things:

1) I ran ADMIXTURE at K = 3, unsupervised

2) I ran an MDS, which visualized the genetic variation in multiple dimensions

Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.

ADMIXTURE results for chromosome 10
K 1 K 2 K 3
CEU 0.04 0.02 0.93
GIH 0.87 0.05 0.08
CHD 0.01 0.97 0.01
Daniel MacArthur 0.29 0.07 0.64
Ilana Fisher 0.01 0.06 0.94

You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:

And here’s the MDS with the two largest dimensions:

Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.

Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what  recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.

Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.

December 1, 2012

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS


The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS


The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

October 10, 2012

A plea for population genetics

Filed under: Population genetics — Razib Khan @ 9:31 pm

The title here is somewhat misleading. This is not just a plea for population genetics, but for quantitative genetics as well. Genetics is a big field. But today it is defined by and large by DNA, the concrete entity in which the abstraction of the gene is embedded. Look at the header of this website, or the background to my Twitter account. Mind you, I’m pathetically informed about molecular genetics, and don’t have a strong interest in the topic! I did consider using the H.W.E. or the breeder’s equation for the header, but in the end I judged it too abstruse and unfamiliar to most readers. DNA dominates when it comes to the modern mental conception of genetics, and we have to live with it to some extent.
But there is also great value in the genetics which has intellectual roots in the pre-DNA Mendelians and biometricians. This genetics exhibits a symbiotic, but not necessary, association with genetics as a branch of biophysics. Yet I come here not to insult or impugn my friends who toil in the trenches of the molecular wars. Rather, I simply want ...

September 27, 2012

Paleopopulation Genetics

Filed under: Genetics,Population genetics — Razib Khan @ 9:57 pm

It seems a new field is being born! Jeff Wall & Monty Slatkin have a pretty thorough review out, Paleopopulation Genetics:

Paleopopulation genetics is a new field that focuses on the population genetics of extinct groups and ancestral populations (i.e., populations ancestral to extant groups). With recent advances in DNA sequencing technologies, we now have unprecedented ability to directly assay genetic variation from fossils. This allows us to address issues, such as past population structure, changes in population size, and evolutionary relationships between taxa, at a much greater resolution than can traditional population genetics studies. In this review, we discuss recent developments in this emerging field as well as prospects for the future.

Nothing very new for close readers of this weblog, but the references are useful for later mining.

August 28, 2012

Evolutionary & population genetics preprints – Haldane’s Sieve

OK, perhaps I can help with that. Dr. Coop speaks of the collaboration between himself & Dr. Joseph Pickrell, Haldane’s Sieve, which I added to my RSS days ago (and you can see me pushing it to my Pinboard). From the “About”:

As described above, most posts to Haldane’s Sieve will be basic descriptions of relevant preprints, with little to no commentary. All posts will have comment sections where discussion of the papers will be welcome. A second type of post will be detailed comments on a preprint of particular interest to a contributor. These posts could take the style of a journal review, or may simply be some brief comments. We hope they will provide useful feedback to the authors of the preprint. Finally, there will be posts by authors of preprints in which they describe their work and place it in broader context.

We ask the commenters to remember that by submitting articles to preprint servers the authors (often biologists) are taking a somewhat unusual step. Therefore, comments should be phrased in a constructive manner to aid the authors.

It might be helpful if other evolution/genetics bloggers ...

Older Posts »

Powered by WordPress