Razib Khan One-stop-shopping for all of my content

October 12, 2018

A historical slice of evolutionary genetics

Filed under: Evolutionary Genetics — Razib Khan @ 1:10 am

A few friends pointed out that I likely garbled my attribution of who were the guiding forces between the “classical” and “balance” in the post below (Muller & Dobzhansky as opposed to Fisher & Wright as I said). I’ll probably do some reading and update the post shortly…but it did make me reflect that in the hurry to keep up on the current literature it is easy to lose historical perspective and muddle what one had learned.

Of course on some level science is not as dependent on history as many other disciplines. The history is “baked-into-the-cake.” This is clear when you read The Origin of Species. But if you are interested in a historical and sociological perspective on science, with a heavy dose of narrative biography, I highly recommend Ullica Segerstrale’s Defenders of the Truth: The Battle for Science in the Sociobiology Debate and Beyond and Nature’s Oracle: The Life and Work of W.D. Hamilton.

Defenders of the Truth in particular paints a broad and vivid picture of a period in the 1960s and later into the 1970s when evolutionary thinkers began to grapple with ideas such as inclusive fitness. E. O. Wilson’s Sociobiology famously triggered a counter-reaction by some intellectuals (Wilson was also physically assaulted in the 1978 AAAS meeting). Characters such as Noam Chomsky make cameo appearances.

Segerstrale’s Nature’s Oracle focuses particularly on the life and times of W. D. Hamilton, though if you want that at high speed and max density, read Narrow Roads of Gene Land, Volume 2. Because Hamilton died before the editing phase, the biographical text is relatively unexpurgated. Hamilton also makes an appearance in The Price of Altruism: George Price and the Search for the Origins of Kindness.

The death of L. L. Cavalli-Sforza reminds us that the last of the students of the first generation of population geneticists are now passing on. With that, a great of history is going to be inaccessible. The same is not yet true of the acolytes of W. D. Hamilton, John Maynard Smith, or Robert Trivers.

October 11, 2018

Why PCA and genetics are a match made in heaven

Filed under: Evolution,Genetics,science — Razib Khan @ 8:13 pm
Insitome customers and selected populations

The image above is not the work of a small child trying to sketch out a B-2 Stealth Bomber. Rather, it is a PCA plot, which shows the distribution of a subset of Insitome’s customers who have purchased the Regional Ancestry Insight — in terms of how they relate to each other genetically.

In green, I have added some British individuals, in red some Africans from Nigeria, and in blue individuals who are ethnically Chinese. The majority of our customers are of Northern European heritage, but a substantial minority are African-American or Asian-American and various mixes therein.

So why do we use Principal Components Analyses, PCA, in the first place? And how does it work to matches our intuitions about relatedness through abstruse mathematical formulae?

Why we use PCA in genetics

Real genetic varition…a little bit

Consider this slice of diversity to the left. Six individuals, top to bottom, genotyped on a small number of genetic positions, left to right. You should recognize the letters, as they are DNA base pairs, A, C, G, and T. You can see above that there are variations between the positions across individuals. Now imagine attempting to gain insight from looking at thousands of individuals (rows) across hundreds of thousands of markers (columns).

Raw genetic data is basically just a huge text file. When you are concerned with the variation on a single position, you can view from the results for individuals or populations in a table and expect most people to immediately understand the implications. Europeans who are lactose tolerant have a variant on a particular marker. If you are TT or CT you can digest milk sugar, lactose, as an adult. If you are CC, you can’t. There are only one a few things to keep track of: the person, and their genotype.

Representing variation on a single marker, a single variable, isn’t necessary because the human mind can process all that information. In contrast, lots of simultaneous variables are impossible to understand just by visually looking at a table. PCA is just one of many excellent ways to extracting signal out of the noise.

The plot to the left was generated from ~30,000 markers on a few hundred individuals from eight populations. This is not a large dataset today. The time it took to run the function which generated the raw PCA result output was the period between me pressing “enter” on the keyboard and me looking at the computer screen.

And yet despite the modesty of this dataset can you imagine me looking at 30,000 variables across 200 samples, and obtaining any understanding? Perhaps if I devoted my life to the project!

What about the math?

The way it works mathematically is that it takes the voluminous raw data, which is totally incomprehensible to the human mind and summarizes it into a set of independent equations — making it completely essential to the analytical toolkit. The data is actually a “matrix.” PCA transforms it with a series of distinct equations which can define the total variation of the underlying data.

A matrix of genotypes

These equations, or more properly dimensions, are arrayed in order of proportion of variation in the data explained. On a conventional PCA plot, you see the first two dimensions, which explain the largest and second largest proportion of the variation, as the x and y-axes. But there are many more dimensions you can break the data apart by, though quite often for genetic analysis the largest ones are sufficient to smoke out the population structure that you are interested in. The values of individuals in each dimension that drops out of the data can then be placed onto a coordinate system, which is much easier to digest than a table of raw variation.

The branching of human populations

But how can a mathematical framework make biological variation comprehensible through maps so well — especially with regards to genetic differences between populations? The answer to this is straightforward: human evolutionary history has a pattern, and that pattern leaves its stamp on the genome. PCA is just a pattern extraction method.

The raw material of variation are mutations, and the pattern of mutations in any human genome is defined by a pedigree back to common ancestors. People who tend to share common ancestors share mutations — and mutations are the raw material for the genetic variation that PCA summarizes.

When used in evolutionary genetics, PCA should ideally recapitulate the phylogenetic tree. Assuming that sample sizes are balanced, humans in worldwide datasets have the first principal component of variation, which invariably a dimension that separates Africans from non-Africans.

Why? Because this is the earliest separation between large lineages, and so this ‘separation’ has had the most time to accumulate distinct and unique mutations in their two respective lineages. The second dimension is usually one that defines the difference between people from the Eastern portion of Eurasia and those from the western portion of Eurasia. Again, this is an important phylogenetic distinction because these two groups seem to have diverged soon after their ancestors left Africa.

And so on. PCA is not the only way to visualize the data. If you run a computer program that counted up raw similarities and differences between individuals at each genetic position, you would notice that some individuals are more similar to others, some groups more similar to other groups, and this too would reflect the phylogenetic history. If you had more time and wanted to dig deeper, you could construct various models of population history, and see how well the data fit those models.

PCA is not the only way to understand genetic variation. PCA itself is not the genetic variation, but a way to represent that variation, but it is a fast method that starts with few assumptions and lends itself to easy graphical representation. It’s not coincidence that it remains popular to this day.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


Why PCA and genetics are a match made in heaven was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

Making what Harvard is about transparent

Filed under: GSS,Harvard — Razib Khan @ 8:03 pm
This is the future Edward Blum wants

In the 20th century version of the TV series Murphy Brown, there was an episode where three young American scholars were introduced. The big laugh was that they had very Chinese or Indian names. Though it’s probably politically incorrect today to depict it that way, the joke is that the best “American” scholars were not really American….

If you’re an Asian American who remembers the period before the 1990s, you know where I’m coming from. This was an America in black and white, and you were literally the Other if you were outside of those two boxes. People would be surprised that you spoke English without an accent, and inquire where you really came from. This still happens now and then, but back in the 1980s, it was pervasive. It was tradition. The children of the first post-1965 immigrants were not yet grown, so the majority of Asian American adults you saw and encountered were immigrants outside of a few areas, such as Hawaii and portions of the West Coast. In 1980 1.7% of the people residing in the United States were Asian American. Today nearly 7% are Asian American.

This is having an impact. The winners of spelling bees and science fair winners don’t “look like America” anymore.

And this is the major reason why the cultural elite is very upset about the scrutiny which admissions processes at top universities have been receiving. Consider this op-ed in The New York Times, A Damaging Bid to Censor Applications at Harvard. It concludes:

As a leader in higher education, Harvard is trying to change this through its modest consideration of race in admissions. Its goal is to create a diverse community of students who can engage with and learn from people who are different, and carry those experiences with them beyond the university.

Expressions of racial identity are part of the fullness of our humanity. It’s not possible to be blind to race. Pretending as though it is ensures we will forever be divided.

The op-ed is pretty measured and not particularly shoddy as far as it goes. This is the sort of message that the editors and reporters at The New York Times want to amplify. Call it the anti-Bari Weiss effect.

The problem I have with Harvard and its academic and administrative overclass is that the media often allows them to engage in doublespeak without any comment, critique or dissent. Part of it is that institutions such as The New York Times are dominated by people from elite academic institutions, and so are part of the same broad culture, with a set of assumptions and interests, implicit and explicit, private and public. They’re all family.

For example, a few years ago the president of Harvard declared that the institution was all about inclusion. On the face of it that is just a bald-faced lie, and everyone knows it. Harvard is about exclusion, selection, and curation. “Inclusion” actually meant that there are certain views and backgrounds that Harvard is going to curate and encourage. Which is fine. But an institution which excludes >95% of those who apply for admission is by definition not inclusive and open.

The issue with Harvard is that it is an institution which is many things to many people. Harvard lets in the smart, talented, wealthy, and powerful, with various mixes of these elements. Asian Americans tend to be smart and talented in academic measures, but most of them are not “old money” in the United States, and even if they were there is a suspicion (perhaps fair, I don’t know) among many stewards of elite academic institutions that they don’t have the values which would result in large donations to those institutions. Harvard needs to take care of rich people, who tend to be white, and lucky, because it wants rich people to take care of Harvard. Luckily for the rich, they are not always so smart and diligent, but they are “well-rounded.” Their personalities have polish, and if that’s not there, perhaps a strategic donation can be made.

Harvard also smiles upon the scions of Third World dynasties. They may not be brilliant, but they are likely to impact the lives of hundreds of millions through their possible ascension to the pinnacle of power. Again, in clear doublespeak, Harvard mouths egalitarianism constantly but signals in its actions that it is realistic that power is passed down through blood. Harvard is in and of this world. It makes the world. And the world makes it.

Finally, Harvard educates the American ruling class. And it wants to continue to educate the American ruling class. As such, it is self-conscious of the fact that it, therefore, can’t have the demographic profile of Cal-Tech. Harvard doesn’t just want to incubate innovators, it wants to cultivate and train the administrators of the largesse that innovation allows.

The “diverse community of students” who are going to become elected officials is no doubt one reason that Harvard and other elite schools make recourse to racial and regional diversity metrics. If Harvard can be thought of as a finishing school for the elites attached to a hedge fund (its endowment), it needs to maintain some diversity in its portfolio of the future overclass. Legacies and the super-rich are important because these are lineages with a record of success within the overclass. The data is clear that innate cognitive aptitudes aside, children of privilege have a leg up. All things equal, and even not equal, it is rational to give bonus points to those who come from privilege if you want to maintain your own as an institution.

But, you also need to sample more of the parameter space. Some families do leave the elite, and others join it. The goal of an institution like Harvard is to admit and cultivate potential joiners. These are not always going to be children who win spelling bees and science fairs, and can attain every metric you might put in front of them. Political leaders of given communities tend to look like and come from those communities. Therefore, there is a need to maintain some level of racial and ethnic diversity if power, as opposed to academics,* is your number one focus.

What if Harvard began to let more Asian Americans in? Even though it is a private institution it would have some of the problems that Stuyvesant High School in New York is facing. Stuy is about 75% Asian American in a city that is 12% Asian American. The plain fact is that an elite public school supported by the city is probably not sustainable in the long-term if it does not reflect the demographics of the city. This is not an argument about whether it is just or not, but an observation of the dynamics of power and influence in a democratic system.

Harvard has to look somewhat like America visibly. The visibility part is important because it makes it salient. The reality is that Harvard undergraduates are highly atypical in their family background. The average student comes from a family in the top 20% of household income distribution. This distribution is probably multi-modal because Harvard’s endowment allows it to subsidize students of more modest means while still reserving spots for the extremely wealthy and privileged. Additionally, when you scratch beneath the surface the “visibility” can deceive. Harvard representation of black students is near the national proportion. But historically the majority of these have been from biracial or immigrant or Caribbean American households. In the 2000s it was estimated that one-third of Harvard black students represented 90% of black Americans who have four grandparents who were born and raised in the United States as black Americans.

But from what I can tell the issue of at last superficial visible identity is key, and substantive differences which are not externally salient less critical. The fact that the first black American president had a white mother and an African immigrant father has been noted, but over time it seems to be less and less important than the fact that he identified and was seen as a black American, despite his atypicality on so many substantive measures.

The problem though is that even though visibility matters, unanimity of viewpoint and opinion may cause problems in pumping the pipeline to power in a democratic republic where there is still a pluralism of views. Harvard undergraduates are very liberal and secular compared to the American public. Not that there’s anything wrong with that. But if you want to be the training ground for power, in a democratic republic where there are still differing views it is important that one expect those views and anticipate responses (though clearly a lot of politicians lie about their piety and ‘evolve’ in their ideology).

In particular, Christian white conservatives are far less well represented at Harvard than they are on a national level. Obviously, there is not anything wrong with that as such, but historically we’ve had white Christian conservatives (or people who identify and affiliate as such) in positions of power, and their exclusion from elite institutions might engender alienation and hostility from the very power that they exist to cultivate.

Of course, it could just be that white conservative Christians are not academically up to snuff. My previous inquiries do suggest there is a strong correlation between secularity and social liberalism and very high IQs. But, if you look at the GSS’s WORDSUM variable you see there are probably a reasonable number of intelligent white conservative Christians.

First, looking at the WORDSUM scores of non-Hispanic whites by ideology, you can see that liberals tend to be smarter than conservatives, and both are smarter than moderates. This is a pretty robust pattern. Intelligent people tend to have stronger and more strident views. Moderates are probably moderate in part because they aren’t as bright and so have weak opinions.

That being said, when you look at the distribution of ideologies by WORDSUM scores you get a different perspective. Though moderates are on average less intelligent, there are so many of them that for non-Hispanic whites they are still the most numerous in the 9-10 category (that is, they got one item wrong, or none wrong). And, there is balance between the number of conservatives and liberals. The average liberal is smarter, but the much larger number of white conservatives means that even in the brightest decile they attain parity.

Of course, the average Harvard student is not a top 10% performer, they’re a top 1% performer. And often not just academically, but in a variety of ways. They are selected for raw intelligence, but also high conscientiousness. Though the two are correlated, they are imperfectly so. Following James F. Crow’s expectation in regards to human inequality, when you select from the intersecting tails of multiple different distributions, the resulting student population is unlikely to be representative of the broader population.

Let’s wrap this up with some conclusions.

First, Harvard and the other Ivies will find a way to continue to cap the number Asian American students. I think the current lawsuit may win on the merits, but the “Deep Oligarchy” is more powerful than the judiciary or the executive branch. If, on the other hand, Harvard gets rid of legacies and special backdoor admissions, I’ll admit I was wrong, and the chosen have lost control of the system. As long as legacies and backdoor admissions continue, you know that the eyes are on the prize of power and glory. Capping the number of Asian American “grinds” would be a small price to pay then, and those who are allowed beyond the gates will be well-trained to sing the praises of Harvard’s policy (as they all do).

Second, the alienation of the successor to the “Eastern Establishment” from the large numbers of moderate and conservative whites will be a long-term problem in terms of the maintenance of its grip on power. Though this segment of the population is in decline, it is still large and substantial, and will wield power and influence out of proportion to its overall numbers for decades because they are older. They vote more, and they mobilize well. The rise to dominance of ideologies at campuses such as Harvard which pathologize the very persistence of these groups on the national scene will exacerbate the polarization and alienation. Though the individuals who run these institutions may bemoan this trend, because of the large numbers of students who are ideologically on the same page on this issue, they won’t be able to stop the march toward cultural radicalization.

Harvard has avoided the problem of Stuyvesant by maintaining visible diversity within its student body. But because it does not emphasize intra-racial ideological diversity, it will eventually run into its own Stuyvesant problem as it loses all legitimacy from large swaths of the body politic who see that racial identity does not entail ideological affinity and sympathy.

Addendum: This is a mildly obscure blog. And to be honest I’d rather write about science papers than this. But, I wanted to put this blog post up so that it’s out there, because mainstream publications seem to be intent on publishing a stream of what I perceive to be simplistic or disingenuous pieces.

The Left/liberal/progressive side engages in cant about “diversity”, when we all know they mean a very precise sort of diversity, and a very particular type of background when they talk about “background.” But the Right/conservative side’s emphasis on merit and colorblindness strikes me as consciously blind to the fact that these institutions were always about shaping and grooming the elite, and engaged in the game of reflecting and determining the American upper class. The Right/conservative project would abolish Harvard as we know it on a far deeper level than the Left/liberal/progressive posturing cultural radicalism, which at the end of the day has no problem bowing before neoliberal capital so long as lexical modifications are made.

If Asian Americans want to increasing their proportion at Harvard, they have to follow the Jewish strategy and join the socio-political elite. If they don’t do that, then the Asian quota will persist in some way.

* When I speak of students and “Harvard” I’m talking about the undergraduate level. The graduate and professional schools are somewhat different.

October 10, 2018

An “in-fill” framework for the expansion of peoples in Europe: beakers, beakers everywhere!

Filed under: Ancient DNA,Beaker people — Razib Khan @ 10:04 pm

In the 1970s A. J. Ammerman and L. L. Cavalli-Sforza argued for the validity of a model of Neolithic expansion of farmers into Europe predicated on a “demic diffusion” dynamic. This is in contrast to the idea that farming spread through the diffusion of ideas, not people. The formal theory is inspired by the Fisher wave model, but empirically just imagine two populations with very different carrying capacities due to their mode of production, farmers, and hunter-gatherers. In a Malthusian framework, the farmer carrying capacity in a given area of land might be ~10× greater than that of hunter-gatherers. Starting at the same initial population, the farmers will simply breed the hunter-gatherers out of existence.

As the farmers reaching their local carrying capacity, migration outward will occur in a continuous and diffusive process. For all practical purposes, the farmers will perceive the landscape occupied by hunter-gatherers as “empty.” This is due to the fact that hunter-gatherers often engage in extensive, not intensive, exploitation of resources. In contrast, even slash and burn agriculturalists leave a much bigger ecological footprint. They swarm over the land.

The beauty of the demic diffusion process is that that it’s analytically elegant and tractable. Families or villages engaged in primary production to “fill up” a landscape through simple cultural practices which manifest on the individual scale that allow for aggregate endogenous growth. And this model underlies much of the work by Peter Bellwood in First Farmers and Colin Renfrew’s theories about the spread of Indo-European langauges. You can call it the Walder Frey theory of history.

I didn’t really think deeply about this theory because I didn’t have much empirical knowledge until I read Lawrence Keeley’s War Before Civilization. In this book, Keeley observes that the archaeological record suggests that there was violent conflict between the first farmers and hunter-gatherers in northwestern Europe, near the North Sea. He reports that there seems to have been a broad front of conflict, presumably a prehistoric “no man’s land.” Not only that, but Keeley claims that the spread of agriculture stopped for a period. The barrier between hunter-gatherer occupation and farmer territory was not permeable. Not diffusion.

As a stylized fact, the demic diffusion framework treats all farmers as interchangeable and all hunter-gatherers as interchangeable. On the face of it, we know that this is wrong. But the assumption is that to a first approximation this axiom will allow us to capture the main features of the dynamics in question. This may be a false assumption. The fact is we know that some hunting and gathering populations can engage in intensive resource extraction and remain sedentary.

Intensive hunter-gatherers

The Pacific Northwest Indian tribes of the United States of America are the best-known examples of such hunting and gathering peoples. Because of the concentrated runs of salmon, these people could remain hunter-gatherers while maintaining relatively sedentary and dense societies characterized by social stratification (e.g., they practiced slavery). As it happens, it seems that it is on the maritime fringes of Northern Europe than the hunter-gatherers flourished the longest. Agriculture took ~1,000 years to transplant itself from northern Germany to southern Scandinavia, and even then hunter-gatherer lifestyles persisted in many locales for several thousand years until the Nordic Bronze Age (and in Finland even longer).

The flip side of the variation in intensity and density of hunter-gatherers is that the early farmers were probably less efficient and intensive than later agriculturalists. And, as the Anatolian farmers pushed into Northern Europe their cultural toolkit would be less and less effective. Even assuming local dynamics of reproductive increase as the primary driver for farmer expansion, the growth parameter of the agriculturalists in comparison to the hunter-gatherers may not have been that different in many contexts.

But the second major issue is that the assumption of continuous and diffusive expansion over wide areas is probably wrong. The early Neolithic farmers may have been stateless in a modern sense, but they were almost certainly not primitive anarchies. They were pre-state polities of some sort no doubt and exhibited coordination and cultural uniformity over large distances. An illustration of what might happen to small groups of farmers is what happened to white American homesteaders who occupied territory too close to the Comanche lands. Future archaeologists may see an empirical pattern of demic diffusion of white Americans from the east to the west, but that expansion occurred only within the scaffold of a political-military superstructure.

On a fundamental level demic diffusion, and the higher reproductive value over time of farmer peoples than hunter-gatherer peoples, are essential pieces of the puzzle of the peopling of Europe during the Holocene. But they need to be framed in the context of the discontinuous expansion of cultural zones of activity and freedom for farming communities, under the umbrella of some supra-village social and political order. This step by step expansion in a piecewise fashion probably explains the “hunter-gatherer resurgence” that David Reich’s lab has found in the temporal transects within a given region. Even if socially and politically dominant within a particular region, the farming communities likely targeted the richest and most suitable lands as predicted by classical economics. The hunter-gatherer populations likely persisted in more marginal areas and only assimilated with the dominant farmers over time. The invasion dynamics locally would exhibit patchiness in the early phases, allowing for hunter-gatherer persistence.

The fundamental lower-level dynamics are those of panmictic local populations expanding over time in a continuous fashion. These can be modeled by a few parameters. The problem is that the older idea that this could be generalized over time and space is surely wrong. Rather, inter-group dynamics probably govern a lot of the coarse-scale patterns we see. Over time farmer populations always won, but “on any given day” the outcome was always in doubt.

And so it was with agriculturalist conflicts as well. This is on my mind partly because I recently reread Genetic origins of the Minoans and Mycenaeans and The Beaker Phenomenon and the Genomic Transformation of Northwest Europe. There are lots of details within these papers that are easy to miss on first or second or even third read. For example, I noticed a sample dated to between 2200 and 1900 BC (so probably 2050 BC?) from Parma in northern Italy from a Bell Beaker cultural context which has a lot of steppe ancestry. Contemporaneous samples from Iberia seem spottier in their steppe ancestry, but that’s around when it shows up in that peninsula. Similarly, steppe ancestry arrived in Greece at some time after the Neolithic but before the Bronze Age collapse.

We know that the Beaker people arrived in Britain and Ireland rather suddenly ~2500 BC, even though the earliest evidence of the canonical beakers diagnostic for this culture are found in western Iberia in ~2900 BC. The Reich group concluded, rightly I suspect, that the cultural phenomenon of the Beaker people transcended a particular socio-cultural group bounded by kinship and genetic affinity. In other words, the Beaker culture was a set of peoples, in the plural.

And yet outside of Iberia and some Mediterranean locales, The Beaker Phenomenon and the Genomic Transformation of Northwest Europe makes it clear that a genetic disruption of the local demographics occurred when the society adopted the beaker. Whereas in Central and Eastern Europe Indo-European languages probably arrived with the Corded Ware people ~2900 BC, the Beaker come to our attention somewhat later, and in fact, pushed eastward into Corded Ware territory. Though the Beaker people seem to have been the vectors for steppe ancestry in many areas of Western Europe, they generally have less of it than the Corded Ware.

The Corded Ware frontier with non-Indo-European peoples to their west, south, and north, can be thought of as a cultural innovation zone. This is historically the trend, with frontier areas producing a vigorous and cohesive, yet often innovative, identity group that can mobilize resources and engages in expansion and domination. The Zhou and Chin states in China are examples of this, as is the ascendence of Roman Emperors from the trans-Danbunian region after 200 AD. It seems entirely possible then that the explosion of Indo-European Beaker people on the West-Central European frontier occurred through cultural synthesis and transmission from non-Indo-European Western Europe of the 3rd millennium, and once this society became cohesive it expanded outward aggressively.

In sum, while genetic processes are continuous and gradual, cultural processes are often discontinuous and may exhibit a phase of fluctuating change alternating stasis (perhaps modeled by a Poisson distribution of periods of expansion against the typical stationary background state?).

Addendum: The Slavic expansion in Eastern Europe and the Balkans fits with this model. Their success both demographically and culturally was due in large part to an ability to adapt to the regression of social complexity. Slavic societies were antifragile. They degraded well. In contrast, the Latin and Greek peasantry were more reliant for their existence and cultural continuity on the Roman state. With the collapse of the Balkan limes in the last quarter of the 6th century, the East Roman Empire lost total control of its Europeans interior communications, and Constantinople, Thessaloniki and the Peloponnese remained connected through maritime means through the Imperial navy’s total control of the Aegean.

And the Slavs were not an anarchic people. Though organized around small tribes, they existed under the hegemony of the Avars, and in multiple instances seem to have coalesced under the leadership of non-Slavic peoples who provided a leadership caste before these groups were culturally assimilated. Their demic diffusion through the Balkans was only enabled through the scaffold of an expansion pastoralist ascendancy in areas heretofore dominated by the Roman state.

October 9, 2018

Reflections on the biology of Homo calaquendi

Filed under: Elves,Fantasy,Tolkien — Razib Khan @ 7:37 pm


For a while now I’ve been really haunted by a question about the verisimilitude of J. R. R. Tolkien’s world-building: what are the long-term social and biological consequences of the fact that the Eldar, the elves, are immortal?

Consider the fact that the elves are long-lived, and not particularly fecund. Even when they are, inter-general patterns are spotty. Fëanor had seven sons, but only one grandson! Today we have “helicopter parents”, always worried about the safety of their offspring. How would an elvish society ever flourish if parents are terrified about the risk of their few offspring dying prematurely?

The fact that elves even go to war is indicative of a very strange and alien psychology. If you had the opportunity for everlasting life, would you risk it in battle? Are elves courageous? Or do they just have high time-preference?

But for me, the bigger question is the psychology of Galadriel. At 7,000 years old she is one of the oldest creatures in Middle Earth, along with Gandalf, Sauron, Cirdan, and Glorfindel. Assuming 100 years that’s 70 human lifetimes. J. R. R. Tolkien is quite clear about her physical appearance. She is quite tall, with silver-gold hair. But her head is not particularly large. So the question presents itself: how does her long-term memory allocation work? We know she has a human cranial capacity.

If salient and emotionally resonant memories connected to excitement in the hippocampus are the ones banked, does that mean that Galadriel’s mind is brimming with incredibly vivid recollections? Shouldn’t she be depressed in the present, because the present is going to be so dull compared to her glittering memories of Aman, and the beauty and elegance of the First Age civilization of the Eldar?

Additionally, it seems clear that the Eldar don’t suffer from cognitive decline in the same way as humans. Does that mean perhaps that Galadriel’s intuitive abilities would be suprahuman? Both humans and elves are children of Eru Ilúvatar. There is no evidence from the legendarium that elves are orders of magnitude more gifted than humans in “system 2” thinking, that is, rational reflection. But in their grace and acuity in matters of perception are curious. Could be it be a function of acquired “system 1” faculties, as opposed to what they were born with?

Perhaps the fey grace of the Eldar is not a matter of their natural abilities, but a function of developmental psychology? If the 10,000-hour rule is a thing, how about the 100-generation rule?

Finally, the elvish recourse to writing strikes me as peculiar in light of their immortality. They seem to be primary producers and foragers who don’t engage in much trade, so accounting is not highly valued in all likelihood. And writing does not confer the gains of the advantage of immortality to an already immortal species.

Note: For those readers who suggest that this post may mean that I never have sex, I already have three children. That’s more than most elves!

The post-neutral human genome (the Kern-Hahn era)

Filed under: Neutral Theory,Population genetics — Razib Khan @ 6:50 pm

If you have any background in evolutionary biology you are probably aware of the controversy around the neutral theory of molecular evolution. Fundamentally a theoretical framework, and instrumentally a null hypothesis, it came to the foreground in the 1970s just as empirical molecular data in evolutionary was becoming a thing.

At the same time that Motoo Kimura and colleagues were developing the formal mathematical framework for the neutral theory, empirical evolutionary geneticists were leveraging molecular biology to more directly assay natural allelic variation. In 1966 Richard Lewontin and John Hubby presented results which suggested far more variation than they had been expecting. Lewontin argued in the early 1970s that their data and the neutral model actually was a natural extension of the “classical” model of expected polymorphism as outlined by R. A. Fisher, as opposed to the “balance school” of Sewall Wright. In short, Lewontin proposed that the extent of polymorphism was too great to explain in the context of the dynamics of the balance school (e.g., segregation load and its impact on fitness), where numerous selective forces maintained variation. The classical school emphasized both strong selective sweeps on favored alleles and strong constraint against most new mutations.

And yet one might expect low levels of polymorphism from the classical school. The way in which the neutral framework was a more natural extension of this model is that even if most inter-specific variation, most substitutions across species, are due to selectively neutral variants, most variants could nevertheless be deleterious and so constrained. Alleles which increase in frequency may have done so through positive selection, or, just random drift. Not balancing forces like diversifying selection and overdominance.

The general argument around neutral theory generated much acrimony and spilled out from the borders of population genetics and molecular evolution to evolutionary biology writ large. Stephen Jay Gould, Simon Conway Morris, and Richard Dawkins, were all under the shadow of neutral theory in their meta-scientific spats about adaptation and contingency.

That was then, this is now. I’ve already stated that sometimes people overplay how much genomics has transformed our understanding of evolutionary biology. But in the arguments around neutral theory, I do think it has had a salubrious impact on the tone and quality of the discourse. Neutral theory and the great controversies flowered and flourished in an age where there was some empirical data to support everyone’s position. But there was never enough data to resolve the debates.

From where I stand, I think we’re moving beyond that phase in our intellectual history. To be frank, some of the older researchers who came up in the trenches when Kimura and his bête noire John Gillespie were engaged a scientific dispute which went beyond conventional collegiality seem to retain the scars of that era. But younger scientists are more sanguine, whatever their current position might be because they anticipate that the data will ultimately adjudicate, because there is so much of it.

With that historical context, consider a new paper, Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences:

Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.

This is not an entirely surprising result. Some researchers in human genetics have been arguing for the pervasiveness of background selection, selection against deleterious alleles which effects nearby regions, for nearly a decade. In contrast, there are others who argue selective sweeps driven by positive selection are important in determining variation. Unlike the 1970s and 1980s these researchers don’t evince much acrimony, in part because the data keeps coming, and ultimately they’ll probably converge on the same position. And, the results may differ by species or taxon.

If you want a less technical overview than the paper, Kelley Harris has an excellent comment accompanying it. If you want to know what I mean by the Kern-Han era, it’s a joke due to the publication of The Neutral Theory in Light of Natural Selection.

Finally, some of you might wonder about the implications for demographic inference which preoccupies me so much on this weblog. In the big picture, it probably won’t change a lot, but it will be important for the details. So this is a step forward. That being said, the possibility of variable mutation rates and recombination rates across time and between lineages are also probably quite important.

Open Thread – Brown Pundits

Filed under: Open Thread — Razib Khan @ 6:28 pm

Please keep the other posts on topic. Use this for talking about whatever you want to talk about.

How the Greeks came to be

Filed under: Greeks,History — Razib Khan @ 4:01 pm

Sing, Goddess, Achilles’ rage,
Black and murderous, that cost the Greeks
Incalculable pain, pitched countless souls
Of heroes into Hades’ dark,
And left their bodies to rot as feasts
For dogs and birds, as Zeus’ will was done.

Who are the Greeks? Where did they come from?

We have enough ancient DNA now to answer many of these questions. It seems that the largest component of Greek ancestry derives from the expansion of farmers out of Anatolia ~9,000 years ago. But at some point in the latter phases of prehistory, another wave of migrants pushed out from the east, with affinities to peoples as far away as Iran. And then during the Bronze Age, another pulse of migration arrived, likely correlated with the arrival of Greek-speaking peoples as such, the Mycenaeans. Finally, there is a fair amount of circumstantial evidence that the peregrinations of the pagan Slavs during Late Antiquity and the early Medieval period left their imprint on many Hellenes, in particular in the north of the country, around Salonika.

But that’s just genetics. What about culture? In terms of religion, Greek paganism is a composite. Zeus pater is clearly a standard Indo-European sky-god. Jupiter in Latin. Dyáuṣ Pitṛ́ for the ancient Aryans. In contrast, gods such as Athena seem to have synthetic, and at least partly pre-Indo-European origins. Finally, Dionysius was possibly an eastern import relatively late in prehistory.

Though the Greek language is definitely Indo-European, there are also extensive loanwords indicating an indigenous substrate. For example, words with the syllabic fragment nth, such as in Hyacinth, are likely native. The Greeks settled amongst peoples who had a long history of settled life, and had developed their own civilization.

The point is that it is probably not even wrong to say that the Greeks as we understand came from elsewhere, or, that they were indigenous. To be Greek probably emerged in the period after 2500 BC, as Indo-Europeans mixed with the local cultures, and created something new. Autochthonous.

October 8, 2018

Brown Pundits Browncast

Filed under: Podcast — Razib Khan @ 8:58 pm

So at some point the Brown Pundits “browncast” (as opposed to brown caste) is a go. I’m not going to submit to Itunes or Stitcher until we have a podcast recorded and up.

Open Thread, 10/8/2018

Filed under: Open Thread — Razib Khan @ 8:03 pm

Paul Romer won the Nobel. Not a big surprise. David Warsh’s Knowledge and the Wealth of Nations: A Story of Economic Discovery is pretty good. I recommend it. I would read it in concert with Hive Mind: How Your Nation’s IQ Matters So Much More Than Your Own and A Farewell to Alms: A Brief Economic History of the World (Warsh wrote a negative review of the second book and likely would not be a fan of the first).

Analyses of Neanderthal introgression suggest that Levantine and southern Arabian populations have a shared population history. Bigger Yemeni data set. Yemeni and Levantine populations seem quite similar….

As you may not know Google+ was finally given an explicit sunset schedule. Google tried twice to tackle Facebook but failed both times. But it turns out that Facebook may never have a successor. A centralized social-graph has weaknesses, and younger cohorts seem to be creating segmentation. Their parents are on Facebook, so they have a nominal Facebook account. But the real action is on other platforms.

Life on the Dirtiest Block in San Francisco. Having drinks with friends at the top of hotels and high rise condominium complexes makes you forget that far below the homeless have come out and taken over the night.

Why most narrative history is wrong. First, this seems to be more about ‘popular’ history today, and the mainstream of past history. One reason contemporary academic history is so boring for most people is that it resists grand narrative temptation.

With that being said, this is more of an indictment on modern journalism.

Quantifying how constraints limit the diversity of viable routes to adaptation.

A Simulation-Based Evaluation of Total-Evidence Dating Under the Fossilized Birth-Death Process.

Expanded Pre-Implantation Genomic Testing.

Fudged statistics on the Iraq War death toll are still circulating today. Do you remember this debate more than ten years ago? I do. The very assertion of these numbers distorted the discourse. This was just a prefiguring of the media landscape today. It’s mostly propaganda.

Phylogeny, ancestors and anagenesis in the hominin fossil record.

The genetic relationship between female reproductive traits and six psychiatric disorders.

In case my Twitter account gets deleted, remember you can subscribe to my RSS or follow my Facebook page.

October 6, 2018

The derived SNP that causes dry earwax was not found in all non-Africans

Filed under: earwax,Population genetics,rs17822931 — Razib Khan @ 11:26 am

A new paper on Chinese genomics using hundreds of thousands of low-coverage data from NIPT screenings is making some waves. I’ll probably talk about the paper at some point. But I want to highlight the frequency of rs17822931 in Han Chinese. It’s pretty incredible how high it is.

Because the derived variant SNP, which is correlated with dry flaky earwax when present in homozygote genotypes, is also associated with less body odor, it has been studied extensively by East Asian geneticists. Basically, individuals who are homozygote for the ancestral SNP, which is the norm in Europe, the Middle East, and Africa, tend to have more body odor, and in societies and contexts where this is offensive these people are subject to more ostracism in East Asia as they are a minority (some of the studies in Japan were motivated by conscripts who elicited complaints from their colleagues).

The relatively low frequency in Guangxi is to be expected. This province was Sinicized only recently. As in, the last 500 years. And it still retains a huge ethnic minority population, and many of the Han in the province likely have that ancestry. But the question still arises: why do the Han have such a high frequency of rs17822931?

Here’s a plot of frequencies:

 

But the ALFRED database has more details. Sardinians, Somalis, Ethiopian Jews, and Dani from the New Guinea highlands all have very low proportions or none of the derived variant. The Ethiopian Jews are about ~40% West Eurasian, due to Middle Eastern agriculturalist ancestry. Groups like the Masai also have Middle Eastern agriculturalist ancestry. I think the low frequencies of the derived variant in the Middle East are due to migration from eastern Eurasia in the relatively recent past. The frequencies of the derived variant in Europe probably came with the Ancestral North Eurasian ancestry of the steppe people. In South and Southeast Asia the frequencies are indicative of balancing selection, even if there is no such selection, while in the New World world the derived variant is at low, but appreciable frequencies.

As I mentioned in an earlier post, a 40,000 year old Siberian had the derived variant (heterozgote). I suspect the Basal Eurasians did not.

Likely male-mediated Indianization in Southeast Asia

Filed under: Indianization,Mainland Southeast Asia — Razib Khan @ 10:26 am
Pop     N R1a1
Cambodian     125 7%
Balinese     551 2%
Southern Han     166 0%
Northern Han     65 0%
Miao     25 0%
Hui     25 17%
Sala     43 21%
Bo’an     44 25%
Dongxiang     47 32%
Black, Michael L., et al. “Genetic ancestries in northwest Cambodia.”

In the comments below a reader has pointed out that there are Y and mtDNA results for Cham people.

This Austronesian group was once dominant in what was termed Annam by the French, the central regions of coastal Vietnam between the deltas flanking the northern and south (dominated by the Vietnamese and Khmer respectively). The Cham were a seafaring population and had extensive contacts with maritime Southeast Asian and the network of Austronesian peoples.

As such, the Cham were influenced by the currents of cultural change to their south, and as by the early modern era many had become Muslims. But a minority resident in Vietnam retained their Hindu religious identity, and this reflects a deep current of Indianization which took root among them in the centuries before 1000 AD. The boundary between ancient Champa and Đại Việt was also a civilizational boundary, between the elite culture of India and China.

The commenter states:

As far as I can see, this sample of Chams from Binh Thuan Province, Vietnam does not exhibit any clear South Asian influence in its mtDNA. This contrasts starkly with the significant (18.6% to 32.2%) South Asian influence that is apparent in the Y-DNA of the male subset of the same sample

This seems right. As you can see above I’ve found plenty of evident that R1a1a is found in Southeast Asia where it shouldn’t be. Notice that among northern groups in China R1a1a is pretty frequent too. Obviously from a different source, but the same general pattern. And in that case we have plenty of historical evidence of interaction with Indo-Europeans on the steppe.

I’m not very conversant in mtDNA. This paper argues that the Mon people of Thailand have some mtDNA affinities with India. I created this pivot table for readers to double-check (the “MO” populations are Mon).

The history of Southeast Asia, or perhaps more accurately the quasi-history of Southeast Asia since so many of the records are from China and elsewhere, indicates strong Indian influence in the period before 1000 AD. The standard model is that this is cultural diffusion. And by and large Southeast Asian peoples are are mostly indigenous. But, a non-trivial minority of their ancestry is recent, but pre-colonial, gene flow from the Indian subcontinent. Additionally, the imprint is easier to see in the Y chromosome than the mtDNA. The legends of marriages between Indian Brahmins and native princesses in places like Cambodia probably do reflect something real in the dynamics of the early Indianization.

Related: Indic Civilization Came To Southeast Asia Because Indian People Came To Southeast Asia. Lots Of Them.

The population genetic structure of China (through noninvasive prenatal testing)

Filed under: China genetics,Han genetics,Population genetics,Population genomics — Razib Khan @ 10:03 am


This week a big whole genome analysis of China was published in Cell, Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History. The abstract:

We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.

In The New York Times write-up there is an interesting detail, “This study served as proof-of-concept, he added. His team is moving forward on evaluating prenatal testing data from more than 3.5 million Chinese people.” So what he’s saying is that this study with >100,000 individuals is a “pilot study.” Let that sink in.

The PCA at the top of the post is a bit busy, so I want to highlight the salient aspect. These results confirm that 5-10% of the ancestry of the Hui, Chinese speaking Muslims, is West Eurasian. The Uygur and Kazakh are about ~40% on the left of the plot. The authors note that the Manchus overlapped almost perfectly with individuals sampled from Northern China. This is expected because by the end of the Ching dynasty most of the Manchus had been fully Sinicized, and in the 20th century fully assimilated. Recently due to an emphasis on “national minorities” and some privileges granted therein many people have identified as Manchu due to some ancestry who in all other ways simply northern Han (the Manchu language is moribund).

The sections on particular adaptations which vary by region are not surprising. In books like The Retreat of Elephants the slow, gradual, and inexorable expansion of the Chinese beyond the Yangzi basin is described in a way that makes it clear that southern diseases and climate were a major impediment. But through a process of acclimation, assimilation of local peoples, and adaptation, by 1000 AD the center of demographic gravity had shifted to the south.

There is a section of the text which I think will be falsified though:

After removing participants with 49bp read length and with sequencing error rate >0.00325, a principal component analysis of 45,387 self-reported Han Chinese from the 31 administrative divisions showed that the greatest differentiation of Han Chinese is along a latitudinal gradient (Figures S3E and S3F), consistent with previous studies (Chen et al., 2009, Xu et al., 2009). In contrast, there is, perhaps surprisingly, very little differentiation from East to West. This observation may be explained by the fact that a large proportion of the western Han populations in China are recent immigrants organized by the central government starting from 1949 when the People’s Republic of China was founded (Liang and White, 1996).

I don’t think there’s any need to make recourse to migration from 1949 and after. The argument in Guns, Germs, and Steel suffices: it’s just easier to move across latitudes than longitudes. The people of the north eat noodles made from wheat, and the people of the south eat rice. This is a big cultural transition for peasants to make, and so it didn’t happen as often as moving to the coast, or inland. We have documented instances of mass migrations from adjacent provinces due to famine and political instability. In the 17th century conflicts resulted in the depopulation of Sichuan and the arrival of large numbers of people from Hunan and Hubei to the east.

The plot below is one of the more interesting ones from the paper. From left to right, private alleles found in the HapMap Utah whites also found in all individuals in a given province, and then just Han, and then private alleles to ethnic Telugu Indians (from South India) found in all individuals in a given province, and then just Han.

Click to enlarge

The first thing to notice is that there is a correlation between the Han and non-Han. This shouldn’t be surprising. Plenty of ethnic groups have become Han through acculturation and become demographically absorbed. This is probably truer in parts of the south than in the north, but southern Chinese ethnic minorities are genetically and culturally much more like the Han in the first place.

Private alleles shared with Northern Europeans (CEU) almost certainly has to do with the interaction sphere of the steppe pastoralists, which extends from the Carpathians to Mongolia. The relatively high frequency of R1a, and to a lesser extent R1b, among many Turkic/Central Asian peoples is a pretty good sign of where this West Eurasian ancestry comes from.

The Indian affinity is perhaps more interesting. To be honest I was surprised at the high affinity in Yunnan and Hainan. Tibet has strong cultural connections to India through its form of Buddhism. But its interesting that Qinghai, where many Tibetans also live, does not have the affinity with India. What’s going on in the other provinces? I suspect that the aboriginal peoples assimilated by the Han and other groups in this region probably had some distant connections to the non-West Eurasian ancestry in South Asia.

October 4, 2018

Obscurantism in the service of transformation

Filed under: Cultural History,Culture,philosophy — Razib Khan @ 11:27 pm


The paper, Ancient Admixture in Human History, was peculiar as far as genetics publications go in that it foregrounds particular abstruse statistical methods developed due to the stimulus of genome-wide variation data. The surfeit of genomic data has resulted in the emergence of many subtle and almost impenetrable works laced with formalisms which daunt most biologists. But given time and effort, these newer methods relying upon greater analytic sophistication are decipherable.

To illustrate what I’m talking about, consider Mathematical Models of Social Evolution. This is a book with a fair amount of formality, but the topic, culture, social change, are often considerations which we ruminate upon verbally.

I open up to page 238 (I literally opened a random page).

…According to this approximation, the altruistic gene will increase whenever

    \[ \frac{g}{c} > \frac{2n}{\Omega} \]

In intrademic models in which groups are formed at random, \Omega = 1. In contrast, if groups were made up of full-sibs, \Omega = 2n. This provides a natural scale on which to judge the effectiveness of interdemic selection. If \Omega is near one, interdemic group selection is no more effective than intrademic group selection with random group formation, which is to say, it cannot lead to the evolution of strong altruism. If \Omega is large, then itnerdemic group selection is effective.

On first blush, the passage can seem impenetrable. But most of the people reading this are probably not intimidated by mathematical formalism. Many of you will know what intrademic and interdemic selection are. Some of you who are more numerically oriented may test some values to develop an intuition. The point is that the formalism is not there to intimidate. It is meant to illuminate. It is there so individuals thinking on the same problem can have a crisp currency with which they can exchange ideas.

Another major reason that this sort of formalism exists is that it’s clear when you think someone is wrong. A problem with many verbal arguments is that they are unspecified or vague in such a way that you’re not even sure if you disagree or agree with your interlocutor. The point is to get somewhere. Coherency. Contingency. And cumulativeness.

Applying a mathematical theory derived from evolutionary biology to cultural and social change strikes many people as strange. But there’s a method to this madness. Theory with data can give birth to a better understanding of the processes which define our world. A description of reality.

In contrast, let me quote Noam Chomsky:

“What you’re referring to is what’s called “theory.” And when I said I’m not interested in theory, what I meant is, I’m not interested in posturing–using fancy terms like polysyllables and pretending you have a theory when you have no theory whatsoever. So there’s no theory in any of this stuff, not in the sense of theory that anyone is familiar with in the sciences or any other serious field. Try to find in all of the work you mentioned some principles from which you can deduce conclusions, empirically testable propositions where it all goes beyond the level of something you can explain in five minutes to a twelve-year-old. See if you can find that when the fancy words are decoded. I can’t. So I’m not interested in that kind of posturing. Žižek is an extreme example of it. I don’t see anything to what he’s saying. Jacques Lacan I actually knew. I kind of liked him. We had meetings every once in awhile. But quite frankly I thought he was a total charlatan. He was just posturing for the television cameras in the way many Paris intellectuals do. Why this is influential, I haven’t the slightest idea. I don’t see anything there that should be influential.”

Now let me open up the Alan Bass translation of Jacques Derrida’s Writing and Difference. From the bottom of page 91:

Therefore, there is a soliloquy of reason and a solitude of light. Incapable of respecting the Being and meaning of the other, phenomenology and ontology would be philosophies of violence. Through them, the entire philosophical tradition, in its meaning and at bottom, would make common cause with oppression and with the totalitarianism of the same. The ancient clandestine friendship between light and power, the ancient complicity between theoretically objective and technico-political possession….

Obviously, I can’t read French. But if I was reading a scientific textbook a translation wouldn’t matter. To be entirely frank when I read these sorts of works in the deconstructionist tradition I feel like I’m reading mantras, not analyses. Declarations of gurus and rabbis. Great ones to emulate.

These authors often like to “play” with language, and engage in a semantic game and lead you on a verbal wild goose chase. Some of them are also better with a turn of phrase and able to generate luxurious prose which pulls you along in an almost novelistic fashion. But reading a second time, often I have no more idea what’s really being said than on the first inspection.

Twenty years ago this was an academic discussion. I had long believed that some of my friends’ fixations with linguistic analysis and redefinition as the summum bonum of any intellectual were silly and useless, but I didn’t think they’d have a direct impact. No longer. This stuff matters. My friends are now tenured professors.

From Judith Butler’s 1988’s Performative Acts and Gender Constitution: An Essay in Phenomenology and Feminist Theory, in Theater Journal:

When Beauvoir claims that ‘woman’ is a historical idea and not a natural fact, she clearly underscores the distinction between sex, as biological facticity, and gender, as the cultural interpretation or signification of that facticity. To be female is, according to that distinction, a facticity which has no meaning, but to be a woman is to have become a woman, to compel the body to conform to an historical idea of ‘woman,’ to induce the body to become a cultural sign, to materialize oneself in obedience to an historically delimited possibility, and to do this as a sustained and repeated corporeal project….

Strip away the lexical obfuscation, but much of this is now taught in biology courses. Whether you agree with it or not is besides the point. This stuff is not just academic.

Chinese and Indian American population genetic structure

Filed under: Population genetics — Razib Khan @ 3:14 pm

In Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past David Reich makes the observation that India is a nation of many different ethnicities, while China is dominated by a single ethnicity, the Han. This is obviously true, more or less. Even today the vast majority of Indians seem to be marrying with their own communities, jati.

Over the years I’ve collected many different genotypes of Americans of various origins who have purchased personal genomics kits, and given me their raw results. I decided to go through my collection and strip detailed ethnic labels and simply group together all those individuals from India, and China, who have had their genotypes done from one of the major services.

I suspect that these individuals are representative of “Indian Americans” and “Chinese Americans.” So what’s their genetic structure?

Here’s the variation of the consumers:

Merging with 1000 Genomes and some HGDP samples, here’s what I get:

Let’s zoom in on the Chinese:

Finally, ADMIXTURE:

In case you don’t know, the American Chinese community has been historically biased toward being mostly Cantonese. More recently, there have been migrants from Fujian. The Indian American distribution should be self-evident.

October 3, 2018

Are there an other readers from 2002?

Filed under: Open Thread — Razib Khan @ 10:44 pm

As some of you know, this blog started in early June of 2002. I just noticed that two people left comments who date from the summer of 2002…which means I have people here who have been reading what I’ve been writing for 70% of my adult life. I know people drop in and out, but are there any others?

Just curious.

Nomads, cosmopolitan predators, and peasants, xenophobic producers

Ten years ago when I read Peter Heather’s Empires and Barbarians, its thesis that the migrations and conquests of the post-Roman period were at least in part folk wanderings, where men, women, and children swarmed into the collapsing Empire en masse, was somewhat edgy. Today Heather’s model has to a large extent been validated. The recent paper on the Lombard migration, the discovery that the Lombards were indeed by and large genetically coherent as a transplanted German tribe in Pannonia and later northern Italy, confirms the older views which Heather attempted to resurrect. Additionally, the Lombards also seem to have been defined by a dominant group of elite male lineages.

Why is this even surprising? Because to a great extent, the ethnic and tribal character of the post-Roman power transfer between Late Antique elites and the newcomers was diminished and dismissed for decades. I can still remember the moment in 2010 when I was browsing books on Late Antiquity at Foyles in London and opened a page on a monograph devoted to the society of the Vandal kingdom in North Africa. The author explained that though the Vandals were defined by a particular set of cultural codes and mores, they were to a great extent an ad hoc group of mercenaries and refugees, whose ethnic identity emerged de novo on the post-Roman landscape.

In the next few years, we will probably get Vandal DNA from North Africa. I predict that they will be notably German (though with admixture, especially as time progresses). Additionally, I predict most of the males will be haplogroup R1b or I1. But the Vandal kingdom was actually one where there was a secondary group of barbarians: the Alans. It was Regnum Vandalorum et Alanorum. I predict that Alan males will be R1a. In particular, R1a1a-z93.

But this post is not about the post-Roman world. Rather, it’s about the Inner Asian forest steppe. The sea of grass, stretching from the Altai to the Carpathians. A new paper in Science adds more samples to the story of the Sbruna, Cimmerians, Scythians, and Sarmatians. Ancient genomes suggest the eastern Pontic-Caspian steppe as the source of western Iron Age nomads. The abstract is weirdly nonspecific, though accurate:

For millennia, the Pontic-Caspian steppe was a connector between the Eurasian steppe and Europe. In this scene, multidirectional and sequential movements of different populations may have occurred, including those of the Eurasian steppe nomads. We sequenced 35 genomes (low to medium coverage) of Bronze Age individuals (Srubnaya-Alakulskaya) and Iron Age nomads (Cimmerians, Scythians, and Sarmatians) that represent four distinct cultural entities corresponding to the chronological sequence of cultural complexes in the region. Our results suggest that, despite genetic links among these peoples, no group can be considered a direct ancestor of the subsequent group. The nomadic populations were heterogeneous and carried genetic affinities with populations from several other regions including the Far East and the southern Urals. We found evidence of a stable shared genetic signature, making the eastern Pontic-Caspian steppe a likely source of western nomadic groups.

The German groups which invaded the Western Roman Empire were agropastoralists. That is, they were slash and burn farmers who raised livestock. Though they were mobile, they were not nomads of the open steppe. Man for man the Germans of Late Antiquity had more skills applicable to the military life than the Roman peasant. This explains in part their representation in the Roman armed forces in large numbers starting in the 3rd century. But the people of the steppe, pure nomads, were even more fearsome. Ask the Goths about the Huns.

Whole German tribes, like the Cimbri, might coordinate for a singular migration for new territory, but for the exclusive pastoralist, their whole existence was migration. Groups such as the Goths and Vandals might settle down, and become primary producers again, but pure pastoralists probably required some natural level of predation and extortion upon settled peoples to obtain a lifestyle beyond marginal subsistence. Which is to say that some of the characterizations of Late Antique barbarians as ad hoc configurations might apply more to steppe hordes.

There has been enough work on these populations over the past few years to admit that various groups have different genetic characteristics, indicative of a somewhat delimited breeding population. But, invariably there are outliers here and there, and indications of periodic reversals of migration and interactions with populations from other parts of Eurasia.

Earlier I noted that Heather seems to have been correct that the barbarian invasions of the Roman Empire were events that involved the migration of women and children, as well as men. The steppe was probably a bit different. Here are the Y and mtDNA results for males from these data that are new to this paper:

Culture MtDNA Haplogroup Y Haplogroup
Late Sarmatian U5b2b R1b1a1a2?
Scythian U5a2a1 R1b1a1a2?
Late Sarmatian D4q R1b1a1a2
Scythian J2b1a6 R1b1a1a2
Scythian U5a1a1 R1b1a1a2
Scythian U5b2a3 R1b1a1a2
Scythian U4* R1b1a1a2
Scythian U5a2b R1b1a1a2
Cimmerian H9a R1b1a
Srubno-alakulskaya T2a1 R1a1a1?
Srubno-alakulskaya J1c3a R1a1a1
Srubno-alakulskaya H R1a1a1
Srubno-alakulskaya HV0a R1a1a1
Srubno-alakulskaya U5a1 R1a1a1
Srubno-alakulskaya HV0a R1a1a1
Late Sarmatian T1a1 R1a1a
Cimmerian C5c (50%) Q1a1

I’m assuming you aren’t surprised. These steppe tribes seem to be defined by extended paternal lineage networks. The Sbruna people are R1a1a1, as is dominant in Eastern Europe today. But, an ancient Sbruna male dating to 1800 BC was found to have the Asian variant of R1a1a1, found in South and Central Asia, not the one predominant among Slavic peoples.

Click to enlarge

Speaking of South Asians, there is some interesting discussion on this issue in the paper. I’ll quote a few sections:

The Bronze Age Srubnaya-Alakulskaya individuals from Kazburun 1/Muradym 8 presented genetic similarities to the previously published Srubnaya individuals. However, in f4 statistics, they shared more drift with representatives of the Andronovo and Afanasievo populations compared to the published Srubnaya individuals. Those apparently West Eurasian people lacked significant Siberian components (NEA and SEA) in ADMIXTURE analyses but carried traces of the SA component that could represent an earlier connection to ancient Bactria. The presence of an SA component (as well as finding of metals imported from Tien Shan Mountains in Muradym 8) could therefore reflect a connection to the complex networks of the nomadic transmigration patterns characteristic of seasonal steppe population movements….

There are two ways, not exclusive, that I can explain the “South Asian” component you find in some of the steppe individuals. First, the “South Asian” component is found in the Neolithic Iranian sample. And, you can see in another plot that the Scythians are enriched for West Asian ancestry in comparison to the Sbruna. As noted above there was probably south to north migration of these Indo-European nomadic groups. So yes, just as with the East Asia ancestry which periodically appears, this is evidence of an “Inner Asian International.”

A second possibility though is that the South Asian ancestry is artifactual and that it’s just emerging in ADMIXTURE because of shared ancestry between the Sbruna and South Asians because of gene flow from the steppe into South Asia (and since South Asians have “Iranian farmer” ancestry it also pops up in the Iranian Neolithich sample).

The Sbruna flourished between the 18th and 12th centuries BC. According to Wikipedia:

Philological and linguistic evidence indicates that the bulk of the Rigveda Samhita was composed in the northwestern region of the Indian subcontinent, most likely between c. 1500 and 1200 BC.

Mitannia Indo-Aryan is attested in Syria in 1380 BC.

In the centuries around 1500 BC it seems quite possible that there was a “Indo-Aryan Inner Asian International”, just as in the first millennium AD there emerged a Turkic International, and for more than a century after 1200 AD there was a Mongol International. In the north, the Indo-Aryans were absorbed by Iranian and Uralic peoples. In West Asia they didn’t have a major cultural impact, aside from introducing chariots. It is in India by happenstance that Indo-Aryan linguistic culture and aspects of their folk memory is preserved to this day.

This isn’t that amazing. Half of the speakers of Turkic langauges are ethnic Turks, who live in Turkey. Anatolia genetically isn’t really very East Asian, though there is some of that. But the cultural heritage of the ancient Turks remains stronger there than in areas anciently inhabited by Turks, such as western Mongolia (where the people are genetically more like the original Turks were in the first millennium AD).

What’s the upshot here? I think that there is a spectrum of passivity and xenophobia in the modes of production outlined above. Sedentary peasant peoples are the most conservative and xenophobic.  They are also the least warlike because their skill set is the least transferable to warfare. They specialize in production, not extortion.

Pure nomads are the least xenophobic and most open to various forms of cultural innovation. The Mongol horde rapidly expanded in the decades of Genghis Khan’s rule through assimilation of various Turkic and Tungusic peoples. Though Genghis Khan put his sons by his first wife Borte in all the major positions, competent individuals outside of his own family line were elevated to power and authority. We have enough evidence now that these social dynamics are also strongly driven by the reality of migrating males, who marry a variety of conquered peoples.

Though Mongols were religiously tolerant and relatively accepting of ethnic diversity so long as subordinate peoples did not rebel, they were fundamentally an extortive order where organized mass violence was always the weapon of first resort. They were almost certainly not atypical, but continuing an Inner Asian tradition which probably dates to the Bronze Age, and matured 1,000 years later with groups like the Scythians.

Agropastoralists, such as the people of Nothern Europe during antiquity, were probably somewhere in between peasants and nomads. Not as xenophobic as peasants, but definitely more inward looking than the steppe nomads.

October 2, 2018

Almost no one is a genetic determinist except in your Communist imagination

Filed under: Behavior Genetics — Razib Khan @ 11:08 pm

Next summer I’m going to be giving a talk at the ISIR meeting. I’m a little bemused about this since, to be honest, I don’t talk much about behavior genetics and intelligence anymore.

Until August of 1998, I had rather conventional views for someone of my education and social background on psychometrics. Then I read Chris Chabris’ article in Commentary. From that, I began to conclude the “orthodoxy” that was presented in the elite media really wasn’t representative of what was going on in the field of psychometrics. It’s kind of like thinking that you get a balanced view of the Arab-Israeli conflict from reading Commentary.

Over the next few years, I read some books, review papers, and updated my views. Every few years I read a book or checked out a paper to see if anything had changed…and usually not to my eye as someone who is not in the field. About a decade ago I read What Is Intelligence?: Beyond the Flynn Effect. More recently I read Stuart Ritchie’s Intelligence and Richard Haier’s The Neuroscience of Intelligence. And other things here and there.

I’ll be reviewing Blueprint: How DNA Makes Us Who We Are, but I do wonder if it’s nothing more than an incremental improvement upon The Nurture Assumption: Why Children Turn Out the Way They Do.

Incrementalism isn’t a problem. I am a big fan of genomics. But its impact has been variable. And frankly in some fields less than you might think. I don’t believe it has changed our understanding of evolutionary process qualitatively (rather, it has allowed a finer-grained resolution to certain arguments around particular hypotheses). Educational attainment 3 is great. But does it change how heritable I think intelligence is in a qualitative sense? Not really. We already knew it was a heritable trait, and we’ve known it for a long time.

Contrast this with the advances in the field of ancient DNA and consumer genomics, two other interests of mine. These two fields did not really exist when this blog was founded in 2002 (I am aware some ancient DNA work was done earlier, but a few publications of dubious validity does not a field make). Ancient DNA has revolutionized our understanding of human demography in the past since 2010. Personal genomics is now a thing. It wasn’t before the middle 2000s and very much niche before 2010. The first human genome cost $3 billion dollars. Now you can get one for less than $1,000 as a consumer.

In behavior genetics, in deep ways, I’m not sure we’re much beyond where we were 20 years ago. Researchers have confirmed the suspicion by many that behaviors are polygenic quantitative traits whose variation is due to the additive effect of lots of genes of small effect. There were some people who argued that the genomic “missing heritability” meant that these traits weren’t heritable at all. But most people never accepted this frankly stupid view.

Where does this leave us? I recently expressed my frustration that we continue for decades to have the same debate about “genetic determinism” that we have had for decades. Nothing ever changes. It’s always the same. Researchers who work in the field emphasize the importance of gene-environment interactions, norms of reaction, and the complex nature of these traits. Or, the modest heritability of the traits in question. They are so focused on these nuances that interesting facts such as the high fraction of nonshared environment get lost in the muddle.

Consider obesity. It’s actually a moderate to highly heritable trait. But traveling internationally, or looking at pictures from the past, make it clear that there isn’t a blueprint for your final weight. Different people have different propensities based on common environments. Yes. But to say your weight is “determined” by your environment or your genes is kind of weird and “not even wrong.” It’s complicated. And yet less than 10% of dieters keep the weight off. There is something that feels inevitable, determined, about this, but it may not be genetic.

These are knotty issues that need unpacking. But having to portray yourself as a non-evil person who doesn’t think that genes are the one key to rule them all means that time is wasted on ass-covering that could be allocated to education.

Outside of media presentations to knock-down a strawman literally no one is a genetic determinist. Similarly, aside from Ash Sarkar and George Ciccariello-Maher almost no one called a Communist is literally a Communist. Most people called Nazis aren’t really Nazis. And so forth.

Of course, people can redefine things however they want. One of my readers in a fit of stupidity declared that they were an environmental determinist for thinking religious fervor was ~0% heritable and I was a genetic determinist for thinking it was ~50% heritable. This was such an act of blatant and sincere stupidity that it’s been seared in my memory 15 years later. Yes, people are that stupid when they haven’t thought deeply about something. Talk to them about anything. Awe-inspiringly stupid. It’s all of our super powers in the right moment and time.

Writing here in 2018, and thinking back to 2002, it doesn’t look good to me. Researchers have more results, more interesting findings. But they are having to get ahead of the same stupid charges, aspersions, and implications. Graduate schools are now dropping the GRE, in part because of an N = 29 study which confirms their prior beliefs. We learned a lot from the replication crisis, didn’t we?

The media doesn’t want to write-up stories about the complex inscrutability of causality. The conditional expression of genetic effects on other parameters. The public is too stupid to think in a statistical manner because they were assigned trigonometry in secondary school. You’re either with them. Or against them. And your enemies and they exist, take your instinct to play fair and address plausible concerns to kill you with a thousand small cuts of obfuscation and misrepresentation. Don’t blame them. They are the scorpion. To sting is their way.

(this won’t be the general thrust of my talk at ISIR!)

Open Thread – Brown Pundits

Filed under: Open Thread — Razib Khan @ 12:23 pm

Please keep the other posts on topic. Use this for talking about whatever you want to talk about.

How related should you expect relatives to be?

Filed under: Population genetics — Razib Khan @ 12:44 am

Like many Americans in the year 2018 I’ve got a whole pedigree plugged into personal genomic services. I’m talking from grandchild to grandparent to great-aunt/uncles. A non-trivial pedigree. So we as a family look closely at these patterns, and we’re not surprised at this point to see really high correlations in some cases compared to what you’d expect (or low).

This means that you can see empirically the variation between relatives of the same nominal degree of separation from a person of interest. For example, each of my children’s’ grandparents contributes 25% of their autosomal genome without any prior information. But I actually know the variation of contribution empirically. For example, my father is enriched in my daughter. My mother is my sons.

The sample principle applies to siblings. Though they should be 50% related on their autosomal genome, it turns out there is variation. I’ve seen some papers large data sets (e.g., 20,000 sibling pairs) which gives a standard deviation of 3.7% in relatedness. But what about other degrees of relation?

I didn’t find empirical data on that (imagine assembling a dataset with large numbers of known third cousins…perhaps in Iceland), but I did find this paper, Variation in actual relationship as a consequence of Mendelian sampling and linkage that was useful. The authors modeled the expectation and variance (and so standard deviation) of identity by descent, genomic relatedness. One of their models gives 3.84% standard deviation for siblings, so that seems pretty close to the empirical mark. Here is a table I put together from a subset of their results:

Relationship Relatedness Standard Dev
Parent-child 0.5 0
Full sibling 0.5 0.0384
Grandparent-Grandchildren 0.25 0.0251
Uncle-Aunt/Nephew-Niece 0.25 0.0251
Cousin 0.125 0.0241
2nd Cousin 0.0312 0.0117
3rd Cousin 0·0078 0·0054

The distribution of relatedness among siblings seems about normal. So there are individuals who are less than 40% related to their “full-sibling” while others are more than 60% related. Notice that when it comes to third cousins the variation in expected relatedness is in the same range as expected relatedness. Some “3rd cousins” won’t share any genomic relatedness as defined by identity by descent from recent ancestors.

Related: How much of your genome do you inherit from a particular grandparent?

Older Posts »

Powered by WordPress