Razib Khan One-stop-shopping for all of my content

October 27, 2018

Laws of engineering are meant to be broken

Filed under: Genomics — Razib Khan @ 8:54 pm

A reader pointed out a very interesting passage in Richard Dawkins’ The Greatest Show on Earth: The Evidence for Evolution on the future possibilities of genome sequencing. Since the book was published in the middle of 2009, it is quite possible the passage was written in 2008, or even earlier.

Unfortunately for Dawkins’ prognostication track-record, but fortunately for science, he was writing at the worst time to make a prediction:

…the doubling time [data produced for a given fixed input] is a bit more than two years, where the Moore’s Law doubling time is a bit less than two years. DNA technology is intensely dependent on computers, so it’s a good guess that Hodgkin’s Law is at least partly dependent on Moore’s Law. The arrows on the right indicate the genome sizes of various creatures. If you follow the arrow towards the left until it hits the sloping line of Hodgkin’s Law, you can read off an estimate of when it will be possible to sequence a gnome the same size as the creature concerned for only £1,000 (of today’s money). For the genome the size of yeast’s, we need to wait only till about 2020. For a new mammal genome…the estimated date is just this side of 2040

Obsolete plot from The Greatest Show on Earth

The cost for a sequence here is somewhat fuzzy. The first assembly of a genome sequence of an organism is much more difficult than subsequent alignments of later organisms (though more in computation than in the sequencing). But, the upshot is that Dawkins was writing when “Hodgkin’s Law” was collapsing. From 2008 to 2011 Moore’s Law was destroyed by the sequencing revolution pushed forward by Illumina.

Though you can get a $1,000 consumer human sequence today, the reality is that this is for 30× coverage. For lower coverage, which means you aren’t as sure of the validity of any given variant, the price drops rapidly. And for the type of evolutionary questions Dawkins is interested in, the coverage needed is far lower than 30× (you probably want to get a larger number of samples than a single high-quality sample).

October 24, 2018

The crash of the cost of genome sequencing

Filed under: Genomics — Razib Khan @ 10:24 am

It’s been a wild 10 years. There’s a reason that data compression companies are a big thing in genomics now.

October 23, 2018

Reflections on ASHG Meeting 2018

Filed under: ASHG,Genetics,Genomics,Illumina — Razib Khan @ 10:17 pm

Another meeting of the American Society of Human Genetics has come and gone. I’ve been going since 2012, and so want to post some observations of how things have changed. This is a big conference. From less than 1,000 people in the late 1970s to nearly 10,000 today.

First, more genomics, less genetics.

The meeting dates to the late 1940s, and originally focused on the classical genetic analysis of human characteristics. Consider the pedigree one might find in a medical text.

Over the past generation more and more of the presentations and posters focus on genomics, surveys of the whole totality of our DNA sequence. This is where medicine and human genetics more generally is moving in any case.

Vendors such as Illumina loom large, but the firehose of data is so powerful that compression companies also arrive at ASHG. In other words, ASHG is a combination of a science, medical, and tech, conference.

Second, a major shift in focus outside of traditional European study populations.

ASHG foregrounded the focus on Africa and other non-European regions to highlight the importance of the capturing of global genetic variation. A fair number of presentations and posters were on this topic, as well as a series of plenary talks.

One thing I’ve noticed is that many talks and posters now present data and results which have been posted as preprints. In past years a lot of novel and new results were first presented at the conference, but now the meetings seem to be more like a halfway point between posting the preprint and the publication of the final paper. This means that networking and career development have become as important as the science itself.

Probably the most notable result that hasn’t been posted as a preprint was the first robust signals of association between genetic variations and homosexual orientation in men. Though there is a history of these reports, this one is clearly a case where the authors went through all the statistical checks to make sure these are true hits. Some in the audience reacted negatively, but the research group was really careful.

Exciting times in the world of genetics and genomics. Very excited for what 2019 brings.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


Reflections on ASHG Meeting 2018 was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

September 18, 2018

On the whole genomics will not be individually transformative…for now

Filed under: Crispr,Genomics,Personal Genetics,Personal Genome,Personal genomics — Razib Khan @ 4:51 pm

A new piece in The Guardian, ‘Your father’s not your father’: when DNA tests reveal more than you bargained for, is one of the two major genres in writings on personal genomics in the media right now (there are exceptions). First, there is the genre where genetics doesn’t do anything for you. It’s a waste of money! Second, there is the genre where genetics rocks our whole world, and it’s dangerous to one’s own self-identity. And so on. Basically, the two optimum peaks in this field of journalism are between banal and sinister.

In response to this I stated that for most people personal genomics will probably have an impact somewhere in the middle. To be fair, someone reading the headline of the comment I co-authored in Genome Biology, Consumer genomics will change your life, whether you get tested or not, may wonder as the seeming contradiction.

But it’s not really there. On the aggrgate social level genomics is going to have a non-trivial impact on health and lifestyle. This is a large proportion of our GDP. So it’s “kind of a big deal” in that sense. But, for many individuals the outcomes will be quite modest. For a small minority of individuals there will be real and important medical consequences. In these cases the outcomes are a big deal. But for most people genetic dispositions and risks are diffuse, of modest effect, and often backloaded in one’s life. Even though it will impact most of society in the near future, it’s touch will be gentle.

An analogy here can be made with BMI, or body-mass-index. As an individual predictor and statistic it leaves a lot to be desired. But, for public health scientists and officials aggregate BMI distributions are critical to get a sense of the landscape.

Finally, this is focusing on genomics where we read the sequence (or get back genotype results). The next stage that might really be game-changing is the write revolution. CRISPR genetic engineering. In the 2020s I assume that CRISPR applications will mostly be in critical health contexts (e.g., “fixing” Mendelian diseases), or in non-human contexts (e.g., agricultural genetics). Like genomics the ubiquity of genetic engineering will be kind of a big deal economically in the aggregate, but it won’t be a big deal for individuals.

If you are a transhumanist or whatever they call themselves now, one can imagine a scenario where a large portion of the population starts “re-writing” themselves. That would be both a huge aggregate and individual impact. But we’re a long way from that….

September 14, 2018

Sequence them all and let God sort it out!

Filed under: Genomics — Razib Khan @ 11:14 am

Researchers reboot ambitious effort to sequence all vertebrate genomes, but challenges loom:

In a bid to garner more visibility and support, researchers eager to sequence the genomes of all vertebrates today officially launched the Vertebrate Genomes Project (VGP), releasing 15 very high quality genomes of 14 species. But the group remains far short of raising the funds it will need to document the genomes of the estimated 66,000 vertebrates living on Earth.

The project, which has been underway for 3 years, is a revamp and renaming of an effort begun in 2009 called the Genome 10K Project (G10K), which aimed to decipher the genomes of 10,000 vertebrates. G10K produced about 100 genomes, but they were not very detailed, in part because of the cost of sequencing. Now, however, the cost of high-quality sequencing has dropped to less than $15,000 per billion DNA bases…

Funding remains an obstacle. To date, the VGP has raised $2.5 million of the $6 million needed to sequence a representative species from each of the 260 major branches of the vertebrate family tree. To reach the goal of all 66,000 vertebrates will require about $600 million, Jarvis says.

Though a lot of the details are different (sequencing vs. genotyping, vertebrates vs. humans), many of the general issues that David Mittelman and I brought up in our Genome Biology comment, Consumer genomics will change your life, whether you get tested or not, apply. That is, to some extent this is an area of science where technology and economics are just as important as science in driving progress.

I remember back in graduate school that people were talking about sequencing hundreds of vertebrates. But even in the few years since then, the landscape has shifted. I’m so little a biologist that I actually didn’t know there were only ~66,000 vertebrate species!

And yet this brings up a reasonable question from many scientists who came up in an era of more data scarcity: what are the questions we’re trying to answer here?

Science involves people. It’s not an abstraction. Throwing a whole lot of data out there does not mean that someone will be there to analyze it, or, that we’ll get interesting insights. To be frank, the original Human Genom Project project should probably tell us that, as its short-term benefits were clearly oversold.

In relation to how cheap data storage is and the declining price point of sequencing, I think my assertion that a genome, a sequence, is not a depreciating asset still holds. There is the initial cost of sequencing and assembling and the long term cost of storage, but these are small potatoes. The bigger considerations are the salaries of scientific labor and the opportunity costs. Sequencing tens of thousands of genomes may not get us anywhere, but really we’re not going to lose that much.

Ultimately I side with those who believe that the existence of the data itself will change the landscape of possible questions being asked, and therefore generate novel science. But it’s pretty incredible to even be debating this issue in 2018 of sequencing all vertebrates. That’s something to reflect on.

May 9, 2018

The “X” in the sex chromosome

Filed under: Genetics,Genomics,mothers-day,science — Razib Khan @ 3:48 pm

There are ~3 billion base pairs in the human genome. Of that ~5% are in the X chromosome. The X is fully functional, unlike the famously hamstrung Y. It harbors one of the longest genes in the human genome, DMD, at 2,300,000 base pairs. In contrast, the human Y chromosome only has 72 protein coding genes! (it’s perhaps no surprise that, aside from sex determination, many of these genes are involved in things such as spermatogenesis)

And yet it is the Y chromosome which gets full treatment in popular science books. Like the C student who receives praise for a B-, the Y chromosome is given high marks simply for doing a few things here and there, most especially its role in driving the emergence of biological males. But the reality is that males would not be viable if it wasn’t for the X.

Can you see that it says 74?

Because the Y chromosome is so handicapped, filled with repetitive “junk DNA,” the heavy-lifting is shifted onto the single X that males carry. Though the Y is what makes males male, the X is what keeps males alive.

Anyone familiar with sex-linked characteristics knows this. Red-green color blindness is found 8 percent of human males and 0.6 percent of human females. Many more women are carriers of color blindness than who are color blind themselves.

The genes responsible for detection of some colors are found on the X chromosome, and are subject to high mutation rates. If a female has a broken copy she usually has a fallback in a functional second copy. She’s a carrier. In contrast, because males have only one X chromosome (inherited from their mother), they don’t have a backup. If a color-vision gene on the X chromosome is broken, then they’re out of luck when it comes to perceiving the full vibrancy of the world.

In other words, the male X chromosome does not possess recessive traits. All traits express due to the state of the single copy of the gene determining the trait. Every mutation on the X chromosome can potentially produce a mutant that will be exposed to natural selection.

Neanderthal-modern human hybrid

This results in some interesting evolutionary quirks when it comes to how natural selection shapes the genome and drives adaptation within populations and speciation between them. Crosses between different species can leave hybrids infertile. In mammals this often happens in males because mutations on the X chromosome can interfere with proper reproductive development. Selection against the genes of other species then happens because males can’t produce offspring.

Studies of Neanderthal admixture confirm this — there is far less Neanderthal ancestry on the X chromosome than across the rest of the genome. There is strong selection against Neanderthal variants in males, because these genes work less well with the rest of the modern human genome.

A wife of Genghis Khan

But the X chromosome is not distinctive just in terms of just natural selection. As two out of three X chromosomes in any population are found in females, its genetic history will be biased toward that sex. Differences between the X chromosome and the non-sex genome can tell us differences in the histories of men and women.

For example historically many more of the female ancestors of admixed people of the New World tended to be non-European, whether it was indigenous or African. As such, the genetic profile of the X chromosome in terms of similarity to worldwide variation would be different from the non-sex chromosomes, because those come equally from the father and mother. This is exactly what we see. There is less European ancestry on the X chromosome.

More generally mating systems such as polygyny — men having multiple female partners — result in far fewer males than females who contribute to future generations. Among Mongols during the era of Genghis Khan, a small number of males descended from Genghis and his Mongol horde had children with numerous women. Because X chromosomes tend to found in women, more of whom are reproducing, they will more diverse than non-sex chromosomes (where a few men contribute half the genes), while the Y chromosome will be the least diverse of all (where only a few men contribute genetic variation).

Men have only one X chromosome, but the one they have is genetically essential to them. X chromosomes are not exclusive to women, but for all males they are the singular legacy of their mothers. Because of this bias the X can shed light on the history of the women of our species, while the uniqueness of inheritance the X chromosome may even extend to driving the emergence of our species.

Explore your Neanderthal story today.


The “X” in the sex chromosome was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 25, 2018

DNA, from genetics to genomics

Filed under: DNA,Genetics,Genomics,science — Razib Khan @ 11:59 am

In the early 1950s scientists established that the molecular structure of DNA was a double helix. The had discovered the physical substrate of heredity. With this discovery the field of molecular genetics was born (and eventually a Nobel Prize given!).

And yet we also know that Gregor Mendel discovered the laws of heredity, the “law of segregation” and the “law of independent assortment”, nearly a century before the discovery of DNA.

It was literally the product of a garden.

The mature field of genetics itself developed fifty years before the discovery of the structure of DNA, as a host of scientists stumbled upon Mendelian insights simultaneously. Most were biologists who worked with plants, flies, or even algebra — no need for a powerful microscope or structural models of molecules.

Though DNA has been the key to many of the discoveries of the past fifty years, it is important to remember that the field of genetics is predicated on an abstract understanding of how inheritance works across pedigrees, as opposed to the biophysical basis of that transmission. Before DNA, before chromosomes, what Mendel and his heirs understood is that inheritance occurs through a process where discrete units of heredity, “genes”, are passed down from generation to generation.

These genes usually come in two copies, ‘alleles,’ for many organisms.

Recessive expression patterns of a trait, where parents do not express a characteristic found in their offspring, becomes comprehensible when a Mendelian model is adopted. Prior to this many had an intuitive “blending” understanding of inheritance, where the characteristics of the parents mixed together to produce offspring. The ultimate problem with blending inheritance is that it had difficulty in explaining how variation persisted over time. A problem solved by the Mendelian insight that genetic variation never disappeared…it simply rearranged itself every generation!

Genetics was born on the backs of Drosophila

Between the reemergence of Mendelian thought around 1900 and the discovery of DNA in the 1950s much research occurred in the field of genetics. The Neo-Darwinian Synthesis built upon the mathematical foundations of population genetics, which took the Mendelian framework and formalized and extended them, to create a model of evolutionary biology for the 20th century. Medical geneticists began to understand the patterns of inheritance of rare diseases in humans with the aim of preventing illness. Those researchers working with fruit flies discovered many of the phenomena which define modern genetics, such as recombination. Finally, biochemists established that heredity and nucleic acids were intimately connected.

Just as an understanding of the discrete basis of inheritance in a Mendelian framework opened up the systematic scientific study of heredity, so the understanding of the double helical structure of DNA paved the way for the molecular revolution of the second half of the 20th century, and the genomic revolution of the 21st. An understanding of DNA as the mode of inheritance allowed for the development of techniques that traced transmission of variation at the level of genes themselves, as opposed to expressed traits.

Illumina sequencing machine

And while in the 20th century we spoke of genetics, and specific genes, today we speak of genomes and the whole set of genes organisms possess. That revolution can not be understood without the knowledge of DNA as the mode of inheritance. If classical Mendelian genetics is pattern recognition across pedigrees, 21st century genomics is a synthesis of classical genetics, post-DNA era biophysics, and cutting-edge computing. Genomics is as much engineering as it is science; and “big data” as much as information theory.

The understanding of DNA created the world where genetics transformed itself from an esoteric science of probabilities, to a mass market product of possibilities.

Classical genetics tells you that your relatedness to your brother or sister is expected to be 0.50. Modern genomics might tell you that your relatedness to your brother or sister is shared across 46.24% of your genome. A fuzzy probability becomes a crisp reality. As a science, genetics can be imagined without DNA. It was born and matured decades before we understood the importance of the double helix, but as a part of our lives, one can’t imagine genetics without DNA.

Learn more about where your traits for food tolerance fall on the spectrum and explore your Metabolism story today.


DNA, from genetics to genomics was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

October 28, 2017

Apes just being apes

Filed under: Genomics — Razib Khan @ 12:10 am

A while back I made from of bonobos and chimpanzees for being kind of losers for looking across at each other on either side of the Congo river for ~1.5 million years the time elapsed since their diversion. I finally ended up reading the paper from last year, Chimpanzee genomic diversity reveals ancient admixture with bonobos, which reported complex population history between these two species. In other words, “they got it on”.

The key was a reasonable sample size of N=40 and high coverage genomes (>20x), to give them the amount of information necessary to have the power to detect admixture. If you aren’t human and have a reasonable size genome, and all mammals do, get to the back of the line. But the Pan‘s turn finally arrived.

The paper primary result is that over past few hundred thousand years there have been reciprocal gene flow events of small, but detectable, magnitude between chimpanzees and bonobos. Naturally, there was some geographic specificity here, in that chimpanzees from far West Africa lack much evidence of this while those from Central Africa have a great deal. The admixture is directly proportional to proximity to b0nobo range.

To obtain the result their initial focus on high-frequency bonobo derived alleles that were at low to moderate frequencies in chimpanzees. There was a notable excess for this class among Central African chimpanzees. And, these alleles seem to have introgressed recently.

I suppose the major takeway is that hominids do it like they do it on the Discovery Channel.

October 22, 2017

Selection swimming against the genomic tide

Filed under: Africa Genetics,Africa Genomics,Genetics,Genomics — Razib Khan @ 1:32 pm

One of the major issues that confuses people is that the distribution of a trait or gene is often only weakly correlated with overall phylogeny and the rest of the genome.

To give a strange but classic example, the MHC loci are subject to strong balancing selection. This means that novel alleles do not substitute and replace ancestral alleles. Substitution of this sort results in “lineage sorting,” so that when you look at chimpanzees and humans you can see many polymorphic loci where all humans carry one variant and all chimpanzees the other. In contrast at the MHC loci there is frequency-dependent selection for rare variants, so the normal cycling process does not occur. Humans and chimpanzees overlap quite a bit on MHC, and any given human may have a more similar profile to a given chimpanzee than another human.

There are 19,000 human genes. At 3 billion base pairs only about ~100 million are polymorphic on a worldwide scale (using some liberal definitions). There are lots of unique stories to tell here.

A new preprint, Inferring adaptive gene-flow in recent African history, illustrates how certain genes with functional significance may differ from genome-wide background. The authors find that among the Fula (Fulani) people of West Africa there has been introgression from a Eurasian mutation that confers lactase persistence. The area of the genome around this gene is much more Eurasian than the rest of the genome. In contrast, the area around the Duffy allele is much less Eurasian. The variation in this locus is related to malaria resistance. Finally, in other African populations, they found gene flow of MHC variants.

None of this is entirely surprising, though the authors apply novel haplotype-based methods which should have wider utility.

September 10, 2017

Quantitative genomics, adaptation, and cognitive phenotypes

The human brain utilizes about ~20% of the calories you take in per day. It’s a large and metabolically expensive organ. Because of this fact there are lots of evolutionary models which focus on the brain. In Catching Fire: How Cooking Made Us Human Richard Wrangham suggests that our need for calories to feed our brain is one reason we started to use fire to pre-digest our food. In The Mating Mind Geoffrey Miller seems to suggest that all the things our big complex brain does allows for a signaling of mutational load. And in Grooming, Gossip, and the Evolution of Language Robin Dunbar suggests that it’s social complexity which is driving our encephalization.

These are all theories. Interesting hypotheses and models. But how do we test them? A new preprint on bioRxiv is useful because it shows how cutting-edge methods from evolutionary genomics can be used to explore questions relating to cognitive neuroscience and pyschopathology, Polygenic selection underlies evolution of human brain structure and behavioral traits:

…Leveraging publicly available data of unprecedented sample size, we studied twenty-five traits (i.e., ten neuropsychiatric disorders, three personality traits, total intracranial volume, seven subcortical brain structure volume traits, and four complex traits without neuropsychiatric associations) for evidence of several different signatures of selection over a range of evolutionary time scales. Consistent with the largely polygenic architecture of neuropsychiatric traits, we found no enrichment of trait-associated single-nucleotide polymorphisms (SNPs) in regions of the genome that underwent classical selective sweeps (i.e., events which would have driven selected alleles to near fixation). However, we discovered that SNPs associated with some, but not all, behaviors and brain structure volumes are enriched in genomic regions under selection since divergence from Neanderthals ~600,000 years ago, and show further evidence for signatures of ancient and recent polygenic adaptation. Individual subcortical brain structure volumes demonstrate genome-wide evidence in support of a mosaic theory of brain evolution while total intracranial volume and height appear to share evolutionary constraints consistent with concerted evolution…our results suggest that alleles associated with neuropsychiatric, behavioral, and brain volume phenotypes have experienced both ancient and recent polygenic adaptation in human evolution, acting through neurodevelopmental and immune-mediated pathways.

The preprint takes a kitchen-sink approach, throwing a lot of methods of selection at the phenotype of interest. Also, there is always the issue of cryptical population structure generating false positive associations, but they try to address it in the preprint. I am somewhat confused by this passage though:

Paleobiological evidence indicates that the size of the human skull has expanded massively over the last 200,000 years, likely mirroring increases in brain size.

From what I know human cranial sizes leveled off in growth ~200,000 years ago, peaked ~30,000 years ago, and have declined ever since then. That being said, they find signatures of selection around genes associated with ‘intracranial volume.’

There are loads of results using different methods in the paper, but I was curious note that schizophrenia had hits for ancient and recent adaptation. A friend who is a psychologist pointed out to me that when you look within families “unaffected” siblings of schizophrenics often exhibit deviation from the norm in various ways too; so even if they are not impacted by the disease, they are somewhere along a spectrum of ‘wild type’ to schizophrenic. In any case in this paper they found recent selection for alleles ‘protective’ of schizophrenia.

There are lots of theories one could spin out of that singular result. But I’ll just leave you with the fact that when you have a quantitative trait with lots of heritable variation it seems unlikely it’s been subject to a long period of unidirecitional selection. Various forms of balancing selection seem to be at work here, and we’re only in the early stages of understanding what’s going on. Genuine comprehension will require:

– attention to population genetic theory
– large genomic data sets from a wide array of populations
– novel methods developed by population genomicists
– and funcitonal insights which neuroscientists can bring to the table

July 26, 2017

The future will be genetically engineered

Filed under: Genetics,Genomics — Razib Khan @ 4:04 pm


If the film Rise of the Planet of the Apes had come out a few years later I believe there would have been mention of CRISPR. Sometimes science leads to technology, and other times technology aids in science. On occasion the two are one in the same.

The plot I made above shows that in the first five years of the second decade of the 20th century CRISPR went from being an obscure aspect of bacterial genetics to ubiquitous. Friends who had been utilizing “advanced” genetic engineering methods such as TALENS and zinc fingers switched overnight to a CRISPR/Cas9 framework.

As I’ve said before the 2010s are the decade when “reading” the genome becomes normal. We really don’t know what the CRISPR/Cas9 technology is capable of. It’s early years yet. With that, First Human Embryos Edited in U.S.. Technically they’re single celled zygotes. The science itself is not astounding. Rather, it is that the human rubicon has been passed in the United States. As indicated in the article there has been some jealousy about what the Chinese have been able to do because of a different cultural and regulatory framework.

There are those calling for a moratorium on this work (on humans). I’m not in favor or opposed. Rather, my question is simple: if CRISPR/Cas9 makes genetic engineering cheap, easy, and effective, how exactly are we going to enforce a world-wide moratorium? A Butlerian Jihad?

Note: I know that people are freaking about humans + genetic engineering. But most geneticists I know are more excited about the prospects of non-human work, since human clinical trials are going to be way in the future. Over 20 years since Dolly it’s notable to me that no human has been cloned from adult somatic cells yet.

June 27, 2017

Genome sequencing for the people is near

Filed under: Genomics,Personal genomics — Razib Khan @ 7:22 am

When I first began writing on the internet genomics was an exciting field of science. Somewhat abstruse, but newly relevant and well known due to the completion of the draft of the human genome. Today it’s totally different. Genomics is ubiquitous. Instead of a novel field of science, it is transitioning into a personal technology.

But life comes at you fast. For all practical purposes the $1,000 genome is here.

And yet we haven’t seen a wholesale change in medicine. What happened? Obviously a major part of it is polygenicity of disease. Not to mention that a lot of illness will always have a random aspect. People who get back a “clean” genome and live a “healthy” life will still get cancer.

Another issue is a chicken & egg problem. When a large proportion of the population is sequenced and phenotyped we’ll probably discover actionable patterns. But until that moment the yield is going to not be too impressive.

Consider this piece in MIT Tech, DNA Testing Reveals the Chance of Bad News in Your Genes:

Out of 50 healthy adults [selected from a random 100] who had their genomes sequenced, 11—or 22 percent—discovered they had genetic variants in one of nearly 5,000 genes associated with rare inherited diseases. One surprise is that most of them had no symptoms at all. Two volunteers had genetic variants known to cause heart rhythm abnormalities, but their cardiology tests were normal.

There’s another possible consequence of people having their genome sequenced. For participants enrolled in the study, health-care costs rose an average of $350 per person compared with a control group in the six months after they received their test results. The authors don’t know whether those costs were directly related to the sequencing, but Vassy says it’s reasonable to think people might schedule follow-up appointments or get more testing on the basis of their results.

Researchers worry about this problem of increased costs. It’s not a trivial problem, and one that medicine doesn’t have a response to, as patients often find a way to follow up on likely false positives. But it seems that this is a phase we’ll have to go through. I see no chance that a substantial proportion of the American population in the 2020s will not be sequenced.

May 15, 2017

Reason is but a slave of passions as it always has been

David Hume stated that “reason is, and ought only to be the slave of the passions.” I don’t know about the ought part, that’s up for debate. But the is part seems empirically true. The reasons people give for this or that is often just a post hoc rationalization. To give a different twist to this contention, others have argued that reason exists to win arguments, not converge upon truth. Or more precisely in my opinion to give the patina of erudition or abstraction to sentiments which are fundamentally derived from emotion or manners enforced through group norms (ergo, the common practice of ‘educated’ people citing scholars whose work we can’t evaluate to buttress our own preconceptions; we all do it).

One of the reasons I recommend In Gods We Trust, and cognitive anthropology more generally, to atheists and religious skeptics is that it gives a better empirical window into the mental processes that are really at work, as opposed to those which people say are at work (or, more unfortunately, those they think are at work). In In Gods We Trust the author reports on research conducted where religious believers are given a set of factual assertions purportedly from scholarship (e.g., the Dead Sea Scrolls). These assertions on the face of it flatly contradict their religious beliefs in some deep fundamental way. But when confronted with facts which seem to logically refute the coherency of their beliefs, they often still accept the validity of the scholarship before them. When asked about the impact on their beliefs? Respondents generally asserted that the new facts strengthened their beliefs.

This is one reason that cognitive anthropologists term religious ‘reasoning’ quasi-propositional. It takes the general form of analysis from axioms, but ultimately the rationality is besides the point, it is simply a quiver in the arrow of a broader and deeper cognitive phenomenon.

To give a personal example which illustrates this. Many many years ago I knew a Jewish girl of Modern Orthodox girl background passingly. She once asserted to me that the event of the Holocaust strengthened her belief in her God. I didn’t follow through on this discussion, as it was too disturbing to me. But it brought home to me that in some way the “reasoning” of many religious people leaves me totally befuddled (and no doubt vice versa).

As it happens, while in the course of writing this post, I found out that Hugo Mercier and Dan Sperber, the authors of the above argument in relation to reason and argumentation, published a book last month, The Enigma of Reason. I encourage readers to get it. I just bought a Kindle copy. Dan Sperber, who I interviewed 12 years ago, is a very deep thinker on the level of Daniel Kahneman. He’s French, and his prose can be somewhat difficult, so I wonder if that’s one reason he’s not nearly as well known).

Ultimately the point of this post actually goes back to genomics and history. Anne Gibbons has an excellent piece in Science, There’s no such thing as a ‘pure’ European—or anyone else. In it she draws on the most recent research in human population genomics to refute antiquated ideas about the purity of any given population. If you have read this blog for the past few years you already know most human populations are complex admixtures; that is, it isn’t a human family tree, but a human family graph.

Gibbons’ piece attacks directly some standard racialist talking points which have been refuted on a factual basis by genetic science:

When the first busloads of migrants from Syria and Iraq rolled into Germany 2 years ago, some small towns were overwhelmed. The village of Sumte, population 102, had to take in 750 asylum seekers. Most villagers swung into action, in keeping with Germany’s strong Willkommenskultur, or “welcome culture.” But one self-described neo-Nazi on the district council told The New York Times that by allowing the influx, the German people faced “the destruction of our genetic heritage” and risked becoming “a gray mishmash.”

In fact, the German people have no unique genetic heritage to protect. They—and all other Europeans—are already a mishmash, the children of repeated ancient migrations, according to scientists who study ancient human origins. New studies show that almost all indigenous Europeans descend from at least three major migrations in the past 15,000 years, including two from the Middle East. Those migrants swept across Europe, mingled with previous immigrants, and then remixed to create the peoples of today.

First, let’s set aside the political question of welcoming on the order of one million refugees to Germany. I will not post comments discussing that.

As a point of fact the truth genetically in relation to Germans is even more complex than what Gibbons’ asserts. When I worked with FamilyTree DNA I had access to their database and presented at their year conference some interesting results from people whose four grandparents were from Germany. In short, Germans tended to fall into three main clusters, one that was strongly skewed toward people from some parts of France, another which was shifted toward Scandinavians, and a third which was very similar to Slavs.

The historical and cultural reasons for this are easy to guess at or make conjectures. The takeaway here is that unlike Finns, or Irish, and to a great extent Scandinavians and Britons, Germany exhibits a lot of population substructure within it because of assimilation or migration in the last ~1,000 years. This is why genetically saying someone is “German” is very difficult when compared to saying someone is Polish or Swedish. By dint of their cultural expansiveness Germans are everyone and no one set next to other Northern Europeans* (with the exception perhaps of the French…I’m sure Germans will appreciate this comparison!).

The conceit of these sort of pieces is that racists will confront refutations which will shatter their racist axioms. But since most of the people who are writing these pieces and read Science are not racists, they won’t have a good intuition on the cognitive processes at work for genuine racists.

This causes problems. As a comparison, many atheists seem to think that refutation of the Athanasian creed will blow Christians away and make them forsake their God (or showing them contradictions in the Bible, admit that you’ve gone through that phase!). Though the Church Father Tertullian’s assertion that he “believed because it is absurd” is more subtle than I often make it out to be, on the face of it it does reflect how outsiders view a normative social group like Christianity.

The emphasis here is on normative. Social or religious movements and sentiments are often about norms, which emerge at the intersection of history, intuition, instinct, and facts. I place facts last in the list, because I think it is a defensible stance to take that facts are the least important variable!

The field of cultural evolution has shown that group cohesion and communal norms have been major drivers of human evolution. Likely there has been gene-cultural coevolution so that group conformity has been selected for as a way to make social units operate more smoothly. Social cognition is a thing; people believe what they believe because other people in their social groups believe something, not because they’ve reasoned to it themselves. Originally reasoning is hard. Letting others derive for you, and plugging and chugging is easy. As Muhammad stated, the Ummah will not agree upon error! The smarter people are, the better they are are reasoning…but the better they are at motivated reasoning, ignorance, and rationalization.

When faced with disconfirming evidence some people can dig in and deny the plain facts. Creationists are a straightforward case of this. Then there are evaders.  From what I have seen on the political Left in the United States at least over the last 15 years (when I’ve been engaging actively with people on the internet) there has been a consistent pattern of obfuscation and dodging the likely reality of sex differences in many quarters. When pinned down on the fundamentals few deny the principle or the possibility, but they almost always impose an extremely high level of skepticism that is not found in other domains, where their epistemology is far less stringent.

But then there is a third case, where facts that seem to refute on first blush to you  only strengthen the beliefs of someone with whom you already disagree. I am generally of the view that the rise of naturalistic science has probably undermined the case for classical supernaturalist theism, which emerged in the pre-modern era. Reasonable people can disagree, as I have smart religious friends who are also scientists. Some of these people, like Francis Collins, will even assert that modern findings which boggle the mind and shock our intuitions confirm and strengthen their belief in pre-modern religious systems!

My point is not to take a strong stance on science and religion. Rather, it is to say that when you present evidence and declare “I refute you thus!”, they may simply respond “Aha! You have proven my point!”

In relation to the Gibbons’ article the writing has been on the wall for at least three years, and probably longer. In Towards a new history and geography of human genes informed by ancient DNA Pickrell and Reich content:

…Implicit in this research is the assumption that the geographic locations of people today are informative about the geographic locations of their ancestors in the distant past. However, it is now clear that long-range migration, admixture and population replacement have been the rule rather than the exception in human history. In light of this, we argue that it is time to critically re-evaluate current views of the peopling of the globe and the importance of natural selection in determining the geographic distribution of phenotypes. We specifically highlight the transformative potential of ancient DNA. By accessing the genetic make-up of populations living at archaeologically-known times and places, ancient DNA makes it possible to directly track migrations and responses to natural selection.

Since this was published in spring of 2014 the evidence has gotten stronger and stronger. That is, the distribution of outcomes is getting more consistent and converging to a high confidence truth.

From this, are we to conclude that white nationalism would decline from marginal to non-existent in the past three years? A review of the empirical data does not seem to support that proposition. Therefore, a naive model that white nationalism is predicated on facts about racial purity may be wrong.

The responses that I have seen (often in the form of comments I don’t publish on this weblog) are denial/rejection, confusion, reinterpretation and vindication (along with standard issue racial insults directed toward me, their colored cognitive inferior). As with the religious case I have a difficult time “putting myself” in the shoes of a racialist of any sort, so I don’t totally understand how they’re getting from A to B, but in their own minds they are.

Let’s reaffirm what’s going on here: white racial consciousness in the United States has exploded on the public scene over the past three years, just as scientists have come to the very strong conclusion that the “white European race” as we understand it is an artifact of the last ~5,000 years or so.**

We need to go back to Hume, and the anthropological understanding of what reason is. Reason is a tool to confirm what you already hold to be true and good. If reason falsifies in some way what you hold to be true and good, that does not mean for most people that reason is where they will stand. Likely there will be some subtle reinterpretation, but magically reason will support their presuppositions. Ask the descendants of the followers of William Miller about falsification.

The fact is that very few people in the world know about David Reich and his research. I know this personally because I’m a voluble evangelist, and many geneticists, even human geneticists, are not aware of the revolution in historical population genetics that ancient DNA has wrought. I do not know any Nazis personally, I suspect that perhaps their knowledge of human phylogenomics is not at the same level as a typical geneticist.

Of course this sort of logic about logic cuts both ways. Before 2010 I actually assumed, as did most human geneticists who took an interest in these topics, that human populations had long been resident in their region of current occupation for tens of thousands of years. When I read Reconstructing Indian Population History by David Reich I was shocked out of my prior model, because the inferences were so ingenious and plausible, and, the updated story of how South Asians came to be actually made a lot of anomalies make a lot more sense. When Lazaridis et al. posted Ancient human genomes suggest three ancestral populations for present-day Europeans on biorxiv in the December of 2013 I was far more surprised, because I had always assumed that the thesis that most European ancestry dated to the Pleistocene in any given region was a robust one. Both the phylogeography from mtDNA and Y pointed to a Pleistocene origin.

But the data were compelling. It’s one thing to make inferences on present day genetic distribution, it’s another to actually genotype ancient individuals (remember, I can reanalyze the data myself, and have done so numerous times). Lazaridis et al. and Priya Moorjani’s Genetic Evidence for Recent Population Mixture in India totally changed my personal life. All of a sudden my wife and I were far closer emotionally and spiritually because we understood that the TMRCA of many segments in our autosomal genome was about 5-fold closer than I had assumed!!!***

Actually, the last sentence is a total fiction. The history which changed how I understood my wife and I to be related on a historical population genetic sense had zero impact on our relationship. That’s because we’re not racists, and race doesn’t really impact our relationship too much (the fact that my parents are Muslim, well, that’s a different issue….). Sorry Everyday Feminism. This is not an uncommon view, though perhaps not as common as we’d assumed of late (actually, as someone who has looked at the fascinating interracial dating research, I pretty much understood that what people say is quite different than what they do; anti-racism is the conformist thing to do, so people will play that tune for a while longer).

Just because the state of the world is one particular way, it does not naturally follow that it should be that way, or that it always will be that way. Most ethical religions saw in slavery an aspect of injustice; rational arguments aside, on some level extension of empathy and sympathy makes its injustice self-evident. But they accepted that it was an aspect of the world that was naturally baked into the structure of reality. The de jure abolition of slavery today does not mean it has truly gone away, but its practice has certainly been curtailed, and much of the cruelty diminished. Theories of human nature or necessities of economic production at the end of the day gave way changing mores and values. Facts about the world became less persuasive when we decided to let them no longer dictate tolerance of slavery.

All that I say above in relation to how humans use reason does not leave scientists or journalists untouched. All humans have their own goals, and even though they see through the glass darkly, they see in the visions beyond what they want to see. The cultural and theoretical structure of modern science is such that some of these impulses are dampened and human intuitions are channeled in a manner so that theories and models of the world seem to correspond to reality. But I believe this is deeply unnatural, and also deeply fragile. When moving outside of their domain of specialty scientists can be quite blind and irrational. Even when one steps away at a mild remove in terms of domain knowledge this becomes clear, such as when Linus Pauling promoted Vitamin C. And motivated reasoning can creep into the actions of even the greatest of scientists, such as when R. A. Fisher rejected the causal connection between tobacco and cancer.****

I will end on a frank and depressing note: I believe that the era of public reason and fealty to empirical standards in at least official capacities is fading. Social cognition, tribal logic, is on the rise. But we have to remember that in the historical perspective social cognition and tribal logic ruled the day. They are the norm. This is age when he abide by public reason is the peculiarity in the sea of polemic. Ultimately it may be the fool who fixates on being right or wrong, as opposed to being on the winning team. I hope I’m wrong on this.

Addendum: I have written a form of this post many times.

* The current chancellor of Germany has a Polish paternal grandfather.

** If Middle Easterners are included as white we can extended the time horizon much further back, but that seems to defeat the purpose of white nationalism in the United States….

*** I had assumed that the western affinity in South Asians had diverged from Europeans during the Last Glacial Maximum. In turns out some of it may be as recent as ~4,500 years ago or so.

**** This may have been unconsciously as opposed to malicious, as Fisher was keen on tobacco personally.

May 11, 2017

When conquered pre-Greece took captive her rude Hellene conqueror

Filed under: Genetics,Genomics,Greece,History,Migration — Razib Khan @ 12:22 am


When I was a child in the 1980s I was captivated by Michael Wood’s documentary In Search of the Trojan War (he also wrote a book with the same name). I had read a fair amount of Greek mythology, prose translations of the Iliad, as well as ancient history. The contrast between the Classical Greeks, the strangeness of their mythology was always something that on the surface of my mind. The reality that Bronze Age Greeks were very different from Classical Greeks resolved this issue to some extent.

Though Classical Greeks were very different from us, to some extent Western civilization began with them, and they are very familiar to us. Rebecca Goldstein’s Plato at the Googleplex was predicated on the thesis that the ancient Greek philosopher had something to tell us, and that if he was alive today he would be a prominent public speaker.

I’m going to dodge the issue of Julian Jaynes’ bicameral mind, and just assert that people of the Bronze Age were fundamentally different from us. And that difference is preserved in aspects of Greek mythology. Though it is fashionable, and correct, to assert that Homer’s world was not that of Mycenaeans, but the barbarian period of the Greek Dark Age, it is not entirely true. Homer clearly preserved traditions where citadels such as Mycenae and Pylos were preeminent, and details such as the boar’s tusk helmets are also present in the Iliad.

But aesthetic details or geopolitics are not what struck me about Greek mythology, but events such as the sacrifice of Iphigenia. Like Abraham’s near sacrifice of his son, this plot element strikes moderns as cruel, barbaric, and unthinking. And though the Classical Greeks did not have our conception of human rights, they had turned against human sacrifice (and the Romans suppressed the practice when they conquered the Celts) on the whole, but it seems to have occurred in earlier periods.

The rupture between the world of the Classical Greeks and the strange edifices of Mycenaean Greece were such that scholars were shocked that the Linear B tablets of the Bronze Age were written in Greek when they were finally deciphered. In fact many of the names and deities on these tablets would be familiar to us today; the name Alexander and the goddess Athena are both attested to in Mycenaean tablets.

Preceding the Mycenaeans, who  emerge in the period between 1400-1600 BCE, are the Minoans, who seem to have developed organically in the Aegean in the 3rd millennium. This culture had relations with Egypt and the Near East, their own system of writing, and deeply influenced the motifs of the successor Mycenaean Greek civilization. The aesthetic similarities between Mycenaeans and Minoans is one reason that many were surprised that the former were Greek, because the Minoan language was likely not.

Mycenaean civilization seems to have been a highly militarized and stratified society. There is a reason that this is sometimes referred to as the “age of citadels.” Allusions to the Greeks, or Achaeans, in the diplomatic missives of the Egyptians and Hittites suggests that the lords of the Hellenes were reaver kings. In 1177 B.C. Eric Cline repeats the contention that a fair portion of the “sea peoples” who ravaged Egypt in the late Bronze Age were actually Greeks.

So when did these Greeks arrive to the shores of Hellas? In The Coming of the Greeks Robert Drews argued that the Greeks were part of a broader movement of mobile charioteers who toppled antique polities and turned them into their own. The Hittites and Mitanni were two examples of Indo-European ruling elites who took over a much more advanced civilizational superstructure and made it their own. While the Hittites and other Indo-Europeans, such as the Luwians and Armenians, slowly absorbed the non-Indo-European substrate of Anatolia, the Indo-Aryan Mitanni elite were linguistically absorbed by their non-Indo-European Hurrian subjects. Indo-Aryan elements persisted only their names, their gods, and tellingly, in a treatise on training horses for charioteers.

Drews’ thesis is that the Greek language percolated down from the warlords of the citadels and their retinues over the Bronze Age, with the relics who did not speak Greek persisting into the Classical period as the Pelasgians. Set against this is the thesis of Colin Renfrew that Greece was one of the first Indo-European languages, as Indo-European languages began in Anatolia.

The most recent genetic data suggest to me that both theses are likely to be wrong. The data are presented in two preprints The Population Genomics Of Archaeological Transition In West Iberia and The Genomic History Of Southeastern Europe. The two papers cover lots of different topics. But I want to focus on one aspect: gene flow from steppe populations into Southern Europe.

We know that in the centuries after 2900 BCE there was a massive eruption of individuals from the steppe fringe of Eastern Europe, and Northern Europe from Ireland to to Poland was genetically transformed. Though there was some assimilation of indigenous elements, it looks to be that the majority element in Northern Europe were descended from migrants.

For various reasons this was always less plausible for Southern Europe. The first reason is that Southern Europeans shared a lot of genetic similarities to Sardinians, who resembled Neolithic farmers. Admixture models generally suggested that in the peninsulas of Southern Europe the steppe-like ancestry was the minority component, not the majority, as was the case in Northern Europe.

These data confirm it. The Bronze Age in Portugal saw a shift toward steppe-inflected populations, but it was not a large shift. There seems to have been later gene flow too. But by and large the Iberian populations exhibit some continuity with late Neolithic populations.  This is not the case in Northern Europe.

In The Genomic History Of Southeastern Europe the authors note that steppe-like ancestry could be found sporadically during early periods, but that there was a notable increase in the Bronze Age, and later individuals in the Bronze Age had a higher fraction. Nevertheless, by and large it looks as if the steppe-like gene flow in the southerly Balkans (focusing on Bulgarian samples) was modest in comparison to the northern regions of Europe. Unfortunately I do not see any Greece Bronze Age samples, but it seems likely that steppe-like influence came into these groups after they arrived in Bulgaria, which is more northerly.

Down to the present day a non-Indo-European language, Basque, is spoken in Spain. Paleo-Sardinian survived down to the Classical period, and it too was not Indo-European. Similarly, non-Indo-European Pelasgian communities continued down to the period of city-states in Greece.

These long periods of coexistence point to the demographic equality (or even superiority) of the non-Indo-European populations. The dry climate of the Mediterranean peninsulas are not as suitable for cattle based agro-pastoralism. This may have limited the spread and dominance of Indo-Europeans. Additionally, the Mediterranean peninsulas were likely touched by Indo-European migrations relatively late. Much of the early zeal for expansion may have already dissipated by them. The high frequency of likely Indo-European R1b lineages among the Basques is curious, and may point to the spreading of male patronization networks, and their assimilation into non-Indo-European substrates where necessary. R1b is also found in Sardinia, and in high frequencies in much of Italy.

The interaction and synthesis between native and newcomer was likely intensive in the Mediterranean. For example, of the gods of the Greek pantheon only Zeus is indubitably of Indo-European origin. Some, such as Artemis, have clear Near Eastern antecedents. But other Greek gods may come down from the pre-Greek inhabitants of what became Greece.

Ultimately these copious interactions and transformations should not be a great surprise. The sunny lands of the Mediterranean attracted Northern European tribes during Classical antiquity. The Cimbri invasion of Italy, Galatians in Thrace and Anatolia, the folk wandering of Vandals and Goths into Iberia, are all instances of population movements southward. These likely moved the needle ever so slightly toward convergence between Northern and Southern Europe in terms of genetic content.

In relation to the more general spread of Indo-Europeans, I believe there are a few areas like Northern Europe, where replacement was preponderant (e.g., the Tarim basin). But I also believe there were many more which presented a Southern European model of synthesis and accommodation.

When conquered pre-Greece took captive her rude Hellene conqueror

Filed under: Genetics,Genomics,Greece,History,Migration — Razib Khan @ 12:22 am


When I was a child in the 1980s I was captivated by Michael Wood’s documentary In Search of the Trojan War (he also wrote a book with the same name). I had read a fair amount of Greek mythology, prose translations of the Iliad, as well as ancient history. The contrast between the Classical Greeks, the strangeness of their mythology was always something that on the surface of my mind. The reality that Bronze Age Greeks were very different from Classical Greeks resolved this issue to some extent.

Though Classical Greeks were very different from us, to some extent Western civilization began with them, and they are very familiar to us. Rebecca Goldstein’s Plato at the Googleplex was predicated on the thesis that the ancient Greek philosopher had something to tell us, and that if he was alive today he would be a prominent public speaker.

I’m going to dodge the issue of Julian Jaynes’ bicameral mind, and just assert that people of the Bronze Age were fundamentally different from us. And that difference is preserved in aspects of Greek mythology. Though it is fashionable, and correct, to assert that Homer’s world was not that of Mycenaeans, but the barbarian period of the Greek Dark Age, it is not entirely true. Homer clearly preserved traditions where citadels such as Mycenae and Pylos were preeminent, and details such as the boar’s tusk helmets are also present in the Iliad.

But aesthetic details or geopolitics are not what struck me about Greek mythology, but events such as the sacrifice of Iphigenia. Like Abraham’s near sacrifice of his son, this plot element strikes moderns as cruel, barbaric, and unthinking. And though the Classical Greeks did not have our conception of human rights, they had turned against human sacrifice (and the Romans suppressed the practice when they conquered the Celts) on the whole, but it seems to have occurred in earlier periods.

The rupture between the world of the Classical Greeks and the strange edifices of Mycenaean Greece were such that scholars were shocked that the Linear B tablets of the Bronze Age were written in Greek when they were finally deciphered. In fact many of the names and deities on these tablets would be familiar to us today; the name Alexander and the goddess Athena are both attested to in Mycenaean tablets.

Preceding the Mycenaeans, who  emerge in the period between 1400-1600 BCE, are the Minoans, who seem to have developed organically in the Aegean in the 3rd millennium. This culture had relations with Egypt and the Near East, their own system of writing, and deeply influenced the motifs of the successor Mycenaean Greek civilization. The aesthetic similarities between Mycenaeans and Minoans is one reason that many were surprised that the former were Greek, because the Minoan language was likely not.

Mycenaean civilization seems to have been a highly militarized and stratified society. There is a reason that this is sometimes referred to as the “age of citadels.” Allusions to the Greeks, or Achaeans, in the diplomatic missives of the Egyptians and Hittites suggests that the lords of the Hellenes were reaver kings. In 1177 B.C. Eric Cline repeats the contention that a fair portion of the “sea peoples” who ravaged Egypt in the late Bronze Age were actually Greeks.

So when did these Greeks arrive to the shores of Hellas? In The Coming of the Greeks Robert Drews argued that the Greeks were part of a broader movement of mobile charioteers who toppled antique polities and turned them into their own. The Hittites and Mitanni were two examples of Indo-European ruling elites who took over a much more advanced civilizational superstructure and made it their own. While the Hittites and other Indo-Europeans, such as the Luwians and Armenians, slowly absorbed the non-Indo-European substrate of Anatolia, the Indo-Aryan Mitanni elite were linguistically absorbed by their non-Indo-European Hurrian subjects. Indo-Aryan elements persisted only their names, their gods, and tellingly, in a treatise on training horses for charioteers.

Drews’ thesis is that the Greek language percolated down from the warlords of the citadels and their retinues over the Bronze Age, with the relics who did not speak Greek persisting into the Classical period as the Pelasgians. Set against this is the thesis of Colin Renfrew that Greece was one of the first Indo-European languages, as Indo-European languages began in Anatolia.

The most recent genetic data suggest to me that both theses are likely to be wrong. The data are presented in two preprints The Population Genomics Of Archaeological Transition In West Iberia and The Genomic History Of Southeastern Europe. The two papers cover lots of different topics. But I want to focus on one aspect: gene flow from steppe populations into Southern Europe.

We know that in the centuries after 2900 BCE there was a massive eruption of individuals from the steppe fringe of Eastern Europe, and Northern Europe from Ireland to to Poland was genetically transformed. Though there was some assimilation of indigenous elements, it looks to be that the majority element in Northern Europe were descended from migrants.

For various reasons this was always less plausible for Southern Europe. The first reason is that Southern Europeans shared a lot of genetic similarities to Sardinians, who resembled Neolithic farmers. Admixture models generally suggested that in the peninsulas of Southern Europe the steppe-like ancestry was the minority component, not the majority, as was the case in Northern Europe.

These data confirm it. The Bronze Age in Portugal saw a shift toward steppe-inflected populations, but it was not a large shift. There seems to have been later gene flow too. But by and large the Iberian populations exhibit some continuity with late Neolithic populations.  This is not the case in Northern Europe.

In The Genomic History Of Southeastern Europe the authors note that steppe-like ancestry could be found sporadically during early periods, but that there was a notable increase in the Bronze Age, and later individuals in the Bronze Age had a higher fraction. Nevertheless, by and large it looks as if the steppe-like gene flow in the southerly Balkans (focusing on Bulgarian samples) was modest in comparison to the northern regions of Europe. Unfortunately I do not see any Greece Bronze Age samples, but it seems likely that steppe-like influence came into these groups after they arrived in Bulgaria, which is more northerly.

Down to the present day a non-Indo-European language, Basque, is spoken in Spain. Paleo-Sardinian survived down to the Classical period, and it too was not Indo-European. Similarly, non-Indo-European Pelasgian communities continued down to the period of city-states in Greece.

These long periods of coexistence point to the demographic equality (or even superiority) of the non-Indo-European populations. The dry climate of the Mediterranean peninsulas are not as suitable for cattle based agro-pastoralism. This may have limited the spread and dominance of Indo-Europeans. Additionally, the Mediterranean peninsulas were likely touched by Indo-European migrations relatively late. Much of the early zeal for expansion may have already dissipated by them. The high frequency of likely Indo-European R1b lineages among the Basques is curious, and may point to the spreading of male patronization networks, and their assimilation into non-Indo-European substrates where necessary. R1b is also found in Sardinia, and in high frequencies in much of Italy.

The interaction and synthesis between native and newcomer was likely intensive in the Mediterranean. For example, of the gods of the Greek pantheon only Zeus is indubitably of Indo-European origin. Some, such as Artemis, have clear Near Eastern antecedents. But other Greek gods may come down from the pre-Greek inhabitants of what became Greece.

Ultimately these copious interactions and transformations should not be a great surprise. The sunny lands of the Mediterranean attracted Northern European tribes during Classical antiquity. The Cimbri invasion of Italy, Galatians in Thrace and Anatolia, the folk wandering of Vandals and Goths into Iberia, are all instances of population movements southward. These likely moved the needle ever so slightly toward convergence between Northern and Southern Europe in terms of genetic content.

In relation to the more general spread of Indo-Europeans, I believe there are a few areas like Northern Europe, where replacement was preponderant (e.g., the Tarim basin). But I also believe there were many more which presented a Southern European model of synthesis and accommodation.

May 9, 2017

The Beaker is breaking!

The link is up, The Beaker Phenomenon And The Genomic Transformation Of Northwest Europe, but the paper is still processing:

I’ll update the post when I can read the paper.

May 6, 2017

Synergistic epistasis as a solution for human existence

Filed under: epistasis,Evolution,Evolutionary Genetics,Genetics,Genomics — Razib Khan @ 12:16 am

Epistasis is one of those terms in biology which has multiple meanings, to the point that even biologists can get turned around (see this 2008 review, Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems, for a little background). Most generically epistasis is the interaction of genes in terms of producing an outcome. But historically its meaning is derived from the fact that early geneticists noticed that crosses between individuals segregating for a Mendelian characteristic (e.g., smooth vs. curly peas) produced results conditional on the genotype of a secondary locus.

Molecular biologists tend to focus on a classical, and often mechanistic view, whereby epistasis can be conceptualized as biophysical interactions across loci. But population geneticists utilize a statistical or evolutionary definition, where epistasis describes the extend of deviation from additivity and linearity, with the “phenotype” often being fitness. This goes back to early debates between R. A. Fisher and Sewall Wright. Fisher believed that in the long run epistasis was not particularly important. Wright eventually put epistasis at the heart of his enigmatic shifting balance theory, though according to Will Provine in Sewall Wright and Evolutionary Biology even he had a difficult time understanding the model he was proposing (e.g., Wright couldn’t remember what the different axes on his charts actually meant all the time).

These different definitions can cause problems for students. A few years ago I was a teaching assistant for a genetics course, and the professor, a molecular biologist asked a question about epistasis. The only answer on the key was predicated on a classical/mechanistic understanding. But some of the students were obviously giving the definition from an evolutionary perspective! (e.g., they were bringing up non-additivity and fitness) Luckily I noticed this early on and the professor approved the alternative answer, so that graders would not mark those using a non-molecular answer down.

My interested in epistasis was fed to a great extent in the middle 2000s by my reading of Epistasis and the Evolutionary Process. Unfortunately not too many people read this book. I believe this is so because when I just went to look at the Amazon page it told me that “Customers who viewed this item also viewed” Robert Drews’ The End of the Bronze Age. As it happened I read this book at about the same time as Epistasis and the Evolutionary Process…and to my knowledge I’m the only person who has a very deep interest in statistical epistasis and Mycenaean Greece (if there is someone else out there, do tell).

In any case, when I was first focused on this topic genomics was in its infancy. Papers with 50,000 SNPs in humans were all the rage, and the HapMap paper had literally just been published. A lot has changed.

So I was interested to see this come out in Science, Negative selection in humans and fruit flies involves synergistic epistasis (preprint version). Since the authors are looking at humans and Drosophila and because it’s 2017 I assumed that genomic methods would loom large, and they do.

And as always on the first read through some of the terminology got confusing (various types of statistical epistasis keep getting renamed every few years it seems to me, and it’s hard to keep track of everything). So I went to Google. And because it’s 2017 a citation of the paper and further elucidation popped up in Google Books in Crumbling Genome: The Impact of Deleterious Mutations on Humans. Weirdly, or not, the book has not been published yet. Since the author is the second to last author on the above paper it makes sense that it would be cited in any case.

So what’s happening in this paper? Basically they are looking to reduced variance of really bad mutations because a particular type of epistasis amplifies their deleterious impact (fitness is almost always really hard to measure, so you want to look at proxy variables).

Because de novo mutations are rare, they estimate about 7 are in functional regions of the genome (I think this may be high actually), and that the distribution should be Poisson. This distribution just tells you that the mean number of mutations and the variance of the the number of mutations should be the same (e.g., mean should be 5 and variance should 5).

Epistasis refers (usually) to interactions across loci. That is, different genes at different locations in the genome. Synergistic epistasis means that the total cumulative fitness after each successive mutation drops faster than the sum of the negative impact of each mutation. In other words, the negative impact is greater than the sum of its parts. In contrast, antagonistic epistasis produces a situation where new mutations on the tail of the distributions cause a lower decrement in fitness than you’d expect through the sum of its parts (diminishing returns on mutational load when it comes to fitness decrements).

These two dynamics have an effect the linkage disequilibrium (LD) statistic. This measures the association of two different alleles at two different loci. When populations are recently admixed (e.g., Brazilians) you have a lot of LD because racial ancestry results in lots of distinctive alleles being associated with each other across genomic segments in haplotypes. It takes many generations for recombination to break apart these associations so that allelic state at one locus can’t be used to predict the odds of the state at what was an associated locus. What synergistic epistasis does is disassociate deleterious mutations. In contrast, antagonistic epistasis results in increased association of deleterious mutations.

Why? Because of selection. If a greater number of mutations means huge fitness hits, then there will strong selection against individuals who randomly segregate out with higher mutational loads. This means that the variance of the mutational load is going to lower than the value of the mean.

How do they figure out mutational load? They focus on the distribution of LoF mutations. These are extremely deleterious mutations which are the most likely to be a major problem for function and therefore a huge fitness hit. What they found was that the distribution of LoF mutations exhibited a variance which was 90-95% of a null Poisson distribution. In other words, there was stronger selection against high mutation counts, as one would predict due to synergistic epistasis.

They conclude:

Thus, the average human should carry at least seven de novo deleterious mutations. If natural selection acts on each mutation independently, the resulting mutation load and loss in average fitness are inconsistent with the existence of the human population (1 − e−7 > 0.99). To resolve this paradox, it is sufficient to assume that the fitness landscape is flat only outside the zone where all the genotypes actually present are contained, so that selection within the population proceeds as if epistasis were absent (20, 25). However, our findings suggest that synergistic epistasis affects even the part of the fitness landscape that corresponds to genotypes that are actually present in the population.

Overall this is fascinating, because evolutionary genetic questions which were still theoretical a little over ten years ago are now being explored with genomic methods. This is part of why I say genomics did not fundamentally revolutionize how we understand evolution. There were plenty of models and theories. Now we are testing them extremely robustly and thoroughly.

Addendum: Reading this paper reinforces to me how difficult it is to keep up with the literature, and how important it is to know the literature in a very narrow area to get the most out of a paper. Really the citations are essential reading for someone like me who just “drops” into a topic after a long time away….

Citation: ScienceNegative selection in humans and fruit flies involves synergistic epistasis.

May 4, 2017

Africa’s great demographic transformation

Filed under: Bantu Expansion,Genetics,Genomics — Razib Khan @ 9:56 pm

Stonehenge has been a preoccupation for moderns since the Victorian period. It was built over 5,000 years ago, and its usage in some fashion continued down to about 2,500 years ago. For a long while it had been associated with the Celts, but more recently there has been some suspicion that its roots must be pre-Celtic.

And that is almost certainly true. The original site of Stonehenge had a wooden structure. But during the arrival of the Bell Beaker culture it was extensively rebuilt, and eventually stone monoliths were erected in the fashion we are used to seeing today.

Bernard Cornwell’s novel Stonehenge deals with this period. There is no major focus on physical conflict between the native populations, and the Bell Beaker groups. Rather, the plot centers around the cultural tumult and innovation that was triggered by the arrival of the newcomers.

In Stonehenge the Bell Beakers occupied more marginal, out of the way, territory. The novel presumed that ultimately there would be cultural fusion between the two groups, as there was a lot of interaction inter-personally among the characters of the two groups. We now know that the reality was likely one of near total replacement. From the abstract to be presented on shortly on the Bell Beakers:

British individuals associated with Beakers are genetically indistinguishable from continental individuals associated with the same material culture and genetically nearly completely discontinuous with the previously resident population.

This is not entirely surprising. Ancient Ireland seems to have been characterized by discontinuity with the arrival of Bell Beakers genetically.

Ancient DNA is not magic. But it can literally put some flesh on the bones of cultural shifts that archaeologists have seen in the material culture. One key element here is that the predominant ancestry across the British Isles today derives from migrations that date to the early Bronze Age.* I do not know if this has any relevance as to the arrival of the Celtic languages to the Britain and Ireland, but I suspect it does.

This was percolating in my mind because there’s a new paper which attempts to explore in more detail the Bantu expansions which occurred between 1000 BCE and 500 CE. It’s pretty incredible that from Gabon to Capetown Africans speak one language family, with similarities at least as close as that of the Romance language family.

But then is it incredible? Indo-European languages span the North Sea to the Bay of Bengal. The Bantu expansion in some ways serves as a template for the argument in First Farmers, as an agricultural revolution triggered a demographic expansion which did not stop until they reached the their geographic limits.

The paper in Science, which is open access, Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America, focuses on two issues. First, the demographic history and phylogenomics of the Bantu populations. Second, using population genomic methods it explores the dynamics of natural selection in these peoples. They utilize and extensive SNP data set, with more than 500,000 markers in their core analyses.

In general I think there are lots of interesting results in this paper. But the one angle I was unsatisfied by was their purported increase in coverage. As you can see it’s highly localized to a few countries. This is probably common sense since much of Africa is not accessible due to political issues (e.g., sampling in the Democratic Republic of Congo is treacherous right now). But one always has to be careful of the limitations of the data when making inferences. Though they have samples from the southwest (Angola, Namibia), the the African Great Lakes region around Uganda, and in South Africa, huge zones between are missing. And, they are highly over sampled in and around Gabon.

With all that said, I think with a variety of methods they probably have confirmed a major aspect of Bantu migration. I’ll quote:

Two hypotheses have been proposed concerning the dispersal of Bantu-speaking populations across sub-Saharan Africa (2–4). According to the “early-split” hypothesis, the western and eastern branches split early, within the Bantu heartland, into separate migration routes. By contrast, the “late-split” model suggests an initial spread southward from the Bantu homeland into the equatorial rainforest (i.e., Gabon/Angola), followed by expansions toward the rest of the subcontinent. We tested these hypotheses by determining whether eBSPs and seBSPs were genetically closer to wBSPs from the southern part, relative to wBSPs from the northern part, of western central Africa….

…Although additional sampling of African populations may further refine these patterns, our results, together with previous genetic data supporting the late-split model (2, 3), indicate that BSPs [Bantu-speaking peoples] first moved southward through the rainforest before migrating toward eastern and southern Africa, where they admixed with local populations. This model is further supported by linguistics (15) and archaeoclimate data (16), suggesting that a climatic crisis ~2500 years ago fragmented the rainforest into patches and facilitated the early movements of BSPs farther southward from their original homeland.

That being said, their sample limitations produce interesting assertions. E.g., “The GLOBETROTTER method estimated that eBSPs resulted from two consecutive admixture events (P < 0.05) occurring 1000 to 1500 years ago and 150 to 400 years ago between a wBSP (~75% contribution) and an Afroasiatic-speaking population from Ethiopia (~10% contribution).” GLOBETROTTER is powerful, but too often people use it in a manner where they assume that the inferences it generates from the data it has are the truth, as opposed to the closest GLOBETROTTER can get to the truth with the tools its given.

In this case I would contend that because there aren’t any Nilotic samples it leaves a major hole in their power to be able to accurately infer what really happened. The presence of pastoralist Nilotic people in close proximity to Bantu agriculturalists has been one of the major dynamics which define the East African landscape. The admixture into eastern Bantu agriculturalists therefore is almost certainly from Nilotic peoples, though there has been Afro-Asiatic (Cushitic) influence as far south as Tanzania, evident in enigmatic peoples such as the Sandawe.

The point here is that just because the GLOBETROTTER method inferred gene flow from a population in the sample set, it does not mean that the gene flow was necessarily from that population. The sampling of the region is sparse, so obviously this is only a first approximation. To some extent I assume the authors assume the readers will connect the dots, but often this sort of thing gets lost in translation, and then it gets into the media….

Though it is difficult to make in the admixture plot above, there are subtle differences in the eastern Bantu groups. The Luyha, who are from Kenya, do not show evidence of the blue component which is clearly Eurasian, while the Bakiga from Rwanda do. But even in the Bakiga the ratio of the violet element that seems to be associated with an indigenous African component which is distinct from that of the Bantu and the blue Eurasian is far higher than in the Afro-Asiatic populations in their data set (this does not mean they don’t have Eurasian ancestry, since admixture plots aren’t perfect proxies).

Because of the nature of the sampling and the utilization of admixture to frame their results I do feel that we don’t get a good sense of the variation among the Bantu across their full range. Granted, the between population genetic distance is actually quite low across this zone, on the order of 0.01, because of the recent shared ancestry. Africans may have much greater total diversity than Eurasians in their genomes, but their between population distance is actually not much different or even lower than Eurasians because of the recent demographic expansions. But did the Bantu expand into empty lands? The Khoisan, Pygmy and Nilotic (I’m sure that’s what it is) contribution to the Bantus across their range is clear, but that’s because we have close enough reference populations to model this contribution. What about areas like Tanzania? Or Mozambique? Were they empty? I suspect the issue here is that we don’t have any non-Bantu indigenous groups as they’ve all been absorbed.

But it is in the selection component that they offer a possible way to ascertain non-Bantu ancestry from ghost populations in the future. They found lots and lots of selection around immune genes. This is not surprising. There were local diseases which they had to adapt to. Therefore, “the HLA region in wBSPs showed a strong excess of ancestry from rainforest hunter-gatherers, at 38%, 6.74 SD higher than the genome-wide average of 16%…..”

In places like Mozambique it would be curious if the regions known to be under selection or enriched for indigenous ancestry in other areas where there are still indigenous populations exhibited a higher Fst against other groups. That is, the Mozambique ghost populations should leave an inordinate impact on regions of the genome associated with immunological function.

Which brings me back to Stonehenge. We do have ancient genomes. But not that many. Especially further back. Apparently the names of rivers and mountains often have very deep histories. For example, the river Humber has a name which may date back to pre-Celtic times (consider the Mississippi river, which has an American Indian origin). These serve as shadows of cultures long gone and replaced. The Bantu expansion is close enough to the margins of history that we don’t have so much time interposed between it and concrete records. We can skein out its outlines with more rigor and surety. And the patterns we see among the Bantus can give us a sense of how past demographic-cultural expansions may have occurred.

* The papers coming out of the PoBI project suggest that a significant minority of the ancestry in eastern England is Anglo-Saxon. But only there.

Addendum: I can’t find the data to download and test some things myself.

May 1, 2017

So what’s point of demographic models which leave you scratching your head

Filed under: Genomics,History,Human Genetics,Selection,Tibetans — Razib Khan @ 10:45 pm


There’s a new paper on Tibetan adaptation to high altitudes, Evolutionary history of Tibetans inferred from whole-genome sequencing. The focus of the paper is on the fact that more genes than have previously been analyzed seem to be the targets of natural selection. And I buy most of their analyses (not sure about the estimate of Denisovan ancestry being 0.4%…these sorts of things can be tricky).

But they fancy it up with a ∂a∂ model of population history, as well as using MSMC to account for gene flow. I don’t understand why they didn’t use something simpler like TreeMix, which can also handle more complex models. I guess because they wanted to focus on only a few populations?

Years ago I asked the developer of MSMC, Stephan Schiffels, if assuming an admixed population is not admixed might cause weird inferences. Why yes, it would. For example, admixed populations might show higher effective population since they’re pooling the histories of two separate populations. As for ∂a∂, the model above leaves me literally scratching my head.

…predicted that the initial divergence between Han and Tibetan was much earlier, at 54kya (bootstrap 95% C.I 44 kya to 58 kya). However, for the first 45ky, the two populations maintained substantial gene flow (6.8×10-4 and 9.0×10-4 per generation per chromosome). After 9.4 kya (bootstrap 95% C.I 8.6 kya to 11.2 kya), the gene flow rate dramatically dropped (1.3×10-11 and 4×10-7 per generation per chromosome), which is consistent with the estimate from MSMC.

Mystifying. The separation between Chinese and Tibetans is pretty much immediately after modern humans arrive in East Asia. Then there’s a lot of reciprocal gene flow…which ends during the Holocene.

We’re being told here that there are two populations which persisted in some form for ~45,000 years. Is this believable? That these two populations maintained some sort of continuity, and, remained in close proximity to engage in gene flow. And then ~10,000 years ago the ancestors of the Tibetans separated from the ancestors of the modern Han Chinese.

The latter scenario I can imagine. It’s this ~45,000 year dance I’m confused by. If there is substantial gene flow between the two groups why did they keep enough distinctive drift to be separate populations?

With what we know about ancient DNA from Europe if we posited such a model for that continent we’d be way off. There’s been too many population turnovers. Is East Asia different? I’m moderately skeptical of that. I think perhaps researchers should be very aware of the limitations of ∂a∂ when it comes to fine-grained population genomic analyses.

Note: This is a cool paper, and this small section is not entirely relevant. Which is why I’m confused about it since it seems the weakest part of the analysis in terms of originality, and the least believable.

April 28, 2017

Beyond “Out of Africa” and multiregionalism: a new synthesis?

Filed under: Africa,Evolution,Genetics,Genomics,Human Evolution,Human Genetics — Razib Khan @ 4:14 pm

For several decades before the present era there have been debates between proponents of the recent African origin of modern humans, and the multiregionalist model. Though molecular methods in a genetic framework have come of the fore of late these were originally paleontological theories, with Chris Stringer and Milford Wolpoff being the two most prominent public exponents of the respective paradigms.

Oftentimes the debate got quite heated. If you read books from the 1990s, when multiregionalism in particular was on the defensive, there were arguments that the recent out of Africa model was more inspirational in regards to our common humanity. As a riposte the multiregionalists asserted that those suggesting recent African origins with total replacement was saying that our species came into being through genocide.

Though some had long warned against this, the dominant perception outside of population genetics was that results such the “mitochondrial Eve” had given strong support to the recent African origin of modern humans, to the exclusion of other ancestry. 2002’s Dawn of Human Culture took it for granted that the recent African origin of modern humans to the total exclusion of other hominin lineages was established fact.

In 2008 I went to a talk where Svante Paabo presented some recent Neanderthal ancient mtDNA work. It was rather ho-hum, as Paabo showed that the Neanderthal lineages were highly diverged from modern ones, and did not leave any descendants. Though of course most modern human lineages did not leave any descendants from that period, Paabo took this evidence supporting the proposition that Neanderthals did not contribute to the modern human gene pool.

When his lab reported autosomal Neanderthal admixture in 2010, it was after initial skepticism and shock internally. I know Milford Wolpoff felt vindicated, while Chris Stringer began to emphasize that the recent African origin of modern humanity also was defined by regional assimilation of other lineages. The data have ultimately converged to a position somewhere between the extreme models of total replacement or balanced and symmetrical gene flow.

This is not surprising. Extreme positions are often rhetorically useful and popular when there’s no data. But reality does not usually conform to our prejudices, so ultimately one has to come down at some point.

The data for non-Africans is rather unequivocal. The vast majority of (>90%) of the ancestry of non-Africans seems to go back to a small number of common ancestors ~60,000 years ago. Perhaps in the range of ~1,000 individuals. These individuals seem to be a node within a phylogenetic tree where all the other branches are occupied by African populations. Between this period and ~15,000 years ago these non-Africans underwent a massive range expansion, until modern humans were present on all continents except Antarctica. Additionally, after the Holocene some of these non-African groups also experienced huge population growth due to intensive agricultural practice.

To give a sense of what I’m getting at, the bottleneck and common ancestry of non-Africans goes back ~60,000 years, but the shared ancestry of Khoisan peoples and non-Khoisan peoples goes back ~150,000-200,000 years. A major lacunae of the current discussion is that often the dynamics which characterize non-Africans are assumed to be applicable to Africans. But they are not.

A 2014 paper illustrates one major difference by inferring effective population from whole genomes: African populations have not gone through the major bottleneck which is imprinted on the genomes of all non-African populations. The Khoisan peoples, the most famous of which are the Bushmen of the Kalahari, have the largest long term effective populations of any human group. The Yoruba people of Nigeria have a history where they were subject to some population decline, but not to the same extent as non-Africans.

What do we take away from this?

One thing is that we have to consider that the assimilationist model which seems to be necessary for non-Africans, also applies to Africans. For years some geneticists have been arguing that some proportion of African ancestry as well is derived from lineages outside of the main line leading up to anatomically modern humans. Without the smoking gun of ancient genomes this will probably remain a speculative hypothesis. I hope that Lee Berger’s recent assertion that they’ve now dated Homo naledi to ~250,000 years before the present may offer up the possibility that ancient DNA will help resolver the question of African archaic admixture (i.e., if naledi is related to the “ghost population”?).

The second dynamic is that the bottleneck-then-range-expansion which is so important in defining the recent prehistory of non-Africans is not as relevant to Africans during the Pleistocene. The very deep split dates being inferred from whole genome analysis of African populations makes me wonder if multiregional evolution is actually much more important within Africa in the development of modern humans in the last few hundred thousand years. Basically, the deep split dates may highlight that there was recurrent gene flow over hundreds of thousands of years between different closely related hominin populations in Africa.

Ultimately, it doesn’t seem entirely surprising that the “Out of Africa” model does not quite apply within Africa.

Addendum: Over the past ~5,000 years we have seen the massive expansion of agricultural populations within the continent. The “deep structure” therefore may have been erased to a great extent, with Pygmies, Khoisan, and Hadza, being the tip of the iceberg in terms of the genetic variation which had characterized the Africa during the Pleistocene.

Older Posts »

Powered by WordPress