Razib Khan One-stop-shopping for all of my content

May 2, 2018

The Insight, Episode 19: Roberta Estes, the Golden State Killer, and forensic genetics

Filed under: Podcast,The Insight — Razib Khan @ 11:26 pm


Last week Spencer & I took a break from The Insight. We’re at 71 iTunes ratings. I would appreciate it if readers of this weblog could help us make it to 100 (then I’ll stop pestering you). Also, we only have 5 reviews on Stitcher.

This week we’re talking to Roberta Estes about the arrest of the suspect in the “Golden State Killings”. We kind put this together really quickly since it seemed relevant, and Roberta, Spencer and I have some competency in this area (we’ve all been talking to science journalists). The biggest takeaway from our conversation is that we were a little surprised that it took this long to apply 21st century genomics to forensics.

When I first heard about the arrest I told my wife that it probably was due to a relative match on something like GEDMatch. After the media reported that it was a “new method” I dismissed my supposition because relative matches aren’t a new or novel thing. Well, it turned out that’s exactly what they were talking about!

A lot of the story here is how law enforcement snapped a bunch of pieces together that were out there. The horse has left the barn, and everyone is trying to figure out how to deal with it.

The genetics of forensic identification

Filed under: criminology,dna-sequencing,Forensics,Genetics — Razib Khan @ 4:03 pm

The arrest of a suspect in the infamous “Golden State Killer” DNA evidence was notable for how he was identified. The media attention the case has garnered means that forensics genetics have come to public attention again in 2018. Not that the public has not been aware of the power of genetics in legal and criminal contexts: The Innocence Project famously leveraged DNA results to show that some individuals had been falsely convicted by eliminating them due to lack of a DNA match from the crime scene.

But the recent illustration of the power of 21st century genomics, with researchers digging through public databases to search for relatives of the potential suspect, was revelatory for much of the world, which had not kept up with the breakneck pace of change in genetics.

The first human genome cost $3,000,000,000 and took more than a decade to complete. Today a good quality human genome sequence can be had for less than $1,000, and generated in around a day in a pinch. The field has been subject to massive changes in the last 10 years, crashing through Moore’s Law and transforming what geneticists are capable of in the present.

The arrest of a suspect in killings that date back four decades has awakened the public to the reality that geneticists have already been living in in the 21st century. It’s like Clark Kent transforming into Superman.

Obviously using genetics to resolve legal disputes is not new at all. Blood group inheritance patterns were understood early in the 20th century, and brought to bear in cases such as paternity disputes. But blood groups are only a small number of traits, with a limited about of variation. In a huge number of cases inheritance patterns wouldn’t resolve anything. If ~25% of the population had blood group A, then finding that wouldn’t allow for narrowing across a broad cross-section of the population even if it would be useful in specific cases.

But even techniques as primitive as blood group inheritance illustrate the power of genetic techniques in the 20th century: they could eliminate a large number of possibilities. ~75% of the population does not have blood group A, so if you are looking at a large number of suspects then removing three out of four possibilities might be worth it.

By the latter decades of the 20th century, forensic genetics took this to the next step. With the molecular revolution in biology, geneticists didn’t have to focus on blood groups — rather, they could look at variation at specific genes that they obtained from various types of biological samples. With the development of new techniques of amplifying DNA from infinitesimally small samples in the 1990s, the amount of genetic material needed declined greatly, making it feasible to revisit cases where DNA analysis was previously deemed impossible.

The combination of molecular biology and genetics in the late 1990s was a
forensic “killer app”, but there was still the problem that geneticists needed to target loci that had enough variation that they could differentiate individuals. If, for example, scientists tackled a genetic position where 99% of the population population has one variant, and 1% the other, in most cases there wouldn’t be much novel information that one could use.

Because forensic labs could only focus on a specific number of genes, they quickly realized that the biggest “bang-for-the-buck” was in highly variable regions. In particular they looked at “short tandem repeats” (STRs). These are regions of the genome subject to expansion or contraction in the number of repeat units during DNA replication, thus generating usable repetitive variation. Where “single nucleotide polymorphisms” (SNPs) are limited to four different bases (A, C, G, and T — and typically only two of the four possible bases), STR loci can differ over many different copy number variants. Because STRs loci are mutate rapidly, they are more polymorphic and vary a great deal even across families.

All this is why they are at the heart of CODIS, Combined DNA Index System, a governmental database used by law enforcement, and centralized at the federal level since 1998. Originally starting with 13 markers, today CODIS uses 20. Because of the high level of variation in these markers, random matches are rare. Though some geneticists dispute the statistics, the FBI estimates that a random CODIS profile should appear about 1 in 10 million cases. That means that there should be more than 30 matches to a profile just based on chance in the United States. Obviously not all of these individuals would be a suspect. All but one would be false positives through DNA testing.

With limited markers, false positives — or more precisely the inability to distinguish between individuals — are always going to be an issue. Just by chance some people will match others within a subset of the genome, even at these highly variable positions. In contrast, the lack of the match eliminates someone from the pool of suspects.

This is why CODIS was useful for exonerating people: if one did not match the DNA sample, one knew that this was not a statistical fluke. A negative match gives a certain conclusion: the individuals are different.* A positive match gives a probability: the individuals are likely the same.

But CODIS is 1990s genetics. The apprehension of the suspect in the rapes and killings from the 1970s and 1980s in California was done with state of the art genetics. While CODIS focuses on 20 markers at most, by 2010 tens of thousands, and today tens of millions, of people were getting large swaths of their genome genotyped, usually at 500,000 to 1,000,000 SNP positions. CODIS relied on STRs because of the expense of genotyping genetic positions in the 1990s.

But today “SNP-chips” cost less than $50 and return nearly a million markers. Data constraints are no longer an issue, and aligning patterns of SNPs across each chromosome allows for highly accurate assessment of relationships between people. Instead of returning the result that two individuals are probably siblings or parent-offspring, one can now conclude that two individuals are siblings, and share 46.5% of their genome in common! (including what segments of each chromosome they share)

With individual DNA data no longer being in short supply, what was needed was a database. CODIS may have about a million profiles, but those are not genotyped on modern DNA technology. Consumer genomics firms such as Ancestry, 23andMe, and Family Tree DNA do have SNP databases of more than a million (Ancestry has more than 10 million), but these are not accessible to law enforcement without a subpoena. However, there are public databases available with SNP genotype profiles. GEDMatch is one of those, with ~1 million entries.

The combination of hundreds of thousands of genetic markers across millions of individuals is powerful. Bringing these together unleashes the ability to look into the pedigrees of thousands of individuals who weren’t tested with just a single sample. There are ~300 million Americans. If GEDMatch has ~1 million samples in its database it is likely that the vast majority of Americans will have matches. Obviously the vast majority of people will not have a perfect match, but because modern methods use hundreds of thousands of variable positions a perfect match is not just a probability anymore, but a surety (barring identical twins there will be only one perfect match in the database per person at most). Matches with 2nd cousins and closer are also ones that can be made with very high confidence. This means people who descend from common great-grandparents — but even without that many people can make matches with people more distantly related; the suspect in the case above shared common great-great-grandparents with people in the GEDMatch database.

Genetic genealogists have become adept at looking at patterns of probabilistic matches that are quite distant, and triangulating them with other pieces of data to establish high confidence genealogical connections. Once those connections are made, obtaining DNA from suspects would yield a result that law enforcement could have near-perfect confidence in.

Law enforcement, the media, and the public are living in the genetic 1990s. The future is actually happening in the present, led by consumer genomics databases and “citizen scientists.” The lesson we can distill from the headlines is that genetic privacy may, in many ways, now be a 20th century novelty in the eyes of the law.

Explore your Regional Ancestry story today.

k

* There are exceptions to this when it comes to genetic mosaicism.

Regional Ancestry


The genetics of forensic identification was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

May 1, 2018

Hui have a lot of West Eurasian Y chromosomes

Filed under: China genetics,Hui — Razib Khan @ 1:26 am
O C R1a R1b R2 E1b G H I1 I2 J1 J2 L N Q T Total N
Han 258 12 2 2 2 1 1 2 1 1 7 9 2 300
Hui 24 7 21 1 9 1 3 1 1 4 1 11 1 3 14 4 106
Tibetan 49 11 18 1 1 3 3 3 3 7 1 100

It’s been a while since I checked in on the genetics of the Hui people. I found the paper, Analysis of 17 Y‐STR loci haplotype and Y‐chromosome haplogroup distribution in five Chinese ethnic groups. About 50% of the Y chromosomal haplogroups are normally classified as “West Eurasian” (R, E, G, I and J). But curious a fraction of the Han have these too, as do some Tibetans.

Additionally, know that some Mongols also have R1a1a. It’s hard to differentiate different periods of admixture. But to me the presence of R2 and J2 point to a Central/South Asian origin of a lot of the Hui R1a as well.

April 30, 2018

Is American genetic diversity enough?

Filed under: Historical Population Genetics,Human Genetic Variation — Razib Khan @ 8:51 pm


In the nearly 20 years since the draft of the human genome was complete,* we’ve moved on to bigger and better things. In particular, researchers are looking to diversify their panels of human genetic diversity, because of differences between groups matter. You can’t just substitute them for each other genetically.

There have been efforts to diversify the population panels recently, but that prompts the question whether American population coverage is sufficient. My first thought is that the genetic diversity in the USA is probably getting us 90% of the way there. Consider Spencer’s comment about Queens, it’s the most ethnically diverse large conurbation in the country.

There are some gaps though. In Who We Are David Reich points out the distinctiveness of Indian population genetics. The subcontinent has lots of large census populations which have drifted upward deleterious alleles due to long-term endogamy. And, many of these populations don’t have a strong representation in the Diaspora.

In contrast, much of the rest of the world is panmictic enough that an American panel can pick up most of the variation. American Chinese are skewed toward Guandong and Fujian, but a substantial number of people from other parts of China have arrived in the last generation. Regional structure is not so strong that you’ll miss out on too much, aside from very rare variants which are more extended pedigree scale rather than population scale.

There are small populations such as Hadza, Khoikhoi, and Pygmies in Africa which are probably going to be missed by American population panels, but the total census size of these groups is pretty low (for comparison, there are 1 million Pulayar Dalits in the state of Kerala alone). Much of the rest of Africa is West African variation well represented in African Americans, and Bantu and Nilotic variation probably captured my immigrant communities.

I’d propose supplementing American genetic diversity with sampling Cape Coloureds in South Africa.

* No discussions about how the genome isn’t totally complete. I know that.

Diaspora culture are often more conservative

Filed under: Culture,History — Razib Khan @ 8:45 am

Zach made a comment below about conservatism and Diaspora cultures. There are two trends one has to highlight here. One the one hand Diaspora cultures often exhibit synthesis with host cultures and can be quite novel and innovative.

But there is another trend which is a cultural universal: Diaspora cultures often exhibit archaism and crystallize old-fashioned norms and practices. To give a concrete example foot-binding persisted the longest, down to the 1970s, in the Chinese communities of Borneo. The French of Quebec is peculiar in part because it preserves characteristics of older French dialects. The same is true of some Anglo-American English dialects.

April 29, 2018

Open Thread, 04/29/2018

Filed under: Open Thread — Razib Khan @ 10:05 pm

One of the strange things about getting old is that your friends start to become kind of a big deal. Matthew Hahn has a new book out, Molecular Population Genetics. If there is one single reason I keep blogging, it’s to get awareness of the field of population genetics to spread beyond the small circle who are “in the know.” I joked on Twitter that buying this textbook is like spending money to talk to Matt about pop-gen, and that’s surely worth it.

Another one for the stack!

Speaking of worth it, Kyle Harper’s The Fate of Rome: Climate, Disease, and the End of an Empire is definitely worth a read. Not done, and I’m not sure it’s better than The Fall of Rome: And the End of Civilization. Perhaps my issue is that exogenous shocks are to be expected in my view of the world. Though the details in The Fate of Rome are novel, the general thesis and framework were what I’d assumed were taken for granted.

What Happens When Geneticists Talk Sloppily About Race. I don’t think that David Reich was sloppy…though the op-ed was edited in a way that was confusing. That being said I’ve heard through the grapevine that some prominent human population geneticists may write a response to David’s op-ed, which is something I want to see. Part of me still thinks that these vigorous public discussions are important (another part of me just thinks that when Sulla or Marius take over all this old-fashioned fixation on truth will be irrelevant).

One thing stated in the piece above is that regular people have a Platonic model of race. This is true. But it is also a fact that geneticists have not done a good job of explaining to the educated public what population structure is, and why it’s not trivial or arbitrary. I know this from personal experience over 15 years interacting with people about genetics online (some of the funniest interactions are on Facebook where a person of professional class background/status “genetics-splains” me about how I don’t understand the extent [lack] of human genetic variation and how arbitrary population cluster identity is).

With The Genomic Formation of South and Central Asia I obviously think we have the broad outlines of the peopling of South Asia in hand. There will be lots of detailed elaborations of how/what happened, but I think the big picture is nailed down.

That being said some of the objections remind me a lot of Creationist tendencies. Creationists often focus on weak points and hammer in on them over and over.

One of the weird things about Indian genetics is that a lot of people think new research will overturn Hindu nationalism. But I know several Hindu nationalists, and privately they tell me that most Hindu nationalists don’t care about these abstruse issues, and many of the more intellectual ones don’t have a major problem with the science.

GEDmatch, Ysearch and the Golden State Killer.

Anthropogenic habitat alteration leads to rapid loss of adaptive variation and restoration potential in wild salmon populations.

Bracketing phenotypic limits of mammalian hybridization.

A few people have asked about the podcast. We skipped a week, but we’ll be back. Taking some feedback in relation to various aspects of the show. A common issue seems to be that my voice is too quiet though Spencer’s is “just right.”

Again, if you use Stitcher or iTunes please remember to give us positive reviews and 5-stars!

If you have ideas for shows, we’re game.

April 27, 2018

Why Bronze Age steppe people replaced the farmers they conquered

Filed under: Historical Genetics,History,steppe — Razib Khan @ 9:59 pm

One of the major revisions in my own mind about the demographic and historical processes of the Holocene in relation to humans has been the reality that large and dense agglomerations of agriculturalists could be marginalized by later peoples, to the point of having a smaller genetic footprint in the future than anyone might have imagined. If you had asked me ten years ago I just wouldn’t have believed that the first farmers of Europe or South Asia wouldn’t account for the vast majority of the ancestry of the contemporary populations of the region. By “first farmers” I don’t even mean migrants. At that point, I had assumed a primarily Pleistocene indigenous hypothesis for the origin of Europeans and South Asians, with farming diffusing through a mixture of a few migrants along a demographic wave of advance.

That’s not what it looks like according to ancient DNA. In Northern Europe, it seems that around half or more of the ancestry is due to the incursions of a pastoralist steppe population during the Bronze Age. In Southern Europe and South Asia, the fraction is closer to 10-25%. But even in the latter case, the fraction of steppe ancestry is far higher than I had expected.

I had assumed that the steppe migrants would contribute 1-5% of the ancestry of Europeans and South Asians and that the spread of Indo-European languages was a matter of elite transmission and emulation. Think the Hungarians, for example, as an example of what had assumed.

So what explains what really happened?

During the Mongol conquest of Northern China Genghis Khan reputedly wanted to turn the land that had been the heart of the Middle Kingdom into pasture, first by exterminating the whole population. Part of the motive was to punish the Chinese for resisting his armies, and part of it was to increase his wealth. One of his advisors, Yelu Chucai, a functionary from the Khitai people, dissuaded him from this path through appealing to his selfishness. Chinese peasants taxed on their surplus would enrich Genghis Khan far more than enlarging his herds. Rather than focus on primary production, Genghis Khan could sit atop a more complex economic system and extract rents.

Most of you at this point can see the general framework then. For thousands of years, pastoralist people of the Inner Asian steppe and forest would extract rents out of the oikoumene by threatening them with force. The reason the East Roman Empire did not face the Hunnic onslaught during the lifetime of Attila is that they paid the horde tribute. Imperial China did the same during some periods. In other instances, civilized states found in the barbarians of the steppe useful confederates. The Tang dynasty did not collapse during the 750s because of the intervention of the Uyghurs, who suppressed the rebellion of An Lushan. In 9th century Baghdad the rise of the Turks was enabled by their usefulness in court politics and distance from any given faction.

The rise of the “gunpowder empires” during the 16th century and the eventual closing of the Inner Asian frontier with the crushing of the last embers of the Oirat confederacy between the Russian and Chinese Empires in the 18th century marked the end of thousands of years of interaction between the farmland and pasture.

But this makes us ask: when did this dynamic begin? I don’t think it was primordial. It was invented and developed over time through trial and error. I believe that the initial instinct of pastoralists was to turn farmland into pasture for his herds. This was Genghis Khan’s instinct. The rude barbarian that he was he had not grown up in the extortive system which more civilized barbarians, such as the Khitai, had been habituated to.

In these situations where pastoralists expropriated the land, there wouldn’t have been an opportunity for the farmer to raise a family. Barbarian warlords throughout history have aspired to be rich by plundering from the civilized the peoples…but would the earliest generations have understood the complexity of the institutions that they would have to extract rents out of if there wasn’t a precedent?

Instead of conventional historical dynamics of predatory elites and static peasantry, a better way to understand what occurred with the incursion of steppe pastoralists during the Bronze Age might be a simple ecological model of intra-specific competition. In a pre-state society defined by clan and tribal ties, steppe elites may have seen the farmers who were earlier residents in the territories which they were expanding into as competitors rather than resources from which a life of leisure might be obtained. In other words, instead of conquest, the dynamic was of animal competition.

Of course, pre-modern societies did not have totalitarian states and deadly technology. Rapid organized genocide in a way that we would understand was unlikely to have happened. Rather, in a world on the Malthusian margin, a few generations of deprivation may have resulted in the rapid demographic extinction of whole cultures. You don’t need to kill them if they starve because they were driven off their land.

In fact, we have some precedent of this historically. The Spaniards were intent on extracting rents out of the native peoples of the New World and living a life of leisure, but in many areas disease and exploitation resulted in demographic collapse. Imagine a conquest elite as vicious as the Spaniards, but without thousands of years of precedent that conquered peoples were more useful alive rather than dead. 

Addendum: The fraction of haplogroup M, which is probably derived from Pleistocene South Asians, is greater than 50% in places like Sindh. This indicates that the steppe migrations were strongly male biased in the initial generations.

Closing the genetic chapter

Filed under: Genetics — Razib Khan @ 8:51 am

Indus Valley People Did Not Have Genetic Contribution From The Steppes: Head Of Ancient DNA Lab Testing Rakhigarhi Samples:

In other words, the preprint observes that the migration from the steppes to South Asia was the source of the Indo-European languages in the subcontinent. Commenting on this, Rai said, “any model of migration of Indo-Europeans from South Asia simply cannot fit the data that is now available.”

Some more comments at my other weblog.

At this point, we need to move to other things. I think the broad genetic framework is pretty clear.

1) The Indus Valley Civilization (IVC) people were a mix of eastern West Asian (from modern Iran) people and native South Asian peoples (~80% of South Asian mtDNA are haplogroup M).

2) ~1500 BC a major incursion from the steppe occurred and overlaid upon #1 to various extents as a function of region, language, and caste.

3) ~0 to 500 AD the strong endogamy that characterizes modern South Asians seems to have established itself.

Rakhigarhi sample doesn’t have steppe ancestry (probably “Indus Periphery”)

Filed under: India Genetics,India genomics,Rakhigarhi — Razib Khan @ 12:02 am

We’ve been waiting for two years now, and it looks like they’re about to pull the trigger, Indus Valley People Did Not Have Genetic Contribution From The Steppes: Head Of Ancient DNA Lab Testing Rakhigarhi Samples:

Niraj Rai, the head of the Ancient DNA Laboratory at Lucknow’s Birbal Sahni Institute of Palaeosciences (BSIP), where the DNA samples from the Harappan site of Rakhigarhi in Haryana are being analysed, has revealed that a forthcoming paper on the work will show that there is no steppe contribution to the DNA of the Harappan people….

“It will show that there is no steppe contribution to the Indus Valley DNA,” Rai said. “The Indus Valley people were indigenous, but in the sense that their DNA had contributions from near eastern Iranian farmers mixed with the Indian hunter-gatherer DNA, that is still reflected in the DNA of the people of the Andaman islands.” He added that the paper based on the examination of the Rakhigarhi samples would soon be published on bioRxiv (pronounced “bio-archive”), a preprint repository of papers in the life sciences.

At this point none of this is surprising. I also wonder if this preprint was hastened by the release of The Genomic Formation of South and Central Asia. It seems that the results here are totally consonant with what came before. My expectation is that the lone sample that they got genetic material out of will be similar to the “Indus Periphery” (InPe) individuals in the earlier preprint: a mix of West Asian with ancestry strongly shifted toward eastern Iran, and indigenous South Asian “hunter-gatherer.”  That’s pretty much what Niraj Rai states in the piece. I think genetically the individual won’t be that different from the Chamars of modern day Punjab.

In fact, Rai, the lead researcher, ends by twisting the knife:

In other words, the preprint observes that the migration from the steppes to South Asia was the source of the Indo-European languages in the subcontinent. Commenting on this, Rai said, “any model of migration of Indo-Europeans from South Asia simply cannot fit the data that is now available.”

A major caveat here is that we’re talking about one sample from the eastern edge of the Indus Valley Civilization (IVC). I’m not sure that this should adjust our probabilities that much. From all the other things we know, as well as copious ancient DNA from Central Asia, our probability for the model which the Rakhigarhi result aligns with should already be quite high.

Again, since it’s one sample, we need to be cautious…but I bet once we have more samples from the IVC the Rakhigarhi individual will probably be enriched for AASI relative to other samples from the IVC. The InPe samples in The Genomic Formation of South and Central Asia exhibited some variation, and it’s likely that the IVC region was genetically heterogeneous.

But, this is going to be a DNA sample from an individual who lived 4,600 years ago within the orbit of the IVC when it was in its mature phase. That’s still a big deal. As most of you know the IVC is prehistory because we haven’t deciphered the seals which are associated with this civilization. But, the IVC clearly had relationships with West Asia and Central Asia, with parts of eastern Iran and the BMAC culture both being influenced and interaction with it. Traders who were likely from the IVC seem to be mentioned in Mesopotamian records.

Additionally, the genetics of one individual can be highly informative if it’s high-quality whole-genome data (I’m skeptical of that in this case). One could possibly even identify the time period that admixture between West Asian and AASI components occurred from a single genome, by looking at ancestry tract lengths.

A single sample isn’t going to falsify the idea held by some that steppe peoples were long present within the IVC. Perhaps they’ll show up in other samples? That’s possible, and it’s what I would argue if I held their position, but I think the constellation of evidence on the balance now does suggest that a relatively late incursion into South Asia is likely. The steppe ancestry with Northern European affinities shows up in BMAC only around 4,000 years ago. It is hard to imagine it was in South Asia before it was in Central Asia.

As I’ve been saying for a while it seems that though there will be more genetic work written on India in the near future, the real analysis is going to have to come out of archaeology and mythology.

It’s pretty clear that in Northern Europe the arrival of the Corded Ware peoples from the steppe zone resulted in great tumult. A linguistic analysis suggests that the languages of Northern Europe have words related to agriculture with a non-Indo-European origin, of common provenance.  But we don’t have much in the way of mythos about the arrival of the Corded Ware.

In contrast, India has a rich mythos which seems to date to the early period of the arrival of the Indo-Aryans. One interpretation has been that since these myths seem to take as a given that Indo-Aryans were autochtonous to India, they were. But the genetic data seem to be strongly suggesting that the arrival of pastoralists occurred in South Asia concomitant with their arrival in West Asia, and somewhat after their expansion westward into Europe. Indian tradition and mythos could actually be a window into the general process of how these pastoralists dealt with native peoples and an illustration of the sort of cultural synthesis that often occurred.

April 26, 2018

The genetic future is here when it comes to finding relatives of suspects

Filed under: Forensics,Personal Genetics — Razib Khan @ 11:05 pm

You may have heard that a suspect was arrested who is alleged to be the “Golden State Killer.” DNA played an important role, Relative’s DNA from genealogy websites cracked East Area Rapist case, DA’s office says.

I think Alexander Kim’s supposition is probably right. It wasn’t a direct to consumer company that you know of that uses a genome-wide analysis, but probably old-fashioned Y STR matching which allowed the researchers to converge on the suspect. The public databases for this are extensive enough now that they might yield something, and law enforcement is comfortable with STR tests. This is really a preview of what’s to come. If researchers routinely extract DNA from remains that are tens of thousands of years old it seems clear that a lot more material will come out of old rape kits.

That’s one dimension. The other dimension is that we have many more markers to work with now. Even without whole-genome analysis, you can identify relatives with reasonable precision out to 2nd cousins (it gets a little dicier beyond that).

But the most important variable happens to be with numbers. If you read Alon Keinan’s piece, Crowdsourcing big data research on human history and health: from genealogies to genomes and back again, you know that probably nearly 20 million people have taken advantage of genome-wide consumer testing. Assuming 10 million are in the United States, a substantial number of “cold cases” could probably be closed by just looking for matches within these databases and establishing the pedigrees which suspects come from.

Of course, the genomics companies are not just going to open their databases to law enforcement.  But I’m not sure that that will be necessary. There are enough genealogy enthusiasts that public forums and services to facilitate matches will probably suffice. If only a few percent of the American population is in these forums, then that might get us 90% of the way there.

Addendum: There has been some work in forensic genetics “predicting” physical appearance. A lot of this is not primetime, but one area where a lot could be done: fine-scale ancestral analysis. Using haplotype-based methods and looking for matches within public datasets one could probably narrow down the ethnic background of a suspect pretty well from DNA. If the test tells you someone is Northern European in Minnesota that might not help, but if it tells you that they are around half Lithuanian, that might be very useful….

The Ancient Neanderthal Mariner

Filed under: Human Evolution,Human Population Genetics — Razib Khan @ 10:35 pm

More recent stuff on Neanderthals of interest, Neandertals, Stone Age people may have voyaged the Mediterranean:

A decade ago, when excavators claimed to have found stone tools on the Greek island of Crete dating back at least 130,000 years, other archaeologists were stunned—and skeptical. But since then, at that site and others, researchers have quietly built up a convincing case for Stone Age seafarers—and for the even more remarkable possibility that they were Neandertals, the extinct cousins of modern humans.

But a growing inventory of stone tools and the occasional bone scattered across Eurasia tells a radically different story. (Wooden boats and paddles don’t typically survive the ages.) Early members of the human family such as Homo erectus are now known to have crossed several kilometers of deep water more than a million years ago in Indonesia, to islands such as Flores and Sulawesi. Modern humans braved treacherous waters to reach Australia by 65,000 years ago. But in both cases, some archaeologists say early seafarers might have embarked by accident, perhaps swept out to sea by tsunamis.

The effective population size of Australian people is just too large for me to imagine that it was only a few individuals swept out on driftwood. There was some sort of sea-going craft which mediated migration to Sahul from Sundaland. Just because we have only recent evidence of sea-going craft doesn’t mean that they weren’t around for tens of thousands of years before that.

I’ve been hearing about Neanderthal tools on islands like Crete, which were never connected with the European mainland, for a while now. It seems that people are finally convinced that this is the real deal, as the stratigraphy came together to confirm dates. One thing that seems obvious from this, as well as Neanderthal “art”, is that the differences between modern humans and Neanderthals were more quantitative than qualitative. Differences of degree, not of kind.

It is hard to deny that modern human expansion between 60 and 15 thousand years ago is sui generis. Hominins didn’t make it to the New World or Sahul, what later became Oceania, until our own kind. There’s also a fair amount of evidence that our lineage pushed the northern frontier of human habitation beyond what Neanderthals ever did. But in the process of marking off our distinctiveness, it seems to me that we’ve overemphasized the differences between us and Neanderthals, and dismissed or ignored evidence of “human-like” “advanced” behaviors from them.

I’ll still go with the prediction that we’ll never find a singular gene which marks us off from other human lineages.

April 25, 2018

Beyond cultural parochialism

Filed under: History — Razib Khan @ 11:21 pm

A major personal peeve of mine is that the past few centuries of Western colonialism have overshadowed so much that moderns are often unequipped to understand the vast tapestry of human historical and geographical diversity. If you are a modern Indian or Chinese or African person you know your own culture and its history…and its relationship to the modern West. This is a shadow of a bygone age which is down in its terminal stage.

Presuming that the audience of this weblog is mostly South Asian, here are some very broad surveys which I think the audience might find interesting:

The Classical World: An Epic History from Homer to Hadrian

China: A History

Africa: A Biography of the Continent

The Russian Moment in World History

Strange Parallels…Southeast Asia in Global Context, c.800-1830

History of Japan

A History of the Ancient Near East, ca. 3000-323 BC

When Baghdad Ruled the Muslim World: The Rise and Fall of Islam’s Greatest Dynasty

The Great Sea: A Human History of the Mediterranean

A History of Iran: Empire of the Mind

Aboriginal Australians: A History Since 1788

If anyone can recommend a good survey of Latin American history, I’m game.

Whales and complex speciation

Filed under: Evolutionary Genomics,Speciation — Razib Khan @ 10:07 pm


“Reader request”, what’s going on with this new crazy baleen whale paper, Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. First, putting “blue whale” in the title is genius, since blue whales are awesome and people will read the paper with that in the title (most people don’t know what “rorquals” are). Second, this paper was interesting because it highlighted the importance of thinking across different ecologies when attempting to understand evolutionary processes.

Since I don’t think too much about speciation, a lot of my thought is derived from the fifteen-year-old book Speciation (good book, too bad it doesn’t seem to be in print anymore!). The authors of Speciation are evolutionary geneticists and emphasize allopatric speciation and the biological species concept. They’re instrumentalists. Basically, you separate populations and they eventually diverge until they’re no longer interfertile. Then you get species.

The problem is that in the oceans allopatric speciation isn’t as straightforward, the seas are open three-dimensional spaces after all. This opens up the likelihood that a lot of oceanic speciation is sympatric speciation (think cichlid fish). Something like this seems to apply to large non-toothed whales in this study.

Though the gray whale is phenotypically very distinct from others in the study above, it turns out that phylogenetically they are within the rorqual clade. The authors suggest that the gray whale distinctiveness is a function of adaptation to a benthic lifestyle. They’re bottom feeders.

The topology at the top of this post illustrates that there seems to have been a lot of complexity and gene flow as the rorquals diverged early on so that it’s not really a simple bifurcating phylogenetic tree. We’ve seen this story before. Remember Genetic evidence for complex speciation of humans and chimpanzees?

I think the moral of the story is that large mammalian species which are the basis of the biological species concept don’t really fit under that paradigm too easily. Even this study is probably not going to be the last word on rorqual phylogenetics.

DNA, from genetics to genomics

Filed under: DNA,Genetics,Genomics,science — Razib Khan @ 11:59 am

In the early 1950s scientists established that the molecular structure of DNA was a double helix. The had discovered the physical substrate of heredity. With this discovery the field of molecular genetics was born (and eventually a Nobel Prize given!).

And yet we also know that Gregor Mendel discovered the laws of heredity, the “law of segregation” and the “law of independent assortment”, nearly a century before the discovery of DNA.

It was literally the product of a garden.

The mature field of genetics itself developed fifty years before the discovery of the structure of DNA, as a host of scientists stumbled upon Mendelian insights simultaneously. Most were biologists who worked with plants, flies, or even algebra — no need for a powerful microscope or structural models of molecules.

Though DNA has been the key to many of the discoveries of the past fifty years, it is important to remember that the field of genetics is predicated on an abstract understanding of how inheritance works across pedigrees, as opposed to the biophysical basis of that transmission. Before DNA, before chromosomes, what Mendel and his heirs understood is that inheritance occurs through a process where discrete units of heredity, “genes”, are passed down from generation to generation.

These genes usually come in two copies, ‘alleles,’ for many organisms.

Recessive expression patterns of a trait, where parents do not express a characteristic found in their offspring, becomes comprehensible when a Mendelian model is adopted. Prior to this many had an intuitive “blending” understanding of inheritance, where the characteristics of the parents mixed together to produce offspring. The ultimate problem with blending inheritance is that it had difficulty in explaining how variation persisted over time. A problem solved by the Mendelian insight that genetic variation never disappeared…it simply rearranged itself every generation!

Genetics was born on the backs of Drosophila

Between the reemergence of Mendelian thought around 1900 and the discovery of DNA in the 1950s much research occurred in the field of genetics. The Neo-Darwinian Synthesis built upon the mathematical foundations of population genetics, which took the Mendelian framework and formalized and extended them, to create a model of evolutionary biology for the 20th century. Medical geneticists began to understand the patterns of inheritance of rare diseases in humans with the aim of preventing illness. Those researchers working with fruit flies discovered many of the phenomena which define modern genetics, such as recombination. Finally, biochemists established that heredity and nucleic acids were intimately connected.

Just as an understanding of the discrete basis of inheritance in a Mendelian framework opened up the systematic scientific study of heredity, so the understanding of the double helical structure of DNA paved the way for the molecular revolution of the second half of the 20th century, and the genomic revolution of the 21st. An understanding of DNA as the mode of inheritance allowed for the development of techniques that traced transmission of variation at the level of genes themselves, as opposed to expressed traits.

Illumina sequencing machine

And while in the 20th century we spoke of genetics, and specific genes, today we speak of genomes and the whole set of genes organisms possess. That revolution can not be understood without the knowledge of DNA as the mode of inheritance. If classical Mendelian genetics is pattern recognition across pedigrees, 21st century genomics is a synthesis of classical genetics, post-DNA era biophysics, and cutting-edge computing. Genomics is as much engineering as it is science; and “big data” as much as information theory.

The understanding of DNA created the world where genetics transformed itself from an esoteric science of probabilities, to a mass market product of possibilities.

Classical genetics tells you that your relatedness to your brother or sister is expected to be 0.50. Modern genomics might tell you that your relatedness to your brother or sister is shared across 46.24% of your genome. A fuzzy probability becomes a crisp reality. As a science, genetics can be imagined without DNA. It was born and matured decades before we understood the importance of the double helix, but as a part of our lives, one can’t imagine genetics without DNA.

Learn more about where your traits for food tolerance fall on the spectrum and explore your Metabolism story today.


DNA, from genetics to genomics was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

April 24, 2018

Why do Indians care about OIT/AIT

Filed under: AIT,Culture,OIT — Razib Khan @ 1:17 pm

From my blog:

Razib: I follow your super feed and read your postings here and on Brown Pundits. The subject of the ancestry of South Asians comes up frequently. It seems to have a political valence that I, as an outsider, do not understand.

Can you explain it? or point us to an explanation?

My response is “British colonialism and modern-day culture wars.” I could say more, but honestly, I don’t care that much. The science is more interesting to me, and it’s a lot to keep track of. Can readers comment?

(Related: there are some Pakistanis who try and pretend as if they are descended from Persians, Turks, or even Arabs. The explanation is pretty straightforwardly summarized as “self-hatred”, though we could all elaborate on that).

Open Thread, 4/24/2018

Filed under: Open Thread — Razib Khan @ 12:23 am

Finished She Has Her Mother’s Laugh: The Powers, Perversions, and Potential of Heredity. To be honest I was pleasantly surprised that the narrative wasn’t overly fixated on the ‘perversions.’ Sometimes it’s hard to move past that.

I think different people will benefit from reading the book differently. If you are a layperson a serial reading from front to back is optimal. She Has Her Mother’s Laugh is a long book, so this will take a while. But you need to do this to get situated. If you are a geneticist, you may benefit from jumping around chapters, and sampling what people in other fields are doing. Additionally, some geneticists would actually benefit from reading the historical chapters.

Started reading The Fate of Rome: Climate, Disease, and the End of an Empire. Yes, it’s very good. Will see if it’s better than The Fall of Rome: And the End of Civilization after I’ve finished.

Thanks for whoever reviewed the podcast I cohost on iTunes and Sticher. If you haven’t done so, please do so!

Appreciate the feedback so far.

Found out today that India Today posted my review of Who We Are a few weeks ago! Pretty funny I didn’t see it.

Meanwhile, The Genetic History of Indians: Are We What We Think We Are? It looks like Indian scientists are bending before reality: ““How do I say it? See, I am a nationalist,” Rai says over the phone. “People will be upset. But that’s how it is. All the studies are showing that people came here from elsewhere.”

A friend asked again “how do I learn population genetics?” My opinion has not changed in the 15 years I’ve become interested in the field, read Principles of Population Genetics. If you need a gentle introduction, Population Genetics: A Concise Guide is probably that. But I read Principles of Population Genetics in 2004 without any formal training in the field. It’s not that difficult if you put time into it.

Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia. Gotta do it on flies first!

California, Coffee and Cancer: One of These Doesn’t Belong. The cancer warnings in California are treated as a joke by the population. Unfortunately, there are real carcinogens out there.

Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits.

April 23, 2018

The water rises and Canute drowns

Filed under: Genetics — Razib Khan @ 11:49 pm

The Genetic History of Indians: Are We What We Think We Are?. The answer is that people of all races have always been what they always were. What we think about what we were…well, that changes.

“I KNOW PEOPLE won’t be happy to hear this,” geneticist Niraj Rai says over the phone from Lucknow. “But I don’t think we can refute it anymore. A migration into [ancient] India did happen.” As head of the Ancient DNA Lab at Lucknow’s Birbal Sahni Institute of Palaeosciences (BSIP), he earlier worked at the CCMB in Hyderabad and has been part of several studies that employed genetics to examine lineages. “It is clear now more than ever before,” he says, “that people from Central Asia came here and mingled with [local residents]. Most of us, in varying degrees, are all descendants of those people.”

Some researchers, even those associated with the current study like Shinde, aren’t quite convinced that an ancient influx of people into the subcontinent from the northwest has finally been established by the latest findings. Shinde does not like the word ‘migration’. “It is better to say movement,” he says, implying a two-way pattern. “Everyone back then was moving to and fro. Some people were moving here and some were moving out. There was contact, yes. There was trade. But local people were involved in the development of several things. So I am not very sure of the interpretation.”

As Rai points out, the analysis of the DNA sample they will present will be of a period before the Steppe people supposedly arrived in India. If R1a is absent in the Indus Valley sample, it suggests that it was brought into South Asia, perhaps by a proto-Indo- European speaking group, from elsewhere. “How do I say it? See, I am a nationalist,” Rai says over the phone. “People will be upset. But that’s how it is. All the studies are showing that people came here from elsewhere.”

I’ve been hearing from Indian journalists that some of these researchers have only “evolved” over the last few months. First, it’s a credit to them if they changed their views on the new data. If the above is correct they got usable DNA from one Rakhigarhi sample. I predict it will be like “Indus Periphery”, but with more AASI. It seems rather clear they’re going to submit a preprint within a month or so (that’s the plan, but it’s been the plan for a year!), but the results are being written up now.

Meanwhile, the ancient DNA tsunami is going to come in further waves in the near future. Various groups have huge data sets from Central Eurasia that are going to surface. Unfortunately, samples are going to be thin on the ground from India, but we have enough now that in broad sketches most people are now falling in line with what happened demographically from the northwest. The “AASI” ancestry is deeply rooted in South Asia, and it doesn’t look like there’s much of an impact of this outside of the subcontinent aside from nearby regions.

The real action is now in understanding the cultural and archaeological processes involved in the perturbation in the years after 2000 BCE. I’ve talked to a few of the geneticists working in this area over the past month or so, and they agree.

April 21, 2018

There were possibly late archaic introgression events in Eurasia

Filed under: Human Population Genetics — Razib Khan @ 12:14 am

A few weeks ago I posted on the strong likelihood that there were at least two Denisovan admixture events in Eurasia into modern humans. That’s probably the floor, not the ceiling. We have an Altai Denisovan genome, but the proportion is so low in most of South and Southeast Asia I don’t think we have a good grasp of how that component differs from the Oceanian fraction, which is much higher.

At the AAPA meeting last week I noticed something strange in one of the presentations: introgressed Denisovan variants which were present among East Asian populations, but lacking elsewhere. The fractions were not >50%, but they were >10%. The Denisovan variants were nearly absent outside of this core zone of East Asians.

There are two possible reasons for this distribution. One reason is that Denisovan variants were segregating in East Asians for thousands of years, and a common bottleneck, or, more likely selection, drove them up in frequency. Another, not exclusive, explanation is that admixture occurred in East Asia relatively late. The Denisovan signature is totally absent in the New World. Either that’s selection or drift eliminating variation, or, it’s the fact that this admixture event happened in East Asia less than about 30,000 years ago when Native American populations’ East Asian-like source population began to divergence from that of East Asians.

One thing that we know from paleontology is that species exist before the remains we find, and persist after the remains we find. It’s quite possible that small relic populations of Denisovans persisted for thousands of years after modern humans came to dominate the East Asian landscape.

April 20, 2018

Is everyone racist and I’m not aware?

Filed under: race,Racism — Razib Khan @ 2:36 pm
Me, proudly culturally appropriating

The expulsion of two young black men from Starbucks is in the news, and people are sharing their experiences. To be honest I’m not surprised that this happened to young black men. What I am surprised by are South Asians who express their own fear of being seen to not buy anything (in part to highlight the privileges that white people have).

I’m a pretty standard looking brown person. Most people realize that I’m South Asian (or “Indian”) when they meet me. Sometimes when I have a very close buzzcut I’m pretty sure people assume I’m a black American (when I got burritos at a Mexican place someone referred to me as the “black guy” in Spanish once when my head was shaved). And a reasonable amount of time people have wondered if I’m a Mexican American, though less and less over the years.

I’ve also spent a fair amount of time in Starbucks. When I’m traveling I always go to a Starbucks because it’s familiar (when I’m not traveling I rarely do anymore). Sometimes I’ll hang out for a while before someone shows up without buying anything. There have even been times where I never bought anything, but just met up with someone. I’ve never felt in any danger of being kicked out.

In fact, in the United States, my main worry about my race is in a very specific context: airports. Since I fly a fair amount I have a routine down. Always shave. Always get there way earlier. Prepare ahead so you don’t seem stressed or uncertain. It’s not super onerous, but I am conscious that I’m probably under more scrutiny.

All that being said I’ve never had a problem in American airports. I have had problems in Europen airports, after a fashion. An example might be a flight in Germany when security was stopping every young non-white male, whether black, brown or Asian before we got on the flight (after we’d made it through the checkpoints). And, when I was in Italy in 2010 on a trip the racism was more palpable. At one point I was denied service by a street vendor, and when I was at a bookstore my wife (then girlfriend) told me I was getting suspicious looks, and there was a misunderstanding with one of the clerks (I don’t speak Italian).  I definitely felt there was more racism in Europe day to day than I’ve experienced in this country, and I speak as someone who grew up in eastern Oregon.

And yet I’m not here to deny the racism that other South Asian Americans face. Their experience is their experience, and so is mine. What’s the difference here? Are people giving me dirty looks that I don’t even notice? Or are other people hyper-aware of what’s going on around them and perceive slights that might not be intended?

I should add that this tendency is common in my family. We don’t seem to perceive racism around us. Perhaps we’re just oblivious?

What do I think though? Honestly, I think there are different levels and types of racism. If you are South or East Asian you are not going to be under the same scrutiny as a black male. Certainly, there is white privilege in relation to being a brown person. Or at least I’m told there is…I’m not white and can’t pass as a white person, so I can only trust people like Linda Sarsour who are nonwhite by choice that life is a lot easier for whites.

I do a real good SJW impersonation because I have good verbal skills and “present” as nonwhite. But it always seems fake to me. I’ve experienced racism in this country, but it’s not pervasive. I felt under more scrutiny in the Middle East to keep to my lane, and that’s despite my “Muslim name.”

I’m curious as to other peoples’ experiences. The above are just mine.

Good night Avcii, you lived before you died

Filed under: Culture — Razib Khan @ 11:58 am

Like many people I didn’t know much about Avicii when he was alive, though now I know much more that he has died. He stuff played while I was on the computer in the lab, or when I was working out. Avicii for me was the anti-Kardashian, as I had no idea who “he” (I wasn’t sure of gender though I assumed he was male), where he was from. He was just a DJ who made music, and I enjoyed the music. He wasn’t famous to me, but his music was famous.

« Newer PostsOlder Posts »

Powered by WordPress