Razib Khan One-stop-shopping for all of my content

August 28, 2017

When journalists get out of their depth on genetic genealogy

Filed under: DTC personal genomics,Genealogy,Genetics,Personal genomics — Razib Khan @ 7:39 pm

For some reason The New York Times tasked Gina Kolata to cover genetic genealogy and its societal ramifications, With a Simple DNA Test, Family Histories Are Rewritten. The problem here is that to my knowledge Kolata doesn’t cover this as part of her beat, and so isn’t well equipped to write an accurate and in depth piece on the topic in relation to the science.

This is a general problem in journalism. I notice it most often when it comes to genetics (a topic I know a lot about for professional reasons) and the Middle East and Islam (topics I know a lot about because I’m interested in them). It’s unfortunate, but it has also made me a lot more skeptical of journalists whose track record I’m unfamiliar with.* To give a contrasting example, Christine Kenneally is a journalist without a background in genetics who nevertheless is immersed in genetic genealogy, so that she could have written this sort of piece without objection from the likes of me (she did write a book on the topic, The Invisible History of the Human Race: How DNA and History Shape Our Identities and Our Futures, which I had a small role in fact-checking).

What are the problems with the Kolata piece? I think the biggest issue is that she didn’t go in to test any particular proposition, and leaned on the wrong person for the science. She quotes Joe Pickrell, who knows this stuff like the back of his hand. But a more space is given to Jonathan Marks, an anthropologist who is quite opinionated and voluble, and so probably a “good source” for any journalist.

Marks seems well respected in anthropology from what I can tell, but he’s also the person who put up a picture of L. L. Cavalli-Sforza juxtaposed with a photo of Josef Mengele in the late 1990s during a presentation at Stanford. Perhaps this is why anthropologists respect him, I don’t know, but I do not like him because of his nasty tactics (I wouldn’t be surprised if Marks had power he would make sure people like me were put in political prison camps, his rhetoric is often so unhinged).

Marks’ quotes wouldn’t be much of an issue if Kolata could figure out when he’s making sense, and when he’s just bullshitting. But she can’t. For example:

…“tells me I’m 95 percent Ashkenazi Jewish and 5 percent Korean, is that really different from 100 percent Ashkenazi Jewish and zero percent Korean?”

The precise numbers offered by some testing services raise eyebrows among genetics researchers. “It’s all privatized science, and the algorithms are not generally available for peer review,” Dr. Marks said.

The part about precise numbers is an issue, though a lot less of an issue with high density SNP-chips (the real issue is sensitivity to reference population and other such parameters). But if a modern test says you are 95 percent Ashkenazi Jewish and 5 percent Korean it really is different from 100% Ashkenazi. Someone who comes up as 5% Korean against an Ashkenazi Jewish background is most definitely of some East Asian heritage. In the early 2000s with ancestrally informative markers and microsatellite based tests you’d get somewhat weird results like this, but with the methods used by the major DTC companies (and in academia) today these sorts of proportions are just not reported as false positives. Marks may not know because this isn’t his area, but Pickrell would have. Kolata probably did not think to double-check with him, but that’s because she isn’t able to smell out tendentious assertions. She has no feel for the science, and is flying blind.

Second, Marks notes that the science is privatized, and it isn’t totally open. But it’s just false that the algorithms are not generally available for peer review. All the details of the pipeline are not downloadable on GitHub, but the core ancestry estimation methods are well known. Eric Durand, who wrote the originally 23andMe ancestry composition methodology presented on it at ASHG 2013. I know because I was there during his session.

You can find a white paper for 23andMe’s method and Ancestry‘s. Not everything is as transparent as open science would dictate (though there are scientific papers and publications which also mask or hide elements which make reproducibility difficult), but most geneticists with domain experience can figure out what’s going on and it if it is legitimate. It is. The people who work at the major DTC companies often come out of academia, and are known to academic scientists. This isn’t blackbox voodoo science like “soccer genomics.”

Then Marks says this really weird thing:

“That’s why their ads always specify that this is for recreational purposes only: lawyer-speak for, ‘These results have no scientific standing.’”

Actually, it’s lawyer-speak for “do not sue us, as we aren’t providing you actionable information.” Perhaps I’m ignorant, but lawyers don’t get to define “scientific standing”.

The problem, which is real, is that the public is sometimes not entirely clear on what the science is saying. This is a problem of communication from the companies to the public. I’ve even been in scientific sessions where geneticists who don’t work in population genomics have weak intuition on what the results mean!

Earlier Kolata states:

Scientists simply do not have good data on the genetic characteristics of particular countries in, say, East Africa or East Asia. Even in more developed regions, distinguishing between Polish and, for instance, Russian heritage is inexact at best.

This is not totally true. We have good data now on China and Japan. Korea also has some data. Using haplotype-based methods you can do a lot of interesting things, including distinguish someone who is Polish from Russian. But these methods are computationally expensive and require lots of information on the reference samples (Living DNA does this for British people). The point is that the science is there. Reading this sort of article is just going to confuse people.

On the other hand a lot of Kolata’s piece is more human interest. The standard stuff about finding long lost relatives, or discovering your father isn’t your father. These are fine and not objectionable factually, though they’ve been done extensively before and elsewhere. I actually enjoyed the material in the second half of the piece, which had only a tenuous connection to scientific detail. I just wish these sorts of articles represented the science correctly.

Addendum: Just so you know, three journalists who regularly cover topics I can make strong judgments on, and are always pretty accurate: Carl Zimmer, Antonio Regalado, and Ewen Callaway.

* I don’t follow Kolata very closely, but to be frank I’ve heard from scientist friends long ago that she parachutes into topics, and gets a lot of things wrong. Though I can only speak on this particular piece.

January 23, 2012

Personal genomics and adoption

Filed under: Genealogy,Genetics,Genomics,Personal genomics — Razib Khan @ 9:23 pm

With DNA Testing, Suddenly They Are Family:

Several companies provide tests that can confirm whether adoptees are related to individuals they already know. Others cast a wider net by plugging DNA results into databases that contain tens of thousands of genetic samples, provided mostly by people searching for their ancestral roots. The tests detect genetic markers that reveal whether people share a common ancestor or relative.

Some experts on adoption and genetics have criticized ancestry and genealogy testing companies, saying they are, at times, connecting people whose genetic links are tenuous — in effect stretching the definition of a relative. Nevertheless, the growing popularity of the tests, combined with social media sites that connect people day to day, has given some adoptees a sense of family that feels tangible, intimate and immediate.


I think that these tenuous connections and slivers of information are better than nothing. This isn’t rocket science. And naturally many adopted people also could care less. This is a deeply personal issue, and the valence is going to be private. I suspect that those of us who aren’t adopted, and take for granted knowledge of our own family background have a hard time imagining the value which even a 3rd or 4th cousin could give someone.

Additionally, though finding very close relatives is not that common (first cousins, let alone first order relatives), knowledge of more distant relations can still help you triangulate aspects of family history if you begin with nothing. To give a personal example I know someone whose paternal grandparents were immigrants from Germany. The maternal side is much more mixed, and some of the genealogical records hit dead-ends in the mid 19th century in the USA. It turns out that one of the individuals that this person is closest to on 23andMe is an African American (both maternal and paternal lineages are clearly African). What does this mean? The lead hasn’t been followed up, but combining family histories might be very informative in this case.

January 24, 2011

The genomic heritage of French Canadians

Image Credit: Anirudh Koul

One of the great things about the mass personal genomic revolution is that it allows people to have direct access to their own information. This is important for the more than 90% of the human population which has sketchy genealogical records. But even with genealogical records there are often omissions and biases in transmission of information. This is one reason that HAP, Dodecad, and Eurogenes BGA are so interesting: they combine what people already know with scientific genealogy. This intersection can often be very inferentially fruitful.

But what about if you had a whole population with rich robust conventional genealogical records? Combined with the power of the new genomics you could really crank up the level of insight. Where to find these records? A reason that Jewish genetics is so useful and interesting is that there is often a relative dearth of records when it comes to the lineages of American Ashkenazi Jews. Many American Jews even today are often sketchy about the region of the “Old Country” from which their forebears arrived. Jews have been interesting from a genetic perspective ...

September 12, 2010

The confusions of genetic relatedness

Filed under: 23andMe,Genealogy,Genetics,Genomics,Personal genomics — Razib Khan @ 11:12 am

Last spring I posted ‘Beyond visualization of data in genetics’ in the hopes that people wouldn’t take PCA too far in assuming that the method was a reflection of reality in a definite fashion. Remember, PCA visualizations are showing you two, and at most three, dimensions in genetic variation within the data set at any given time. The fine print is important; e.g., “PC 1 15%”, “PC 2 4.5%”, etc., which points to the magnitude of the dimensions within the data. You see the largest, and likely historically most significant on a population wide scale, genetic variances, but there’s still a large remainder left over. But when I look at referrals from message boards people obviously aren’t careful with what PCA is telling them.

As an illustration, in the 23andMe user interface you can “compare genes” genes across people who you “share genes” with. This comparison operates over ~550,000 single nucelotide polymorphisms out of 3 billion base pairs (you can constrain it to traits, but I’m going to talk about the comparison to the whole data set below). For example, a man of European descent shares 83.2% with his daughter, who is Eurasian (the mother is Burmese, with some recent Indian admixture). Another man of European descent shares 84% with his daughter, whose mother is also European (in fact, both parents are western European). The “gene sharing” with other people of European descent of these two men is in the 75-74% range (for reference, a Chinese person is 71%, and Nigerian 68.5%). On the PCA plot the European and his Eurasian daughter are very far apart, while the European man and his European daughter cluster together. What you’re seeing on the PCA chart is population level information, not the genetic uniqueness within families and across parents and offspring.

To further explore this issue, I thought it would be interesting to revisit my own genetic data. If you read my previous post, you will know it is not boring. As an ethnic Bengali my ancestry comes from the northeast of the Indian subcontinent, so in addition to the “Asian” fraction which most South Asians have in the 23andMe “ancestry painting” (around 25% on average, with a range from 10-35% probably the extremes within two standard deviations from what I can tell), I likely have some southeast Asian ancestry from Burma. 23andMe has three “reference” populations it uses from the HapMap:

Asian = Chinese/Japanese
European = Northwest European
African = Yoruba

All of us get an ancestry painting which is a combination of these three. Unfortunately unless you’re a relatively straightforward combination of these three groups it isn’t always too informative. So if you’re African American you should be in luck since the two ancestral populations which you derive from are included as reference populations. On the other hand, unadmixed Native Americans tend to be about 25% European and 75% Asian, while unadmixed South Asians are 75% European and 25% Asian. That’s because the allele frequencies in these two populations have some relationship to both the reference groups, even if there hasn’t been any recent admixture (additionally, the painting presumably misses a lot that is distinctive to these groups, though 23andMe has a feature which allows people to explore possible Native American ancestry specifically).

As I told you before my ancestry is 57% European and 43% Asian. This is a very large Asian fraction for a South Asian, and after comparing notes with other South Asian 23andMe customers I’m pretty sure that my large fraction is due to having admixture from Burmese and/or Tibeto-Burman or Austro-Asiatic “Hill Tribes” to the north, south and east of Bengal. Since my family is from the east of Bengal that is not too surprising.

You know from my previous post that on the PCA plot I am near, but outside, of the main South Asian cluster. But there’s some interesting data from the gene comparison feature too. For reasons of privacy I’m not going to give you names obviously, but, I will label people by geographical origin if I know that aspect of the individual’s information. Additionally, below the comparison is mostly to Indians, and so I’m going to substitute names of Indian states for those where I have that level of specificity. I also restandardized the gene sharing value, so that the nearest individual with whom I’m sharing is 0 , and the furthest on the plot is 1 (74.5% to 73.04% if you’re curious). To add a wrinkle, I’ve added the % Asian calculated from 23andMe’s ancestry painting on the Y axis. The two images below show the results, the first includes some East Asians and a European, while the second includes only South Asians.

The first image is of more interest. Two points:

1 – Unlike most South Asians I have greater gene sharing identity with East Asians than with Europeans. The South Asian to whom I am closest to does not exhibit my own pattern, as they are closer to some Europeans than they are to some Chinese. In contrast, I not only unequivocally share more genes with East Asians than Europeans, but, I share more genes with some East Asians than I do with the individual from Iran, and, one South Asian from the northwest of the subcontinent and another from southern India. This last pattern is very peculiar from what I’ve been told (the other Bangaldeshi has the same tendency, though not to the same extent).

2 – There is a woman with whom I am sharing genes with from Burma. Her father, who died when she was young, had Indian ancestry, and reputedly spoke Tamil. She is ~20% European, which would make her father ~40% European. I have not seen a South Indian who is less than 65% European, so I believe that he had native Burmese admixture. If his mother was Burmese that would make his father ~80% European, which I have seen in a few South Indians, though their usual range seems to be 75-65%. Note that I am closer to her than I am to most South Asians. In contrast, the Bangaldeshi with whom I am sharing genes, and has the second highest percentage of Asian in their ancestry is about as far from this woman and he is from the Punjabis in terms of distance (in contrast, the Punjabis are about 2.5 times further than she is from my own genetic state).

7419_133883902983_699392983If I did the same plot of % Asian with gene sharing for the European man and his Eurasian daughter I would see a pattern whereby for most of the data there would be a noticeable linear pattern, the more Asian, the less gene sharing. The exception would be his daughter, who would be greatly Asian, but would be the closest by this genetic distance measure. Similarly, the Burmese woman with some Indian admixture is an outlier on my plot. The South Asians follow a southeast-to-northwest range of distance from me, with a rough, but not perfect, correspondence with Asian ancestry. Among the South Asians the individual from Bihar is an exception, just as the Burmese woman is. Why? From previous comments I’ve made I have indicated that there is a high probability of recent Burmese ancestry on my paternal lineage (specifically, my paternal grandfather, whose physical appearance is always described as atypical for a Bengali. My paternal grandmother was from a Hindu family which converted, and she looked stereotypically Bengali). Additionally, I know my mother’s maternal grandfather is from the Indian state of Uttar Pradesh, specifically, the region of Delhi. But I also know that before they were Muslim my maternal grandfather’s family were of the Hindu Kayastha caste. The individual from Bihar is a Kayastha, and for those of you who do not know, Bihar is the state just to the west of Bengal. I do not know if the Kayasthas share any deep genetic affinity or not, but I recall that Reich et al. observed a high degree of genetic evidence of endogamy in South Asia. So, just as I believe that I share Burmese-specific genetic variants with the woman of predominant Burmese origin which are not showing up in the simple ancestry estimates based on the global reference populations, I may also share Kayastha-specific variants which results in my genetic closeness to the Bihari individual. But my confidence in the latter conjecture is far weaker than in the former case.

In reviewing all I’ve said so far I suppose the moral of the story is not to trust too deeply in one set of data visualizations or summary statistics. Granted, some people have axes to grind and can find what they want in the science, my posts on Jewish genetics indicates that very strongly. But if you’re genuinely interested in patterns of variation, and your own place within the broader framework, you need to open different windows on the same data to get a truly fully-fleshed out understanding of the nature of things. If you are of an understudied population, and of somewhat mixed background, as I am, tread lightly and carefully. If you are of a well studied and characterized population, then learning you are 100% European is basically worthless (though some of the more detailed PCA’s can tell you some things).

Powered by WordPress