Razib Khan One-stop-shopping for all of my content

January 23, 2012

Personal genomics and adoption

Filed under: Genealogy,Genetics,Genomics,Personal genomics — Razib Khan @ 9:23 pm

With DNA Testing, Suddenly They Are Family:

Several companies provide tests that can confirm whether adoptees are related to individuals they already know. Others cast a wider net by plugging DNA results into databases that contain tens of thousands of genetic samples, provided mostly by people searching for their ancestral roots. The tests detect genetic markers that reveal whether people share a common ancestor or relative.

Some experts on adoption and genetics have criticized ancestry and genealogy testing companies, saying they are, at times, connecting people whose genetic links are tenuous — in effect stretching the definition of a relative. Nevertheless, the growing popularity of the tests, combined with social media sites that connect people day to day, has given some adoptees a sense of family that feels tangible, intimate and immediate.


I think that these tenuous connections and slivers of information are better than nothing. This isn’t rocket science. And naturally many adopted people also could care less. This is a deeply personal issue, and the valence is going to be private. I suspect that those of us who aren’t adopted, and take for granted knowledge of our own family background have a hard time imagining the value which even a 3rd or 4th cousin could give someone.

Additionally, though finding very close relatives is not that common (first cousins, let alone first order relatives), knowledge of more distant relations can still help you triangulate aspects of family history if you begin with nothing. To give a personal example I know someone whose paternal grandparents were immigrants from Germany. The maternal side is much more mixed, and some of the genealogical records hit dead-ends in the mid 19th century in the USA. It turns out that one of the individuals that this person is closest to on 23andMe is an African American (both maternal and paternal lineages are clearly African). What does this mean? The lead hasn’t been followed up, but combining family histories might be very informative in this case.

January 24, 2011

The genomic heritage of French Canadians

Image Credit: Anirudh Koul

One of the great things about the mass personal genomic revolution is that it allows people to have direct access to their own information. This is important for the more than 90% of the human population which has sketchy genealogical records. But even with genealogical records there are often omissions and biases in transmission of information. This is one reason that HAP, Dodecad, and Eurogenes BGA are so interesting: they combine what people already know with scientific genealogy. This intersection can often be very inferentially fruitful.

But what about if you had a whole population with rich robust conventional genealogical records? Combined with the power of the new genomics you could really crank up the level of insight. Where to find these records? A reason that Jewish genetics is so useful and interesting is that there is often a relative dearth of records when it comes to the lineages of American Ashkenazi Jews. Many American Jews even today are often sketchy about the region of the “Old Country” from which their forebears arrived. Jews have been interesting from a genetic perspective ...

September 12, 2010

The confusions of genetic relatedness

Filed under: 23andMe,Genealogy,Genetics,Genomics,Personal genomics — Razib Khan @ 11:12 am

Last spring I posted ‘Beyond visualization of data in genetics’ in the hopes that people wouldn’t take PCA too far in assuming that the method was a reflection of reality in a definite fashion. Remember, PCA visualizations are showing you two, and at most three, dimensions in genetic variation within the data set at any given time. The fine print is important; e.g., “PC 1 15%”, “PC 2 4.5%”, etc., which points to the magnitude of the dimensions within the data. You see the largest, and likely historically most significant on a population wide scale, genetic variances, but there’s still a large remainder left over. But when I look at referrals from message boards people obviously aren’t careful with what PCA is telling them.

As an illustration, in the 23andMe user interface you can “compare genes” genes across people who you “share genes” with. This comparison operates over ~550,000 single nucelotide polymorphisms out of 3 billion base pairs (you can constrain it to traits, but I’m going to talk about the comparison to the whole data set below). For example, a man of European descent shares 83.2% with his daughter, who is Eurasian (the mother is Burmese, with some recent Indian admixture). Another man of European descent shares 84% with his daughter, whose mother is also European (in fact, both parents are western European). The “gene sharing” with other people of European descent of these two men is in the 75-74% range (for reference, a Chinese person is 71%, and Nigerian 68.5%). On the PCA plot the European and his Eurasian daughter are very far apart, while the European man and his European daughter cluster together. What you’re seeing on the PCA chart is population level information, not the genetic uniqueness within families and across parents and offspring.

To further explore this issue, I thought it would be interesting to revisit my own genetic data. If you read my previous post, you will know it is not boring. As an ethnic Bengali my ancestry comes from the northeast of the Indian subcontinent, so in addition to the “Asian” fraction which most South Asians have in the 23andMe “ancestry painting” (around 25% on average, with a range from 10-35% probably the extremes within two standard deviations from what I can tell), I likely have some southeast Asian ancestry from Burma. 23andMe has three “reference” populations it uses from the HapMap:

Asian = Chinese/Japanese
European = Northwest European
African = Yoruba

All of us get an ancestry painting which is a combination of these three. Unfortunately unless you’re a relatively straightforward combination of these three groups it isn’t always too informative. So if you’re African American you should be in luck since the two ancestral populations which you derive from are included as reference populations. On the other hand, unadmixed Native Americans tend to be about 25% European and 75% Asian, while unadmixed South Asians are 75% European and 25% Asian. That’s because the allele frequencies in these two populations have some relationship to both the reference groups, even if there hasn’t been any recent admixture (additionally, the painting presumably misses a lot that is distinctive to these groups, though 23andMe has a feature which allows people to explore possible Native American ancestry specifically).

As I told you before my ancestry is 57% European and 43% Asian. This is a very large Asian fraction for a South Asian, and after comparing notes with other South Asian 23andMe customers I’m pretty sure that my large fraction is due to having admixture from Burmese and/or Tibeto-Burman or Austro-Asiatic “Hill Tribes” to the north, south and east of Bengal. Since my family is from the east of Bengal that is not too surprising.

You know from my previous post that on the PCA plot I am near, but outside, of the main South Asian cluster. But there’s some interesting data from the gene comparison feature too. For reasons of privacy I’m not going to give you names obviously, but, I will label people by geographical origin if I know that aspect of the individual’s information. Additionally, below the comparison is mostly to Indians, and so I’m going to substitute names of Indian states for those where I have that level of specificity. I also restandardized the gene sharing value, so that the nearest individual with whom I’m sharing is 0 , and the furthest on the plot is 1 (74.5% to 73.04% if you’re curious). To add a wrinkle, I’ve added the % Asian calculated from 23andMe’s ancestry painting on the Y axis. The two images below show the results, the first includes some East Asians and a European, while the second includes only South Asians.

The first image is of more interest. Two points:

1 – Unlike most South Asians I have greater gene sharing identity with East Asians than with Europeans. The South Asian to whom I am closest to does not exhibit my own pattern, as they are closer to some Europeans than they are to some Chinese. In contrast, I not only unequivocally share more genes with East Asians than Europeans, but, I share more genes with some East Asians than I do with the individual from Iran, and, one South Asian from the northwest of the subcontinent and another from southern India. This last pattern is very peculiar from what I’ve been told (the other Bangaldeshi has the same tendency, though not to the same extent).

2 – There is a woman with whom I am sharing genes with from Burma. Her father, who died when she was young, had Indian ancestry, and reputedly spoke Tamil. She is ~20% European, which would make her father ~40% European. I have not seen a South Indian who is less than 65% European, so I believe that he had native Burmese admixture. If his mother was Burmese that would make his father ~80% European, which I have seen in a few South Indians, though their usual range seems to be 75-65%. Note that I am closer to her than I am to most South Asians. In contrast, the Bangaldeshi with whom I am sharing genes, and has the second highest percentage of Asian in their ancestry is about as far from this woman and he is from the Punjabis in terms of distance (in contrast, the Punjabis are about 2.5 times further than she is from my own genetic state).

7419_133883902983_699392983If I did the same plot of % Asian with gene sharing for the European man and his Eurasian daughter I would see a pattern whereby for most of the data there would be a noticeable linear pattern, the more Asian, the less gene sharing. The exception would be his daughter, who would be greatly Asian, but would be the closest by this genetic distance measure. Similarly, the Burmese woman with some Indian admixture is an outlier on my plot. The South Asians follow a southeast-to-northwest range of distance from me, with a rough, but not perfect, correspondence with Asian ancestry. Among the South Asians the individual from Bihar is an exception, just as the Burmese woman is. Why? From previous comments I’ve made I have indicated that there is a high probability of recent Burmese ancestry on my paternal lineage (specifically, my paternal grandfather, whose physical appearance is always described as atypical for a Bengali. My paternal grandmother was from a Hindu family which converted, and she looked stereotypically Bengali). Additionally, I know my mother’s maternal grandfather is from the Indian state of Uttar Pradesh, specifically, the region of Delhi. But I also know that before they were Muslim my maternal grandfather’s family were of the Hindu Kayastha caste. The individual from Bihar is a Kayastha, and for those of you who do not know, Bihar is the state just to the west of Bengal. I do not know if the Kayasthas share any deep genetic affinity or not, but I recall that Reich et al. observed a high degree of genetic evidence of endogamy in South Asia. So, just as I believe that I share Burmese-specific genetic variants with the woman of predominant Burmese origin which are not showing up in the simple ancestry estimates based on the global reference populations, I may also share Kayastha-specific variants which results in my genetic closeness to the Bihari individual. But my confidence in the latter conjecture is far weaker than in the former case.

In reviewing all I’ve said so far I suppose the moral of the story is not to trust too deeply in one set of data visualizations or summary statistics. Granted, some people have axes to grind and can find what they want in the science, my posts on Jewish genetics indicates that very strongly. But if you’re genuinely interested in patterns of variation, and your own place within the broader framework, you need to open different windows on the same data to get a truly fully-fleshed out understanding of the nature of things. If you are of an understudied population, and of somewhat mixed background, as I am, tread lightly and carefully. If you are of a well studied and characterized population, then learning you are 100% European is basically worthless (though some of the more detailed PCA’s can tell you some things).

Powered by WordPress