Razib Khan One-stop-shopping for all of my content

December 12, 2012

A lighter shade of brown: the Dan MacArthur chronicles, not a Romani

Filed under: Anthroplogy,Daniel MacArthur,Human Genetics,Human Genomics — Razib Khan @ 9:25 am

Pakistani honor guard

A few days ago I suggested that Dr. Daniel MacArthur might have South Asian ancestry. Now, when confronted with surprise the best option is to stick with your prior assumption, unless that surprise is powerful enough for you to “update” your model. After a few days of further analysis I will update: I do think Dan MacArthur has South Asian ancestry. Dienekes dug further, and noticed that there are hallmarks of “Ancestral South Indian” ancestry along the first 2/3 or so of chromosome 10. Now, you do have to remember that this genomic region is only half South Asian. The other half is European.

But in any case, one question that some people brought up: perhaps MacArthur has Romani heritage? I’m skeptical of this partly because:

1) there weren’t that many Romani in Britain in the 19th century

2) The British Romani are already very highly admixed

Another friend, who is a population genomicist himself, expressed some skepticism that such a long segment wasn’t broken up by recombination over the generations. My only moderately informed answer is this: we’d only notice the long segments, because if a very small region of ‘exotic’ ancestry was embedded within the dominant ancestral component it probably would not show up on some of these tests (or, we’d assume it was noise). Dan has another segment of South Asian ancestry, but much smaller in size. It may be there are other regions which we could find if we used better reference populations.

Here’s what I tentatively want to do with Dan’s data now. First, take the 80 MB or so which has South Asian ancestry, and phase it. That way I’d have a South Asian chromosome and a European one, and we could look for matches for only the South Asian one. But being busy I didn’t have time to do this. What I did have time to do was reduce the chromosomal region under consideration, and then run an IBS distance analysis in a private data set I have. This is a crude, but not always uninformative analysis. But by looking at the relationships I can now conclude that Dan MacArthur probably does not have Romani ancestry. Why? Because the Romani are of Northwest Indian heritage, and MacArthur’s match pattern using the diploid genotype (so South Asian + European) does not match what I expect would emerge from such a combination.

The full table is below, but to me the fact that he has so many matches with Northwest Indian populations is evidence that his ancestry was not Northwest Indian. Otherwise, he would be matching more Utah white (CEU samples) more often. Rather, someone with a mix of more conventional South Asian ancestry and European ancestry often resembles some of the less South Asian populations of South Asia (e.g., Brahui) in these crude measures. In fact, one of the closest matches to Dan’s IBS profile’s is that of my own mother. She is a rather vanilla ethnic Bengali, so I think there is a strong chance that his Indian ancestry is similar. This weak genetic data isn’t really the primary reason. The British East India company operated out of Bengal for much of its history, and there are simply a lot of Bengalis.

There’s a lot more that can be done here. Since I don’t have time, here’s the pedigree file if anyone wants to play with them (Dan is DGM001).

PopulationGenetic distance from DanStandardized distance
Brahui0.25381.268
Burusho0.25782.736
Razib’s Mother0.25882.783
CEU0.25882.993
Burusho0.25883.024
CEU0.2683.547
Sakilli0.2683.555
Brahui0.26183.831
Brahui0.26183.857
GIH0.26183.955
CEU0.26183.972
CEU0.26183.985
CEU0.26284.043
North Kannadi0.26284.169
CEU0.26284.207
CEU0.26284.318
CEU0.26284.33
CEU0.26384.391
Paniya0.26384.408
CEU0.26384.437
CEU0.26384.445
CEU0.26384.488
CEU0.26384.606
CEU0.26384.609
CEU0.26484.691
Brahui0.26484.709
CEU0.26484.752
CEU0.26484.764
Brahui0.26484.822
GIH0.26484.826
Burusho0.26484.841
CEU0.26484.898
CEU0.26484.975
North Kannadi0.26484.992
CEU0.26585.087
Paniya0.26585.212
CEU0.26585.226
CEU0.26585.25
CEU0.26585.25
CEU0.26585.278
CEU0.26585.299
North Kannadi0.26585.3
Burusho0.26585.309
Burusho0.26685.328
CEU0.26685.363
CEU0.26685.409
North Kannadi0.26685.412
CEU0.26685.436
Burusho0.26685.446
Bene Israel0.26685.508
CEU0.26685.521
GIH0.26685.618
GIH0.26785.661
CEU0.26785.696
CEU0.26785.722
CEU0.26785.732
Brahui0.26785.777
GIH0.26785.793
CEU0.26785.799
CEU0.26785.816
Cochin Jews0.26785.85
CEU0.26785.943
Brahui0.26885.996
CEU0.26886.005
Cochin Jews0.26886.011
CEU0.26886.08
CEU0.26886.115
CEU0.26886.18
GIH0.26886.229
Cochin Jews0.26886.234
CEU0.26886.244
Burusho0.26886.265
CEU0.26886.277
CEU0.26886.278
CEU0.26986.288
CEU0.26986.291
CEU0.26986.318
CEU0.26986.325
CEU0.26986.326
GIH0.26986.327
CEU0.26986.329
CEU0.26986.354
CEU0.26986.387
CEU0.26986.463
CEU0.26986.515
CEU0.26986.517
CEU0.26986.55
CEU0.2786.609
Paniya0.2786.682
CEU0.2786.687
CEU0.2786.696
CEU0.2786.717
CEU0.2786.733
Sakilli0.2786.74
CEU0.2786.866
Malayan0.2786.879
North Kannadi0.2786.883
CEU0.27186.937
Brahui0.27186.952
Burusho0.27186.956
CEU0.27186.957
CEU0.27186.977
North Kannadi0.27186.995
GIH0.27187.018
CEU0.27187.042
CEU0.27187.066
CEU0.27187.07
Brahui0.27187.09
Bene Israel0.27187.094
Sakilli0.27187.141
CEU0.27187.2
CEU0.27187.24
North Kannadi0.27287.253
CEU0.27287.297
Burusho0.27287.307
CEU0.27287.327
GIH0.27287.353
CEU0.27287.355
Cochin Jews0.27287.381
CEU0.27287.384
CEU0.27287.5
CEU0.27287.535
CEU0.27387.594
Malayan0.27387.676
CEU0.27387.702
CEU0.27387.741
Burusho0.27387.806
CEU0.27387.846
Cambodians0.27487.932
North Kannadi0.27487.951
CEU0.27487.951
Burusho0.27488.03
CEU0.27488.047
CEU0.27488.081
CEU0.27488.089
CEU0.27488.101
CEU0.27488.179
CEU0.27488.19
North Kannadi0.27588.243
CEU0.27588.32
GIH0.27588.325
CEU0.27588.349
Brahui0.27588.393
CEU0.27588.402
CEU0.27588.457
Bene Israel0.27688.552
CEU0.27688.577
CEU0.27688.603
CEU0.27688.647
CEU0.27688.7
CEU0.27688.729
CEU0.27688.814
CEU0.27688.85
Brahui0.27688.855
CEU0.27788.923
GIH0.27788.99
Paniya0.27789.082
CEU0.27789.118
CEU0.27789.15
CEU0.27789.151
CEU0.27789.17
CEU0.27889.184
Cambodians0.27889.208
Cambodians0.27889.233
Cambodians0.27889.383
CEU0.27889.45
CEU0.27889.493
Cambodians0.27989.522
CEU0.27989.595
CEU0.27989.679
CEU0.27989.753
CEU0.27989.762
CEU0.27989.807
Cambodians0.2889.942
GIH0.2890.085
CEU0.28190.178
Brahui0.28190.364
Cambodians0.28290.543
Cambodians0.28290.559
Cambodians0.28290.77
Cambodians0.28390.898
CEU0.28390.956
CEU0.28491.316
CHD0.28992.952
Sakilli0.2993.103
Bene Israel0.2993.122
CHD0.29193.619
CHD0.29193.663
CHD0.29394.125
CHD0.29394.248
CHD0.29494.451
CHD0.29494.629
CHD0.29694.965
CHD0.29695.279
Yorubas0.29795.298
CHD0.29795.368
CHD0.29795.438
CHD0.29795.441
Yorubas0.29795.567
CHD0.29895.678
CHD0.29895.828
CHD0.29996.032
CHD0.29996.127
CHD0.396.349
CHD0.396.403
CHD0.396.443
CHD0.396.508
CHD0.396.523
CHD0.396.533
CHD0.30196.575
CHD0.30196.598
CHD0.30196.624
CHD0.30196.625
CHD0.30196.738
CHD0.30196.758
CHD0.30196.869
Yorubas0.30297.106
CHD0.30397.37
CHD0.30397.41
Yorubas0.30497.681
CHD0.30497.713
CHD0.30497.747
Yorubas0.30497.829
CHD0.30497.838
CHD0.30598.106
CHD0.30698.309
Yorubas0.30798.499
CHD0.30798.546
CHD0.30798.547
CHD0.30798.606
CHD0.30798.764
CHD0.30798.78
CHD0.30798.803
Yorubas0.30898.947
Yorubas0.30899.03
Yorubas0.30999.411
Yorubas0.30999.417
CHD0.30999.452
CHD0.3199.624
Yorubas0.311100

A lighter shade of brown: the Dan MacArthur chronicles, not a Romani

Filed under: Anthroplogy,Daniel MacArthur,Human Genetics,Human Genomics — Razib Khan @ 9:25 am

Pakistani honor guard

A few days ago I suggested that Dr. Daniel MacArthur might have South Asian ancestry. Now, when confronted with surprise the best option is to stick with your prior assumption, unless that surprise is powerful enough for you to “update” your model. After a few days of further analysis I will update: I do think Dan MacArthur has South Asian ancestry. Dienekes dug further, and noticed that there are hallmarks of “Ancestral South Indian” ancestry along the first 2/3 or so of chromosome 10. Now, you do have to remember that this genomic region is only half South Asian. The other half is European.

But in any case, one question that some people brought up: perhaps MacArthur has Romani heritage? I’m skeptical of this partly because:

1) there weren’t that many Romani in Britain in the 19th century

2) The British Romani are already very highly admixed

Another friend, who is a population genomicist himself, expressed some skepticism that such a long segment wasn’t broken up by recombination over the generations. My only moderately informed answer is this: we’d only notice the long segments, because if a very small region of ‘exotic’ ancestry was embedded within the dominant ancestral component it probably would not show up on some of these tests (or, we’d assume it was noise). Dan has another segment of South Asian ancestry, but much smaller in size. It may be there are other regions which we could find if we used better reference populations.

Here’s what I tentatively want to do with Dan’s data now. First, take the 80 MB or so which has South Asian ancestry, and phase it. That way I’d have a South Asian chromosome and a European one, and we could look for matches for only the South Asian one. But being busy I didn’t have time to do this. What I did have time to do was reduce the chromosomal region under consideration, and then run an IBS distance analysis in a private data set I have. This is a crude, but not always uninformative analysis. But by looking at the relationships I can now conclude that Dan MacArthur probably does not have Romani ancestry. Why? Because the Romani are of Northwest Indian heritage, and MacArthur’s match pattern using the diploid genotype (so South Asian + European) does not match what I expect would emerge from such a combination.

The full table is below, but to me the fact that he has so many matches with Northwest Indian populations is evidence that his ancestry was not Northwest Indian. Otherwise, he would be matching more Utah white (CEU samples) more often. Rather, someone with a mix of more conventional South Asian ancestry and European ancestry often resembles some of the less South Asian populations of South Asia (e.g., Brahui) in these crude measures. In fact, one of the closest matches to Dan’s IBS profile’s is that of my own mother. She is a rather vanilla ethnic Bengali, so I think there is a strong chance that his Indian ancestry is similar. This weak genetic data isn’t really the primary reason. The British East India company operated out of Bengal for much of its history, and there are simply a lot of Bengalis.

There’s a lot more that can be done here. Since I don’t have time, here’s the pedigree file if anyone wants to play with them (Dan is DGM001).

PopulationGenetic distance from DanStandardized distance
Brahui0.25381.268
Burusho0.25782.736
Razib’s Mother0.25882.783
CEU0.25882.993
Burusho0.25883.024
CEU0.2683.547
Sakilli0.2683.555
Brahui0.26183.831
Brahui0.26183.857
GIH0.26183.955
CEU0.26183.972
CEU0.26183.985
CEU0.26284.043
North Kannadi0.26284.169
CEU0.26284.207
CEU0.26284.318
CEU0.26284.33
CEU0.26384.391
Paniya0.26384.408
CEU0.26384.437
CEU0.26384.445
CEU0.26384.488
CEU0.26384.606
CEU0.26384.609
CEU0.26484.691
Brahui0.26484.709
CEU0.26484.752
CEU0.26484.764
Brahui0.26484.822
GIH0.26484.826
Burusho0.26484.841
CEU0.26484.898
CEU0.26484.975
North Kannadi0.26484.992
CEU0.26585.087
Paniya0.26585.212
CEU0.26585.226
CEU0.26585.25
CEU0.26585.25
CEU0.26585.278
CEU0.26585.299
North Kannadi0.26585.3
Burusho0.26585.309
Burusho0.26685.328
CEU0.26685.363
CEU0.26685.409
North Kannadi0.26685.412
CEU0.26685.436
Burusho0.26685.446
Bene Israel0.26685.508
CEU0.26685.521
GIH0.26685.618
GIH0.26785.661
CEU0.26785.696
CEU0.26785.722
CEU0.26785.732
Brahui0.26785.777
GIH0.26785.793
CEU0.26785.799
CEU0.26785.816
Cochin Jews0.26785.85
CEU0.26785.943
Brahui0.26885.996
CEU0.26886.005
Cochin Jews0.26886.011
CEU0.26886.08
CEU0.26886.115
CEU0.26886.18
GIH0.26886.229
Cochin Jews0.26886.234
CEU0.26886.244
Burusho0.26886.265
CEU0.26886.277
CEU0.26886.278
CEU0.26986.288
CEU0.26986.291
CEU0.26986.318
CEU0.26986.325
CEU0.26986.326
GIH0.26986.327
CEU0.26986.329
CEU0.26986.354
CEU0.26986.387
CEU0.26986.463
CEU0.26986.515
CEU0.26986.517
CEU0.26986.55
CEU0.2786.609
Paniya0.2786.682
CEU0.2786.687
CEU0.2786.696
CEU0.2786.717
CEU0.2786.733
Sakilli0.2786.74
CEU0.2786.866
Malayan0.2786.879
North Kannadi0.2786.883
CEU0.27186.937
Brahui0.27186.952
Burusho0.27186.956
CEU0.27186.957
CEU0.27186.977
North Kannadi0.27186.995
GIH0.27187.018
CEU0.27187.042
CEU0.27187.066
CEU0.27187.07
Brahui0.27187.09
Bene Israel0.27187.094
Sakilli0.27187.141
CEU0.27187.2
CEU0.27187.24
North Kannadi0.27287.253
CEU0.27287.297
Burusho0.27287.307
CEU0.27287.327
GIH0.27287.353
CEU0.27287.355
Cochin Jews0.27287.381
CEU0.27287.384
CEU0.27287.5
CEU0.27287.535
CEU0.27387.594
Malayan0.27387.676
CEU0.27387.702
CEU0.27387.741
Burusho0.27387.806
CEU0.27387.846
Cambodians0.27487.932
North Kannadi0.27487.951
CEU0.27487.951
Burusho0.27488.03
CEU0.27488.047
CEU0.27488.081
CEU0.27488.089
CEU0.27488.101
CEU0.27488.179
CEU0.27488.19
North Kannadi0.27588.243
CEU0.27588.32
GIH0.27588.325
CEU0.27588.349
Brahui0.27588.393
CEU0.27588.402
CEU0.27588.457
Bene Israel0.27688.552
CEU0.27688.577
CEU0.27688.603
CEU0.27688.647
CEU0.27688.7
CEU0.27688.729
CEU0.27688.814
CEU0.27688.85
Brahui0.27688.855
CEU0.27788.923
GIH0.27788.99
Paniya0.27789.082
CEU0.27789.118
CEU0.27789.15
CEU0.27789.151
CEU0.27789.17
CEU0.27889.184
Cambodians0.27889.208
Cambodians0.27889.233
Cambodians0.27889.383
CEU0.27889.45
CEU0.27889.493
Cambodians0.27989.522
CEU0.27989.595
CEU0.27989.679
CEU0.27989.753
CEU0.27989.762
CEU0.27989.807
Cambodians0.2889.942
GIH0.2890.085
CEU0.28190.178
Brahui0.28190.364
Cambodians0.28290.543
Cambodians0.28290.559
Cambodians0.28290.77
Cambodians0.28390.898
CEU0.28390.956
CEU0.28491.316
CHD0.28992.952
Sakilli0.2993.103
Bene Israel0.2993.122
CHD0.29193.619
CHD0.29193.663
CHD0.29394.125
CHD0.29394.248
CHD0.29494.451
CHD0.29494.629
CHD0.29694.965
CHD0.29695.279
Yorubas0.29795.298
CHD0.29795.368
CHD0.29795.438
CHD0.29795.441
Yorubas0.29795.567
CHD0.29895.678
CHD0.29895.828
CHD0.29996.032
CHD0.29996.127
CHD0.396.349
CHD0.396.403
CHD0.396.443
CHD0.396.508
CHD0.396.523
CHD0.396.533
CHD0.30196.575
CHD0.30196.598
CHD0.30196.624
CHD0.30196.625
CHD0.30196.738
CHD0.30196.758
CHD0.30196.869
Yorubas0.30297.106
CHD0.30397.37
CHD0.30397.41
Yorubas0.30497.681
CHD0.30497.713
CHD0.30497.747
Yorubas0.30497.829
CHD0.30497.838
CHD0.30598.106
CHD0.30698.309
Yorubas0.30798.499
CHD0.30798.546
CHD0.30798.547
CHD0.30798.606
CHD0.30798.764
CHD0.30798.78
CHD0.30798.803
Yorubas0.30898.947
Yorubas0.30899.03
Yorubas0.30999.411
Yorubas0.30999.417
CHD0.30999.452
CHD0.3199.624
Yorubas0.311100

December 10, 2012

Is Daniel MacArthur ‘desi’?

My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).

Here’s my daughter’s new chromosome painting:

One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!

But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.

In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:

As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.

What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.

I did two things:

1) I ran ADMIXTURE at K = 3, unsupervised

2) I ran an MDS, which visualized the genetic variation in multiple dimensions

Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.

ADMIXTURE results for chromosome 10
K 1K 2K 3
CEU0.040.020.93
GIH0.870.050.08
CHD0.010.970.01
Daniel MacArthur0.290.070.64
Ilana Fisher0.010.060.94

You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:

And here’s the MDS with the two largest dimensions:

Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.

Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what  recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.

Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.

Powered by WordPress