Razib Khan One-stop-shopping for all of my content

March 17, 2011

Looking for relatedness in the HapMap Gujaratis

Filed under: Genetics,Genomics,Gujaratis — Razib Khan @ 10:34 pm

Recently I was looking at a 3-D PCA animation which Zack generated from the Harappa Ancestry Project data set. Click the link and come back. Notice the outlier clusters? The Burusho are straightforward, they seem to have low levels of Tibetan admixture. But what about the Gujarati cluster? Again, we see what we’ve seen before, the fractioning out of the Gujaratis in PCA into two groups, one a tight cluster, and the other relatively widely distributed. This prompted me to look more closely at the HapMap Gujarati sample. Today I was exploring the question with Plink’s identity-by-descent feature. First I’ll start out with a smaller data set, my family (father, mother, sibling 1, sibling 2, and myself), and an Indian (from Uttar Pradesh) and Pakistani as unrelated individuals. I merged out 23andMe derived genotypes, and with ~900,000 markers calculated pairwise IBD:

./plink --bfile IBDControl --genome

Here are the relevant results:

Individual 1 Individual 2 Z0 Z1 Z2 PI_HAT DST PPC RATIO Indian Father 0.768 0.027 0.205 0.218 0.760 0.160 1.940 Indian Mother 0.782 0.010 0.209 0.214 0.759 0.026 1.886 Indian Razib 0.767 0.032 0.202 0.218 0.759 0.500 2.000 Indian Sibling1 0.769 0.025 0.206 0.219 0.760 0.198 1.949 Indian Sibling2 0.766 0.032 0.203 0.219 0.760 0.685 2.030 Indian Pakistani 0.781 0.017 0.203 0.211 0.758 0.533 2.005 Father Mother 0.776 0.018 0.207 0.215 0.759 0.284 1.965 Father Razib 0.002 0.777 0.221 0.610 0.851 1.000 450.800 Father Sibling1 0.001 0.785 0.214 0.606 0.850 1.000 898.800 Father Sibling2 0.002 0.779 0.220 0.609 0.851 1.000 643.143 Father Pakistani 0.778 0.019 0.203 0.213 0.758 0.201 1.950 Mother Razib 0.002 0.788 0.211 0.605 0.849 1.000 639.429 Mother Sibling1 0.002 0.781 0.218 0.608 0.850 1.000 639.857 Mother Sibling2 0.002 0.782 0.216 0.607 0.850 1.000 447.900 Mother Pakistani 0.779 0.020 0.201 0.211 0.758 0.052 1.904 Razib Sibling1 0.183 0.408 0.409 0.613 0.866 1.000 11.386 Razib Sibling2 0.194 0.432 0.374 0.590 0.858 1.000 11.491 Razib Pakistani 0.781 0.016 0.203 0.211 0.758 0.933 2.095 Sibling1 Sibling2 0.236 0.412 0.351 0.557 0.849 1.000 9.413 Sibling1 Pakistani 0.777 0.024 0.199 0.211 0.758 0.327 1.973 Sibling2 Pakistani 0.774 0.024 0.202 0.214 0.758 0.443 1.991

You can infer some things without even knowing what the columns mean. Notice that there are differences between parent-child, sibling-sibling, and unrelated comparisons. The distance measure, DST, is basically exactly the same as the genome-wide comparison in 23andMe. Either the web app is running Plink, or, it’s using the ...

February 14, 2011

Who are those Houston Gujus?

The figure to the left is a three dimensional representation of principal components 1, 2, and 3, generated from a sample of Gujaratis from Houston, and Chinese from Denver. When these two populations are pooled together the Chinese form a very homogeneous cluster. They don’t vary much across the three top explanatory dimensions of genetic variance. In contrast, the Gujaratis do vary. This is not surprising. In the supplements of Reconstructing Indian population history it was notable that the Gujaratis did tend to shake out into two distinct clusters in the PCAs. This is a finding you see over and over when you manipulate the HapMap Gujarati data set. In reality, there aren’t two equivalent clusters. Rather, there’s one “tight” cluster, which I will label “Gujarati_B” from now on in my data set, and another cluster, “Gujarati_A,” which really just consists of all the individuals who are outside of Gujarati_B cluster. Even when compared to other South Asian populations these two distinct categories persist in the HapMap Gujaratis.

Zack has already identified a major difference between the two clusters: Gujarat_A has some individuals with much more “West Eurasian” ancestry. ...

Powered by WordPress