Razib Khan One-stop-shopping for all of my content

June 27, 2018

South Asian Genotype Project, Summer 2018 Update

Filed under: India Genetics,India genomics,South Asian Genotype Project — Razib Khan @ 12:54 am


I’ve put another update on the South Asian Genotype Project. If you’ve contributed since March check it out.

Again, if you are interested: send me a 23andMe, Ancestry, MyHeritage, Family Tree DNA raw genotype file to contactgnxp -at- gmail.com.

In the subject please put:

  1. “South Asian Genotype Project”
  2. The state/province your family is from
  3. Ethnolinguistic group
  4. If applicable, caste

I decided to some poking around with some of the higher quality samples people have given me. 180,000 SNPs with almost no genotyping error. I also removed “relatives.” That means that a lot of Muslim groups from Pakistan had individuals dropping out. In the PCA above you can see 4 Burushos left! Not too many Pathans either.

Click to enlarge!

First, I decided to look at the Brahmin samples I had.

– Uttar Pradesh, Bihar, and the Gujarati Brahmin(s) I had are one cluster
– South Indian Brahmins (mostly Iyer) are another

To my surprise, the two Maharashtra Brahmins that I have are firmly in the South Indian cluster. The Bengali Brahmin is more like the North Indians. But there is a subtle skew toward the distant Bangladesh cluster. This individual seems less East Asian than even the typical Bengali Brahmin, but I think Bengali Brahmins can be modeled as North Indian Brahmin with non-Brahmin (and therefore East Asian) ancestry.

Click to enlarge!

Next, I wanted to look at Gujaratis. The 1000 Genomes has a large number of this population…but there’s not group identity. Years ago Zack Ajmal of Harappa DNA concluded that a large and relatively related cluster in these data were “Patels.” Someone who is a Bohra Muslim of presumably Patel background sent me their data. They did not fall in the Patel cluster. Rather, they were in the “Gujurati_ANI_1” group, which is more like Pakistanis than other Gujuratis. In fact, the Gujurati Brahmin is not in this cluster. An individual whos Solanki seems to be more ASI-shifted, like the Patels and Gujurati_ANI_4.

Overall, Gujarat has a lot of population structure in a rather small state (yes, I can’t spell Gujarat as you can see in my population labels).

Click to enlarge!

From Maharashtra, right to the south of Gujarat in western India, I have two Brahmins and one Kayastha. For non-South Asians, my understanding is that Kayasthas are literate non-Brahmin castes. In Bengal, they take the places of the Kshatriya in the caste hierarchy, and with Brahmins formed the traditional Hindu educated classes. I have seen Bengali Kayastha genotypes, and they look rather like other Bengalis (my mother’s father’s family is from a Kayastha family before their conversion to Islam judging from their customary surname).

There are Kayasthas in other parts of South Asia. I have a Kayastha sample from Maharashtra. Curiously on the PCA this individual is in the same position as the two Brahmins from the region, and South Indian Brahmins. I don’t know what this means.

Click to enlarge!

Next some odds and ends from the northwest of the subcontinent. I have a few Jatts who are not related. This group from Punjab is quite ANI-shifted. Someone who claims to be a Rajput from Rajasthan is where they should be on account of geography. The Punjabi 1000 Genome group is quite diverse. I have a Ramgarhia individual who seems to be somewhere between Punjabi_ANI_1 and Punjabi_ANI_2. The Jatt are on the edge (ANI-shifted) of Punjabi_ANI_1.

I have two individuals who claim to be Kashmiri. A Butt and a Syed. I have no idea what that means. But both are Punjabi_ANI_2…but they look somewhat East Asian shifted. This is not surprising. The curious thing about Kashmiris is that they are culturally and geographically quite distinct from Indians to their south. But genetically they are not so different. In fact, they are “more South Asian” (ASI) than Jatt, and considerably more than Iranian speaking groups like Pathans.

Finally, there is a Marwari individual. This community is from Rajasthan, though they occupy a mercantile role across the subcontinent. Strangely (or not?) they are very close to the Patels. Much more ASI-enriched than the Rajput.

Click to enlarge!

Shifting to South Indian samples, I plotted the Chamar samples, who I believe were collected from Uttar Pradesh in the north. These Dalits actually seem to cluster with a subset of the 1000 Genomes Tamil and Telugu samples I believe are Scheduled Caste (Dalit) as well. The Chamar are somewhat distinct. They are more ANI-shifted. But notice that the bulk of Tamils and Telugus are still more ANI-shifted than the Chamars are! This surprised me.

I have some Velama individuals, as well as a Reddy from Andhra Pradesh and a Padmashali. All these individuals are in the main distribution of South Indians. I do have a Mudaliar Tamil sample, and this individual is placed among the Chamars. Though not really in the Tamil Scheduled Caste group.

Click to enlarge!

Finally some odds & ends. The Nasrani samples from Kerala are between the South Indian Brahmins and middle caste South Indians. I suspect this is due to the origin of the Nasranis in the Nair community, who have mixed some with Brahmins. The Vania sample from Gujarat is clustered with South Indian Brahmins. The Dusadhs, an agricultural group from Uttar Pradesh and Bihar, that is depressed in some manner in relation to the dominant groups, are not quite Chamars, but they are quite ASI-shifted.

Some of you will be asking about admixture. I ran K = 4 unsupervised on the data set. You can find it here.

January 20, 2018

South Asian Genotype Project, update

Filed under: South Asian Genetics,South Asian Genotype Project — Razib Khan @ 10:08 pm


I’ve been working on the South Asian Genotype Project. Again, if you are interested: send me a 23andMe, Ancestry, or Family Tree DNA raw genotype file to contactgnxp -at- gmail.com.

In the subject please put:

  1. “South Asian Genotype Project”
  2. The state/province your family is from
  3. Ethnolinguistic group
  4. If applicable, caste

I changed the reference populations because the earlier ones were too complicated. You can see the population averages from public data sets for some groups.  The results for project members are here. I re-ran everyone who has sent data in so far. I’ll leave commentary for later.

At this point, I think the easiest way to update project members is to create a mailing list. If you are have submitted genotypes, please join:

Subscribe to the South Asian Genotype Project

* indicates required




January 6, 2018

South Asian Genotype Project update

Just a quick update. I know I haven’t been responsive, but I’ve been traveling and spending time with the family and working a lot for the past few weeks. I’m going to make some revisions to my pipeline as well. I will get back to generating results soon (as in a week or so). So please keep sending data to contactgnxp@gmail.com.

December 2, 2017

South Asian Genotype Project

Filed under: Personal genomics,South Asian Genotype Project — Razib Khan @ 6:02 pm


It’s been a few years since I’ve done any serious “Genome Blogging.” Mostly I’ve been very busy and there isn’t much low-hanging fruit left as it is. But today I want to announce that I’ll be running the generically titled “South Asian Genotype Project.”

The way it works is simple: send me a 23andMe, Ancestry, or Family Tree DNA raw genotype file to contactgnxp -at- gmail.com (though 23andMe’s new chip has far less overlap with other platforms earlier, so probably best if you were typed before August 2017).

In the subject please put:

  1. “South Asian Genotype Project”
  2. The state/province your family is from
  3. Ethnolinguistic group
  4. If applicable, caste

In the body of the email you can put Y and mtDNA and any other information you want. Obviously your data is confidential and I won’t identify you by name, just ethnolinguistic group and such.

Since the last time I did this I have some scripts that make this a lot of easier, so hopefully I’ll be adding individuals to this spreadsheet every few days. I’ll give project members an ID and try to email them when the results are up.

The main motivator for this project on my part is that people still ask me questions about Sinhalese, Nasrani Christians, and other assorted groups which we don’t have answers to because current research projects haven’t focused on them.

Since Zack worked on the Harappa Ancestry Project we know a lot more about South Asian ancestry. Basically, there is an ANI-ASI cline, and some South Asians have exogenous ancestry off this cline. Indian Jews have Middle Eastern ancestry, while Bengalis have East Asian ancestry, and some groups in Pakistan have African ancestry. With that in mind I’ll be testing a smaller number of populations. The marker set is 240,000 SNPs by the way.

Below are some representative results. You can see that my results from three DTC services are basically the same. Also, some South Indian groups (see Pulliyar) show “Dai” ancestry, when I’m pretty sure it’s just that I didn’t sample as much on the extreme portion of the ASI-cline.

ID
Armenians
Belorussian
C_India
Dai
Nigerian
NWIndia
S_India
YemeniteJews
Balochi
34%
1%
0%
0%
0%
66%
0%
0%
Bangladesh_Razib (23andMe)
0%
0%
14%
14%
0%
15%
57%
0%
Bangladesh_Razib (Ancestry)
0%
0%
14%
14%
0%
15%
57%
0%
Bangladesh_Razib (ftDNA)
0%
0%
13%
14%
0%
15%
58%
0%
Chenchus
0%
0%
1%
1%
0%
0%
98%
0%
Dharkars
0%
0%
16%
2%
0%
38%
44%
0%
Dusadh
0%
0%
21%
1%
0%
2%
76%
0%
Iranians
65%
2%
1%
2%
0%
20%
0%
10%
Kallar
0%
0%
0%
0%
0%
0%
100%
0%
Kurumba
0%
0%
0%
0%
0%
4%
96%
0%
Meghawal
0%
0%
10%
0%
0%
26%
64%
0%
MumbaiJews
18%
0%
4%
0%
0%
39%
28%
11%
Naga
0%
0%
0%
90%
0%
0%
10%
0%
NorthKannadi
0%
0%
0%
2%
0%
0%
98%
0%
Pakistani
3%
7%
19%
6%
0%
38%
23%
4%
Pathan
12%
3%
1%
1%
0%
80%
3%
0%
TamilNadu_Iyer
0%
1%
2%
0%
0%
42%
54%
0%
TamilNadu_Nadar
0%
0%
0%
1%
0%
0%
99%
0%
UP_Kayastha
0%
0%
17%
2%
0%
42%
39%
0%
WestBengal_Kayastha
0%
2%
15%
6%
0%
14%
64%
0%
Pulliyar
0%
0%
0%
7%
0%
0%
93%
0%
DalitTN
0%
0%
0%
1%
0%
0%
99%
0%
Velama
0%
0%
9%
0%
0%
22%
68%
0%

South Asian Genotype Project

Filed under: Personal genomics,South Asian Genotype Project — Razib Khan @ 6:02 pm


It’s been a few years since I’ve done any serious “Genome Blogging.” Mostly I’ve been very busy and there isn’t much low-hanging fruit left as it is. But today I want to announce that I’ll be running the generically titled “South Asian Genotype Project.”

The way it works is simple: send me a 23andMe, Ancestry, or Family Tree DNA raw genotype file to contactgnxp -at- gmail.com (though 23andMe’s new chip has far less overlap with other platforms earlier, so probably best if you were typed before August 2017).

In the subject please put:

  1. “South Asian Genotype Project”
  2. The state/province your family is from
  3. Ethnolinguistic group
  4. If applicable, caste

In the body of the email you can put Y and mtDNA and any other information you want. Obviously your data is confidential and I won’t identify you by name, just ethnolinguistic group and such.

Since the last time I did this I have some scripts that make this a lot of easier, so hopefully I’ll be adding individuals to this spreadsheet every few days. I’ll give project members an ID and try to email them when the results are up.

The main motivator for this project on my part is that people still ask me questions about Sinhalese, Nasrani Christians, and other assorted groups which we don’t have answers to because current research projects haven’t focused on them.

Since Zack worked on the Harappa Ancestry Project we know a lot more about South Asian ancestry. Basically, there is an ANI-ASI cline, and some South Asians have exogenous ancestry off this cline. Indian Jews have Middle Eastern ancestry, while Bengalis have East Asian ancestry, and some groups in Pakistan have African ancestry. With that in mind I’ll be testing a smaller number of populations. The marker set is 240,000 SNPs by the way.

Below are some representative results. You can see that my results from three DTC services are basically the same. Also, some South Indian groups (see Pulliyar) show “Dai” ancestry, when I’m pretty sure it’s just that I didn’t sample as much on the extreme portion of the ASI-cline.

ID
Armenians
Belorussian
C_India
Dai
Nigerian
NWIndia
S_India
YemeniteJews
Balochi
34%
1%
0%
0%
0%
66%
0%
0%
Bangladesh_Razib (23andMe)
0%
0%
14%
14%
0%
15%
57%
0%
Bangladesh_Razib (Ancestry)
0%
0%
14%
14%
0%
15%
57%
0%
Bangladesh_Razib (ftDNA)
0%
0%
13%
14%
0%
15%
58%
0%
Chenchus
0%
0%
1%
1%
0%
0%
98%
0%
Dharkars
0%
0%
16%
2%
0%
38%
44%
0%
Dusadh
0%
0%
21%
1%
0%
2%
76%
0%
Iranians
65%
2%
1%
2%
0%
20%
0%
10%
Kallar
0%
0%
0%
0%
0%
0%
100%
0%
Kurumba
0%
0%
0%
0%
0%
4%
96%
0%
Meghawal
0%
0%
10%
0%
0%
26%
64%
0%
MumbaiJews
18%
0%
4%
0%
0%
39%
28%
11%
Naga
0%
0%
0%
90%
0%
0%
10%
0%
NorthKannadi
0%
0%
0%
2%
0%
0%
98%
0%
Pakistani
3%
7%
19%
6%
0%
38%
23%
4%
Pathan
12%
3%
1%
1%
0%
80%
3%
0%
TamilNadu_Iyer
0%
1%
2%
0%
0%
42%
54%
0%
TamilNadu_Nadar
0%
0%
0%
1%
0%
0%
99%
0%
UP_Kayastha
0%
0%
17%
2%
0%
42%
39%
0%
WestBengal_Kayastha
0%
2%
15%
6%
0%
14%
64%
0%
Pulliyar
0%
0%
0%
7%
0%
0%
93%
0%
DalitTN
0%
0%
0%
1%
0%
0%
99%
0%
Velama
0%
0%
9%
0%
0%
22%
68%
0%

Powered by WordPress