Razib Khan One-stop-shopping for all of my content

March 23, 2017

Your ancestry inference is precise and accurate(ish)

Filed under: 23andMe,Ancestry,Culture,Family Tree DNA,Genetics,Genomics — Razib Khan @ 6:29 am

For about three years I consulted for Family Tree DNA. It was a great experience, and I met a lot of cool people through that connection. But perhaps the most interesting aspect was the fact that I can understand the various pressures that direct-to-consumer genomics firms face from the demand side. The science is one thing, but when you are working on a consumer facing product, other variables come into play which are you not cognizant of when you are thinking of it from a point of pure analysis. I’m pretty sure that my insights working with Family Tree DNA can generalize to the other firms as well (23andMe, Ancestry, and Genographic*).

The science behind the ancestry inference elements of the product on offer is not particularly controversial or complex, but the customer aspect of how these results are received can become an intractable nightmare. The basic theory was outlined in the year 2000 in Pritchard et al.’s Inference of Population Structure Using Multilocus Genotype Data. You have lots of data thanks to better genomic technology (e.g., 300,000 SNPs). You have computers to analyze that data. And, you have scientific models of population history and dynamics which you can test that data against. The shape of the data will determine the parameters of the model, and it this those parameters that yield “your ancestry.”

In broad sketches the results make sense for most people. It’s in the finer details that the confusions emerge. To the left you see my son’s 23andMe ancestry deconvolution. The color coding is such you can tell that his maternal and paternal chromosomes have very different ancestry profiles (mostly Northern European and South Asian, respectively).

But his “Northern European” chromosomes also are more richly colored, with alternative segments denoting ancestry from different parts of Northern Europe. So in terms of proportions I am told my son is about 15 percent French and German, and 10 percent Scandinavian and 10 percent British and Irish. This is reasonable. On the other side he’s nearly 50 percent “broadly South Asian.” The balance is accounted for by my East Asian ancestry, which is correct, as my South Asian ethnicity is from Bengal, where there is a fair amount of East Asian ancestry (my family’s origin is on the eastern edge of Bengal itself).

And it is here that the non-scientific concerns of consumer genomics comes into focus. The genetic differences and distance between various South Asian groups are far higher than those between various Northern European groups. Depending on the statistic measure you use intra-South Asian variation is about one order of magnitude greater than intra-Northern European differences. This is due to geographic partitioning, the caste system, and differential admixture in South Asians between extreme diverged ancestral elements (about half of South Asian ancestry is very similar to Europeans and Middle Easterners, and half of it is extremely different, so how far you are from the 50 percent mark determines a lot).

Broadly South Asian

In Northern Europe there is very little genetic variation from the British Isles all the way the Baltic. The reason for this is historical: massive population turnover in the region 4,500 years ago means that much of the genetic divergence between the groups dates to the Bronze Age. It is this the genetic divergence, the variation, that is the raw material for the inferences and proportions you see in ancestry calculators. There’s just not that much raw material for Northern Europeans.

Broadly South Asian

Remember, the methods require lots of variation in the data as a raw input. You’re making the inference machine work real hard to produce a reasonable robust result if you don’t have that much variation. In contrast to the situation with Northern Europeans, with South Asians the companies are leaving raw material on the table, and just combining diverse groups together.

What’s going on here? As you might have guessed this is an economically motivated decision. Most South Asians know their general heritage due to caste and regional origins (though many Bengalis exhibit some lacunae about their East Asian ancestry). In contrast, many Americans of Northern European ancestry with an interest in genealogy are extremely curious about explicit proportional breakdowns between Northern European nationalities. The direct-to-consumer genomic firms attempt to cater to this demand as best as they can.

As I have stated many times, racial background is to various extents both biological and social. When it comes to the difference between Lithuanians and Nigerians the biological differences due to evolutionary history are straightforward, and clear and distinct. You can generate a phylogenetic history and perform a functional analysis of the differences. Additionally, you also have to note that the social differences exist, but are not straightforward. Like Lithuanians Nigerians of Igbo background are generally Roman Catholic, while most other Nigerians are not. The linguistic differences between Nigerian languages are great enough that it is defensible to suggest that Hausa speakers of Afro-Asiatic dialects are closer to Lithuanians in their phylogenetic history than to the dialects of the Yoruba.

A Lithuanian American

Contrast this to the situation where you differentiate Lithuanians from French. To any European the differences here are incredibly huge. The history of France, what was Roman Gaul, goes back 2,000 years. After the collapse of the West Roman Empire by any measure the people who became French were at the center of European history. In contrast, Lithuanians were a marginal tribe, who did not enter Christian civilization until the late 14th century. In social-cultural terms, due to history, the differences between French and Lithuanians are extremely salient to people of French and Lithuanian ancestry. But genetically the differences are modest at best.

If a direct-to-consumer genetic testing company tells you that you are 90 percent Northern European and 10 percent West African, that is a robust result that has a clear historical genetic interpretation. The two element’s of one’s ancestry have been relatively distinct for on the order of 100,000 years, with the Northern European element really just a proxy for non-Africans (though it is easy to drill-down within Eurasia). In contrast, notice how 23andMe, with some of the best scientists in the business, tells people they are “French-German,” and not French or German. What the hell is a “French-German”? Someone from Alsace-Lorraine? A German descendent of Huguenots? Obviously not.

“French-German” is a cluster almost certainly because there are no clear and distinct genetic differences between French and Germans. Yes, there is a continuum of allele frequencies between these two groups, but having looked at a fair number of people of French and German background in Family Tree DNA’s database I can tell you that France and Germany have a lot of local structure even among people of indigenous ancestry. Germans from the Rhineland are quite often genetically closer to French from Normandy than they are to Germans from eastern Saxony. Some of this is due to gene flow between neighboring regions, but some of this is due to cultural fluidity as to who exactly is German. It is clear that some Germans from the eastern regions are Germanized Slavs. Some Germans from the north exhibit strong affinities to Scandinavians, while Germans from Bavaria and Austria are classically Central European (whatever that means). The average German is distinct from the average French person, but the genetic clustering of the two groups is not clear and distinct.

Remember earlier I explained that the science is predicated on aligning data and models. The cultural model of Northern Europeans is conditioned on diversity and difference which has been very salient for the past few thousand years since the rise and fall of Rome. But the evolutionary genetic history is one where there are far fewer differences. The data do not fit a model that makes much sense to the average consumer (e.g., “you descend from a mix of Bronze Age migrants from the west-central steppe of Eurasia and Mesolithic indigenous hunter-gatherers and Neolithic farmers”). What makes sense to the average American consumer are histories of nationalities, so direct-to-consumer genetic companies try to satisfy this need. Because the needs of the consumer and their cultural expectations are poorly served by the data (genetic variation) and models of population history, you have a lot of awkward kludges and strange results.

Imagine, for example, you want to estimate how “German” someone is.  What do you use for your reference population of Germans?  Looking at the data there are clearly three major clusters within Germany when you weight the numbers appropriate, with affinities to the northern French, Slavs, and Scandinavians, and various proportions in between. Your selection of your sample is going to mean that some Germans are going to be more Germans than other Germans. If you select an eastern German sample then western Germans whose ancestors have been speaking a Germanic language far longer than eastern Germans are going to come out as less German. Or, you could just pick all of these disparate groups…in which case, lots of Northern Europeans become “German.”

Consumers want genetic tests to reflect strong cultural memories which were forged in the fires of rapidly protean and distinction-making process of cultural evolution. But biological and cultural evolution exhibit different modes (the latter generates huge between group differences) and tempos (those differences emerge fast). The ancestry results many people get are the outcomes of compromises to thread the needle and square the circle.

All the above is half the story. Next I’ll explain why “deep history” has to be massaged to make recent history informative and comprehensible….

* Also, I have a little historical perspective because of my friendship with the person who arguably created this sector, Spencer Wells.

January 8, 2013

Using your 23andMe data: exploring with MDS

Filed under: 23andMe,Genomics — Razib Khan @ 2:09 am

Note: please read the the earlier post on this topic if you haven’t.

The above image is from 23andMe. It’s from a feature which seems to have been marginalized a bit with their ancestry composition. Basically it is projecting 23andMe customers on a visualization of genetic variation from the HGDP data set. This is actually a rather informative sort of representation of variation. But there has always been an issue with the 23andMe representation: you are projected onto their invariant data set. In other words, you can’t mix & match the populations so as to explore different relationships. The nature of the algorithm and representation produces strange results, so varying the population sets is often useful in smoking out the true shape of things.

With the MDS feature I wrote about yesterday you can now compute positions with different weights of populations and mixes. This post will focus on how to manipulate the overall data set. You should have PHYLO from the the earlier post. Open up the .fam file. It should look like this:

Malayan A382 0 0 1 -9
Paniya D36 0 0 1 -9
BiakaPygmies HGDP00479 0 0 1 -9
BiakaPygmies HGDP00985 0 0 1 -9
BiakaPygmies HGDP01094 0 0 1 -9
MbutiPygmies HGDP00982 0 0 1 -9
Mandenkas HGDP00911 0 0 1 -9
Mandenkas HGDP01202 0 0 1 -9
Yorubas HGDP00927 0 0 1 -9
BiakaPygmies HGDP00461 0 0 1 -9
BiakaPygmies HGDP00986 0 0 1 -9
MbutiPygmies HGDP00449 0 0 1 -9
Mandenkas HGDP00912 0 0 1 -9
Mandenkas HGDP01283 0 0 1 -9
Yorubas HGDP00928 0 0 2 -9

And so forth. PHYLO has 1,500+ individuals. This is a bit much, which is why the – -genome command took so long. To ask particular questions it is often useful to prune the population down. I have a friend who is 1/4 Filipino who is curious as to whether his ancestry was more Chinese or native Filipino. How to answer this?

- You want a range of East Asian populations, north to south.

- You want a good out group. I’ll use the Utah whites.

All you need to do is go through the .fam file and keep only those lines you want, and put them into a new file, keep.txt. Then you run this command:

plink – -noweb – -bfile PHYLO – -keep keep.txt – -make-bed – -out PHYLONARROW

So I’ve now made a new pedigree data set which is a subset of the original. Now I merged my friend and my daughter’s genotype into this data set. What about if I wanted to remove some individuals, for examples, the ones in keep.txt? You do it like so:

plink – -noweb – -bfile PHYLO – -remove keep.txt – -make-bed – -out PHYLOAFEWGONE

With – -keep and – -remove, and making files drawn from the .fam file(s), you can customize your own data set for your own purposes. Again you want to produce an MDS, so run:

- -plink – -noweb – -bfile PHYLONARROW – -genome

-plink – -noweb – -bfile PHYLONARROW – -read-genome plink.genome – -mds-plot 6

This time – -genome will run very fast, because there are far fewer individuals. Here is my plot of the result of the outcome (my friend is “RF,” my daughter is “RD”):

Note that RF is aligned straight toward the “Dai” population, an ethnic group from South China, but not Han (they are related to the Thai). It seems plausible that my friend is of mixed Chinese and Filipino background. My daughter’s minimal East Asian ancestry is indeed Southeast Asian, and this is clear from this plot, as she is shifted further toward the Cambodians (this may be due to South Asian affinities as well).

The point is not to rely on one plot, but to generate many so as to explore the possibilities, and develop and intuition.

December 11, 2012

$99 for 1 million markers

Filed under: 23andMe,Personal genomics — Razib Khan @ 8:18 am

Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)

1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.

2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.

$99 for 1 million markers

Filed under: 23andMe,Personal genomics — Razib Khan @ 8:18 am

Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)

1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.

2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.

August 9, 2012

23andMe discount code (again)

Filed under: 23andMe,Personal genomics — Razib Khan @ 10:57 pm

At this point if you have spare cash why not shell out $300 for a raw copy of your genotype? (yes, I know 23andMe provides other services) I’m sure many readers spend $100 on nice meals now and then. That’s one day. Your genotype won’t ‘depreciate’ in a literal sense, and more practically until whole-genome sequencing gets affordable within the next decade (i.e., < 10 years) 1 million SNPs is a pretty good deal. And not to be morbid, but it is probably best to get older family members typed now (though if they have had hospital stays you can probably later retrieve genetic material, it will be a bureaucratic pain).

The reason I’m posting this now though is that I received a notification about a $50 discount code from 23andMe. Here it is: YHPRD7. It’s valid for the next few days. $50 isn’t trivial for most people, so perhaps it will prompt a few here to go and purchase.

April 17, 2012

Ancestry painting: true but trivial, or interesting but inaccurate

Filed under: 23andMe — Razib Khan @ 8:22 pm

23andMe has done some great things, and I highly recommend its service to friends. But I’m really glad that CeCe Moore is being consulted by them in regards to improving their ancestry feature set. Below are the “ancestry paintings” for myself & my daughter.

According to 23andMe I’m 40% Asian, and she is 8% Asian. Obviously something is off here. The situation easily resolved itself when I tuned my parameters and increased my sampled populations in Interpretome. But it just goes to show you the limits of this sort of thing without fine-grained control of the details of the analysis.

 

January 1, 2012

23andMe controversies in the genetic genealogy community

Filed under: 23andMe,Personal genomics — Razib Khan @ 3:11 pm

A few readers have pointed me to controversies having to do with 23andMe’s “terms of use”. You can read about it over at Your Genetic Genealogist, who has two posts up on the issues. I think the crux is that the early enthusiasts for personal genomics in the genetic genealogy community can not support the revenue needs of a firm like 23andMe. The question for the firm is how to expand its reach more fully into the domain of personalized healthcare, where the big money and mainstream impact is, without alienating these early adopters, who are not bashful about spreading bad buzz all over the blogosphere.

 

From what I can tell there’s a lot of confusion as to what’s going on. Myself, I don’t care about the details too much. My main interest is getting the raw data, I don’t pay that much attention to the various health & genealogy services that 23andMe provides. But I can understand why others feel differently. I also know that 23andMe is not irrational, and is trying to run a firm which can generate a profit. They’re not a charity.

The key is how they can make the “person on the street” more interested.  I have purchased eight accounts in their system, most of them with the monthly personal genome service fees. It’s pretty clear that most of the people who I’ve purchased these accounts for don’t play close attention to the results. Yes, they were curious, but they haven’t kept up with the health report updates, or explored the other services. Obviously I’m going to cancel the subscriptions for that reason, as I’m not interested in paying for a service that’s not being utilized.

I wish 23andMe, and all the new personal genomics firms, the best of luck. This is a time of great change, and I think in 2020 this sort of service is going to be a seamless part of our lives. But working out the details isn’t always going to be without error (my own suggestion would be a reversion to more fine-grained service with the subscriptions). Life comes at you fast….

August 2, 2011

23andMe $50 off coupon code

Filed under: 23andMe,Personal genomics — Razib Khan @ 8:15 am

From 23andMe: “To show our appreciation and to encourage others to join in this research revolution we are giving you a $50 coupon that you can share with as many people as you like. This coupon expires in 7 days (August 9, 2011) so make sure you get the word out fast.” At current prices that works to 24% off for the yearly price ($9/month X 12 months + $99).

(this is for “new customers only”)

June 15, 2011

Two public genotypes

Filed under: 23andMe,Personal genomics,Public Genotypes — Razib Khan @ 11:50 am

First, Sam Snyder. Here’s the link to the file in dropbox.

Second, Heather Frawley. I’ve uploaded her text file as well as pedigree format at RapidShare as a zip file. Click “Free Download” at the bottom right of the page. It’ll take about ~5 minutes to pull down the 10 MB file.

Remember, if you want to have your public genotype posting publicized or want me to upload and format it, email me at contactgnxp -at- gmail -dot- com.

June 13, 2011

The ethnic breakdown of 23andMe customers

Filed under: 23andMe,Personal genomics — Razib Khan @ 7:45 pm

According to Your Genetic Genealogist, it is:

1000 African American
3500 Latino/Hispanic
5500 East Asian
3400 South Asian
4900 Southern European
6200 Ashkenazi Jewish
56,000 Northern European
1,000 First generation from two continents

I’m kind of surprised that there are so few African Americans, since the marginal return on ancestry matching technologies for the black American community is going to be higher than for other groups. If these numbers are true then I have on the order of ~10% of the 23andMe genotypes for black Americans in the African Ancestry Project. Zack Ajmal referring to the over 3,000 South Asians quips: “Now if 10-20% of them would participate in Harappa Ancestry Project!” My main concern is that if HAP gets more well known Zack will have hundreds of Tamil Brahmins sending him pretty much duplicate genotypes.

April 20, 2011

23andMe, Stanford, personal genomics study

Filed under: 23andMe,Genetics,Genomics,Open genomics,Razib Khan — Razib Khan @ 11:43 pm

Call to Participate in a New Study on Social Networking and Personal Genomics:

Do you share your information with others? How has your personal genetic information influenced your lifestyle and the way you approach your health and medical decisions? Can genetic information create new communities and connections?

The Social Networking and Personal Genomics Study at the Center for Biomedical Ethics invites participants between the ages of 18 and 75 to spend approximately 2 hours with us in a focus group setting. Participants must have purchased direct-to-consumer personal genetic information from 23andMe, Inc., shared their information with others, and be willing to discuss their perspectives and experiences. Focus group members will receive a $50 gift card for their participation and childcare will be available on an as-needed basis at no cost. For additional information or to enroll, please contact Simone Vernez, Project Manager, by email at svernez@stanford.edu or by telephone at (650) 723- 9364. For more information on the study itself, including specific research aims and funding please visit http://bioethics.stanford.edu/research/SocialNetworkingandPersonalGenomics.html. For general information about participant rights, contact 1-866-680-2906.


I released my 1,000,000 SNPs into the public domain yesterday. Why? To borrow a line from William Jefferson Clinton: because I could. And ...

April 15, 2011

Checking for Alzheimer’s risk with 23andMe

Filed under: 23andMe,Alzheimer's Disease,Health,Human Health — Razib Khan @ 10:21 am

Dr. Daniel MacArthur at Genomes Unzipped:

23andMe announced yesterday that it will now be releasing information on Alzheimer’s disease risk markers in the APOE gene to customers who purchased their recently upgraded v3 test. The APOE markers are famously associated with a major increase in risk for late-onset Alzheimer’s, with individuals carrying two copies of the ε4 version of the gene being around 15 times more likely than average to develop the disease. Customers who have been tested on the v3 platform will be able to able to access their APOE status after “unlocking” it; customers on earlier versions of the test will need to upgrade to get access. You can see screenshots of the unlocking and results pages here.


I don’t put much weight on 23andMe’s disease risk estimates since I have a relatively large pedigree, and my four grandparents all made it at least to age 75 (one made it to 100, and two to 80+), so I have some sense of my odds of late onset diseases. But, I will admit I was still a little anxious when “unlocking” my results for this locus. This is a classic “tail risk” event which hooks into all the ...

April 10, 2011

23andme Sale Tomorrow (April 11th, 2011)

Filed under: 23andMe,DNA Day,Genetics — Razib Khan @ 9:02 am

23andMe Sale tomorrow:

For a limited time, you can order a 23andMe kit for $0 up front, plus a 12-month commitment to our Personal Genome Service® at $9/month. This is down from the regular price of $199 plus $9/month.

This promotional price will be available from 12:00AM PST until 11:59PM PST on Monday 4/11/11, or while supplies last!

Update: Sale is a go right now. 5 kits per person.

March 30, 2011

Personal genomics gets very personal

Filed under: 23andMe,Personal genomics — Razib Khan @ 10:58 pm

Dan MacArthur points me to this nice post over at Daily Kos, Our Genome Decoded: How Companies Like 23andMe Are Advancing the Field of Personal Genomics:

…However, in the past few years several private biotech companies have started offering a “personal genome service” that involves sequencing the most variable portions of our DNA. The goals are straightforward – to give individuals information about their ancestry and inherited traits. While there are definite limitations – both technically and bioethically – to the amount and type of information that can be obtained from personal genome sequencing, in my case the service answered a lingering question about something important to me, and thus was well worth it.

In this article, I’m going to tell the story about why I chose to purchase a personal genome service, briefly explain how it works, show my interesting results, and finally, provide some commentary on how these services will impact the fields of genomics and medicine.

One step at a time. I also appreciate that Michelle keeps posting on her ADMIXTURE results.

March 9, 2011

Your genes, your rights – FDA’s Jeffrey Shuren misleading testimony under oath

Filed under: 23andMe,FDA,Genetics,Genomics,Jeffrey Shuren,Select Post — Razib Khan @ 12:05 pm

Over the past few days I’ve been very disturbed…and angry. The reason is that I’ve been reading Misha Angrist and Dr. Daniel MacArthur. First, watch this video:

In the very near future you may be forced to go through a “professional” to get access to your genetic information. Professionals who will be well paid to “interpret” a complex morass of statistical data which they barely comprehend. Let’s be real here: someone who regularly reads this blog (or Dr. Daniel MacArthur or Misha’s blog) knows much more about genomics than 99% of medical doctors. And yet someone reading this blog does not have the guild certification in the eyes of the government to “appropriately” understand their own genetic information. Someone reading this blog will have to pay, either out of pocket, or through insurance, someone else for access to their own information. Let me repeat: the government and professional guilds which exist to defend the financial interests of their members are proposing that they arbitrate what you can know about your genome. A friend with a background in genomics emailed me today: “If they succeed in ramming this through, then you will not be able to access your ...

February 8, 2011

Dodecad open for submissions

Since I know plenty of friends are getting, or just got, their V3 results, I thought I’d pass this on, Open-ended submission opportunity for 23andMe data (#2):

Who is eligible

Everyone who is of European, Asian, or North African ancestry and all four of his/her grandparents are from the same European, Asian, or North African ethnic group or the same European, Asian, or North African country.

Also, Zack has more than 30 individuals in HAP. The “cow belt” is still way underrepresented. The only Bengalis in the data set are my parents.

February 3, 2011

Why siblings differ differently

The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.

Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between ...

February 1, 2011

My family’s Neandertal genes, ii

Last week I reported that it turns out that one of my siblings carry a possible Neandertal haplotype on the dystrophin gene. To review, it seems likely that ~3% of the average non-African’s genome is derived from Neandertal populations. But by and large this ancestral quantum seems broadly dispersed through the genome of individuals, so that there isn’t a particular set of loci which are Neandertal, as such. As an analogy, about ~20-25% of the genome of an average black American is derived from Europe because of white American ancestry. But you can’t usually predict from that on which locus the “white” alleles will be found. The main exception to this will be loci where you might suspect selection will be operative, such as those implicated in malaria defense (some of them have negative consequences).

The dystrophin haplotype though has higher frequencies in some populations than expectation. ~9% in non-Africans as a whole, and higher in some groups. So there was a reasonable expectation that people might find that they carried it snooping through their genomes. Now that my parents (RF and RM) have come through, as well as sibling #2 (RS2), I can show you this:

SNPs rs1456740 rs6628685 rs331370 rs2854965 rs6653863 rs331369 rs331368 rs331367 rs331366 RF A A T G A A T T T RS1 A/G A/A C/T G/G A/G A/C T/T T/T G/T Razib G A C G (not typed) C T T G RM A/G A/A C/T G/G A/G A/C T/T T/T G/T RS2 A A (no ...

January 29, 2011

“Asian” in all the right places


mtDNA haplogroup G1a2

The pith: In this post I examine the most recent results from 23andMe for my family in the context of familial and regional (Bengal) history. I also use these results to offer up a framework for the ethnognesis of the eastern Bengali people within the last 1,000 years, and their relationship to other South Asian and Southeast Asian populations.

Since I received my 23andMe results last May I’ve been blogging about it a fair amount. In a recent post I inferred that perhaps I had a recent ancestor who was an ethnic Burman or some related group. My reasoning was that this explained a pattern of elevated matches on chromosomal segments with populations from southwest China in the HGDP data set. But now we have more than my genome to go on. This week I got the first V3 chip results from a sibling. And finally, yesterday the results from my parents came in. One thing that I immediately found interesting was my father’s mtDNA haplogroup assignment, G1a2. This came from his maternal grandmother, and as you can see it has a distribution which ...

January 27, 2011

Neandertal (haplotype) in the family!

There is pretty much a 100% probability that I carry Neandertal origin genes, since I’m Eurasian. That being said, I hadn’t looked too closely into the matter in regards to my own genome, because the whole “which SNPs are Neandertal” issue has been pretty dicey. But after the “Neandertal dystrophin” paper sniffing for whether you carry a specific Neandertal haplotype got a whole lot easier. The authors provided the markers and their associated haplotypes within the paper. So if the B006 haplotye is Neandertal, by looking at your markers in 23andMe through the browse raw data feature you can figure out what your lineage is, and see if you are indeed “Neandertal” on that locus. Since it’s on the X chromosome, males will carry only one copy of the gene. On the other hand, if you’re a woman you’ll have two copies, so ascertaining what specific combination of markers you have spanning a particular genomic segment can be more difficult (the results are not “phased,” so you don’t know if the allele is from the mother or father on any given genotype). But inferring the sequence of markers on a strand of DNA is much easier if ...

Older Posts »

Powered by WordPress