Razib Khan One-stop-shopping for all of my content

December 18, 2012

Unveiling the genealogical lattice

To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.

Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.


This is on my mind because of the emergence of packages such as TreeMix and AdmixTools. Using software such as these on the numerous public data sets allows one to perceive the reality of admixture, and overlay lateral gene flow upon the tree as a natural expectation. But perhaps a deeper result is the character of the tree itself is torn asunder. The figure above is from a new paper, Efficient moment-based inference of admixture parameters and sources of gene flow, which debuts MixMapper. The authors bring a lot of mathematical heft to their exposition, and I can’t say I follow all of it (though some of the details are very similar to Pickrell et al.’s). But in short it seems that in comparison to TreeMix MixMapper allows for more powerful inference of a narrower set of populations, selected for exploring very specific questions. In contrast, TreeMix explores the whole landscape with minimal supervision. Having used the latter I can testify that that is true.

The big result from MixMapper is that it extends the result of Patterson et al., and confirms that modern Europeans seem to be an admixture between a “north Eurasian” population, and a vague “west Eurasian” population. Importantly, they find evidence of admixture in Sardinians, which implies that Patterson et al.’s original were not sensitive to admixture in putative reference populations (note that Patterson is a coauthor on this paper as well). The rub, as noted in the paper, is that it is difficult to estimate admixture when you don’t have “pure” ancestral reference populations. And yet here the takeaway for me is that we may need to rethink our whole conception of pure ancestral populations, and imagine a human phylogenetic tree as a series of lattices in eternal flux, with admixed nodes periodically expanding so as to generate the artifice of a diversifying tree. The closer we look, the more likely that it seems that most of the populations which have undergone demographic expansion in the past 10,000 years are also the products of admixture. Any story of the past 10,000 years, and likely the past 100,000 years, must give space at the center of the narrative arc lateral gene flow across populations.

Cite: arXiv:1212.2555 [q-bio.PE]

December 10, 2012

Is Daniel MacArthur ‘desi’?

My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).

Here’s my daughter’s new chromosome painting:

One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!

But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.

In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:

As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.

What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.

I did two things:

1) I ran ADMIXTURE at K = 3, unsupervised

2) I ran an MDS, which visualized the genetic variation in multiple dimensions

Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.

ADMIXTURE results for chromosome 10
K 1 K 2 K 3
CEU 0.04 0.02 0.93
GIH 0.87 0.05 0.08
CHD 0.01 0.97 0.01
Daniel MacArthur 0.29 0.07 0.64
Ilana Fisher 0.01 0.06 0.94

You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:

And here’s the MDS with the two largest dimensions:

Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.

Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what  recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.

Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.

December 1, 2012

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS

 

The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS

 

The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

October 10, 2012

A plea for population genetics

Filed under: Population genetics — Razib Khan @ 9:31 pm

The title here is somewhat misleading. This is not just a plea for population genetics, but for quantitative genetics as well. Genetics is a big field. But today it is defined by and large by DNA, the concrete entity in which the abstraction of the gene is embedded. Look at the header of this website, or the background to my Twitter account. Mind you, I’m pathetically informed about molecular genetics, and don’t have a strong interest in the topic! I did consider using the H.W.E. or the breeder’s equation for the header, but in the end I judged it too abstruse and unfamiliar to most readers. DNA dominates when it comes to the modern mental conception of genetics, and we have to live with it to some extent.
But there is also great value in the genetics which has intellectual roots in the pre-DNA Mendelians and biometricians. This genetics exhibits a symbiotic, but not necessary, association with genetics as a branch of biophysics. Yet I come here not to insult or impugn my friends who toil in the trenches of the molecular wars. Rather, I simply want ...

September 27, 2012

Paleopopulation Genetics

Filed under: Genetics,Population genetics — Razib Khan @ 9:57 pm

It seems a new field is being born! Jeff Wall & Monty Slatkin have a pretty thorough review out, Paleopopulation Genetics:

Paleopopulation genetics is a new field that focuses on the population genetics of extinct groups and ancestral populations (i.e., populations ancestral to extant groups). With recent advances in DNA sequencing technologies, we now have unprecedented ability to directly assay genetic variation from fossils. This allows us to address issues, such as past population structure, changes in population size, and evolutionary relationships between taxa, at a much greater resolution than can traditional population genetics studies. In this review, we discuss recent developments in this emerging field as well as prospects for the future.

Nothing very new for close readers of this weblog, but the references are useful for later mining.

August 28, 2012

Evolutionary & population genetics preprints – Haldane’s Sieve

OK, perhaps I can help with that. Dr. Coop speaks of the collaboration between himself & Dr. Joseph Pickrell, Haldane’s Sieve, which I added to my RSS days ago (and you can see me pushing it to my Pinboard). From the “About”:

As described above, most posts to Haldane’s Sieve will be basic descriptions of relevant preprints, with little to no commentary. All posts will have comment sections where discussion of the papers will be welcome. A second type of post will be detailed comments on a preprint of particular interest to a contributor. These posts could take the style of a journal review, or may simply be some brief comments. We hope they will provide useful feedback to the authors of the preprint. Finally, there will be posts by authors of preprints in which they describe their work and place it in broader context.

We ask the commenters to remember that by submitting articles to preprint servers the authors (often biologists) are taking a somewhat unusual step. Therefore, comments should be phrased in a constructive manner to aid the authors.

It might be helpful if other evolution/genetics bloggers ...

July 22, 2012

What is inbreeding?

Filed under: Inbreeding,Population genetics — Razib Khan @ 9:23 pm

I’ve put up a bunch of posts relating to inbreeding recently (1, 2, 3, 4). But I haven’t really defined it. First, let’s stipulate what inbreeding is not: it is not the same as incest. Acts of incest can include individuals who have no blood relationship to each other (e.g., Hamlet). Additionally, there are instances of inbreeding which are not necessarily incestuous. If a population is highly inbred, then individuals who are not relations by social custom may still be so genetically similar to a point where the pairing can not credibly be stated as an outcross. But still, what do I mean? To refresh myself I re-read the section on inbreeding in Hartl & Clark. And I think that helped clarify one implicit assumption which I have which may not be clear to everyone, and I’ll get to that.

In any case, first, what’s the deal with inbreeding? The short answer is that inbreeding is a measure of the probability of identity by descent of two alleles at a given locus in a given individual. This concise definition itself is the problem. These are all abstract concepts, close to being ...

July 1, 2012

On theoretical evolutionary genetics

Filed under: Population genetics,theoretical evolutionary genetics — Razib Khan @ 10:13 pm

Joe Felsenstein in the comments:

The books you have listed are good ones, by fine people. But may I immodestly suggest a book of mine? If you want to work your way through the theory of theoretical population genetics, I have set of notes for my Genome 562 course, a textbook. It is a freely-downloadable PDF (start with my website by clicking on my name in this comment). It’s not for everyone but I think those interested in knowing how the theory actually works in more detail will benefit from it. As it’s free, I have no monetary interest in calling your attention to it, just pure ego. (And if you want a one-locus population genetics simulation program, try PopG at my lab’s website too — Google “Felsenstein PopG”).

Many of the books I recommended below are rather expensive. Theoretical Evolutionary Genetics (PDF) is not. Unfortunately much of the discourse of contemporary science is beyond the financial means of much of the world’s population, whether it be in university press textbooks or gated journals. So I’m quite happy in putting up a link to this text-in-progress.

Learning population and evolutionary genetics

Filed under: Population genetics — Razib Khan @ 11:44 am

A reader emailed me to ask what I thought would be a good way to better understand some of the more technical posts I put up.

First, two course notes which I’ve found useful as personal references:

- EEB 5348 — Population Genetics

- Evolutionary Quantitative Genetics, Uppsala University (if you are ambitious, bookmark this too)

Some people might argue that John Gillespie’s Population Genetics: A Concise Guide (Kindle edition) is a touch too abstruse and cryptic for the introductory reader. It’s short, and the mathematics isn’t challenging, but because of its concision the author can sometimes unleash upon your nearly cryptic formalism, perhaps defeating the purpose of a soft introduction in the first place. To get the most out of this book you probably ironically have to have a more thorough textbook on hand to clear up those particular points which you find confusing. But to get the general logic of population genetics and establish familiarity this seems to be the right entry point (assuming you’re not to terrified by algebra).

Of course most readers of this weblog are focused ...

March 7, 2012

Where the wild clines aren’t

Filed under: Anthroplogy,Human Genetics,Human Genomics,Population genetics,race — Razib Khan @ 7:38 pm

In the recent ‘do human races’ exist controversy Nick Matzke’s post Continuous geographic structure is real, “discrete races” aren’t has become something of a touchstone (perhaps a post like Cosma Shalizi’s on I.Q. and heritability).* In the post Matzke emphasized the idea of clines, roughly a continuous gradient of genetic change over space. Fair enough. But in the map above I traced two linear transects. I would suggest that anyone who has a general understanding of the demographics of South-Central Eurasia would immediately anticipate that these transects would reveal a relatively sharp break in allele frequencies. True, there are intermediate populations between the two end points, in Nepal, and on the fringes of India’s northeastern states. But clearly about halfway through the southwest-northeast transect you’ll see a rapid shift in allele frequencies. The blue transect is different, insofar as the change occurs very near its eastern pole. In Bengal, 85% of the length of the transect from its western terminus, the populations will still be far closer genetically to those on the western pole than those just to the east!

 

I thought of this when I saw that Zack had posted a Tibetan data set from Qinghai. As the crow flies Qinghai is closer to the plains of North India than peninsular South India, but Zack found Tibetans from this region to be only ~1 percent South Asian. That’s likely to be close to noise. I assume this does not surprise anyone. Despite the fact that North India is very populous in relation to Tibet, it turns out that geographical barriers are very strong in discouraging gene flow (note that Tibet and North India are actually culturally related; Tibetan Buddhism has its origins in the Tantric Buddhism of Bengal). This is one of my major “beefs” with the idea that “race does not exist” because of clines. I think this is a robust point when it comes to there being no Middle Eastern race vs. Scandinavian race. The clines are real and gradual between these two population sets. But I do think there has been strong differentiation between populations from the antipodes of Eurasia. I suspect that the emergence of more flexible lifestyles (e.g., oasis agriculture, horse nomadism) has in fact resulted in far greater connections between the isolated zones of Western and Eastern Eurasia over the past 10,000 years than before. In fact, one can conceptualize it as a two fold process. On the one hand you had very powerful expansions from small initial founder groups across macro-regions such as Western Eurasia and the Far East. This resulted in a decrease of genetic difference within these zones through the power of homogenization, though increased Fst in the few zones of direct contact across the zones. But, the “empty zones” of Central Eurasia may also have filled up with”proto-”Silk Road” centers over the past ~10,000 years, resulting in more frequent long term connections between the macro-regions than had heretofore been possible.

* I guess I should divulge that I have socialized with Nick Matzke an that we share common friends.

February 12, 2012

The social and biological construction of race

Filed under: Anthroplogy,Hispanics,Latinos,Population genetics,race — Razib Khan @ 2:45 pm

Many of our categories are human constructions which map upon patterns in nature which we perceive rather darkly. The joints about which nature turns are as they are, our own names and representations are a different thing altogether. This does not mean that our categories have no utility, but we should be careful of confusing empirical distributions, our own models of those distributions, and reality as it is stripped of human interpretative artifice.

I have argued extensively on this weblog that:

1) Generating a phylogeny of human populations and individuals within those populations is trivial. You don’t need many markers, depending on the grain of your phylogeny (e.g., to differentiate West Africans vs. Northern Europeans you actually can use one marker!).

2) These phylogenies reflect evolutionary history, and the trait differences are not just superficial (i.e., “skin deep”).

The former proposition I believe is well established. A group such as “black American” has a clear distribution of ancestries in a population genetic sense. The latter proposition is more controversial and subject to contention. My own assumption is that we will know the truth of the matter within the generation.


A black American

But that is the biological construction of race. Subject to fudge and fuzziness, but mapping upon a genuine reality. What about the social construction? Due to its flexibility this is a much more difficult issue to characterize in a succinct manner. Consider the cultural conditionals which render G. K. Butterfield “black” and Luis Guzman “Hispanic.” Both individuals are products of an admixture between people of mixed African and European ancestry (and likely some Amerindian in Guzman’s case). It turns out that the genes have segregated out such that Butterfield reflects more his European ancestry in traits. Guzman’s phenotype is more mixed. The perception of these two individuals is weighted by two different strains in modern American racial ideology. First, that of hypodescent, where one drop of black blood means that an individual is black, without equivocation. Halle Berry appealed to this framework to argue why her daughter, who is less than 1/4 African in ancestry (Berry’s African American father almost certainly had some European ancestry) was black. No matter that hypodescent’s origins were to buttress white racial supremacy and purity. Today black Americans espouse for purposes of community solidarity (the black American community as we know it is a partly a product of hypodescent which forced mixed-race blacks into the African American community).


Not a black American

The second issue, which has crystallized in our time, but has roots back decades, is the peculiar position of “Hispanics/Latinos” in the American racial system. As A. D. Powell has observed Hispanics seem to be able to evade the one drop rule, unless their African features are extremely dominant (e.g., pre-skin whitening Sammy Sosa). I’ve looked at the genotypes of enough Latin Americans to assume that some level of African ancestry (e.g., ~5%) is present in the vast majority of those who are not the children of recent European immigrants or from indigenous communities. For example, Mexico’s large slave population seems to have been totally absorbed, to the point where their past existence has been nearly forgotten. Mexicans of mestizo or white identity routinely have African ancestry, they just don’t know it, nor is it part of their racial identity. And it isn’t just Latinos. People of Middle Eastern ancestry, in particular Arabs, often have some African ancestry. But they are not classified as black (unlike Hispanics/Latinos they don’t have their own ethnic category, but are put into the “white” box, irrespective of their race, from Afro-Arab to Syrian).

This broader coexistence of frameworks persists on the implicit level. We don’t usually explicitly flesh out these details. Rather, we take these social constructions as givens. The major problem is when the problems and artificialities of these social constructions begin to bleed over into attempts to understand patterns of biological variation. Because of America’s fixation on the black-white dichotomy rooted in skin color people routinely offer up the fact that the human phylogeny is not well correlated with pigmentation as a refutation of the concept of race. What biology is doing is refuting a peculiar social construction of race. It is not negating the reality of human population substructure. Sociology and culture anthropology are empires of imagination to a much greater extent than human biology.

I’m thinking of this because with the birth of my daughter I confronted the bleeding over of the social into the biological. For medical purposes her race had to be assessed. One side of her ancestry was not problematic; white European. But I had to argue for why her other half should not be listed as “Asian.” For sociological purposes I have no great issue with the term Asian American which is inclusive of South and East Asians (I am not denying that this a recent political identity, I am saying that I do not personally find it objectionable and routinely enter my race as “Asian American” into public forms). But for biological purposes this is an incoherent and misleading classification. I know when my sister was born my parents put her race as “Asian,” which even at the time I felt was totally without purpose as far as biological taxonomy went. At the end of it all my daughter had “South Asian” entered in by hand. Better that her information be discarded than aggregated into a data set in a misleading fashion.

Obviously disentangling the social and biological is not necessarily impossible. Rather, it takes a little care and explicitness, as it is so easy to move between the two domains so easily as to elide their differences. And to some extent they do inform each other. Personal genomics is adding a new twist, but the general problem is as old as human systematics. The only cure is care.

Image credit: Wikipedia

January 26, 2012

1 migrant needed to prevent genetic divergence

Filed under: 1 migrant rule,Conservative Genetics,Population genetics — Razib Khan @ 2:09 am

In the survey below I asked if you knew about how many migrants per generation were needed to prevent divergence between populations. About ~80 percent of you stated you did not know the answer. That was not totally surprising to me. The reason I asked is that the result is moderately obscure, but also rather surprisingly simple and fruitful. The rule of thumb is that 1 migrant per generation is needed to prevent divergence.*

It doesn’t tell you much in and of itself of course. But if you think about it you can inject that fact into all sorts of other population genetic phenomena. For example, to have selection across two populations which is not reducible to selection within those populations (i.e., inter-demic selection) you need group-level genetic differences. These differences can be measured by the Fst statistic. In short the value of Fst tells you the proportion of variation which can be attributed to between-group differences (e.g., Fst across human races is ~0.15). For natural selection to have any adaptive effect you also need heritable variation. If you have lots of heritable variation selection can be weaker, while if you have little heritable variation selection has to be very strong (see response to selection). Fst is a rough gauge of heritable variation when you are evaluating group level differences. An Fst of 1.0 would imply that the groups are nearly perfectly distinct at the loci of interest, while an Fst of 0.0 would imply that the groups are not genetically distinct at all. With no distinction selection would have no efficacy in terms of driving adaptation. All this is a long way to saying that the 1 migrant rule is one reason that evolutionary biologists take a skeptical position in relation to group selection. It tends to quickly erase the variation which group selection depends upon.

 

To make it concrete here is the equation which you use to generate the equilibrium F statistic:

In this formula N = the population size, and m = the proportion of migrants within the population within a given generation. Nm then works out to be the number of migrants in any given generation. So 1 migrant per generation would mean for 1,000 individuals m = 0.001. For 100, the m = 0.01. To see the power of a given number of migrants per generation on long term Fst, the measure of between population difference, I’ve plotted some computed results below (Fst y-axis, Nm on the x-axis).

 

This should make intuitive sense. If there is no migration (gene flow) between populations then over the long term they become perfectly distinct. As you increase migration naturally that is going to homogenize differences between populations. But I suspect the question you may still have is how is it that only a few individuals are necessary in even large populations to prevent differentiation?

Here the intuition is simple. In a neutral scenario between-population differences emerge as gene frequencies change over time. The generation to generation change is inversely proportional to population. This is simply the sample variance or transmission noise. The expected deviation is going to be proportional to 1/N, where N is the population (2N for diploid). As N gets rather large you converge upon zero. So as the population gets very large there is less and less divergence which may occur in one given generation. In contrast you have a lot of generation to generation variation, and rapid change in frequency, in a small population. So why only 1 migrant? In a large population 1 migrant does not effect much change, but much change is not necessary. In a small population it has much more impact, but the generation to generation change is also much bigger. These two dynamics work at cross purposes so that the number of migrants needed remains relatively insensitive to population size.

* This is the result derived from population genetics, some ecological geneticists have made the case that you may actually need 10 migrants, 1 being the lower boundary.

Image credit: Wikipedia

December 18, 2011

Nature really is real

Filed under: Human Genetics,Population genetics — Razib Khan @ 2:30 am

I generated the figure at left from table 9.6 in The Genetics of Human Populations. This book was published in 1971, but I purchased the 1999 edition (which was simply a republication of the original text by Dover) in 2005.* At the time I recall reading the section on inferring the number of genetic loci implicated in the variation in pigmentation with some mild skepticism. The authors, L. L. Cavalli-Sforza and W. F. Bodmer pegged the black-white difference due to ~4 genes. Their data set consisted of individuals of various races in Liverpool; whites, blacks, people with one white parent and one black parent (F1 hybrids), people with three grandparents of one race and one of another (“backcrosses,” where you take an F1 and mate them with one of the parental lines), and finally, F2 individuals who are the product of pairings of F1s.


To come to the estimate the authors made some assumptions. For example, they assumed that blacks and whites were disjoint on the genes which encoded skin color in terms of their variants. Because these two populations lay at the opposite poles of the phenotypic distribution for humans it’s a natural assumption, but they had nothing to go on besides their hunch at the time. It turns out though that to a good first approximation this is actually a valid assumption. If you assume that the two populations are fixed at the allelic variants, that they don’t have segregating alleles which encode variation, then whites and blacks should exhibit the same variance due to environmental forces. This is what the authors saw. Using skin reflectance measures it seems that blacks and whites varied the same amount about their mean. If the two populations are approximately homozygote then the F1 generation, which are heterozygotes, should be between the two parental populations in trait value, but not exhibit much greater variance. Recall that they’d inherit a black and white copy at every locus. Therefore, all the variance in this population should also be environmental, rather than genetic. The real action comes in the backcrosses and the F2 generation. In these two populations segregation will result in a genetic variance component which will inflate the total variance. Therefore, genetic variance on this trait can be estimated like so:

Genetic variance = Total variance – Environmental variance

Recall that we already estimated environmental variance earlier. So genetic variance can be inferred by subtraction. Why do we see this pattern? Think about what happens when F1′s cross at a single locus. They’re heterozygotes. 50% of their offspring will be of like genotype. But 50% will revert back to one of parental genotypes. This naturally results in increased variance. Similarly, with a backcross 50% of the offspring will be heterozygotes, while 50% will be homozygotes.

Now let focus on a term, a, which defines the additive effect of a variant. It turns out that for their data set they didn’t see any dominance effect, so the model is rather simple here where you have an environmental component, and an additive genetic component. The variance of this is:

Va ~ 1/2 ∑a2, 1/2 × the sum of a2

As per your model you can replace the sum by a multiplicative factor, the number of genes which produce the additive effect, k, and turn the additive effect into a mean. So you have:

2Va ~ k × mean(a)2

Now, recall that we have the mean values for whites and blacks. 1/2 of the difference between these is equal to:

k × mean(a)

In this system so far we know a, and we know the mean values for the parental populations. What we don’t know is k. So we need to set up the equation so that k is the unknown which we’re computing with the values we have. Some algebra leads to this formula:

k = [1/2(mean value white - mean value black)]2/(2Va) (if you put k × mean(a) into the numerator in the right spot you can get 2Va to work out)

From their values:

k = (0.098)2/(2 × 0.001215) ~ 4

k ~ 4 means that they estimate from the variance of effects there are 4 genes. When I first saw this I thought that the result was rather crude. But it turns out they were about right! In some ways they got lucky; pigmentation is notoriously large effect ‘polygenic’ trait. But it’s still rather awesome to se that old genetic methods can yield answers which are validated in our time.

Addendum: Just to be clear, some of the data here is really rough & ready. The inference of 4 genes has a huge error due to small sample sizes in some of the sets (e.g., F2). And yet it turned out they were about right! Some of this may have been luck, but in this case the trait really was only barely polygenic.

* I just pretended that the trait was normally distributed, which probably overestimated the standard deviation, producing more overlap than is empirically justifiable.

December 16, 2011

James F. Crow in Genetics

Filed under: James F. Crow,Population genetics — Razib Khan @ 12:03 am

At 95 James F. Crow is not only an eminent population geneticist, but he knew the figures who were responsible for the whole field. The journal Genetics has commissioned a series of essays and perspectives in his honor. The first is by Daniel Hartl. I thought this was funny:

Soon after joining the program I asked Professor Crow whether I could join his lab as a graduate student. He thought for a moment and then said, “Yes, Dan, provided you understand that population genetics is a recondite field that will never be of great interest except to a small group of specialists.” I remember this because afterward I hurried to look up “recondite” in the dictionary. His admonition made population genetics seem like some variety of monasticism, which, being an admirer of Gregor Mendel, was all right by me. Little did either of us foresee that genetics would be transformed in our lifetimes by genomic sequencing on a population scale and the development of computer technologies capable of analyzing terabytes of data and that population genetics would become a key approach for understanding human evolutionary history as well as for identifying genetic risk factors for common diseases.

I had the privilege of interviewing Crow in 2006. My email requesting an interview was sent only on the smallest probability of a reply, but he replied immediately! And when I sent my questions again the reply was nearly immediate. My favorite of Crow’s answers: “In my view it is wrong to say that research in this area — assuming it is well done — is out of order. I feel strongly that we should not discourage a line of research because someone might not like a possible outcome.” At his age he’s seen many fashions come and go. But nature abides and persists.

November 26, 2011

On the real possibility of human differences

I have discussed the reality that many areas of psychology are susceptible enough to false positives that the ideological preferences of the researchers come to the fore. CBC Radio contacted me after that post, and I asked them to consider that in 1960 psychologists discussed the behavior of homosexuality as if it was a pathology. Is homosexuality no longer a pathology, or have we as a society changed our definitions? In any given discipline when confronted with the specter of false positives which happen to meet statistical significance there is the natural tendency to align the outcome so that it is socially and professionally optimized. That is, the results support your own ideological preferences, and, they reinforce your own career aspirations. Publishing preferred positive results furthers both these ends, even if at the end of the day many researchers may understand on a deep level the likelihood that a specific set of published results are not robust.

This issue is not endemic to social sciences alone. I have already admitted this issue in medical sciences, where there is a lot of money at stake. But it crops up in more theoretical biology as well. In the early 20th century Charles Davenport’s research which suggested the inferiority of hybrids between human races was in keeping with the ideological preferences of the era. In our age Armand Leroi extols the beauty of hybrids, who have masked their genetic load through heterozygosity (a nations like Britain which once had a public norm against ‘mongrelization’ now promote racial intermarriage in the dominant media!). There are a priori biological rationales for both positions, hybrid breakdown and vigor (for humans from what I have heard and seen there seems to be very little evidence overall for either once you control for the deleterious consequences of inbreeding). In 1900 and in 2000 there are very different and opposing social preferences on this issue (as opposed to individual preferences). The empirical distribution of outcomes will vary in any given set of cases, so researchers are incentivized to seek the results which align well with social expectations. (here’s an example of heightened fatality due to mixing genetic backgrounds; it seems the exception rather than the rule).

Thinking about all this made me reread James F. Crow’s Unequal by nature: a geneticist’s perspective on human differences. Crow is arguably the most eminent living population geneticist (see my interview from 2006). Born in 1916, he has seen much come and go. For those of us who wonder how anyone could accept ideas which seem shocking or unbelievable today, I suspect Crow could give an answer. He was there. In any case, on an editorial note I think the essay should have been titled “Different by nature.” Inequality tends to connote a rank order of superiority or inferiority, though in the context of the essay the title is obviously accurate. Here is the most important section:

Two populations may have a large overlap and differ only slightly in their means. Still, the most outstanding individuals will tend to come from the population with the higher mean. The implication, I think, is clear: whenever an institution or society singles out individuals who are exceptional or outstanding in some way, racial differences will become more apparent. That fact may be uncomfortable, but there is no way around it.

The fact that racial differences exist does not, of course, explain their origin. The cause of the observed differences may be genetic. But it may also be environmental, the result of diet, or family structure, or schooling, or any number of other possible biological and social factors.

My conclusion, to repeat, is that whenever a society singles out individuals who are outstanding or unusual in any way, the statistical contrast between means and extremes comes to the fore. I think that recognizing this can eventually only help politicians and social policymakers.

You can, and should, read the whole thing. Let’s make it concrete. Imagine the following trait with two distributions (i.e., two populations):

- Mean = 100 and 105 (average value)
- Standard deviation = 15 (measure of dispersion)
- Let’s assume a normal distribution

Let’s plot the two distributions:

Observe the close overlap between the two distributions. Most of the variance occurs within both sets of populations. Now let’s impose a cut-off of about ~130 on the curves:

Now the similarity between the two curves is not as striking. As you move to the tails of the distribution they begin to diverge. In other words, the average of the two populations is pretty much interchangeable, but the values at the tails differ. Now let’s move the cut-off to 145:

The difference is now even more stark. Let’s compare the ratios of the area under the curve for the two populations as defined by the cut-offs:

Value at 100 = 1.26 (any given individual in the blue population is 1.26 times more likely to be above 100 than in the red population)
Value at 130 = 1.83
Value at 145 = 28

A major caveat: quantitative traits are only approximately normally distributed, and there tends to be a “fat tail” dynamic, where deviation from the normal increases as one moves away from the mean. Concretely, this means that the ratios at the tails are probably not quite as extreme, as there are more individuals in all populations at the tails than you’d expect.

What does this entail concretely? As Crow noted above if you sample from the tails of the distribution then very modest differences between groups become rather salient. Consider long distance running. To be successful in international competitions one presumably has to be many, many, standard deviations above the norm. One can’t be a 1 out of 100, or 1 out of 1,000. Rather, presumably one should be 1 out of hundreds of thousands, at a minimum. This would be the fastest ~100,000 or so people in the world (out of 7 billion). With this in mind, we should not be surprised a priori at the success of the Kalenjin people of Kenya in this domain. They may have both the biological and social preconditions which allow their distribution of talent to be moderately above that of the human norm. Even a marginal shift can make a huge difference at the tails. 1 out of 100,000 is 4.26 deviation units above the mean. Increasing the mean of a population by half a standard deviation units (e.g., if 100 is the mean, 15 is the standard deviation, then for the population with the higher mean you’d be at 107.5) results in a disproportion in ratio of above 8:1 at 4.26 units (as measured in the first population). This is modest, about 1 order of magnitude, but consider possible gene-environment correlations and synergies that might ensue when you have a critical mass of very fast individuals. This could amplify the effect of a difference in distributions on a single variate (more importantly I suspect, consider that virtuosity in many domains requires an intersection of aptitudes many units deviated from the norm across many traits).

In the early 2000s James F. Crow was responding to the Human Genome Project. As has been thoroughly covered elsewhere human genomics has probably underwhelmed in terms of outcomes 10 years out. But it is often the case that with new technologies we overestimate the short-term change which they will effect and underestimate their long-term consequences. I believe with the rise of mass genomics, a radical increase in population coverage and full genome sequencing, we may finally start to adduce the underpinnings of quantitative traits. We already have indirect methods, but I believe that by 2020 we will have direct means at our disposal. We’ll have a good sense how deeply humans are commensurable on a population genetic level. I doubt it will change much in our values, but it may entail some rhetorical adjustments.

October 6, 2011

When did population genetics emerge?

Filed under: Population genetics — Razib Khan @ 11:49 am

I recently heard an eminent geneticist declare that population genetics began with Theodosius Dobzhansky’s Genetics and the Origin of Species in 1937. My immediate reflex was to be skeptical of this, at least going by Will Provine’s treatment in The Origins of Theoretical Population Genetics, which seemed to push back the timing to the 1920s.

So I looked up “population genetics” in Ngram viewer.

 

 
These results are not consistent with my expectations. Looks like my intuition was wrong. At least for the term population genetics. Score one for experience and wisdom.

September 18, 2011

The words of the father

Over at A Replicated Typo they are talking about a short paper in Science, Mother Tongue and Y Chromosomes. In it Peter Forster and Colin Renfrew observe that “A correlation is emerging that suggests language change in an already-populated region may require a minimum proportion of immigrant males, as reflected in Y-chromosome DNA types.” But there’s a catch: they don’t calculate a correlation in the paper. Rather, they’re making a descriptive verbal observation. This observation seems plausible on the face of it. In addition to the examples offered, one can add the Latin American case, where mestizo populations tend to have European Y chromosomal profiles and indigenous mtDNA.


In one of the more nuanced instances cited by Renfrew and Forster they note that though there is a fair penetration of both Austronesian mtDNA and Y chromosomes amongst Austronesian speaking inhabitants of New Guinea, Austronesian Y chromosomes are nearly absent from those populations which speak Papuan. In other words, the inference is that the native indigenous male elite perpetuated the Papuan language! I think the key issue here is social dominance, and the role of male cultural units. This is clear when you have a group like African Americans. Though mostly West African in ancestry this ethnic group clearly has ~20% European ancestry, mostly mediated through European males. This is simply a function of the social dominance of European males in relation to African males. And, it may be telling that African Americans are an English speaking Christian population, albeit with their own distinctive accent.

In the paper the authors note:

…It may be that during colonization episodes by emigrating agriculturalists, men generally outnumbered women in the pioneer colonizing groups and took wives from the local community. When the parents have different linguistic backgrounds, it may often be the language of the father that is dominant within the family group….

There is certainly some truth to this. But what I think that this narrative misses are the coarser and larger scale social units which expansive male cultural communities operate through. The spread of European males in Latin America was not uncoordinated. Often small groups of males decapitated indigenous male power elites, sometimes literally! Long distance travel, such as what the Austronesians engaged in, almost certainly must have entailed a minimal level of political and ideological commitment. They cite the example of the Genghis Khan haplotype, but that wasn’t perpetuated through a process of gradual demic expansion.

September 17, 2011

Out of Africa’s end?

The BBC has a news report up gathering reactions to a new PLoS ONE paper, The Later Stone Age Calvaria from Iwo Eleru, Nigeria: Morphology and Chronology. This paper reports on remains found in Nigeria which date to ~13,000 years B.P. that exhibit a very archaic morphology. In other words, they may not be anatomically modern humans. A few years ago this would have been laughed out of the room, but science moves. Here is Chris Stringer in the BBC piece:

“[The skull] has got a much more primitive appearance, even though it is only 13,000 years old,” said Chris Stringer, from London’s Natural History Museum, who was part of the team of researchers.

“This suggests that human evolution in Africa was more complex… the transition to modern humans was not a straight transition and then a cut off.”

Prof Stringer thinks that ancient humans did not die away once they had given rise to modern humans.

They may have continued to live alongside their descendants in Africa, perhaps exchanging genes with them, until more recently than had been thought.


In the broad outlines most people still seem to hold that within the last ~100,000 years there was a major demographic pulse which swept out of Africa and populated the rest of the world. Something special did happen. Oceania and the New World were settled by the descendants of anatomically modern humans, whom we can trace back to Africa. The key modifications to the old model seem to be two-fold:

1) The possibility of admixture with other lineages on the way out

2) The sublocalization of the “Out of Africa” scenario, and further admixture with lineages within Africa

There have long been debates about an East or South Africa ur-heimat for the first anatomically modern humans. Others are now even positing a North African origin! To a great extent I wonder if a West or Central African origin is forgone in part due to the paucity of fossil remains entailed by the unfavorable conditions for preservation.

However the details shake out the story seems to be getting more, not less, complicated. This makes for less pithy one liners for the media, but also more work for scientists. Figuring out stuff can be fun!

August 29, 2011

Tutsi probably differ genetically from the Hutu


Paul Kagame with Barack and Michelle Obama

I first heard about Rwanda in the 1980s in relation to Dian Fossey’s work with mountain gorillas. The details around this were tragic enough, but obviously what happened in 1994 washed away the events dramatized in Gorillas in the Mist in terms of their scale and magnitude. That period was a time when the idea of “ancient hatreds” leading to internecine conflict was in the air. It was highlighted by the series of wars in the former Yugoslavia, and the Tutsi-Hutu civil wars in Rwanda, Burundi, and Congo. Of the latter the events in 1994 in Rwanda were only the most prominent and well known.

After having read Dancing in the Glory of Monsters: The Collapse of the Congo and the Great War of Africa I am relatively conscious of the broader canvas of what occurred in Central and East Africa in the 1990s. Not only was there a conflict between Tutsi and Hutu in Rwanda, but a similar dynamic also flared up in Burundi. The tensions are more complex in Congo and Uganda, in large ...

Older Posts »

Powered by WordPress