Razib Khan One-stop-shopping for all of my content

February 1, 2011

A genomic map of human variation, where we’re at

Zack has started exploring the K’s of his merged data set for HAP. A commenter suggests that:

As you have begun interpreting the reference results, let me make a friendly warning: you have to keep in mind that most of the reference populations of ethnic groups are extremely limited in sample size (with only between 2 and 25 individuals) and from very obscure sources, and you should keep away from drawing conclusions about millions of people based on such limited number of individuals.

This seems a rather reasonable caution. But I don’t think such a vague piece of advice really adds any value. These sorts of caveats are contingent upon:

- The scope of the question being asked (i.e., how fine a grain is the variation you are attempting to measure going to be)

- The sample size

- The representativeness

- The thickness of the marker set (10 autosomal markers vs. 500,000 SNPs)


This isn’t a qualitative issue, easily to divide into “right” and “wrong.” Sometimes an N = 1 is very insightful. That’s why the whole genome of one Bushman was very useful. In fact, the whole genome of any random Sub-Saharan African, and the whole genome of any random non-African ...

January 9, 2011

Of association & evolution

Two of the main avenues of research which I track rather closely in this space are genome-wide association studies (GWAS), which attempt to establish a connection between a trait/disease and particular genetic markers, and inquiries into the evolutionary parameters which shape the structure of variation within the human genome. Often with specific relation to a particular trait/disease. By evolutionary parameters I mean stochastic and deterministic forces; mutation, migration, random drift, and natural selection. These two angles are obviously connected. Both focus on phenomena which are proximate in relation to the broader evolutionary principle: the ultimate raison d’être, replication. Stochastic forces such as random genetic drift reflect the error of sampling of genes from generation to generation during the process of reproduction, while adaptation through natural selection is an outcome of the variation of reproductive fitness as a function of variation of heritable traits. Both of these forces have been implicated in diseases and traits which come under the purview of GWAS (and linkage mapping).

GWAS are regularly in the news because of their relevance in identifying the causal genetic factors for specific diseases. For example, schizophrenia. But they can be useful in a non-disease context as well. Human pigmentation is a character whose genetic architecture has been well elucidated thanks to a host of recent association studies. The common disease-common variant has yielded spectacular results for pigmentation; it does seem a few common variants are responsible for most of the variation on this trait. But this has been the exception rather than the rule.

One reason for this disjunction between the promise of GWAS and the concrete tangible outcomes is that many traits/diseases of interest may be polygenic and quantitative. This implies that variation in phenotype is controlled by variation across many genes, and, that the variation itself exhibits gradual continuity (a continuity which can be modeled as a normal distribution of values). The power of GWAS to detect correlated variation across genes and traits of small marginal effect is obviously limited. In contrast, it seems that about half a dozen genes can explain most of the between population variation in pigmentation. One SNP is able to account for 25-40% of the difference in shade between Europeans and Africans. This SNP is fixed in Europeans, nearly absent in Africans and East Asians, and segregating in both ancestral and derived variants in groups such as South Asians and African Americans. In contrast, though traits such as schizophrenia and height are substantially heritable, much of the variation at the population level of the trait is explainable by variation in genes. The effect size at any given locus may be small, or the variation may be accumulated through the sum of larger effect variants of low frequency. In other words, many common variants of small effect, or numerous distinctive rare variants of large effect.


ResearchBlogging.orgThese nuances of genetic architecture are not irrelevant to the possible evolutionary arc of the traits in question. One model of the adaptation leading to the high frequency of a trait or disease is that a novel mutation rapidly “sweeps” to fixation, or nearly to fixation. In other words, it shifts from nearly ~0% to nearly ~100% frequency in the population of alleles at that locus, driven by positive selection. This sort of rapid “hard sweep” would also result in “hitchhiking” of associated variants in the genomic regions adjacent to the originally favored mutant, producing regions of high linkage disequilibrium in the genome and haplotype blocks of associated alleles across loci. Such a model does seem possible in the case of some of the variants which are responsible for diversity of pigmentation. But this neat dovetailing between the strong association of a few variants with trait variance, and signatures of positive selection being driven by adaptation, is not so easy to come by in many instances.

There are other evolutionary possibilities in terms of what could drive a high frequency of particular alleles. Population bottlenecks and inbreeding can crank up the frequency of a variant simply through chance. This may be the origin of many traits and diseases expressed recessively or in quasi-Mendelian form which run in specific populations. Let’s set such stochastic possibilities to the side for now. The well of natural selection is not quite tapped out simply by models of positive selection drawing upon singular new mutations. Another model is that of “soft sweeps” operating upon standing genetic variation. Consider for example a trait which has a heritability of 0.50. 50% of the variance in trait value can be explained by variance in genes. Selection correlated with trait value can rapidly change the distribution of the trait within the population, as modeled by the breeder’s equation. But no new mutations are necessary in this model, rather, the frequencies of extant alleles changes over time. In fact, as the proportions shift novel combinations of alleles which were once too rare to be found together in the same individual will emerge, and so offer up the possibility that the mean trait value in generation t + n generations may be outside of the range of trait values at t = 0.

Over time such selection on a quantitative trait theoretically exhausts its own fuel, genetic variation. But quite often this is not practically operative, because such traits are subject to a background level of novel mutation and balancing selection. Stabilizing selection around a median phenotype, as well as frequency dependence and shifting environmental pressures, may produce a circumstance where adaptation never moves beyond the transient flux toward a new equilibrium. The element of the eternal race is at the heart of the Red Queen’s Hypothesis, where pathogen and host engage in an evolutionary war, and host immune responses are subject to negative frequency dependence. As the frequency of an allele rises, its relative fitness declines. As its frequency declines, its fitness rises.

Naturally such complex evolutionary models, subject to contingency and less non-trivially powerful in their generality, only become appealing when simple hard sweep models no longer suffice. But it seems highly plausible that the genetic architecture of some traits, those which seem plagued by ‘missing heritability,’ are going to necessitate somewhat more baroque evolutionary models to explain their ultimate emergence & persistence. A new paper in PLoS Genetics tackles this complexity by looking at the patterns of variation of SNPs implicated in GWAS in the HGDP data set. Genome-Wide Association Study SNPs in the Human Genome Diversity Project Populations: Does Selection Affect Unlinked SNPs with Shared Trait Associations? First, the abstract:

Genome-wide association studies (GWAS) have identified more than 2,000 trait-SNP associations, and the number continues to increase. GWAS have focused on traits with potential consequences for human fitness, including many immunological, metabolic, cardiovascular, and behavioral phenotypes. Given the polygenic nature of complex traits, selection may exert its influence on them by altering allele frequencies at many associated loci, a possibility which has yet to be explored empirically. Here we use 38 different measures of allele frequency variation and 8 iHS scores to characterize over 1,300 GWAS SNPs in 53 globally distributed human populations. We apply these same techniques to evaluate SNPs grouped by trait association. We find that groups of SNPs associated with pigmentation, blood pressure, infectious disease, and autoimmune disease traits exhibit unusual allele frequency patterns and elevated iHS scores in certain geographical locations. We also find that GWAS SNPs have generally elevated scores for measures of allele frequency variation and for iHS in Eurasia and East Asia. Overall, we believe that our results provide evidence for selection on several complex traits that has caused changes in allele frequencies and/or elevated iHS scores at a number of associated loci. Since GWAS SNPs collectively exhibit elevated allele frequency measures and iHS scores, selection on complex traits may be quite widespread. Our findings are most consistent with this selection being either positive or negative, although the relative contributions of the two are difficult to discern. Our results also suggest that trait-SNP associations identified in Eurasian samples may not be present in Africa, Oceania, and the Americas, possibly due to differences in linkage disequilibrium patterns. This observation suggests that non-Eurasian and non-East Asian sample populations should be included in future GWAS

And now the author summary:

Natural selection exerts its influence by changing allele frequencies at genomic polymorphisms. Alleles associated with harmful traits decrease in frequency while those associated with beneficial traits become more common. In a simple case, selection acts on a trait controlled by a single polymorphism; a large change in allele frequency at this polymorphism can eliminate a deleterious phenotype from a population or fix a beneficial one. However, many phenotypes, including diseases like Type 2 Diabetes, Crohn’s disease, and prostate cancer, and physiological traits like height, weight, and hair color, are controlled by multiple genomic loci. Selection may act on such traits by influencing allele frequencies at a single associated polymorphism or by altering allele frequencies at many associated polymorphisms. To search for cases of the latter, we assembled groups of genomic polymorphisms sharing a common trait association and examined their allele frequencies across 53 globally distributed populations looking for commonalities in allelic behavior across geographical space. We find that variants associated with blood pressure tend to correlate with latitude, while those associated with HIV/AIDS progression correlate well with longitude. We also find evidence that selection may be acting worldwide to increase the frequencies of alleles that elevate autoimmune disease risk.

This is a paper where jumping to the methods might be useful. Though I’m sure that the authors did not intend it, sometimes it felt as if you were following the marble being manipulated by the carnival tender. Since I was not familiar with some of the terms for the statistics, a simple allusion to the methods without elaborating in detail did not suffice. In any case, the key here is that they focused on the set of SNPs which have been associated with trait variance in GWAS, and compared those to the total SNPs found in the HGDP data set of 53 populations. Note that not all SNPs in GWAS were in the HGDP SNP panel. But for the general questions being asked the intersection of SNPs sufficed. Additionally, they generated a further subset of SNPs which were highly likely to be associated with trait variance. These were SNPs where other SNPs of related function were within 1 MB, or, SNPs which were found in more than one GWAS.

There were four primary statistics within the paper: Delta, Fst, LLC, and iHS. Fst and iHS are familiar. Fst measures the extent of between population variance across a set of populations. High Fst means a great deal of population structure, while Fst ~ 0 means basically no population structure. iHS is a test to detect the probability of natural selection based on patterns of linkage disequilibrium in the genome. Basically the important thing for the purposes of this paper is that iHS tends to be good at detecting alleles at moderate frequencies still presumably going through sweeps. This is in contrast to the older EHH test, which only detects sweeps which are nearly complete. If the authors are focusing on polygenic traits and soft sweeps the likelihood of that showing up on EHH is low since that is predicated on hard, nearly complete, sweeps. LLC measures the correlation between genetic variant of a trait as a function of latitude and longitude. Presumably this would be useful for smoking out those traits driven by ecological pressures (an obvious example in a general sense are consistent changes in area-to-volume ratio across taxa as organisms proceed from warmer to colder climes). Finally, Delta measures the allele frequency difference across the set of populations. The sign of Delta is simply a function of whether the allele frequency in question is higher in the first or second population in the comparison.

In doing their comparisons the authors did not simply compare across all 53 populations in a pairwise fashion. Rather, they often pooled continental or regional groups. To the left is a slice of table 1. It shows the populations used to generate the Delta values, and how they were pooled. The HGDP populations are broken down by region in a rather straightforward manner. But also note that some of the comparisons are between populations within regions, and those with different lifestyles. I assume that the comparisons highlighted within the paper were performed with the aim of squeezing maximal informative juice in such an exploratory endeavor. There are no obligate hunter-gatherers within the Eurasian populations in the HGDP data set to my knowledge, so a comparison between agriculturalists and hunter-gatherers would not be possible. There is such a comparison available in the African data set. The authors generated p-values by comparing the GWAS SNPs to random SNPs within the HGDP data set. In particular, they were looking for signatures of distinctiveness among the HGDP data set.

Such distinctiveness is expected. The set of SNPs associated with diseases and traits of note are not likely to be a representative subset of the SNPs across the whole genome. Remember that a neutral model of molecular evolution means that we should expect most genetic variation within the genome is going to be due to stochastic forces. Panel A of figure 1 shows that in fact the SNPs derived from GWAS did exhibit a different pattern from the total set of SNPs in the HGDP panel. Observe that the distribution of minor allele frequency (MAF) is somewhat skewed toward higher values for the GWAS SNPs. If the logic of GWAS is geared toward “common variants” which will be frequent enough within the population to generate an effect which is powerful enough to be picked up by the studies given their sample sizes,  the bias toward more common variants (higher MAF) is understandable.

To the left are some SNPs and traits which had low p-values (i.e., they were deviated from expectation beyond what you’d expect from random noise). Not very surprisingly they found that pigmentation related SNPs tended to show up strongly in all the measures of population differentiation and variation. rs28777 is found in SLC45A2, a locus which differentiates Europeans from non-Europeans. rs1834640 is in SLC24A5, which differentiates Europeans + Middle Easterners + Central/South Asians from other populations. rs12913832 is a “blue eye” related variant. That is, it’s one of the markers associated with blue vs. non-blue eye color differences in Europeans.

Seeing that pigmentation has been one of the few traits which has been well elucidated by the current techniques, it should be expected that more subtle and thorough methods aimed at detecting genetic variation across and within populations should stumble upon those markers first. The authors note that “SNPs and study groups associated with pigmentation and immunological traits made up a majority of those that reached significance in our analysis.” There has long been a tendency toward finding signatures of selection around pigmentation and disease related loci.

One pattern which was also evident in terms of geography in the patterns of low p-values was the tendency for Eurasian groups to be enriched. This is illustrated in figure 2. Most of the SNPs from the GWAS studies were derived from study populations which were European. Because of this there is probably a bias in the set of SNPs being evaluated which are particular informative for Europeans and related populations. Additionally, it may also be that Eurasians were subject to different selective pressures as they left the ancestral African environment ~150-50,000 years B.P. In any case, for purposes of medical analysis the authors did find that using SNPs from East Asian populations produced somewhat different results than using those from European populations. Though some studies have shown a broad applicability of SNPs across populations, there are no doubt many variants in non-European populations which have simply not been detected because GWAS studies are not particularly focused on non-European populations. Consider:

… However, our results indicate that SNPs associated with pigmentation in GWAS display unusual allele frequency patterns almost exclusively in Europe, the Middle East, and Central Asia. This suggests to us that there may be SNPs, perhaps in or near genes other than SLC45A2, IRF4, TYR, SLC24A4, HERC2, MC1R, and ASIP, which are associated with pigmentation in non-Eurasian populations, but which have yet to be identified by GWAS. GWAS for pigmentation traits carried out using non-European subjects are needed to explore this possibility further.

There are two major other classes of trait/disease which were found to vary systematically across the HGDP populations:

- High blood pressure associated variants seemed to decrease with latitude

- Infectious and autoimmune disease SNPs had elevated scores. Specifically, there were some HIV related SNPs associated with Europeans which seem to confer resistance

The first set of traits would naturally come out of GWAS derived SNPs, since so much medical research goes into identifying risk and treating high blood pressure and other circulatory ailments. A consistent pattern where geography and not ancestry predict variation is an excellent tell for exogenous selective pressures. The physical nature of the earth is such that as mammals spread away from the equators their physiques will be reshaped by different sets of ecological parameters. Siberian populations have developed adaptations to cold stress, and there seem to be consistent cross-taxa shifts in body form to maximize or minimize heat radiation among mammals.

In the second case you have resistance to disease cropping up again, as well as pleiotropy, whereby genetic changes can have multiple downstream consequences. Often this is temporally simultaneous; consider the tame silver foxes. But sometimes you have a change in the past which has a subsequent consequence later in time due to different selective pressures. It is not that surprising that immunological responses can be multi-purpose, so even though Europeans did not develop resistance to HIV as a general selective pressure, similar pressures seem to have resulted in responses with general utility and now a specific use in relation to HIV. Selection can often be a blunt instrument, interposing itself into a network of interactions with multiple consequences, reshaping many traits simultaneously in the process of maximizing local fitness. This is most clear when you have a trait such as sicke-cell disease, which emerges only because the fitness benefit of heterozygosity is so great. But no doubt when it comes to many traits the byproducts are more subtle, or may seem cryptic to us. We still do not know why EDAR was driven to higher frequency in East Asians (less body odor and thick straight hair seem implausible targets for selection).

And just as natural selection can be blunt and rude in its impact on the covariance of genes and traits, so its relaxation may remove a suffocating vice. Consider the possibilities with blood pressure: perhaps the reason that northern Eurasians have lower blood pressure is that selection for other correlated traits associated with higher values were relaxed, allowing for fitness to be maximized in this particular dimension. Similarly, African Americans have a lower frequency of the sickle-cell disease than their ~80% West African ancestry would entail, because without the pressure of endemic malaria selection for the heterozygote was removed, allowing for the purging of the allele from the gene pool.

Nevertheless, the authors do conclude::

Despite our broad-based approach, we found only a few examples of what may be a polygenic response to a single selective pressure.</b> We did use stringent significance criteria which might mean that additional examples can be found among the study groups that did not quite meet our threshold of significance. It may also be that there is something about “GWAS” traits and their underlying genetics that served to undermine our approach.

They have several suggestions for why this didn’t pan out:
- The GWAS variants aren’t the primary source of the variation. It could be copy number variants, rare large effect variants (“synthetic”)

- Epistasis. Gene-gene interaction, which would mask or confound linear associations between variants and traits

- Low impact of selection on GWAS SNPs, or, balancing or negative selection

They finish:

In summary, we have examined 1,336 trait-associated SNPs in the 53 CEPH-HGDP populations looking for individual SNPs and groups of SNPs with unusual allele frequency patterns and elevated iHS scores. We identified 13 different traits with an associated SNP or study group that produced a significantly elevated score for at least one delta, Fst, LLC, or iHS measure, a small percentage of the total number of traits analyzed. We believe that the limited number of positive results could be due to our stringent significance criteria or to features of the genetic architecture of the traits themselves. Specifically, the roles of rare variants, epistasis, and pleiotropy in human complex traits are, although areas of active inquiry, still generally not well understood. Our measures may also not be optimal for detecting all types of selection acting on GWAS traits. It has been speculated that variants underlying complex traits will be influenced primarily by negative or balancing selection, which may not produce extreme values for our measures, particularly if these forces are relatively uniform across populations or are acting on many regions in the genome.

If selective pressures on polygenic traits are so common perhaps genomicists are going to be thumbing through Introduction to Quantitative Genetics. These are traits and evolutionary processes which lack clear distinction. In many ways modeling positive selection and hard sweeps resembles the economics of equilibriums. When it comes to continuous and quantitative traits subject to the effect of many genes a different way of thinking has to come to the fore. The transient no longer becomes a punctuation between the stasis, but the thing in and of itself. There are for example HLA genes in humans which are found in chimpanzees, because the nature of the eternal race between host and pathogen means that all the old tricks are preserved, at least at low frequencies. Human variation in intelligence, height, and all sorts of other liabilities and characteristics, may have always been with us, being buffeted continuously by a swarm of selective pressures. The question is, can our crude statistical methods ever get a grip on this diffuse but all-powerful net?

Citation: Casto AM, & Feldman MW (2011). Genome-Wide Association Study SNPs in the Human Genome Diversity Project Populations: Does Selection Affect Unlinked SNPs with Shared Trait Associations? PLoS Genetics : 10.1371/journal.pgen.1001266

December 13, 2010

To study humankind, AAA responds

This morning I received an email from the communication director of the American Anthropology Association. The contents are on the web:

AAA Responds to Public Controversy Over Science in Anthropology

Some recent media coverage, including an article in the New York Times, has portrayed anthropology as divided between those who practice it as a science and those who do not, and has given the mistaken impression that the American Anthropological Association (AAA) Executive Board believes that science no longer has a place in anthropology. On the contrary, the Executive Board recognizes and endorses the crucial place of the scientific method in much anthropological research. To clarify its position the Executive Board is publicly releasing the document “What Is Anthropology?” that was, together with the new Long-Range Plan, approved at the AAA’s annual meeting last month.

The “What Is Anthropology?” statement says, “to understand the full sweep and complexity of cultures across all of human history, anthropology draws and builds upon knowledge from the social and biological sciences as well as the humanities and physical sciences. A central concern of anthropologists is the application of knowledge to the solution of human problems.” Anthropology is a holistic and expansive discipline that covers the full breadth of human history and culture. As such, it draws on the theories and methods of both the humanities and sciences. The AAA sees this pluralism as one of anthropology’s great strengths.

Changes to the AAA’s Long Range Plan have been taken out of context and blown out of proportion in recent media coverage. In approving the changes, it was never the Board’s intention to signal a break with the scientific foundations of anthropology – as the “What is Anthropology?” document approved at the same meeting demonstrates. Further, the long range plan constitutes a planning document which is pending comments from the AAA membership before it is finalized.

Anthropologists have made some of their most powerful contributions to the public understanding of humankind when scientific and humanistic perspectives are fused. A case in point in the AAA’s $4.5 million exhibit, “RACE: Are We So Different?” The exhibit, and its associated website at www.understandingRACE.org, was developed by a team of anthropologists drawing on knowledge from the social and biological sciences and humanities. Science lays bare popular myths that races are distinct biological entities and that sickle cell, for example, is an African-American disease. Knowledge derived from the humanities helps to explain why “race” became such a powerful social concept despite its lack of scientific grounding. The widely acclaimed exhibit “shows the critical power of anthropology when its diverse traditions of knowledge are harnessed together,” said Leith Mullings, AAA’s President-Elect and the Chair of the newly constituted Long-Range Planning Committee.


Up until the last paragraph this is an anodyne statement. Who could disagree with: “Anthropology is a holistic and expansive discipline that covers the full breadth of human history and culture. As such, it draws on the theories and methods of both the humanities and sciences. The AAA sees this pluralism as one of anthropology’s great strengths.” But the explosion of anger from biologically and scientifically oriented anthropologists on the web is drawn from a deeper layer of lived experience. On a raw level many of them feel that some factions in cultural anthropology are obscurantists who are fluent in rhetoric which they utilize in power-plays and politics. There are anthropologists who do deny the deep insights of the scientific method in illuminating reality. In fact, they reject the naive realism at the heart of science as it is practiced. For them science is a swear-word, and connotes an affinity with oppression and all the negative abstractions in fashion at a given time (e.g., patriarchy, heternormativity, capitalism, Eurocentrism, etc.). Of course as I note above scientific anthropologists are not given to tolerating the verbal circumlocutions and incantations of their non-scientific colleagues with much grace themselves. There is a deep cultural chasm, and these sorts of arguments over words in obscure institutional documents are only triggers for a persistent roiling debate.

As for the last paragraph, it illustrates the selectivity of a discipline which attempts to contextualize, and often has a skeptical relationship toward a positive framework. I believe that race is a social construct. The Hispanic identity, which consists of people of indigenous Amerindian, European, and African ancestry, and all their combinations, has been racialized. The Islamic identity has also been racialized. Benjamin Franklin stupidly contended that only the English and Saxons were true whites, with all other Europeans, including Nordics, being swarthy.

But just because a construct has a social element does not mean it has only a social construct. Because of the Left-liberal anti-racist egalitarian bias of anthropology, the academy in general, and the dominant narrative of Western society as a whole, there is a strong tendency to assert flatly that “race does not exist” as a biological concept. There is no interrogation of the concept of race except to refute its utility. This is not a case of agnostic skepticism washing away illusions, but a case of skepticism applied in a fashion to obtain a clear and distinct objective result which corresponds to reality. When it comes to race many become naive realists who accept that biological concepts can be falsified or verified in a simple and straightforward fashion. There is all of a sudden one Way of Knowing which presents us with indubitable truths.

Here is L. L. Cavalli-Sforza (my question in italics):

7) Question #3 hinted at the powerful social impact your work has had in reshaping how we view the natural history of our species. One of the most contentious issues of the 20th, and no doubt of the unfolding 21st century, is that of race. In 1972 Richard Lewontin offered his famous observation that 85% of the variation across human populations was within populations and 15% was between them. Regardless of whether this level of substructure is of note of not, your own work on migrations, admixtures and waves of advance depicts patterns of demographic and genetic interconnectedness, and so refutes typological conceptions of race. Nevertheless, recently A.W.F. Edwards, a fellow student of R.A. Fisher, has argued that Richard Lewontin’s argument neglects the importance of differences of correlation structure across the genome between populations and focuses on variance only across a single locus. Edwards’ argument about the informativeness of correlation structure, and therefore the statistical salience of between-population differences, was echoed by Richard Dawkins in his most recent book. Considering the social import of the question of interpopulational differences as well as the esoteric nature of the mathematical arguments, what do you believe the “take home” message of this should be for the general public?

Edwards and Lewontin are both right. Lewontin said that the between populations fraction of variance is very small in humans, and this is true, as it should be on the basis of present knowledge from archeology and genetics alike, that the human species is very young. It has in fact been shown later that it is one of the smallest among mammals. Lewontin probably hoped, for political reasons, that it is TRIVIALLY small, and he has never shown to my knowledge any interest for evolutionary trees, at least of humans, so he did not care about their reconstruction. In essence, Edwards has objected that it is NOT trivially small, because it is enough for reconstructing the tree of human evolution, as we did, and he is obviously right.

L. L. Cavalli-Sforza contends that between population genetic variation is not trivially small. This is clear from the fact that one can discern village-to-village genetic distinctions in Europe. Human variation exists, and it is not trivial. It is useful for phylogenetics, significantly impacts salient phenotypes, and, risks for particular diseases. The social construction of race has real biological raw materials. At one end, the transformation of white European converts to Islam through changes in personal appearance into de facto “People of Color” are matters of social construction in totality. In contrast, the blackness of a Dinka from Sudan is a matter of biological categorization. The categorization of Egyptian Arabs with obvious African admixture as “white” in the US Census is a matter of social construction due to bureaucratic contingency, and illustrates the intersection of biological reality and social fluidity. It is well known that when foreign Arabs with obvious black admixture visited the American South there was often a debate as to whether they were subject to segregation, illustrating the tensions between social norms (which would have coded them as black), bureaucratic function (which coded them as non-black usually), and biological reality (where they were an amalgam of a minor black African component with a dominant white Arab component).

Of course it is true that on any given trait variation can span populations. But even in the case given above, of sickle cell, the correlations with ancestry and population are striking. A lower boundary value is that 75% of sickle cell suffers are of mostly African ancestry, despite only 15% of the world’s population being of mostly African ancestry. These statistics refute a platonic model of race, but they do not refute the population-thinking which is at the heart of much of modern biology, pure and applied.

All that said, the word “race” is fraught with a lot of historical baggage. Therefore to study population wide variation you need to focus on “fine-scale population structure” and what not. This trend would be something of interest for cultural anthropologists of science to study. Race is just a word. Even a term as widely accept as species exhibits a fair amount of flexibility on the margins. But the underlying biological patterns, and the instrumental utility of those patterns, can not be denied.

Addendum: I often use “human” or “humankind” where earlier norms would be to use “man” or “mankind.” My main rationale is I don’t want annoying comments objecting to the term. The concept which I’m pointing to is the same no matter the pointer, and so I don’t mind changing it to facilitate my intent to communicate clearly and without undue extraneous baggage.

August 18, 2010

Which population is most genetically distant from Africans?

A comment below:

Razib, I don’t know much about genetics but is it true that these people of Melanesia are among the least related people (even more so than Europeans) to sub-saharan Africans genetically??

This is a common question. The typical scientifically curious intelligent person is generally aware that on the order of 100,000 years ago there was a movement of anatomically modern humans from Africa. They know that Africans have the most genetic variation of any human population, and that in fact Africa has more genetic variation than the rest of the world combined. It would stand to reason then that the further you are from Africa, the more genetically distant you are. Simply because of recent admixture of Sub-Saharan African ancestry in much of the Middle East there is some truth to this, but I think it misses the “big picture.”

To the best of my knowledge the current consensus on the origin and expansion of modern humans goes like so:


1) Anatomically modern humans emerge in Africa first ~200,000 years ago. This population is a sister lineage to the various Eurasian hominins, Neandertals, X-woman, etc.

2) Between 50,000 and 200,000 years ago a subset of the African population left Africa.

3) Sometime between the exit-from-Africa event and the present the anatomically modern humans replaced all other lineages (with some assimilation) and diversified.

My confidence in any specific aspect of the “orthodox census” is very high, though joint probability of the details is more modest. #3 for example had to be modified a bit recently because of the possible existence of Neandertal admixture in Eurasians. So back to my question, assuming this model, which population is most genetically distant from Africans? The answer is really none. Here are some figures from Xing et al., which gets at why the answer is “none of the above”:

AFRICAVSNONAFRICA

Here’s the text for the figure:

Figure 3. Population relationships between the 40 populations. A) Neighbor-joining tree. Populations are color-coded based on their continental origins. The hypothetical ancestral population is shown. Bootstrap support values for most branches are larger than 95% (the bootstrap consensus tree is shown in Supp. Figure S1). B) Principal components analysis. First two principal components (PCs) are shown. Each individual is represented by one dot and the color label corresponding to their regional origin. The percentage of variance explained by each PC is shown on the axis. C) Individual grouping inferred by ADMIXTURE. Results from K = 4 and K = 12 are shown. Each individual’s genome is represented by a vertical bar composed of colored sections, where each section represents the proportion of an individual’s ancestry derived from one of the K ancestral populations. Individuals are arrayed horizontally and grouped by population as indicated.

The tree makes it clear: all non-Africans form their own independent branch from Africans. In the PCA you see that along the biggest component of variation in the genetic data the non-African groups are about the same distance from Africans. And in the ADMIXTURE analysis when you assume four ancestral populations, the Africans and non-Africans separate out cleanly excluding groups which a high likelihood of European or Arab admixture. Remember the part about how Africans have more genetic diversity than all non-Africans combined? That’s also part of the puzzle. In some ways all non-Africans can be thought of as a subset of the genetic variation of Africans. Those humans who reside outside of Africa are simply a diversified branch of Africans. From what I can tell the data is converging on the likelihood that there was only one migration out of Africa which resulted in the branches of non-African humanity. That means that those of us of non-African ancestry are all equally distant from the African root.

Powered by WordPress