Razib Khan One-stop-shopping for all of my content

June 27, 2017

Why you should learn some population genetics

Filed under: Population genetics — Razib Khan @ 10:03 pm

From reader surveys I know a substantial portion of the people who will see this post are financially well off (of those who aren’t, a large number are students). Therefore, you can invest in some books.

Often people ask me questions related to population genetics in the comments (sometimes I get emails). That is all well and good. But it is always better to be able to fish than have to ask for fish. Additionally, learning some population and quantitative genetics allows you to develop some tacit schemas through which you can process information coming at you, and through with you can develop some general intuition.

If you have a modest level of mathematical fluency and and the disposable income, here are three indispensable books which are like the keys to the kingdom:

* Elements of Evolutionary Genetics
* Principles of Population Genetics
* Introduction to Quantitative Genetics.

If you don’t have the cash to spare, there are online notes which are pretty good:

* Graham Coop’s Population Genetics notes
* Joe Felsenstein’s Theoretical Evolutionary Genetics

There are others online resources, but they are not as comprehensive. John Gillespie’s Population Genetics: A Concise Guide is good as very gentle introductions go, but if you are going to spend money, I think just plumping down for a more comprehensive textbook (which will have more genomics in it) is better over the long run.

The goal of getting these books isn’t to make you a population geneticist, but, if you are interested in evolutionary questions it gives you a powerful toolkit. Really nothing in evolutionary process makes sense except in the light of population genetics.

April 25, 2017

Dost thou know the equilibrium at panmixia?

Filed under: Genetics,Population genetics — Razib Khan @ 3:58 pm

If you read a blog about Biblical criticism from a Christian perspective it would probably be best if you were familiar with the Bible. You don’t have to have read much scholarly commentary, rather, just the New Testament. Barring that, at least the synoptic gospels!

At this point, with over 400 individuals responding to the reader survey, it is strange to consider that more people believe they have a handle on what Fst is than the Hardy-Weinberg Equilibrium. First, Fst is a more subtle concept than people often think it is. And second, because the HWE is so easy, important, and foundational to population genetics. I mean p^2 + 2pq^2 + q^2 = 1. Could it be simpler???

So a quick ask. If you are one of the people who doesn’t understand HWE or why it is important, please get yourself a copy of John Gillespie’s Population Genetics: A Concise Guide. I understand that not everyone has the time, interest, or money for Principles of Population Genetics, or any of the more “hardcore” texts. But Population Genetics: A Concise Guide will surely suffice to follow anything on this blog.

Or, barring that, please review the online resources which you have available. Two examples:

Graham Coop’s Notes on Population Genetics or Joe Felsenstein’s unpublished textbook Theoretical Evolutionary Genetics.

April 23, 2017

Why the rate of evolution may only depend on mutation

Filed under: Evolutionary Genetics,Genetics,Population genetics — Razib Khan @ 10:07 pm

Sometimes people think evolution is about dinosaurs.

It is true that natural history plays an important role in inspiring and directing our understanding of evolutionary process. Charles Darwin was a natural historian, and evolutionary biologists often have strong affinities with the natural world and its history. Though many people exhibit a fascination with the flora and fauna around us during childhood, often the greatest biologists retain this wonderment well into adulthood (if you read W. D. Hamilton’s collections of papers, Narrow Roads of Gene Land, which have autobiographical sketches, this is very evidently true of him).

But another aspect of evolutionary biology, which began in the early 20th century, is the emergence of formal mathematical systems of analysis. So you have fields such as phylogenetics, which have gone from intuitive and aesthetic trees of life, to inferences made using the most new-fangled Bayesian techniques. And, as told in The Origins of Theoretical Population Genetics, in the 1920s and 1930s a few mathematically oriented biologists constructed much of the formal scaffold upon which the Neo-Darwinian Synthesis was constructed.

The product of evolution

At the highest level of analysis evolutionary process can be described beautifully. Evolution is beautiful, in that its end product generates the diversity of life around us. But a formal mathematical framework is often needed to clearly and precisely model evolution, and so allow us to make predictions. R. A. Fisher’s aim when he wrote The Genetical Theory Natural Selection was to create for evolutionary biology something equivalent to the laws of thermodynamics. I don’t really think he succeeded in that, though there are plenty of debates around something like Fisher’s fundamental theorem of natural selection.

But the revolution of thought that Fisher, Sewall Wright, and J. B. S. Haldane unleashed has had real yields. As geneticists they helped us reconceptualize evolutionary process as more than simply heritable morphological change, but an analysis of the units of heritability themselves, genetic variation. That is, evolution can be imagined as the study of the forces which shape changes in allele frequencies over time. This reduces a big domain down to a much simpler one.

Genetic variation is concrete currency with which one can track evolutionary process. Initially this was done via inferred correlations between marker traits and particular genes in breeding experiments. Ergo, the origins of the “the fly room”.

But with the discovery of DNA as the physical substrate of genetic inheritance in the 1950s the scene was set for the revolution in molecular biology, which also touched evolutionary studies with the explosion of more powerful assays. Lewontin & Hubby’s 1966 paper triggered a order of magnitude increase in our understanding of molecular evolution through both theory and results.

The theoretical side occurred in the form of the development of the neutral theory of molecular evolution, which also gave birth to the nearly neutral theory. Both of these theories hold that most of the variation with and between species on polymorphisms are due to random processes. In particular, genetic drift. As a null hypothesis neutrality was very dominant for the past generation, though in recent years some researchers are suggesting that selection has been undervalued as a parameter for various reasons.

Setting the live scientific debate, which continue to this day, one of the predictions of neutral theory is that the rate of evolution will depend only on the rate of mutation. More precisely, the rate of substitution of new mutations (where the allele goes from a single copy to fixation of ~100%) is proportional to the rate of mutation of new alleles. Population size doesn’t matter.

The algebra behind this is straightforward.

First, remember that the frequency of the a new mutation within a population is \frac{1}{2N}, where N is the population size (the 2 is because we’re assuming diploid organisms with two gene copies). This is also the probability of fixation of a new mutation in a neutral scenario; it’s probability is just proportional to its initial frequency (it’s a random walk process between 0 and 1.0 proportions). The rate of mutations is defined by \mu, the number of expected mutations at a given site per generation (this is a pretty small value, for humans it’s on the order of 10^{-8}). Again, there are 2N individuals, so you have 2N\mu to count the number of new mutations.

The probability of fixation of a new mutations multiplied by the number of new mutations is:

    \[ \( \frac{1}{2N} \) \times 2N\mu = \mu \]

So there you have it. The rate of fixation of these new mutations is just a function of the rate of mutation.

Simple formalisms like this have a lot more gnarly math that extend them and from which they derive. But they’re often pretty useful to gain a general intuition of evolutionary processes. If you are genuinely curious, I would recommend Elements of Evolutionary Genetics. It’s not quite a core dump, but it is a way you can borrow the brains of two of the best evolutionary geneticists of their generation.

Also, you will be able to answer the questions on my survey better the next time!

April 14, 2017

Why overdominance probably isn’t responsible for much polymorphism

Filed under: Genetics,Population genetics — Razib Khan @ 10:54 pm

Hybrid vigor is a concept that many people have heard of, because it is very useful in agricultural genetics, and makes some intuitive sense. Unfortunately it often gets deployed in a variety of contexts, and its applicability is often overestimated. For example, many people seem to think (from personal communication) that it may somehow be responsible for the genetic variation around us.

This is just not so. As you may know each human carries tens of millions of genetic variants within their genome. Populations have various levels of polymorphism at particular positions in the genome. How’d they get there? In the early days of population genetics there were two broad schools, the “balance” and “classical.” The former made the case for the importance of balancing selection in maintaining variation. The latter suggested that the variation we see around us is simply a transient between fixation of a favored mutation from a low a frequency or extinction of a disfavored variant (perhaps environmental conditions changed and a high frequency variant is now disfavored). Arguably the rise of neutral theory and empirical results from molecular evolution supported the classical model more than the balance framework (at least this was Richard Lewontin’s argument, and I follow his logic here).

But even in relation to alleles which are maintained at polymorphism through balancing selection, overdominance isn’t going to be the major player.

Sickle cell disease is a classic consequence of overdominance; the heterozygote is more fit than the wild type or the recessive disease which is caused by homozygotes of the mutation. Obviously polymorphism is maintained despite the decreased fitness of the mutant homozygote because the heterozygote is so much more fit than the wild type. The final proportion of the alleles segregating in the population will be conditional on the fitness drag of the homozygote in the mutant type, because as per HWE it will be present in the population ~q2.

The problem is that this is clearly not going to scale across loci. That is, even if the fitness drag is more minimal than is the case with the sickle cell locus, one can imagine a cummulative situation. The segregation load is just going to be too high. Overdominance is probably a transient strategy which fades away as populations evolve more efficient ways to adapt that doesn’t have such a fitness load.

So how does balancing selection still lead to variation without heteroygote advantage? W. D. Hamilton argued that much of it was due to negative frequency dependent selection. Co-evolution with pathogens is the best case of this. As strategies get common pathogens adapt, so rare strategies encoded by rare alleles gain in fitness. As these alleles increase in frequency their fitness decreases due to pathogen resistance. Their frequency declines, and eventually the pathogens lose the ability to resist it, and its frequency increases again.

April 8, 2017

Why only one migrant per generation keeps divergence at bay

The best thing about population genetics is that because it’s a way of thinking and modeling the world it can be quite versatile. If Thinking Like An Economist is a way to analyze the world rationally, thinking like a population geneticist allows you to have the big picture on the past, present, and future, of life.

I have some personal knowledge of this as a transformative experience. My own background was in biochemistry before I became interested in population genetics as an outgrowth of my lifelong fascination with evolutionary biology. It’s not exactly useless knowing all the steps of the Krebs cycle, but it lacks in generality. In his autobiography I recall Isaac Asimov stating that one of the main benefits of his background as a biochemist was that he could rattle off the names on medicine bottles with fluency. Unless you are an active researcher in biochemistry your specialized research is quite abstruse. Population genetics tends to be more applicable to general phenomena.

In a post below I made a comment about how one migrant per generation or so is sufficient to prevent divergence between two populations. This is an old heuristic which goes back to Sewall Wright, and is encapsulated in the formalism to the left. Basically the divergence, as measured by Fst, is proportional to the inverse of 4 time the proportion of migrants times the total population + 1. The mN is equivalent to the number of migrants per generation (proportion times the total population). As the mN become very large, the Fst converges to zero.

The intuition is pretty simple. Image you have two populations which separate at a specific time. For example, sea level rise, so now you have a mainland and island population. Since before sea level rise the two populations were one random mating population their initial allele frequencies are the same at t = 0. But once they are separated random drift should begin to subject them to divergence, so that more and more of their genes exhibit differences in allele frequencies (ergo, Fst, the between population proportion of genetic variation, increases from 0).

Now add to this the parameter of migration. Why is one migrant per generation sufficient to keep divergence low? The two extreme scenarios are like so:

  1. Large populations change allele frequency very slowly due to drift, so only a small proportion of migration is needed to prevent them from diverging
  2. Small populations change allele frequency very fast due to drift, so a larger proportion of migration is needed to prevent them from drifting

Within a large population one migrant is a small proportion, but drift is occurring very slowly. Within a small population drift is occurring fast, but one migrant is a relatively large proportion of a small population.

Obviously this is a stylized fact with many details which need elaborating. Some conservation geneticists believe that the focus on one migrant is wrongheaded, and the number should be set closer to 10 migrants.

But it still gets at a major intuition: gene flow is extremely powerful and effective at reducing differences between groups. This is why most geneticists are skeptical of sympatric speciation. Though the focus above is on drift, the same intuition applies to selective divergence. Gene flow between populations work at cross-purposes with selection which drives two groups toward different equilibrium frequencies.

This is why it was surprising when results showed that Mesolithic hunter-gatherers and farmers in Europe were extremely genetically distinct in close proximity for on the order of 1,000 years. That being said, strong genetic differentiation persists between Pygmy peoples and their agriculturalist neighbors, despite a long history of living nearby each other (Pygmies do not have their own indigenous languages, but speak the tongue of their farmer neighbors). In the context of animals physical separation is often necessary for divergence, but for humans cultural differences can enforce surprisingly strong taboos. Culture is as strong a phenomenon as mountains or rivers….

December 18, 2012

Unveiling the genealogical lattice

To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.

Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.


This is on my mind because of the emergence of packages such as TreeMix and AdmixTools. Using software such as these on the numerous public data sets allows one to perceive the reality of admixture, and overlay lateral gene flow upon the tree as a natural expectation. But perhaps a deeper result is the character of the tree itself is torn asunder. The figure above is from a new paper, Efficient moment-based inference of admixture parameters and sources of gene flow, which debuts MixMapper. The authors bring a lot of mathematical heft to their exposition, and I can’t say I follow all of it (though some of the details are very similar to Pickrell et al.’s). But in short it seems that in comparison to TreeMix MixMapper allows for more powerful inference of a narrower set of populations, selected for exploring very specific questions. In contrast, TreeMix explores the whole landscape with minimal supervision. Having used the latter I can testify that that is true.

The big result from MixMapper is that it extends the result of Patterson et al., and confirms that modern Europeans seem to be an admixture between a “north Eurasian” population, and a vague “west Eurasian” population. Importantly, they find evidence of admixture in Sardinians, which implies that Patterson et al.’s original were not sensitive to admixture in putative reference populations (note that Patterson is a coauthor on this paper as well). The rub, as noted in the paper, is that it is difficult to estimate admixture when you don’t have “pure” ancestral reference populations. And yet here the takeaway for me is that we may need to rethink our whole conception of pure ancestral populations, and imagine a human phylogenetic tree as a series of lattices in eternal flux, with admixed nodes periodically expanding so as to generate the artifice of a diversifying tree. The closer we look, the more likely that it seems that most of the populations which have undergone demographic expansion in the past 10,000 years are also the products of admixture. Any story of the past 10,000 years, and likely the past 100,000 years, must give space at the center of the narrative arc lateral gene flow across populations.

Cite: arXiv:1212.2555 [q-bio.PE]

December 10, 2012

Is Daniel MacArthur ‘desi’?

My initial inclination in this post was to discuss a recent ordering snafu which resulted in many of my friends being quite peeved at 23andMe. But browsing through their new ‘ancestry composition’ feature I thought I had to discuss it first, because of some nerd-level intrigue. Though I agree with many of Dienekes concerns about this new feature, I have to admit that at least this method doesn’t give out positively misleading results. For example, I had complained earlier that ‘ancestry painting’ gave literally crazy results when they weren’t trivial. It said I was ~60 percent European, which makes some coherent sense in their non-optimal reference population set, but then stated that my daughter was >90 percent European. Since 23andMe did confirm she was 50% identical by descent with me these results didn’t make sense; some readers suggested that there was a strong bias in their algorithms to assign ambiguous genomic segments to ‘European’ heritage (this was a problem for East Africans too).

Here’s my daughter’s new chromosome painting:

One aspect of 23andMe’s new ancestry composition feature is that it is very Eurocentric. But, most of the customers are white, and presumably the reference populations they used (which are from customers) are also white. Though there are plenty of public domain non-white data sets they could have used, I assume they’d prefer to eat their own data dog-food in this case. But that’s really a minor gripe in the grand scheme of things. This is a huge upgrade from what came before. Now, it’s not telling me, as a South Asian, very much. But, it’s not telling me ludicrous things anymore either!

But in regards to omission I am curious to know why this new feature rates my family as only ~3% East Asian, when other analyses put us in the 10-15% range. The problem with very high values is that South Asians often have some residual ‘eastern’ signal, which I suspect is not real admixture, but is an artifact. Nevertheless, northeast Indians, including Bengalis, often have genuine East Asia admixture. On PCA plots my family is shifted considerably toward East Asians. The signal they are picking up probably isn’t noise. Almost every apportionment of East Asian ancestry I’ve seen for my family yields a greater value for my mother, and that holds here. It’s just that the values are implausibly low.

In any case, that’s not the strangest thing I saw. I was clicking around people who I had “shared” genomes with, and I stumbled upon this:

As you can guess from the screenshot this is Daniel MacArthur’s profile. And according to this ~25% of chromosome 10 is South Asian! On first blush this seemed totally nonsensical to me, so I clicked around other profiles of people of similar Northern European background…and I didn’t see anything equivalent.

What to do? It’s going to take more evidence than this to shake my prior assumptions, so I downloaded Dr. MacArthur’s genotype. Then I merged it with three HapMap populations, the Utah whites (CEU), the Gujaratis (GIH), and the Chinese from Denver (CHD). The last was basically a control. I pulled out chromosome 10. I also added Dan’s wife Ilana to the data set, since I believe she got typed with the same Illumina chip, and is of similar ethnic background (i.e., very white). It is important to note that only 28,000 SNPs remained in the data set. But usually 10,000 is more than sufficient on SNP data for model-based clustering with inter-continental scale variation.

I did two things:

1) I ran ADMIXTURE at K = 3, unsupervised

2) I ran an MDS, which visualized the genetic variation in multiple dimensions

Before I go on, I will state what I found: these methods supported the inference from 23andMe, on chromosome 10 Dr. MacArthur seems to have an affinity with South Asians (i.e., this is his ‘curry chromosome’). Here are the average (median) values in tabular format, with MacArthur and his wife presented for comparison.

ADMIXTURE results for chromosome 10
K 1 K 2 K 3
CEU 0.04 0.02 0.93
GIH 0.87 0.05 0.08
CHD 0.01 0.97 0.01
Daniel MacArthur 0.29 0.07 0.64
Ilana Fisher 0.01 0.06 0.94

You probably want a distribution. Out of the non-founder CEU sample none went above 20% South Asian. Though it did surprise me that a few were that high, making it more plausible to me that MacArthur’s results on chromosome 10 were a fluke:

And here’s the MDS with the two largest dimensions:

Again, it’s evident that this chromosome 10 is shifted toward South Asians. If I had more time right now what I’d do is probably get that specific chromosomal segment, phase it, and then compare it to various South Asian populations. But I don’t have time now, so I went and checked out the results from the Interpretome. I cranked up the settings to reduce the noise, and so that it would only spit out the most robust and significant results. As you can see, again chromosome 10 comes up as the one which isn’t quite like the others.

Is there is a plausible explanation for this? Perhaps Dr. MacArthur can call up a helpful relative? From what  recall his parents are immigrants from the United Kingdom, and it isn’t unheard of that white Britons do have South Asian ancestry which dates back to the 19th century. Though to be totally honest I’m rather agnostic about all this right now. This genotype has been “out” for years now, so how is it that no one has noticed this peculiarity??? Perhaps the issue is that everyone was looking at the genome wide average, and it just doesn’t rise to the level of notice? What I really want to do is look at the distribution of all chromosomes and see how Daniel MacArthur’s chromosome 10 then stacks up. It might be a random act of nature yet.

Also, I guess I should add that at ~1.5% South Asian that would be consistent with one of MacArthur’s great-great-great-great grandparents being Indian. Assuming 25 year generation times that puts them in the mid-19th century. Of course, at such a low proportion the variance is going to be high, so it is quite possible that you need to push the real date of admixture one generation back, or one generation forward.

December 1, 2012

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS

 

The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

Northern Europeans and Native Americans are not more closely related than previously thought

A new press release is circulating on the paper which I blogged a few months ago, Ancient Admixture in Human History. Unlike the paper, the title of the press release is misleading, and unfortunately I notice that people are circulating it, and probably misunderstanding what is going on. Here’s the title and first paragraph:

Native Americans and Northern Europeans More Closely Related Than Previously Thought

Released: 11/30/2012 2:00 PM EST
Source: Genetics Society of America

Newswise — BETHESDA, MD – November 30, 2012 — Using genetic analyses, scientists have discovered that Northern European populations—including British, Scandinavians, French, and some Eastern Europeans—descend from a mixture of two very different ancestral populations, and one of these populations is related to Native Americans. This discovery helps fill gaps in scientific understanding of both Native American and Northern European ancestry, while providing an explanation for some genetic similarities among what would otherwise seem to be very divergent groups. This research was published in the November 2012 issue of the Genetics Society of America’s journal GENETICS

 

The reality is ta Native Americans and Northern Europeans are not more “closely related” genetically than they were before this paper. There has been no great change to standard genetic distance measures or phylogeographic understanding of human genetic variation. A measure of relatedness is to a great extent a summary of historical and genealogical processes, and as such it collapses a great deal of disparate elements together into one description. What the paper in Genetics outlined was the excavation of specific historically contingent processes which result in the summaries of relatedness which we are presented with, whether they be principal component analysis, Fst, or model-based clustering.

What I’m getting at can be easily illustrated by a concrete example. To the left is a 23andMe chromosome 1 “ancestry painting” of two individuals. On the left is me, and the right is a friend. The orange represents “Asian ancestry,” and the blue represents “European” ancestry. We are both ~50% of both ancestral components. This is a correct summary of our ancestry, as far as it goes. But you need some more information. My friend has a Chinese father and a European mother. In contrast, I am South Asian, and the end product of an ancient admixture event. You can’t tell that from a simple recitation of ancestral quanta. But it is clear when you look at the distribution of ancestry on the chromosomes. My components have been mixed and matched by recombination, because there have been many generations between the original admixture and myself. In contrast, my friend has not had any recombination events between his ancestral components, because he is the first generation of that combination.

So what the paper publicized in the press release does is present methods to reconstruct exactly how patterns of relatedness came to be, rather than reiterating well understood patterns of relatedness. With the rise of whole-genome sequencing and more powerful computational resources to reconstruct genealogies we’ll be seeing much more of this to come in the future, so it is important that people are not misled as to the details of the implications.

October 10, 2012

A plea for population genetics

Filed under: Population genetics — Razib Khan @ 9:31 pm

The title here is somewhat misleading. This is not just a plea for population genetics, but for quantitative genetics as well. Genetics is a big field. But today it is defined by and large by DNA, the concrete entity in which the abstraction of the gene is embedded. Look at the header of this website, or the background to my Twitter account. Mind you, I’m pathetically informed about molecular genetics, and don’t have a strong interest in the topic! I did consider using the H.W.E. or the breeder’s equation for the header, but in the end I judged it too abstruse and unfamiliar to most readers. DNA dominates when it comes to the modern mental conception of genetics, and we have to live with it to some extent.
But there is also great value in the genetics which has intellectual roots in the pre-DNA Mendelians and biometricians. This genetics exhibits a symbiotic, but not necessary, association with genetics as a branch of biophysics. Yet I come here not to insult or impugn my friends who toil in the trenches of the molecular wars. Rather, I simply want ...

September 27, 2012

Paleopopulation Genetics

Filed under: Genetics,Population genetics — Razib Khan @ 9:57 pm

It seems a new field is being born! Jeff Wall & Monty Slatkin have a pretty thorough review out, Paleopopulation Genetics:

Paleopopulation genetics is a new field that focuses on the population genetics of extinct groups and ancestral populations (i.e., populations ancestral to extant groups). With recent advances in DNA sequencing technologies, we now have unprecedented ability to directly assay genetic variation from fossils. This allows us to address issues, such as past population structure, changes in population size, and evolutionary relationships between taxa, at a much greater resolution than can traditional population genetics studies. In this review, we discuss recent developments in this emerging field as well as prospects for the future.

Nothing very new for close readers of this weblog, but the references are useful for later mining.

August 28, 2012

Evolutionary & population genetics preprints – Haldane’s Sieve

OK, perhaps I can help with that. Dr. Coop speaks of the collaboration between himself & Dr. Joseph Pickrell, Haldane’s Sieve, which I added to my RSS days ago (and you can see me pushing it to my Pinboard). From the “About”:

As described above, most posts to Haldane’s Sieve will be basic descriptions of relevant preprints, with little to no commentary. All posts will have comment sections where discussion of the papers will be welcome. A second type of post will be detailed comments on a preprint of particular interest to a contributor. These posts could take the style of a journal review, or may simply be some brief comments. We hope they will provide useful feedback to the authors of the preprint. Finally, there will be posts by authors of preprints in which they describe their work and place it in broader context.

We ask the commenters to remember that by submitting articles to preprint servers the authors (often biologists) are taking a somewhat unusual step. Therefore, comments should be phrased in a constructive manner to aid the authors.

It might be helpful if other evolution/genetics bloggers ...

July 22, 2012

What is inbreeding?

Filed under: Inbreeding,Population genetics — Razib Khan @ 9:23 pm

I’ve put up a bunch of posts relating to inbreeding recently (1, 2, 3, 4). But I haven’t really defined it. First, let’s stipulate what inbreeding is not: it is not the same as incest. Acts of incest can include individuals who have no blood relationship to each other (e.g., Hamlet). Additionally, there are instances of inbreeding which are not necessarily incestuous. If a population is highly inbred, then individuals who are not relations by social custom may still be so genetically similar to a point where the pairing can not credibly be stated as an outcross. But still, what do I mean? To refresh myself I re-read the section on inbreeding in Hartl & Clark. And I think that helped clarify one implicit assumption which I have which may not be clear to everyone, and I’ll get to that.

In any case, first, what’s the deal with inbreeding? The short answer is that inbreeding is a measure of the probability of identity by descent of two alleles at a given locus in a given individual. This concise definition itself is the problem. These are all abstract concepts, close to being ...

July 1, 2012

On theoretical evolutionary genetics

Filed under: Population genetics,theoretical evolutionary genetics — Razib Khan @ 10:13 pm

Joe Felsenstein in the comments:

The books you have listed are good ones, by fine people. But may I immodestly suggest a book of mine? If you want to work your way through the theory of theoretical population genetics, I have set of notes for my Genome 562 course, a textbook. It is a freely-downloadable PDF (start with my website by clicking on my name in this comment). It’s not for everyone but I think those interested in knowing how the theory actually works in more detail will benefit from it. As it’s free, I have no monetary interest in calling your attention to it, just pure ego. (And if you want a one-locus population genetics simulation program, try PopG at my lab’s website too — Google “Felsenstein PopG”).

Many of the books I recommended below are rather expensive. Theoretical Evolutionary Genetics (PDF) is not. Unfortunately much of the discourse of contemporary science is beyond the financial means of much of the world’s population, whether it be in university press textbooks or gated journals. So I’m quite happy in putting up a link to this text-in-progress.

Learning population and evolutionary genetics

Filed under: Population genetics — Razib Khan @ 11:44 am

A reader emailed me to ask what I thought would be a good way to better understand some of the more technical posts I put up.

First, two course notes which I’ve found useful as personal references:

- EEB 5348 — Population Genetics

- Evolutionary Quantitative Genetics, Uppsala University (if you are ambitious, bookmark this too)

Some people might argue that John Gillespie’s Population Genetics: A Concise Guide (Kindle edition) is a touch too abstruse and cryptic for the introductory reader. It’s short, and the mathematics isn’t challenging, but because of its concision the author can sometimes unleash upon your nearly cryptic formalism, perhaps defeating the purpose of a soft introduction in the first place. To get the most out of this book you probably ironically have to have a more thorough textbook on hand to clear up those particular points which you find confusing. But to get the general logic of population genetics and establish familiarity this seems to be the right entry point (assuming you’re not to terrified by algebra).

Of course most readers of this weblog are focused ...

March 7, 2012

Where the wild clines aren’t

Filed under: Anthroplogy,Human Genetics,Human Genomics,Population genetics,race — Razib Khan @ 7:38 pm

In the recent ‘do human races’ exist controversy Nick Matzke’s post Continuous geographic structure is real, “discrete races” aren’t has become something of a touchstone (perhaps a post like Cosma Shalizi’s on I.Q. and heritability).* In the post Matzke emphasized the idea of clines, roughly a continuous gradient of genetic change over space. Fair enough. But in the map above I traced two linear transects. I would suggest that anyone who has a general understanding of the demographics of South-Central Eurasia would immediately anticipate that these transects would reveal a relatively sharp break in allele frequencies. True, there are intermediate populations between the two end points, in Nepal, and on the fringes of India’s northeastern states. But clearly about halfway through the southwest-northeast transect you’ll see a rapid shift in allele frequencies. The blue transect is different, insofar as the change occurs very near its eastern pole. In Bengal, 85% of the length of the transect from its western terminus, the populations will still be far closer genetically to those on the western pole than those just to the east!

 

I thought of this when I saw that Zack had posted a Tibetan data set from Qinghai. As the crow flies Qinghai is closer to the plains of North India than peninsular South India, but Zack found Tibetans from this region to be only ~1 percent South Asian. That’s likely to be close to noise. I assume this does not surprise anyone. Despite the fact that North India is very populous in relation to Tibet, it turns out that geographical barriers are very strong in discouraging gene flow (note that Tibet and North India are actually culturally related; Tibetan Buddhism has its origins in the Tantric Buddhism of Bengal). This is one of my major “beefs” with the idea that “race does not exist” because of clines. I think this is a robust point when it comes to there being no Middle Eastern race vs. Scandinavian race. The clines are real and gradual between these two population sets. But I do think there has been strong differentiation between populations from the antipodes of Eurasia. I suspect that the emergence of more flexible lifestyles (e.g., oasis agriculture, horse nomadism) has in fact resulted in far greater connections between the isolated zones of Western and Eastern Eurasia over the past 10,000 years than before. In fact, one can conceptualize it as a two fold process. On the one hand you had very powerful expansions from small initial founder groups across macro-regions such as Western Eurasia and the Far East. This resulted in a decrease of genetic difference within these zones through the power of homogenization, though increased Fst in the few zones of direct contact across the zones. But, the “empty zones” of Central Eurasia may also have filled up with”proto-”Silk Road” centers over the past ~10,000 years, resulting in more frequent long term connections between the macro-regions than had heretofore been possible.

* I guess I should divulge that I have socialized with Nick Matzke an that we share common friends.

February 12, 2012

The social and biological construction of race

Filed under: Anthroplogy,Hispanics,Latinos,Population genetics,race — Razib Khan @ 2:45 pm

Many of our categories are human constructions which map upon patterns in nature which we perceive rather darkly. The joints about which nature turns are as they are, our own names and representations are a different thing altogether. This does not mean that our categories have no utility, but we should be careful of confusing empirical distributions, our own models of those distributions, and reality as it is stripped of human interpretative artifice.

I have argued extensively on this weblog that:

1) Generating a phylogeny of human populations and individuals within those populations is trivial. You don’t need many markers, depending on the grain of your phylogeny (e.g., to differentiate West Africans vs. Northern Europeans you actually can use one marker!).

2) These phylogenies reflect evolutionary history, and the trait differences are not just superficial (i.e., “skin deep”).

The former proposition I believe is well established. A group such as “black American” has a clear distribution of ancestries in a population genetic sense. The latter proposition is more controversial and subject to contention. My own assumption is that we will know the truth of the matter within the generation.


A black American

But that is the biological construction of race. Subject to fudge and fuzziness, but mapping upon a genuine reality. What about the social construction? Due to its flexibility this is a much more difficult issue to characterize in a succinct manner. Consider the cultural conditionals which render G. K. Butterfield “black” and Luis Guzman “Hispanic.” Both individuals are products of an admixture between people of mixed African and European ancestry (and likely some Amerindian in Guzman’s case). It turns out that the genes have segregated out such that Butterfield reflects more his European ancestry in traits. Guzman’s phenotype is more mixed. The perception of these two individuals is weighted by two different strains in modern American racial ideology. First, that of hypodescent, where one drop of black blood means that an individual is black, without equivocation. Halle Berry appealed to this framework to argue why her daughter, who is less than 1/4 African in ancestry (Berry’s African American father almost certainly had some European ancestry) was black. No matter that hypodescent’s origins were to buttress white racial supremacy and purity. Today black Americans espouse for purposes of community solidarity (the black American community as we know it is a partly a product of hypodescent which forced mixed-race blacks into the African American community).


Not a black American

The second issue, which has crystallized in our time, but has roots back decades, is the peculiar position of “Hispanics/Latinos” in the American racial system. As A. D. Powell has observed Hispanics seem to be able to evade the one drop rule, unless their African features are extremely dominant (e.g., pre-skin whitening Sammy Sosa). I’ve looked at the genotypes of enough Latin Americans to assume that some level of African ancestry (e.g., ~5%) is present in the vast majority of those who are not the children of recent European immigrants or from indigenous communities. For example, Mexico’s large slave population seems to have been totally absorbed, to the point where their past existence has been nearly forgotten. Mexicans of mestizo or white identity routinely have African ancestry, they just don’t know it, nor is it part of their racial identity. And it isn’t just Latinos. People of Middle Eastern ancestry, in particular Arabs, often have some African ancestry. But they are not classified as black (unlike Hispanics/Latinos they don’t have their own ethnic category, but are put into the “white” box, irrespective of their race, from Afro-Arab to Syrian).

This broader coexistence of frameworks persists on the implicit level. We don’t usually explicitly flesh out these details. Rather, we take these social constructions as givens. The major problem is when the problems and artificialities of these social constructions begin to bleed over into attempts to understand patterns of biological variation. Because of America’s fixation on the black-white dichotomy rooted in skin color people routinely offer up the fact that the human phylogeny is not well correlated with pigmentation as a refutation of the concept of race. What biology is doing is refuting a peculiar social construction of race. It is not negating the reality of human population substructure. Sociology and culture anthropology are empires of imagination to a much greater extent than human biology.

I’m thinking of this because with the birth of my daughter I confronted the bleeding over of the social into the biological. For medical purposes her race had to be assessed. One side of her ancestry was not problematic; white European. But I had to argue for why her other half should not be listed as “Asian.” For sociological purposes I have no great issue with the term Asian American which is inclusive of South and East Asians (I am not denying that this a recent political identity, I am saying that I do not personally find it objectionable and routinely enter my race as “Asian American” into public forms). But for biological purposes this is an incoherent and misleading classification. I know when my sister was born my parents put her race as “Asian,” which even at the time I felt was totally without purpose as far as biological taxonomy went. At the end of it all my daughter had “South Asian” entered in by hand. Better that her information be discarded than aggregated into a data set in a misleading fashion.

Obviously disentangling the social and biological is not necessarily impossible. Rather, it takes a little care and explicitness, as it is so easy to move between the two domains so easily as to elide their differences. And to some extent they do inform each other. Personal genomics is adding a new twist, but the general problem is as old as human systematics. The only cure is care.

Image credit: Wikipedia

January 26, 2012

1 migrant needed to prevent genetic divergence

Filed under: 1 migrant rule,Conservative Genetics,Population genetics — Razib Khan @ 2:09 am

In the survey below I asked if you knew about how many migrants per generation were needed to prevent divergence between populations. About ~80 percent of you stated you did not know the answer. That was not totally surprising to me. The reason I asked is that the result is moderately obscure, but also rather surprisingly simple and fruitful. The rule of thumb is that 1 migrant per generation is needed to prevent divergence.*

It doesn’t tell you much in and of itself of course. But if you think about it you can inject that fact into all sorts of other population genetic phenomena. For example, to have selection across two populations which is not reducible to selection within those populations (i.e., inter-demic selection) you need group-level genetic differences. These differences can be measured by the Fst statistic. In short the value of Fst tells you the proportion of variation which can be attributed to between-group differences (e.g., Fst across human races is ~0.15). For natural selection to have any adaptive effect you also need heritable variation. If you have lots of heritable variation selection can be weaker, while if you have little heritable variation selection has to be very strong (see response to selection). Fst is a rough gauge of heritable variation when you are evaluating group level differences. An Fst of 1.0 would imply that the groups are nearly perfectly distinct at the loci of interest, while an Fst of 0.0 would imply that the groups are not genetically distinct at all. With no distinction selection would have no efficacy in terms of driving adaptation. All this is a long way to saying that the 1 migrant rule is one reason that evolutionary biologists take a skeptical position in relation to group selection. It tends to quickly erase the variation which group selection depends upon.

 

To make it concrete here is the equation which you use to generate the equilibrium F statistic:

In this formula N = the population size, and m = the proportion of migrants within the population within a given generation. Nm then works out to be the number of migrants in any given generation. So 1 migrant per generation would mean for 1,000 individuals m = 0.001. For 100, the m = 0.01. To see the power of a given number of migrants per generation on long term Fst, the measure of between population difference, I’ve plotted some computed results below (Fst y-axis, Nm on the x-axis).

 

This should make intuitive sense. If there is no migration (gene flow) between populations then over the long term they become perfectly distinct. As you increase migration naturally that is going to homogenize differences between populations. But I suspect the question you may still have is how is it that only a few individuals are necessary in even large populations to prevent differentiation?

Here the intuition is simple. In a neutral scenario between-population differences emerge as gene frequencies change over time. The generation to generation change is inversely proportional to population. This is simply the sample variance or transmission noise. The expected deviation is going to be proportional to 1/N, where N is the population (2N for diploid). As N gets rather large you converge upon zero. So as the population gets very large there is less and less divergence which may occur in one given generation. In contrast you have a lot of generation to generation variation, and rapid change in frequency, in a small population. So why only 1 migrant? In a large population 1 migrant does not effect much change, but much change is not necessary. In a small population it has much more impact, but the generation to generation change is also much bigger. These two dynamics work at cross purposes so that the number of migrants needed remains relatively insensitive to population size.

* This is the result derived from population genetics, some ecological geneticists have made the case that you may actually need 10 migrants, 1 being the lower boundary.

Image credit: Wikipedia

December 18, 2011

Nature really is real

Filed under: Human Genetics,Population genetics — Razib Khan @ 2:30 am

I generated the figure at left from table 9.6 in The Genetics of Human Populations. This book was published in 1971, but I purchased the 1999 edition (which was simply a republication of the original text by Dover) in 2005.* At the time I recall reading the section on inferring the number of genetic loci implicated in the variation in pigmentation with some mild skepticism. The authors, L. L. Cavalli-Sforza and W. F. Bodmer pegged the black-white difference due to ~4 genes. Their data set consisted of individuals of various races in Liverpool; whites, blacks, people with one white parent and one black parent (F1 hybrids), people with three grandparents of one race and one of another (“backcrosses,” where you take an F1 and mate them with one of the parental lines), and finally, F2 individuals who are the product of pairings of F1s.


To come to the estimate the authors made some assumptions. For example, they assumed that blacks and whites were disjoint on the genes which encoded skin color in terms of their variants. Because these two populations lay at the opposite poles of the phenotypic distribution for humans it’s a natural assumption, but they had nothing to go on besides their hunch at the time. It turns out though that to a good first approximation this is actually a valid assumption. If you assume that the two populations are fixed at the allelic variants, that they don’t have segregating alleles which encode variation, then whites and blacks should exhibit the same variance due to environmental forces. This is what the authors saw. Using skin reflectance measures it seems that blacks and whites varied the same amount about their mean. If the two populations are approximately homozygote then the F1 generation, which are heterozygotes, should be between the two parental populations in trait value, but not exhibit much greater variance. Recall that they’d inherit a black and white copy at every locus. Therefore, all the variance in this population should also be environmental, rather than genetic. The real action comes in the backcrosses and the F2 generation. In these two populations segregation will result in a genetic variance component which will inflate the total variance. Therefore, genetic variance on this trait can be estimated like so:

Genetic variance = Total variance – Environmental variance

Recall that we already estimated environmental variance earlier. So genetic variance can be inferred by subtraction. Why do we see this pattern? Think about what happens when F1′s cross at a single locus. They’re heterozygotes. 50% of their offspring will be of like genotype. But 50% will revert back to one of parental genotypes. This naturally results in increased variance. Similarly, with a backcross 50% of the offspring will be heterozygotes, while 50% will be homozygotes.

Now let focus on a term, a, which defines the additive effect of a variant. It turns out that for their data set they didn’t see any dominance effect, so the model is rather simple here where you have an environmental component, and an additive genetic component. The variance of this is:

Va ~ 1/2 ∑a2, 1/2 × the sum of a2

As per your model you can replace the sum by a multiplicative factor, the number of genes which produce the additive effect, k, and turn the additive effect into a mean. So you have:

2Va ~ k × mean(a)2

Now, recall that we have the mean values for whites and blacks. 1/2 of the difference between these is equal to:

k × mean(a)

In this system so far we know a, and we know the mean values for the parental populations. What we don’t know is k. So we need to set up the equation so that k is the unknown which we’re computing with the values we have. Some algebra leads to this formula:

k = [1/2(mean value white - mean value black)]2/(2Va) (if you put k × mean(a) into the numerator in the right spot you can get 2Va to work out)

From their values:

k = (0.098)2/(2 × 0.001215) ~ 4

k ~ 4 means that they estimate from the variance of effects there are 4 genes. When I first saw this I thought that the result was rather crude. But it turns out they were about right! In some ways they got lucky; pigmentation is notoriously large effect ‘polygenic’ trait. But it’s still rather awesome to se that old genetic methods can yield answers which are validated in our time.

Addendum: Just to be clear, some of the data here is really rough & ready. The inference of 4 genes has a huge error due to small sample sizes in some of the sets (e.g., F2). And yet it turned out they were about right! Some of this may have been luck, but in this case the trait really was only barely polygenic.

* I just pretended that the trait was normally distributed, which probably overestimated the standard deviation, producing more overlap than is empirically justifiable.

December 16, 2011

James F. Crow in Genetics

Filed under: James F. Crow,Population genetics — Razib Khan @ 12:03 am

At 95 James F. Crow is not only an eminent population geneticist, but he knew the figures who were responsible for the whole field. The journal Genetics has commissioned a series of essays and perspectives in his honor. The first is by Daniel Hartl. I thought this was funny:

Soon after joining the program I asked Professor Crow whether I could join his lab as a graduate student. He thought for a moment and then said, “Yes, Dan, provided you understand that population genetics is a recondite field that will never be of great interest except to a small group of specialists.” I remember this because afterward I hurried to look up “recondite” in the dictionary. His admonition made population genetics seem like some variety of monasticism, which, being an admirer of Gregor Mendel, was all right by me. Little did either of us foresee that genetics would be transformed in our lifetimes by genomic sequencing on a population scale and the development of computer technologies capable of analyzing terabytes of data and that population genetics would become a key approach for understanding human evolutionary history as well as for identifying genetic risk factors for common diseases.

I had the privilege of interviewing Crow in 2006. My email requesting an interview was sent only on the smallest probability of a reply, but he replied immediately! And when I sent my questions again the reply was nearly immediate. My favorite of Crow’s answers: “In my view it is wrong to say that research in this area — assuming it is well done — is out of order. I feel strongly that we should not discourage a line of research because someone might not like a possible outcome.” At his age he’s seen many fashions come and go. But nature abides and persists.

Older Posts »

Powered by WordPress