Razib Khan One-stop-shopping for all of my content

May 13, 2010

Life is One, universal common ancestry supported

Filed under: Common Descent,Genetics,Genomics,Statistics — Razib Khan @ 2:01 am

One of the notions implicit in most evolutionary models is that the tree of life has a common root. In other words all individuals of all species represent end points of lineages which ultimately coalesce back to the the original common ancestor. The first Earthling, so to speak. I say implicit because common ancestry isn’t necessary for evolution to be valid; after all, we presumably accept that evolutionary process is operative in an exobiological context, if such a context exists. Therefore it is possible that modern extant lineages are derived from separate independent antecedents. A “multiple garden” model. This has seemed less and less plausible as the molecular basis of biology has been elucidated; it looks like the basic toolkit is found all across the tree of life. But with a new found awareness of the power of processes such as horizontal gene transfer the open & shut case is faced with a new element of ambiguity. Or perhaps not?

Here’s a post from Wired, Life on Earth Arose Just Once:

The idea that life forms share a common ancestor is “a central pillar of evolutionary theory,” says Douglas Theobald, a biochemist at Brandeis University in Waltham, Massachusetts. “But recently there has been some mumbling, especially from microbiologists, that it may not be so cut-and-dried.”

Because microorganisms of different species often swap genes, some scientists have proposed that multiple primordial life forms could have tossed their genetic material into life’s mix, creating a web, rather than a tree of life.

To determine which hypothesis is more likely correct, Theobald put various evolutionary ancestry models through rigorous statistical tests. The results, published in the May 13 Nature, come down overwhelmingly on the side of a single ancestor.

A universal common ancestor is at least 102,860 times more probable than having multiple ancestors, Theobald calculates.

The paper is now on the Nature website, A formal test of the theory of universal common ancestry. They looked specifically at 23 very conserved proteins across 12 taxa from the three domains of life (those being eukaryotes, prokaryotes, and the archaea). Here’s where the author explains the philosophy behind the statistical technique:

When choosing among several competing scientific models, two opposing factors must be taken into account: the goodness of fit and parsimony. The fit of a model to data can be improved arbitrarily by increasing the number of free parameters. On the other hand, simple hypotheses (those with as few ad hoc parameters as possible) are preferred. Model selection methods weigh these two factors statistically to find the hypothesis that is both the most accurate and the most precise.

The sorts of models compared is illustrated by figure 2. One the left you have the universal common descent model, and on the right the prokaryotes (bacteria) have an independent origin. The lines represent connections between the 23 conserved protein sequences, either through horizontal transfer or vertical transmission.


As noted in the Wired piece there’s no contest here. Universal common descent is strongly supported. I’ll let the author’s finish:

What property of the sequence data supports common ancestry so decisively? When two related taxa are separated into two trees, the strong correlations that exist between the sequences are no longer modelled, which results in a large decrease in the likelihood. Consequently, when comparing a common-ancestry model to a multiple-ancestry model, the large test scores are a direct measure of the increase in our ability to accurately predict the sequence of a genealogically related protein relative to an unrelated protein. The sequence correlations between a given clade of taxa and the rest of the tree would be eliminated if the columns in the sequence alignment for that clade were randomly shuffled. In such a case, these model-based selection tests should prefer the multiple-ancestry model. In fact, in actual tests with randomly shuffled data, the optimal estimate of the unified tree (for both maximum likelihood and Bayesian analyses) contains an extremely large internal branch separating the shuffled taxa from the rest. In all cases tried, with a wide variety of evolutionary models (from the simplest to the most parameter rich), the multiple-ancestry models for shuffled data sets are preferred by a large margin over common ancestry models (LLR on the order of a thousand), even with the large internal branches. Hence, the large test scores in favour of UCA models reflect the immense power of a tree structure, coupled with a gradual Markovian mechanism of residue substitution, to accurately and precisely explain the particular patterns of sequence correlations found among genealogically related biological macromolecules.

Citation: Theobald, Douglas L., A formal test of the theory of universal common ancestry, Nature, doi:10.1038/nature09014

Powered by WordPress