Razib Khan One-stop-shopping for all of my content

March 18, 2019

The evolution of languages

Filed under: Diversity,Evolution,Language,Linguistics — Razib Khan @ 2:56 pm
Map of language families of the world today

The story in the Bible about the “Tower of Babel” was the explanation that the ancient Hebrews gave for why there was so much linguistic diversity in the world around them. Ancient people were curious and observant enough to notice that their neighbors did not speak like them. The word “barbarian” comes from the ancient Greek perception of what non-Greeks sounded like to Greeks.

Sometimes linguistic differences can be more subtle, but still critical to life and death. The meaning of the term shibboleth comes out of the context where different ancient Israelite groups pronounced s differently and used that to identify members of an enemy tribe. The limits of your language are often the limits of your tribe.

But evolutionary genetics tells us humans share a common ancestor. That we are one tribe in our genealogy. In fact, the most recent common ancestors of all human populations lived within the last 200,000 years. Outside of Africa, they lived within the last 50,000 years. And, in North and South America it is within the last 15,000 years. We are a young species.

And yet you have a situation such as in the highlands of New Guinea where people who live in different valleys positioned next to each other speak two totally different languages. In North America, Europeans encountered thousands of languages and many language families. And yet we know that most of the ancestors of the natives of North America arrived within the last 15,000 years!

The situation in the Americas may have been the norm in the recent past. Today 40% of the world’s population speak Indo-European languages, but 6,000 years ago it is likely that very few Europeans or Indians spoke Indo-European languages. The spread of English, Arabic, and Chinese occurred in historical time. Their rise to dominance is due to social and political realities of the last 2,000 years.

The ancient world points to incredible linguistic diversity which faded with rising of the “empires of the word.” Over four thousand years ago in Mesopotamia, what is now modern Iraq, many of the people spoke Semitic dialects. Related Arabic and Hebrew. But Sumerian flourished at the time in the south, a language unrelated to any we know of today. In the far north, the people spoke Hurrian, again, a language unrelated to any which flourish today. In the mountains to the east there lived the Guti and Kassites, who seem to have spoken languages unrelated to any spoken today as well.

Etruscans spoke a non-Indo-European language, but influenced the Romans

The Romans record the presence of Etruscans, who influenced their culture, and spoke a language which was not Indo-European. To the further north, there were Ligurians, hugging the coast around modern Genoa, while in the hills there lived tribal Samnites and Oscans. To the south, there were Greek cities and obscure native peoples such as the Sicils. The island of Sardinia was inhabited by speakers of what we now term “Paleo-Sardinian,” perhaps related to Basque. The ancient world was one great Babel.

What this highlights is that while genetic evolution proceeds slowly, gradually, and continuously, linguistic evolution can be riotous, rapid, and proliferate at light speed toward unintelligibility.

Just by physical inspection, one can tell that Finns and Swedes share common ancestors. That they are genetically related. But linguistically they are as different as can be. Finnish is no closer to Swedish phylogenetically than it is to Bantu or Chinese! Swedish as a language is most definitely closer to Bengali, Spanish, or ancient Hittite, than it is to Finnish.

Evolution simply describes a change in characteristics which be defined on a phylogenetic tree. This can be biological, as with genetic evolution, or, it can be cultural. But clearly, the mechanisms matter here. Mendel’s laws impose constraints and regularity to biological evolution which culture lacks. Half of your genetic material comes from each parent. There is no such constraint with culture. In fact, your cultural inheritance may come from someone who is not your biological parent.

Whereas genetic evolution can be traced through modern scientific methods to billions of years in the past, elements of cultural evolution shift so fast that most researchers are skeptical of the possibility of going more than ten thousand years in the past. We have a Neanderthal genome, but it is unlikely we will ever be able to reconstruct the Neanderthal languages (there were certainly many!).

The diversity of languages of North and South America illustrates how a small number of people, perhaps a few thousand genetically, can give rise to thousands of languages hundreds of generations later. The diversity we see around us today in the modern era is but a shadow of what was likely the human norm for most of our species’ history. It is as if a massive process of selection has winnowed down the languages spoken down to a few huge families.

And yet we can still discern similarities across many languages separated by history and large geographical distances. This is most famously illustrated by the “Indo-European” languages.

The affinities between Indian languages and those of Europe were discerned by Sir William Jones in the 18th century. After the fact, the similarities are clear to native speakers. A focus on core words that were more likely to be preserved gave rise to “Swadesh lists.”

Here is the number “nine” in various languages:

Finnish: yhdeksän, Hungarian: szám, Basque: zenbakia, Swedish: nio, Czech: neun, French: neuf, Armenian: inn, Bengali: naẏa, Arabic: tis3a, Turkish: dokuz

Even if you are not a linguist or philologist peculiar similarities may jump out at you (as well as discordances). This is because a large number of languages in that list are Indo-European, and share a common origin within the last ~5,000 years. Paired with them are nearby languages which are non-Indo-European.

It is almost certainly the case that most of those languages above are spoken by people who share ancestors within the last ~50,000 years…but evolution on vocabulary is fast enough that the signal of shared ancestry is lost much faster than in genetic evolution.

This is why many historical linguists focus on grammar, rather than vocabulary. Just going by a list of the number of words within the lexicon you might conclude that English is a Latinate language, like French, Spanish or Italian. But if you look at grammar, it is clear that English is a Germanic language. Vocabulary is something that is easily shared, and quite protean. Consider how quickly different generations develop their own slang and preferred terms.

Grammar is much more conservative, and non-standard speech is often indicative that someone learned English as an adult, and retained the grammar of the language in which they were raised.

Vocabulary evolves fast and responds to selection. People who live in a forested environment may have many ways to describe types of trees. Those who live on a grassland may not. But grammar is part of the deep structure of any language and is evolutionarily conserved. If Noam Chomsky is correct, all grammar is a local expression of “universal grammar,” which is hardwired into our species on the deepest levels.

And yet all of this fascinating research and knowledge is constrained by the fact that most of the world’s languages are disappearing. This mass extinction is happening due to globalization, trade, and the advantages of speaking an ‘international’ language. Of the world’s 7,000 living languages, nearly half are in danger of going extinct.

With the extinction of a language, a peoples’ whole memory fades into oblivion, as well as the record of human diversity from which we can make inferences about the power and range of evolutionary processes in culture.

March 5, 2019

The dearth of diversity in genomics

Filed under: Diversity,Genetics,GWAS,Medicine — Razib Khan @ 11:24 pm
Citation: Martin, Alicia R., et al. bioRxiv(2019): 441261.

One of the curious things about genomics is the field has exploded in the 21st century so fast, with such explosive growth and increase in power, that it is hard to keep up if you blink. The first human genome cost $3,000,000,000. Now human genomes can be had for $1,000 or less. Whereas thirty years ago geneticists were debating whether you could even map the human genome, today we have hundreds of thousands of whole-genome sequences.

But some things don’t change as much as you might think. The chart at the top of this post illustrates the proportion of various ethnicities in “genome-wide association” (GWAS) studies over the past twelve years. The logic of GWAS studies is straightforward: you are searching within the genome for genetic variation that explains variation within the population. You are looking for genes that cause diseases and traits.

The chart to the left illustrates the heritability, proportion of variation which is genetic, for a range of traits. Many of these, such as obesity, heart attack, and schizophrenia, are obviously extremely relevant in a medical context. It would be best to understand the genetic basis of these diseases within the population and in the individual.

Therefore, in an ideal world, you could look at the specific genes you carry and construct a “polygenic risk score” (PRS) which predicts your lifetime probability of developing the disease in relation to the broader population. But we do not live in an ideal world.

Because most GWAS are performed in European populations, PRS values for individuals not of European ancestry are far less accurate. This phenomenon is caused by several factors. One of the major ones is that each population has genetic variations that cause diseases special and unique to a given population (“private alleles” in the jargon). Studies which use only Europeans cannot detect unique variation in non-European populations by definition. Those variants are not found in Europeans! Additionally, sometimes genetic variants even give different risks in Europeans than non-Europeans because of interactions of genes. The predictions in one population do not transfer to another.

The brains of schizophrenics are different in neuroimaging

One prediction that one could have made, and one that I did, is that the incredible cheapness of modern genomic technology would mean that people of diverse ethnicities would be included in studies over time. We wouldn’t need to do anything special, the magic of technology would solve the problem for us.

But instead, we’ve seen a process of the “rich getting richer.” European nations have robust healthcare infrastructures geared toward collecting the information needed for GWAS. Additionally, there are statistical reasons that GWAS are more powerful for homogeneous populations…as Europeans have the larger sample sizes, to begin with, many researchers continue to stay with the population with larger sample sizes!

What’s the path going forward? First, researchers are now proactively going and reanalyzing populations within datasets in the Western world with large numbers of people of non-European ancestry. Previously to obtain homogeneous datasets as noted above these individuals would be discarded from the analysis. Second, there are now proactive efforts to obtain diversity from regions outside of Europe. The African Genome Variation Project is one of these cases. Finally, private consumer genomic firms are now assembling databases of quite a large size, and a substantial number of their customers are non-European.

The past has taught us that we can’t be complacent, and simply expect the “laws of genomics” to solve the issues in relation to genomic diversity. Rather, researchers and the public have to proactively address these issues, so as to allow all of us to make the best decisions within our own lives in terms of what our genes bring to the table.

Interested in learning where your ancestors came from? Check out Regional Ancestry by Insitome to discover various regional migration stories and more!


The dearth of diversity in genomics was originally published in Insitome on Medium, where people are continuing the conversation by highlighting and responding to this story.

May 5, 2011

Why the Amazon Rainforest is species rich

Filed under: Amazon Rainforest,Amazonia,Diversity,Ecology,Environment,Speciation — Razib Khan @ 3:08 pm


A monkey frog

The Pith: The Amazon Rainforest has a lot of species because it’s been around for a very long time.

I really don’t know much about ecology, alas. So my understanding of evolution framed in its proper ecological context is a touch on the coarse side. When I say I don’t know much about ecology, I mean that I lack a thick network of descriptive detail. So that means that I have some rather simple models in my head, which upon closer inspection turn out to be false in many specific instances. That’s what you get for relying on theory. Today I ran into a paper which presented me with some mildly surprising results.

The question: why is the Amazon Rainforest characterized by such a diversity of species? If you’d asked me that question 1 hour ago I would have said that it was a matter of physics. That is, the physical parameters of a high but consistent rainfall and temperature regime. This means the basic energetic inputs into the biome is high, and its consistency allows the organisms to plan their life schedule efficiently, maximizing the inputs. All ...

April 21, 2011

The Court Jester and the averaging fallacy

The Pith:Climatic and biological evolutionary pressures on an ecosystem complement at different scales. Neither is “dominant,” as that framing is not even wrong.

Yesterday I alluded to the Court Jester hypothesis of evolutionary change, which is often contrasted with the Red Queen hypothesis. The main embarrassment for me as a person who fancies himself a fan of evolutionary process is that I hadn’t ever heard of the Court Jester Hypothesis before yesterday. Therefore I went back to the paper which outlined many of the basic ideas of the model in 2001, Distinguishing the effects of the Red queen and Court Jester on Miocene mammal evolution in the northern Rocky Mountains. To be fair, the hypothesis itself is a tightening of a range of ideas which were long in the air. I did know, for example, about the Turnover-pulse hypothesis. These are all a set of models which emphasize the abiotic selective pressures on life forms, as opposed to the biotic ones. An abiotic pressure would be something like the Younger Dryas cold snap. A biotic pressure might be an exotic invasive species spreading through the ...

July 15, 2010

Linguistic diversity = poverty

Filed under: Culture,Diversity,Economic Growth,Economics,Language,Norms,Values — Razib Khan @ 3:05 pm

In yesterday’s link dump I expressed some dismissive attitudes toward the idea that loss of linguistic diversity, or more precisely the extinction of rare languages, was a major tragedy. Concretely, many languages are going extinct today as the older generation of last native speakers is dying. This is an issue that is embedded in a set of norms, values which you hold to be ends, so I thought I could be a little clearer as to what I’m getting at. I think there are real reasons outside of short-term hedonic utility why people would want to preserve their own linguistic tradition, and that is because I am no longer a total individualist when it comes to human identity. I have much more sympathy for the French who wish to preserve French against the loss of their linguistic identity against the expansion of English than I had a few years ago.

Language is history and memory. When the last speaker of English dies, or, when English is transmuted to such an extent that it is no longer English as we today understand it, our perception of the past and historical memory, our understanding of ourselves, will change. There is a qualitative difference when Shakespeare becomes as unintelligible as Beowulf. Though I tend to lean toward the proposition that all languages are a means toward the same ends, communication, I agree that there are subtleties of nuance and meaning which are lost in translation when it comes to works of literature and other aspects of collective memory. Those shadings are the sort of diversity which gives intangible aesthetic coloring to the world. A world where everyone spoke the same language would lose a great deal of color, and I acknowledge that.


But we need to look at the other side of the ledger. First, we’re not talking about the extinction of English, French, or Cantonese. We’re talking about the extinction of languages with a few thousand to a dozen or so speakers.  The distribution of languages and the number of speakers they have follows a power law trend, the vast majority of languages have very few speakers, and these are the ones which are going extinct. We are then losing communal identity, a thousand oral Shakespeare’s are turning into Beowulf’s and Epic of Gilgamesh’s, specific stories which have to be reduced to their universal human elements because a living native speaking community is gone. Let me acknowledge that there is some tragedy here. But this ignores the costs to those who do not speak world languages with a high level of fluency. The cost of collective color and diversity may be their individual poverty (i.e., we who speak world languages gain, but incur no costs).

Over the arc of human history individuals and communities have shifted toward languages with more numerous following. Sometimes, as in the case of the marginalization of the dialects of France for standard French in the 19th century, there was a top-down push. In other cases there needed to be no top-down push, because people want to integrate themselves into networks of trade, communication and participate in the family of nations on equal footing. Losing the languages of your ancestors means that your ancestors are made to disappear, their memory fades, and is replaced by other fictive ancestors. Modern Arabs outside of Arabia will often acknowledge that they are the products of Arabization (this is most obvious in the case of regions like Egypt or Mesopotamia which have long and glorious historical traditions pre-dating Islam). But they also in particular circumstances conceive of themselves as descendants of Ishmael, because they are Arab. A similar sort of substitution occurs when peoples change religions. The early medieval European monarchies, such as the Merovingians and the House of Wessex, traced their ancestry to German pagan gods. Later European dynasties tended to establish fictive ties to the House of David.

But letting one’s ancestors die also means that one can live with other human beings, and participate clearly and with a high level of fluency. You may object that this does not entail monolingualism. And certainly it does not, but over the generations there will be a shift toward a dominant language if there is economic, social and cultural integration. The way we can preserve local traditions and languages in the face of the homogenizing power of languages and cultures of greater scope is to put up extremely high barriers to interaction. The Amish have preserved their German dialect and religious traditions, but only through opting out of the mainstream to an extreme extent (and the Amish are bilingual too).

On a deeper cognitive level some readers point out that there are hints that the Sapir-Whorf hypothesis may be correct. This is still not a strong enough reason for the perpetuation of linguistic traditions which are not widely subscribed. Humans have a finite amount of time in their lives, and the choices they make may not be perfectly rational, but quite often in the aggregate they are. When it comes to some aspects of cultural diversity, such as dress and religion, the importance we place on these traits is imbued by aspects of human psychology. Not so with language. Communication is of direct utilitarian importance.

Now that I’ve addressed, at least minimally, the tensions on the macro and micro level when it comes to linguistic preference, I want to address the aggregate gains to linguistic uniformity. My family is from Bangladesh, which had a “language movement”, which served as the seeds for the creation of that nation from a united Pakistan. Though there was a racial and religious component to the conflict I don’t think it would have matured and ripened to outright civil war without the linguistic difference. Language binds us to our ancestors, and to our peers, but also can separate us from others. A common language may not only be useful in a macroeconomic context, reducing transaction costs and allowing for more frictionless flow of information, but it also removes one major dimension of intergroup conflict.

So if only everyone spoke the same language there would be peace and prosperity? Perhaps not. Recently I have been convinced that it is best to have an oligopoly of languages so that “group-think” doesn’t impact the whole world in the same way. I’m basically repeating Jared Diamond’s argument in Guns, Germs, and Steel, as to why Europe was more cultural creative in the early modern period than China. Institutional barriers can allow for more experimentation, and prevent “irrational herds” from taking the whole system into dead-ends. Another way to think of it is portfolio diversity. Though linguistic diversity will introduce frictions to communication, on the margins some friction is useful to prevent memetic contagion which might occur due to positive feedback loops.

Below I present my model in graphical form. One the X axis is a diversity index. Imagine it goes from 1 to 0. 1 is the state where everyone speaks a different language, and 0 is the one where everyone speaks the same language. A state of high linguistic diversity converges upon 1, and one of low diversity upon 0. I believe that as linguistic diversity decreases one gains economies of scale, but there are diminishing returns. And, beyond a certain point I suspect that there are decreases to utility because of the systematic problem of irrational herds. I didn’t put a scale on the X axis because I don’t have a really clear sense of when we’re hitting the point of negative returns on homogeneity, though I don’t think we’re there yet.

lingdiv

Note: My confidence in the hypothesis that there are negative returns at some point is modest at best, and I have a high level of uncertainty as to its validity. But, I have a high confidence about the shape of the left side of the chart below, that very high linguistic diversity is not conducive to economic growth, social cooperation, and amity more generally scaled beyond the tribe.

Powered by WordPress