Razib Khan One-stop-shopping for all of my content

April 13, 2012

Verbal intelligence by demographic

Filed under: Data Analysis,Demographics,GSS,Intelligence,WORDSUM — Razib Khan @ 7:43 pm

A few years ago I put up a post, WORDSUM & IQ & the correlation, as a “reference” post. Basically if anyone objected to using WORDSUM, a variable in the General Social Survey, then I would point to that post and observe that the correlation between WORDSUM and general intelligence is 0.71. That makes sense, since WORDSUM is a vocabulary test, and verbal fluency is well correlated with intelligence.

But I realized over the years I’ve posted many posts using the GSS and WORDSUM, but never explicitly laid out the distribution of WORDSUM scores, which range from 0 (0 out of 10) to 10 (10 out of 10). I’ve used categories like “stupid, interval 0-4,” but often only mentioned the percentiles in the comments after prompting from a reader. This post is to fix that problem forever, and will serve as a reference for the future.

First, please keep in mind that I limited the sample to the year 2000 and later. The N is ~7,000, but far lower for some of variables crossed. Therefore, I invite you to replicate my results. After the charts I will list all the variables, so if you care you should be able to ...

December 29, 2011

Vocabulary score by race, ethnicity, and region

Filed under: Data Analysis,Demographics,GSS,WORDSUM — Razib Khan @ 10:22 pm

Mike the Mad Biologist has a post up, A Modest Proposal: Alabama Whites Are Genetically Inferior to Massachusetts Whites (FOR REALZ!). The post is obviously tongue-in-cheek, but it’s actually an interesting question: what’s the difference between whites in various regions of the United States? I’ve looked at this before, but I thought I’d revisit it for new readers.

First, I use the General Social Survey. Second, I use the WORDSUM variable, a 10 question vocabulary test which has a correlation of 0.70 with general intelligence. My curiosity is about differences across white ethnic groups by region. To do this I use the ETHNIC variable, which asks respondents where their ancestors came from by nation. I omitted some nations because of small sample size, and amalgamated others.

Here are my amalgamations:

German = Austria, Germany, Switzerland

French = French Canada, France

Eastern Europe = Lithuania, Poland, Hungary, Yugoslavia, Russia, Czechaslovakia (many were asked before 1992), Romania

Scandinavian = Denmark, Norway, Sweden, Finland (yes, I know that Finland is not part of Scandinavia, Jaakkeli!)

British = England, Wales, Scotland

Next we need to break it down by region. The REGION variable uses the Census divisions. You can see them to the left. I combined a few of these to create the following classes:

Northeast = New England, Middle Atlantic

Midwest = E North Central, W North Central

South = W S Central, E S Central, South Atlantic

West = Pacific, Mountain

The key method I used is to look for mean vocabulary test scores by ethnicity and religion. I also later broke down some of these ethnic groups by religion. Finally, all bar plots have 95 percent confidence intervals. This should give you a sense of the sample sizes for each combination.

First let’s break it down by race/ethnicity and compare it by region to get a reference:


Next, the main course:

Finally, let’s separate by religion for Germans and Eastern Europeans:

I include the last plot because these reports of nationality have to be taken with a consideration for the structure they may mask. People whose ancestors from Poland in the United States fall into two large categories: people of Jewish heritage whose identity as ethnic Poles was contested (recall that Jews often spoke Yiddish as their first language, a Germanic language), and Roman Catholic Slavs. I suspect many of those in the “None” category are also Jews by culture, if not religion.

Second: there is a tendency of people of all ethnic groups to have lower vocabulary scores if they are from the South or Midwest. This tendency is in many cases outside of the 95 percent confidence interval. It’s especially striking in the three groups with huge samples sizes in all regions: Germans, Irish, and British. Irish here includes both Scots-Irish and those of Irish Catholic background. Not only are the sample sizes for these groups large, but the roots of these groups in some of these regions go rather far back. In particular, the division between the people of British ancestry goes back centuries in the North vs. South divide.

How to understand this? There are a lot of complicating factors.  But as outlined in Albion’s Seed and The Cousins’ Wars the divisions between the Anglo-Celtic folkways runs deep and long. If a time traveler from the 18th century arrived in the United States today and were asked which region was the heart of intellectual ferment they would correctly guess New England. Early Puritan New England was the first universal-literacy society in the world. This was to some extent a matter of conscious planning. The leaders of the New England colonies enforced limitations upon who could emigrate to their dominion. Religious exclusions and persecutions in this region are well known, but there was also a policy of rejecting the settlement of those who were perceived to be possible burdens upon the community. New England then selected for a middle class migration out of East Anglia and the port towns of southwest England. But the fathers of the early colony also rejected the transfer of the privileges of the blood nobility from the motherland, thereby throwing up a barrier to the migration of the aristocracy.

In contrast the lowland South received a more representative selection of the British class strata. The younger sons of the British nobility and self-styled gentlemen arrived to make their mark, as did those who became indentured servants and even slaves. A class society on the model of southwestern England recapitulated itself in this region. As for the uplands, what became Appalachia, an influx of Scots-Irish came to dominate the scene by the mid of the 18th century, disembarking in Philadelphia, and pushing down the spine of the high country down to the Deep South.

Conflicts between these “Anglo” groups framed the terms of debate over the 18th and 19th centuries. They were to some extent at the root of the Age of Sectionalism. Today because of the salience of race, and the prominence of the later wave of migration in the late 19th and early 20th century which remained vibrant in living memory for mod, these early divisions have moved out of sight. But they still remain. The difference between Germans in Texas and the Anglos of Southern extraction remains to this day, but note that Germans exhibit the same regional differences in vocabulary score as Anglos. Why? This may be a case where the original cultural substratum has an outsized impact (the dialect of eastern New England, made famous by the Catholic Irish of Boston, is descended from East Anglian English!).

Of course there might be a genetic difference. Intelligence is a quantitative trait, so it would be trivial to generate two populations which are genetically similar, but very different in trait value, simply through selection. In the 1630s ~20 thousands Puritans settled New England. For various reasons there was very little migration over the next century and a half. By 1780 New England’s population was 700,000, almost all through natural increase (not only was New England the world’s first universal literacy society, but its fertility was the highest in the late 17th century).

Finally, there’s the issue of disease and pathogen load. Endemic hookworm infection does seem likely to have made Southerners, of both races, relatively indolent and lethargic in comparison to Northerners. Who knows what pathogens simply fall below our radar?

Overall I think that a more fine-grained and detailed exploration of these topics is warranted. Our public discussion is too coarse, and data-thin.

January 21, 2011

The stupid rich and poor smart do exist

Filed under: Data Analysis,I.Q.,WORDSUM — Razib Khan @ 12:56 am

WORDSUM is a variable in the General Social Survey. It is a 10 word vocabulary test. A score of 10 is perfect. A score of 0 means you didn’t know any of the vocabulary words. WORDSUM has a correlation of 0.71 with general intelligence. In other words, variation of WORDSUM can explain 50% of the variation of general intelligence. To the left is a distribution of WORDSUM results from the 2000s. As you can see, a score of 7 is modal. In the treatment below I will label 0-4 “Dumb,” 5-7 “Not Dumb,” and 8-10 “Smart.” Who says I’m not charitable? You also probably know that general intelligence has some correlation with income and wealth. But to what extent? One way you can look at this is inspecting the SEI variable in the GSS, which combines both monetary and non-monetary status and achievement, and see how it relates to WORDSUM. The correlation is 0.38. It’s there, but not that strong.

To further explore the issue I want to focus on two GSS variables, WEALTH and INCOME. WEALTH was asked in 2006, and it has a lot of categories of ...

May 4, 2010

WORDSUM & IQ & the correlation

Filed under: Blog,data,Data Analysis,GSS,IQ,WORDSUM — Razib Khan @ 2:08 pm

Every time I use the WORDSUM variable from the GSS people will complain that a score on a 10-question vocabulary test is not a good measure of intelligence. The reality is that “good” is too imprecise a term. The correlation between adult IQ and WORDSUM = 0.71. The source for this number is a 1980 paper, The Enduring Effects of Education on Verbal Skills. I’ve reproduced the relevant table…

Estimated Correlations for Variables in a Model of Enduring Effects of Education for White, Native-Born People 25 to 72 Years Old in the Contemporary [1970s] United States
  Child IQ Age Sex Father’s Educ Father’s SEI Educ Adult IQ WORDSUM
Child IQ - 0 0 0.31 0.30 0.51 0.80 -
Age - - 0.026 -0.304 -0.130 -0.304 -0.42 -0.005
Sex - - - -0.054 0.058 0.050 0 -0.121
Father’s Educ - - - - 0.488 0.469 0.30 0.302
Father’s SEI - - - - - 0.347 0.31 0.285
Educ - - - - - - 0.66 0.511
Adult IQ - - - - - - - 0.71
WORDSUM - - - - - -   -
                 

Obviously since the WORDSUM test was not given to those under 18 you can’t calculate the correlation between childhood IQ and WORDSUM score. Additionally, I suspect since 1980 there’s been a bit more cognitive stratification by education. I notice in the GSS sample that there are many older people, especially women, who have high WORDSUM scores but no college education. In the younger age cohorts this pattern is not as evident because if you are intelligent the probability is much higher that you’ll obtain a university education.

A correlation of 0.71 is not mind-blowing, there’s a significant difference between IQ and WORDSUM as they relate to each other linearly. But I think it’s good enough to get a sense that WORDSUM is a serviceable substitute for a more rigorous measure of g in lieu of any alternatives, and not so clumsy a proxy so as to be useless. Though that call is up to you, and readers are free to disagree with the methodology of the model used to obtain this correlation. Additionally, I would point out that WORDSUM is a subset of the vocabulary subsection of the Wechsler Adult Intelligence Scale. WORDSUM is in effect a slice of an IQ test.

I am bookmarking this post so that in the future I can simply place a link in the comment threads in response to objections to WORDSUM.

Note: Thanks to Bryan Caplan for pointing me to this paper.

Citation: Lee M. Wolfle, Sociology of Education, Vol. 53, No. 2 (Apr., 1980), pp. 104-114

Powered by WordPress