Razib Khan One-stop-shopping for all of my content

December 30, 2011

Vocab by ethnicity, region, and education

Filed under: data,Data Analysis,GSS,I.Q.,Regionalism — Razib Khan @ 12:58 pm

A questioner below was curious if vocabulary test differences by ethnic and region persist across income. There’s a problem with this. First, the INCOME variable isn’t very fine-grained (there is a catchall $30,000 or greater category). Second, it doesn’t seem to control for inflation. But, there is a variable, DEGREE, which asks the highest level of education attained. I used this to create a “college” and “non-college” category (i.e., do you have a bachelor’s degree or not). Because of sample size considerations I removed some of the ethnic groups, but replicated the earlier analysis.

Below are two tables. One shows the mean vocab score for region and ethnicity (for whites) for those without college educations, and another shows those with college educations. I decided to generate a correlation over the two rows, even though it sure isn’t useful as a quantitative statistical measure because of the small number of data points. Rather, I just wanted a summary of the qualitative result. The short answer is that the average vocabulary difference seems to persist across educational levels (the exception here is the “German” ethnicity).

Mean WORDSUM Score by Ethnicity and Region
No college education




German 6.05 5.81 5.79 6.11
Eastern Europe 6.17 6.16 6.18 6.29
Scandinavian 6.35 5.97 6.23 6.35
British 6.6 6.21 6.02 6.57
Irish 6.66 5.83 5.69 6.58
Italian 6 5.85 5.8 6.18

College educated




German 8.03 7.48 7.63 7.33
Eastern Europe 7.7 7.37 7.5 8.09
Scandinavian 8.5 7.82 7.86 7.92
British 8.44 8.06 7.76 7.95
Irish 8.03 7.79 7.39 7.59
Italian 7.45 7.75 7.6 7.87

Correlation of college and non-college
German 0.08
Eastern Europe 0.92
Scandinavian 0.57
British 0.70
Irish 0.57
Italian 0.40

April 25, 2011

Sectionalism submerged

Filed under: Culture,Regionalism — Razib Khan @ 12:42 pm

Aside from transient memes such as Jesusland sectional sentiment tends to be implicit and remain below the surface, especially outside of “Dixie”, in the United States today. In a nation the size of a continent and populated by over 300 million we first start with an aggregation, as if we’re just another nation-state. This is evident when we compare how the United States is doing compared to…France, or the United Kingdom, or Denmark. Except the Russian Federation the proper point of comparison for all the large European nations is probably California. If we do disaggregate the United States first we generally start with race, and then perhaps move on to politics. But many of these variables are rooted in deeper sectional identities, which were much more salient in the early republic. Many of the arguments about the nature of the Civil War in terms of whether it was “about” slavery or economics or states rights misses the bigger picture that all of these issues contributed to, and emerged out of, an organic historical process where the new republic crystallized as a divergent set of regional interests which predate the founding.

Here is an fascinating section from the ...

July 1, 2010

Region matters, don’t you forget it

Filed under: Culture,Data Analysis,Regionalism — Razib Khan @ 3:48 am

Over at Ezra Klein’s weblog his research assistant had a post up on the black-white academic achievement gap by state. This section was of interest:

…Among southern states, the deep South, where one might expect to see the largest gaps, does not stand out, with Alabama and Mississippi doing roughly as well as the Carolinas or Tennessee. Hawaii and West Virginia report the smallest gaps in both surveys. Both have notably small black populations, which provide less opportunity for de facto school segregation.

Why might one expect the largest gaps there? Obviously the history of Southern racial polarization, and the exceptional nature of the subjugation of blacks by whites. But one thing that I have seen in the General Social Survey over the years is that it is in the American South that blacks and whites exhibit the least cultural difference in attitudes and outlook. This should not be that surprising, local culture matters a great deal implicitly in a sense which is only evident once you leave your familiar context.The Second Great Awakening, which reshaped the South religiously as a region where evangelical Protestantism was dominant among the population, influenced both black slaves and non-elite whites disproportionately.

These realities would be less surprising to people if they were more conscious of the patterns in the United States which derive from different streams in Anglo-Saxon folkways, as outlined in books such as Albion’s Seed and The Cousin’s Wars. Going back to educational attainment, the GSS vocabulary test, Wordsum, has the lowest scores in the South for both blacks and whites. On a more fine grained level, let’s look at mean Wordsum score broken down by region and ethnicity. The regions are as defined by the Census divisions. I wanted to look at ethnicity, as well as race. White and black categories are straightforward. British = Scottish, Welsh and English ancestry. Irish might seem to straightforward, but I think it’s pretty obvious that it throws into one category two social-cultural groups, the Catholic Irish and the Scots-Irish. First, here’s a line graph illustrating the variation by region for mean Wordsum score:


I chose a line graph so you could see that all the groups track each other. If you’ve followed my writing over the years you’ll have seen the pattern before; New England is the most academically inclined region, and the Gulf Coast of the Deep South the least. Below is a table with the mean scores by region, broken down by ethnic and racial group.


New England Mid Atlantic E N Central W N Central S Atlantic E S Central W S Central Mountain Pacific
British 7.42 7.1 6.63 6.85 6.53 6.18 6.89 6.9 7.14
Irish 7.12 7.03 6.14 6.48 6.11 5.64 6.01 6.51 6.95
German 7.66 6.31 6.02 6.37 6.19 5.84 6.15 6.41 6.39
Black 5.19 5.19 5 5.29 4.61 4.38 4.74 4.39 5.21
White 6.64 6.52 6.11 6.36 6.06 5.46 5.95 6.47 6.44

Black – White Gap
Gap % 22% 20% 18% 17% 24% 20% 20% 32% 19%
Gap Abs 1.45 1.33 1.11 1.07 1.45 1.08 1.21 2.08 1.23

A quick note: sample sizes for blacks in the Mountain region and New England are very small. So ignore that. Rather, observe that the black-white gap is pretty similar in the Mid Atlantic and the E S Central in a proportional scale (whites do about 20% better), but it is smaller in an absolute sense in the latter case. That’s because whites in that region of the country do rather badly, not that blacks do well.

The correlation between black and white mean values of Wordsum is 0.63, which means that one can predict 40% of the variation in the regional differences of one race by the variation of the other. Here’s a correlation matrix with the ethnic groups:

Mean Wordsum by region correlation matrix

Irish German Black
British 0.92 0.79 0.67
0.72 0.71


This is making concrete in a simple statistic what you saw on the line graph; the between regional patterns are significant and transcend ancestry groups (though I suspect that the high correlation between the British and Irish categories in the GSS has to do in large part with the fact that the two overlap a great deal today with intermarriage). Of course you might wonder if this applies to anything else. As I said, it is clear in relation to cultural issues that region matters a lot. Abortion has been one issue with relatively little change over the years in the GSS. The correlation in opinion by region between blacks and whites as to whether women should be able to have an abortion for any reason is about 0.80. Here are the correlations for the ethnic groups:

Abortion on demand correlation matrix

Irish German Black
British 0.86 0.72 0.62
0.67 0.80


More plainly, you can see support for abortion on demand by region for whites and blacks, and how much it varies within races and between regions:

Support for abortion on demand
Region White Black
New England 51 41
Mid Atlantic 47 44
E N Central 36 40
W N Central 34 34
S Atlantic 38 36
E S Central 26 21
W S Central 34 26
Mountain 42 52
Pacific 54 48

Powered by WordPress