Razib Khan One-stop-shopping for all of my content

May 3, 2019

Statistics on Asian American interracial marriage statistics

Filed under: data — Razib Khan @ 11:55 am

I really don’t know what to make of some of the contentions in Zach’s post below, What’s wrong with fetishizing white men? (also, posting videos which Hindi means I have no idea specifically what is going on in the video) Some of this is probably due to differences between the UK and the USA. But there are some statistics from the 2010 USA Census.

The website Asian Nation has tabulated the outmarriage rates by generation (foreign-born vs. US-raised) and various Asian American ethnicities. You can see the results below as I’ve repackaged them to focus on inmarriage of various subgroups, stratified by sex.

Some notes

1) “Asian Indian” only includes people who are Indian nationals or whose ancestors were from the Republic of India. It excludes other South Asian nationalities (I am not “Asian Indian” for example). But since other South Asian nationalities are a very small number in the USA I think that’s fine.

2) The statistics are generated from subsamples of the Census. I would be a bit cautious on outmarriage rates for groups like Asian Indians and Koreans where in 2010 the number of those born or raised in the USA was still rather small compared to the foreign-born/raised population. One reason Indian Americans showed extremely low outmarriage rates in the early 2000 Census results is that there was a massive swell of immigration in the Clinton era from Indians, so the foreign-born immigrants overwhelmed the signal.

3) Both the Japanese and Chinese have multi-generational communities in the United States. There are large numbers of highly assimilated Japanese and Chinese Americans whose roots in East Asia are as far back as their grandparents, or even earlier. I think it is noticeable that there is sex balance here.

4) I know a lot of you like bullshitting. I will be doing other things for a while so not monitoring comments much. But if I come back and have to see 1,0000-word personal thoughts which are factless and emotional I will just delete them, even if you are a long-time commenter.

April 14, 2019

Indian Muslims are more latitudinarian than Pakistani Muslims

Filed under: data,Religion — Razib Khan @ 11:13 pm

There is a lot of talk on this weblog. Probably because this is South Asian focus, and we tend to be a loquacious people on the whole (some more than others). But I decided to look in the World Values Survey in regards to the question of whether believers believed their religion was the only acceptable religion.

Before some of you ask about methods and cross-tabs, the website has a late 1990s interface. You too can use it and look up facts!

(also, Hindu intolerance surprised me a bit, though not too much)

 

April 5, 2019

Unleash the data kraken!

Filed under: data,Population genetics — Razib Khan @ 9:15 pm


The Reich lab has done a mitzvah and released a huge merged dataset of their modern and ancient populations in a big tarball. Actually, there are two files. One of them is a larger number of individuals with 600,000 SNPs (includes “Human Origins Array”) and the other has 1,200,000 SNPs, but fewer individuals. It is in EIGENSTRAT format.

For the convenience of readers who are more comfortable in PLINK/PEDIGREE format, I’ve converted them, and replaced the family ID column with population labels. The links take to you a zip file that has the three files for the binary format.

July 11, 2018

Tutorial to run PCA, Admixture, Treemix and pairwise Fst in one command

Filed under: Admixture,data,Fst,PCA,PLINK,Population genetics,TreeMix — Razib Khan @ 11:50 pm


Today on Twitter I stated that “if the average person knew how to run PCA with plink and visualize with R they wouldn’t need to ask me anything.” What I meant by this is that the average person often asks me “Razib, is population X closer to population Y than Z?” To answer this sort of question I dig through my datasets and run a few exploratory analyses, and get back to them.

I’ve been meaning to write up and distribute a “quickstart” for a while to help people do their own analyses. So here I go.

The audience of this post is probably two-fold:

  1. “Trainees” who are starting graduate school and want to dig in quickly into empirical data sets while they’re really getting a handle on things. This tutorial will probably suffice for a week. You should quickly move on to three population and four population tests, and Eigensoft and AdmixTools. As well fineStructure
  2. The larger audience is technically oriented readers who are not, and never will be, geneticists professionally. 

What do you need? First, you need to be able to work in a Linux or environment. I work both in Ubuntu and on a Mac, but this tutorial and these scripts were tested on Ubuntu. They should work OK on a Mac, but there may need to be some modifications on the bash scripts and such.

Assuming you have a Linux environment, you need to download this zip or tar.xz file. Once you open this file it should decompress a folderancestry/.

There are a bunch of files in there. Some of them are scripts I wrote. Some of them are output files that aren’t cleaned up. Some of them are packages that you’ve heard of. Of the latter:

  • admixture
  • plink
  • treemix

You can find these online too, though these versions should work out of the box on Ubuntu. If you have a Mac, you need the Mac versions. Just replace the Mac versions into the folderancestry/. You may need some libraries installed into Ubuntu too if you recompile yourselfs. Check the errors and make search engines your friends.

You will need to install R (or R Studio). If you are running Mac or Ubuntu on the command line you know how to get R. If not, Google it.

I also put some data in the file. In particular, a plink set of files Est1000HGDP. These are merged from the Estonian Biocentre, HGDP, and 1000 Genomes. There are 4,899 individuals in the data, with 135,000 high quality SNPs (very low missingness).

If you look in the “family” file you will see an important part of the structure. So do:

less Est1000HGDP.fam

You’ll see something like this:
Abhkasians abh154 0 0 1 -9
Abhkasians abh165 0 0 1 -9
Abkhazian abkhazian1_1m 0 0 2 -9
Abkhazian abkhazian5_1m 0 0 1 -9
Abkhazian abkhazian6_1m 0 0 1 -9
AfricanBarbados HG01879 0 0 0 -9
AfricanBarbados HG01880 0 0 0 -9

There are 4,899 rows corresponding to each individual. I have used the first column to label the ethnic/group identity. The second column is the individual ID. You can ignore the last 4 columns.

There is no way you want to analyze all the different ethnic groups. Usually, you want to look at a few. For that, you can use lots of commands, but what you need is a subset of the rows above. The grep command matches and returns rows with particular patterns. It’s handy. Let’s say I want just Yoruba, British (who are in the group GreatBritain), Gujurati, Han Chinese, and Druze. The command below will work (note that Han matches HanBeijing, Han_S, Han_N, etc.).

grep "Yoruba|Great|Guj|Han|Druze" Est1000HGDP.fam > keep.txt

The file keep.txt has the individuals you want. Now you put it through plink to generate a new file:

./plink --bfile Est1000HGDP --keep keep.txt --make-bed --out EstSubset

This new file has only 634 individuals. That’s more manageable. But more important is that there are far fewer groups for visualization and analysis.

As for that analysis, I have a Perl script with a bash script within it (and some system commands). Here is what they do:

1) they perform PCA to 10 dimensions
2) then they run admixture on the number of K clusters you want (unsupervised), and generate a .csv file you can look at
3) then I wrote a script to do pairwise Fst between populations, and output the data into a text file
4) finally, I create the input file necessary for the treemix package and then run treemix with the number of migrations you want

There are lots of parameters and specifications for these packages. You don’t get those unless you to edit the scripts or make them more extensible (I have versions that are more flexible but I think newbies will just get confused so I’m keeping it simple).

Assuming I create the plink file above, running the following commands mean that admixture does K = 2 and treemix does 1 migration edge (that is, -m 1). The PCA and pairwise Fst automatically runs.

perl pairwise.perl EstSubset 2 1

Just walk away from your box for a while. The admixture will take the longest. If you want to speed it up, figure out how many cores you have, and edit the file makecluster.sh, go to line 16 where you see admixture. If you have 4 cores, then type -j4 as a parameter. It will speed admixture up and hog all your cores.

There is as .csv that has the admixture output. EstSubset.admix.csv. If you open it you see something like this:
Druze HGDP00603 0.550210 0.449790
Druze HGDP00604 0.569070 0.430930
Druze HGDP00605 0.562854 0.437146
Druze HGDP00606 0.555205 0.444795
GreatBritain HG00096 0.598871 0.401129
GreatBritain HG00097 0.590040 0.409960
GreatBritain HG00099 0.592654 0.407346
GreatBritain HG00100 0.590847 0.409153

Column 1 will always be the group, column 2 the individual, and all subsequent columns will be the K’s. Since K = 2, there are two columns. Space separated. You should be able to open the .csv or process it however you want to process it.

You’ll also see two other files: plink.eigenval plink.eigenvec. These are generic output files for the PCA. The .eigenvec file has the individuals along with the values for each PC. The .eigenval file shows the magnitude of the dimension. It looks like this:
68.7974
38.4125
7.16859
3.3837
2.05858
1.85725
1.73196
1.63946
1.56449
1.53666

Basically, this means that PC 1 explains twice as much of the variance as PC 2. Beyond PC 4 it looks like they’re really bunched together. You can open up this file as a .csv and visualize it however you like. But I gave you an R script. It’s RPCA.R.

You need to install some packages. First, open R or R studio. If you want to go command line at the terminal, type R. Then type:
install.packages("ggplot2")
install.packages("reshape2")
install.packages("plyr")
install.packages("ape")
install.packages("igraph")
install.packages("ggplot2")

Once those packages are loaded you can use the script:
source("RPCA.R")

Then, to generate the plot at the top of this post:
plinkPCA()

There are some useful parameters in this function. The plot to the left adds some shape labels to highlight two populations. A third population I label by individual ID. This second is important if you want to do outlier pruning, since there are mislabels, or just plain outlier individuals, in a lot of data (including in this). I also zoomed in.

Here’s how I did that:
plinkPCA(subVec = c("Druze","GreatBritain"),labelPlot = c("Lithuanians"),xLim=c(-0.01,0.0125),yLim=c(0.05,0.062))

To look at stuff besides PC 1 and PC 2 you can do plinkPCA(PC=c("PC3","PC6")).

I put the PCA function in the script, but to remove individuals you will want to run the PCA manually:

./plink --bfile EstSubset --pca 10

You can remove individuals manually by creating a remove file. What I like to do though is something like this:
grep "randomID27 " EstSubset.fam >> remove.txt

The double-carat appends to the remove.txt file, so you can add individuals in the terminal in one window while running PCA and visualizing with R in the other (Eigensoft has an automatic outlier removal feature). Once you have the individuals you want to remove, then:

./plink --bfile EstSubset --remove remove.txt --make-bed --out EstSubset
./plink --bfile EstSubset --pca 10

Then visualize!

To make use of the pairwise Fst you need the fst.R script. If everything is set up right, all you need to do is type:
source("fst.R")

It will load the file and generate the tree. You can modify the script so you have an unrooted tree too.

The R script is what generates the FstMatrix.csv file, which has the matrix you know and love.

So now you have the PCA, Fst and admixture. What else? Well, there’s treemix.

I set the number of SNPs for the blocks to be 1000. So -k 1000. As well as global rearrangement. You can change the details in the perl script itself. Look to the bottom. I think the main utility of my script is that it generates the input files. The treemix package isn’t hard to run once you have those input files.

Also, as you know treemix comes with R plotting functions. So run treemix with however many migration edges (you can have 0), and then when the script is done, load R.

Then:
>source("src/plotting_funcs.R")
>plot_tree("TreeMix")

But actually, you don’t need to do the above. I added a script to generate a .png file with the treemix plot in pairwise.perl. It’s called TreeMix.TreeMix.Tree.png.

OK, so that’s it.

To review:

Download zip or tar.xz file. Decompress. All the packages and scripts should be in there, along with a pretty big dataset of modern populations. If you are on a non-Mac Linux you are good to go. If you are on a Mac, you need the Mac versions of admixture, plink, and treemix. I’m going to warn you compiling treemix can be kind of a pain. I’ve done it on Linux and Mac machines, and gotten it to work, but sometimes it took time.

You need R and/or R Studio (or something like R Studio). Make sure to install the packages or the scripts for visualizing results from PCA and pairwiseFst won’t work.*

There is already a .csv output from admixture. The PCA also generates expected output files. You may want to sort, so open it in a spreadsheet.

This is potentially just the start. But if you are a laypersonwith a nagging question and can’t wait for me, this should be you where you need to go!

* I wrote a lot of these things piecemeal and often a long time ago. It may be that not all the packages are even used. Don’t bother to tell me.

June 9, 2017

India = Nigeria + Italy in terms of fertility

Filed under: data,Total Fertility Rate — Razib Khan @ 6:57 pm


The map above shows the most recent district level fertility rates in India. It is immediately clear why comparing India to Pakistan and Bangladesh (let alone Nepal, Sri Lanka, or Bhutan) is a major error.

In some of the northern regions of the Hindi-speaking “cow belt” as well as the lightly populated Northeast the total fertility rate is similar to what you find in Nigeria, between 5 and 6 children per woman. For comparison the TFR for Saudi Arabia is 2.75. For Bangladesh it is 2.20 and for Pakistan it is 3.6. In contrast, much of the South, Punjab, and West Bengal have below replacement fertility.

Here is 2017 data by state:

State/UT Fertility rate 2017
Sikkim 1.2
Andaman & Nicobar 1.5
Chandigarh 1.6
Kerala 1.6
Punjab 1.6
Puduchery 1.7
Goa 1.7
Daman & Diu 1.7
Tripura 1.7
Delhi 1.7
Tamil Nadu 1.7
Karnataka 1.8
Andhra Pradesh 1.8
Lakshadweep 1.8
West Bengal 1.8
Telangana 1.8
Maharashtra 1.9
Himachal Pradesh 1.9
Gujarat 2
Jammu and Kashmir 2
Arunachal Pradesh 2.1
Haryana 2.1
Uttarakhand 2.1
Odisha 2.1
Chhattisgarh 2.2
Assam 2.2
 India 2.2
Mizoram 2.3
Dadra Nagar Haveli 2.3
Madhya Pradesh 2.3
Rajasthan 2.4
Manipur 2.6
Jharkhand 2.6
Uttar Pradesh 2.7
Nagaland 2.7
Meghalaya 3
Bihar 3.4

August 6, 2014

Conservatives respect atheists less

Filed under: data,GSS — David Hume @ 2:18 am

This clip by S. E. Cupp is making the rounds. I often find Cupp to be glib, so it’s no surprise that I disagree with many of the details of what she is saying. In particular it struck me as strange to listen to her talk about how conservatives respect atheists. Atheists are held in low esteem by the American public as a whole, let alone by conservatives. The general social survey has a question, SPKATH, which states:

There are always some people whose ideas are considered bad or dangerous by other people. For instance, somebody who is against churches and religion… a. If such a person wanted to make a speech in your (city/town/community) against churches and religion, should he be allowed to speak, or not?

Here are fractions who would allow this person to speak or not not in 1972-1990:

charts

Here are fractions who would allow this person to speak or not not in 2000-2012:

charts2

Liberals tend to be more accepting of atheists making a speech than conservatives. Interestingly even in the 2000s ~20 percent of self-identified extreme liberals would still not allow an atheist speak. As opposed to ~40 percent of self-identified extreme conservatives.

Addendum: To be clear about the intent behind this post, I’m all about keeping it real. I think it is acceptable to be an atheist on the Right. A substantial proportion of libertarians are atheists. Even among non-libertarian conservatives it’s an acceptable position. But this is really mostly relevant at the elite levels pundits and policy professionals. Atheists just aren’t popular at the grass roots. There aren’t that many conservative atheists or atheist conservatives.

Conservatives respect atheists less

Filed under: data,GSS — David Hume @ 2:18 am

This clip by S. E. Cupp is making the rounds. I often find Cupp to be glib, so it’s no surprise that I disagree with many of the details of what she is saying. In particular it struck me as strange to listen to her talk about how conservatives respect atheists. Atheists are held in low esteem by the American public as a whole, let alone by conservatives. The general social survey has a question, SPKATH, which states:

There are always some people whose ideas are considered bad or dangerous by other people. For instance, somebody who is against churches and religion… a. If such a person wanted to make a speech in your (city/town/community) against churches and religion, should he be allowed to speak, or not?

Here are fractions who would allow this person to speak or not not in 1972-1990:

charts

Here are fractions who would allow this person to speak or not not in 2000-2012:

charts2

Liberals tend to be more accepting of atheists making a speech than conservatives. Interestingly even in the 2000s ~20 percent of self-identified extreme liberals would still not allow an atheist speak. As opposed to ~40 percent of self-identified extreme conservatives.

Addendum: To be clear about the intent behind this post, I’m all about keeping it real. I think it is acceptable to be an atheist on the Right. A substantial proportion of libertarians are atheists. Even among non-libertarian conservatives it’s an acceptable position. But this is really mostly relevant at the elite levels pundits and policy professionals. Atheists just aren’t popular at the grass roots. There aren’t that many conservative atheists or atheist conservatives.

February 11, 2013

American Born or Raised Indian American outmarriage rates don’t change

Filed under: data — Razib Khan @ 7:16 pm

In the early-to-mid 2000s I had a discussion with friends who were involved in the Sepia Mutiny blog about the trends for outmarriage rates in the Indian American community. Now that we have Census 2010 data we can compare.

US born or US raised Indian Americans married with US born or US raised
Ethnic identity of spouses
Indian Other Asian White Black Hispanic Others
Men 2010 62.4 4.5 25.6 0.7 3.5 3.4
Men 2000 65.2 4.3 27.3 0 4.3
Women 2010 52 2.9 37.8 2.8 2.1 2.4
Women 2000 54.2 0 39.1 4.3 4.2

Share

American Born or Raised Indian American outmarriage rates don’t change

Filed under: data,Intermarriage — Razib Khan @ 7:16 pm

In the early-to-mid 2000s I had a discussion with friends who were involved in the Sepia Mutiny blog about the trends for outmarriage rates in the Indian American community. Now that we have Census 2010 data we can compare.

US born or US raised Indian Americans married with US born or US raised
Ethnic identity of spouses
Indian Other Asian White Black Hispanic Others
Men 2010 62.4 4.5 25.6 0.7 3.5 3.4
Men 2000 65.2 4.3 27.3 0 4.3
Women 2010 52 2.9 37.8 2.8 2.1 2.4
Women 2000 54.2 0 39.1 4.3 4.2

Share

August 9, 2012

What is the distribution of offspring per individual?

Filed under: data,Data Analysis — Razib Khan @ 7:58 am

A commenter below notes:

Also, in modern society, doesn’t just about everyone reproduce, such that not only is any particular advantage competing against other countervailing pressures as you note, but also that the “less fit” genomes are not removed from the overall population, but rather are added back to the mix? In other words, the less-preferred short males don’t die and have zero kids, they also get married and their genes get thrown back into the pot.

First, let’s not get caught in the assumption that for genes to be disfavored one has to have zero fitness in individuals carrying those genes. If, for example, in a situation of demographic expansion you had individuals who had eight children vs. those who had one child, there would be selection for the traits which were passed by those with eight children in relation to those who had one child. But, it did make me realize I wasn’t intuitively aware of the distribution of number of offspring in the population. I assumed that the median was around two, but that’s about it.

So, I looked at the GSS CHILDS variable for individuals born in 1950 or earlier from the year 2000 on (COHORT and ...

July 17, 2012

Women wanted more children in 2000s, but had fewer

Filed under: data,Demographics — Razib Khan @ 10:04 pm

As someone with mild concerns about dysgenic (albeit, with a normative lens that high intelligence and good looks are positive heritable traits) trends, I’m quite heartened that Marissa Mayer is pregnant. Of course she’s batting well below the average of some of her sisters, but you take what you can get in the game of social statistics. Quality over quantitative thanks to assortative mating.

This brings me to a follow up of my post from yesterday, People wanted more children in 2000s, but had fewer. A reader was curious about limiting the data set to females. Therefore, I did. The same general pattern seems to apply (the limitations/constraints were the same). The only thing I’ll note is that there were only ~40 women in the data set with graduate degrees in the 1970s who were also asked these particular questions, so take this with a grain of salt.


Realized 1970s 1980s 1990s 2000s < HS 2.73 3.19 3.02 2.79 HS 2.67 2.91 2.59 2.22 Junior College 3 2.75 2.38 2.06 Bachelor 2.31 2.47 2.11 1.71 Graduate 2.11 2.07 1.89 1.56 < $20 K 2.52 2.89 2.57 2.23 $20-40 K 2.57 2.9 2.46 2.02 $40-80 K 2.91 2.95 2.49 1.99 > $80 K 3.08 2.86 2.35 1.95 Ideal 1970s 1980s 1990s 2000s < HS 3.08 2.96 2.73 2.85 HS 3.04 2.89 2.61 2.97 Junior College 2.58 2.8 2.95 3.31 Bachelor 3.01 2.95 2.86 3.15 Graduate 2.73 2.52 3.63 3.02 < $20 K 3 2.84 2.79 3.04 $20-40 K 3.04 3.01 2.69 2.96 $40-80 K 3.06 2.83 2.89 3.06 > $80 K 3.13 2.87 2.84 3.06

 

Addendum: ...

June 24, 2012

Higher vocabulary ~ higher income

Filed under: data,Data Analysis,GSS,Income,IQ — Razib Khan @ 7:54 pm

Prompted by a comment below I was curious as to the correlation between intelligence and income. To indicate intelligence I used the GSS’s WORDSUM variable, which has a ~0.70 correlation with IQ. For income, I used REALINC, which is indexed to 1986 values (so it is inflation adjusted) and aggregates the household income. Finally, I limited my sample to non-Hispanic whites over the age of 30 (for what it’s worth, this choice also limited the data set to respondents from the year 2000 and later).

The results don’t get at the commenter’s assertions, because 10 out of 10 on WORDSUM does not imply that you’re that smart really. But the trendline is suggestive. Note that aggregated 0-4 because the sample size at the lower values is small indeed.

June 23, 2012

Attitudes toward genetically modified crops & science

Filed under: data,Data Analysis — Razib Khan @ 11:20 am

In the further interests of putting quantitative data out their instead of vague impressions, I noticed two GSS variables which might be of interest. One queries the impression of effect on the environment of genetically modified crops. The second asks about whether science does more harm than good. The latter question exhibited almost no year to year variation of note, so I just threw them in a pot together. But for the environment and genetically modified crop question I show responses for the year 2000 and 2010. As you can see there is a modest difference in regards to the first where liberals are more skeptical.

June 21, 2012

Left vs. right in anti-science

Filed under: data,Data Analysis — Razib Khan @ 8:31 pm

In the comments Chad says:

The Right is not inherently anti-science. Yes there are some morons out there who glorify in their ignorance, but lets recognize them for who they are, extremist idiots. This does not describe the majority of those on the Right. It doesn’t even describe the majority of creationists who are for the most part more concerned with work and children to be bothered to think about the origins of life in an average week. One can also point to similar kooks on the Left. Not just the genetic denialism described here, but also rejection of animal research, genetic engineering, organic farming, anti-vaccinations, etc.

First, I’m going to reiterate something: the majority of the human race consists of individuals who are not very smart. This is not meant as an insult, but it’s basically the truth. We may not be talking about idiots, but the average person on the street can not come close to reasoning like A. V. O. Quine. But the main issue I have with these equivalences is that though there is a valid point here, the reality is that it seems to be that the political Right in the USA has taken a ...

June 18, 2012

Trust in science, 1998 vs. 2008 (no difference)

Filed under: data,Data Analysis — Razib Khan @ 5:42 pm

A weeks ago Robert Wright had a post up, Creationists vs. Evolutionists: An American Story. Here’s the crux:

A few decades ago, Darwinians and creationists had a de facto nonaggression pact: Creationists would let Darwinians reign in biology class, and otherwise Darwinians would leave creationists alone. The deal worked. I went to a public high school in a pretty religious part of the country–south-central Texas–and I don’t remember anyone complaining about sophomores being taught natural selection. It just wasn’t an issue.

A few years ago, such biologists as Richard Dawkins and PZ Myers started violating the nonaggression pact. [Which isn't to say the violation was wholly unprovoked; see my update below.] I don’t just mean they professed atheism–many Darwinians had long done that; I mean they started proselytizing, ridiculing the faithful, and talking as if religion was an inherently pernicious thing. They not only highlighted the previously subdued tension between Darwinism and creationism but depicted Darwinism as the enemy of religion more broadly.

If the only thing this Darwinian assault did was amp up resistance to teaching evolution in public schools, the damage, though regrettable, would be limited. My fear is that the damage is broader–that fundamentalist Christians, upon being maligned ...

May 20, 2012

Education encourages integration?

Filed under: data,Demographics,race — Razib Khan @ 10:36 am

It is sometimes fashionable to assert that higher socioeconomic status whites are the sort who will impose integration on lower socioeconomic status whites, all the while sequestering themselves away. I assumed this was a rough reflection of reality. But after looking at the General Social Survey I am not sure that this chestnut of cynical wisdom has a basis in fact. Below are the proportions of non-Hispanic whites who have had a black friend or acquaintance over for dinner recently by educational attainment:

35% – Less than high school
36% – High school
47% – Junior College
45% – Bachelor
59% – Graduate

I thought this might have been a fluke, so I played around with the GSS’s multiple regression feature, using a logistic model. To my surprise socioeconomic status was positively associated with having a black person over for dinner, and age negatively associated. These two variables in fact tended to exhibit equal magnitude values in opposition, and always remained statistically significant. Just to clear, I created a variable Non-South vs. South below (being Southern increases likelihood of having had a black person over for dinner). All the individuals surveyed are non-Hispanic whites for the year 2000 and ...

April 29, 2012

Comparing American conservative Protestants & Muslims

Filed under: data,Data Analysis,Public opinion — Razib Khan @ 7:09 pm

A few years ago a book came out, American Taliban: How War, Sex, Sin, and Power Bind Jihadists and the Radical Right. The title clearly was aimed to push copies, but the gist of the title has moderately wide circulation. The rough sketch is that conservative American Protestants are roughly equivalent to conservative Muslims. I have always held that this is a qualitatively misleading analogy. The reason is from all I can gather the socially views of mainstream American conservative Protestants are actually in the moderate range of opinion amongst Muslims. But apples-to-apples comparisons are rather difficult in this domain.

But then I realized that the World Values Survey could allow me to do exactly such comparisons. The method is simple. First, you can subsample the data sets, so I could look at Protestants in the United States who identified as political conservatives. I compared these to the view of Muslims in a selection of nations (the WVS doesn’t cover much of the world, and some questions are not asked in some countries).

The results below range from 1, never justifiable, to 10, always justifiable. There is some strangeness in the results below, but they show the general qualitative result: American ...

March 26, 2012

How income, class, religion, etc. relate to political party

Filed under: data,Data Analysis,Demographics,GSS,Politics — Razib Khan @ 9:11 pm

Update: There was a major coding error. I’ve rerun the analysis. No qualitative change.

As is often the case a 10 minute post using the General Social Survey is getting a lot of attention. Apparently circa 1997 web interfaces are so intimidating to people that extracting a little data goes a long way. Instead of talking and commenting I thought as an exercise I would go further, and also be precise about my methodology so that people could replicate it (hint: this is a chance for readers to follow up and figure something out on their own, instead of tossing out an opinion I don’t care about).

 

Just like below I limited the sample to non-Hispanic whites after the year 2000. Here’s how I did it: YEAR(2000-*), RACE(1), HISPANIC(1)

Next I want to compare income, with 1986 values as a base, with party identification. To increase sample sizes I combined all Democrats and Republicans into one class; the social science points to the reality that the vast majority of independents who “lean” in one direction are actually usually reliable voters for that party. So I feel no guilt about this. I suppose Americans simply like the conceit of being independent? I know I do. ...

March 25, 2012

The upper class is more Republican

Filed under: data,Data Analysis,Demographics — Razib Khan @ 2:31 pm

A few months ago I listened to Frank Newport of Gallup tell Kai Ryssdal of Marketplace that upper class Americans tend to be Democrats. Ryssdal was skeptical, but Newport reiterated himself, and explained that’s just how the numbers shook out. This is important because Newport shows up every now and then to offer up numbers from Gallup to get a pulse of the American nation.

Frankly, Newport was just full of crap. I understand that Thomas Frank wrote an impressionistic book which is highly influential, What’s the Matter with Kansas, while more recently Charles Murray has come out with the argument in Coming Apart that the elites tend toward social liberalism. I’m of the opinion that Frank is just wrong on the face of it, but that’s OK because he’s an impressionistic journalist, and I don’t expect much from that set beyond what I might expect from a sports columnist for ESPN. Murray presents a somewhat different case, as outlined by Andrew Gelman, in that his “upper class” is modulated in a particular manner so as to fall within the purview of his framework. Neither of these qualifications apply to Frank Newport, who is purportedly presenting straightforward unadorned data.

When the “average person on the street” thinks upper class they think first and foremost money. This is not all they think about, but in the rank order of criteria this is certainly first on the list. We can argue till the cows come home as to whether a wealthy small business owner in Iowa who is a college drop out is more or less elite than a college professor in New York City who is bringing home a modest upper middle class income (very modest adjusting for cost of living). But to a first approximation when we look at aggregates we had better look at the bottom line of money. After that we can talk details. And the first approximation is incredibly easy to ascertain. Below is a table and chart which illustrate the proportion of non-Hispanic whites after 2000 who align with a particular party as a function of family income, with family income being indexed to a 1986 value (so presumably $80,000 hear means what $80,000 would buy in 1986, not the aughts).

 

Family Income Strong Dem Dem Lean Dem Ind Lean Rep Rep Strong Rep
Less than $20,000 12 15 12 24 9 15 12
$20-$40,000 12 15 10 18 11 19 15
$40-$80,000 11 14 10 13 11 24 18
More than $80,000 12 12 10 11 11 23 21

The results are straightforward: the more income a family has, the more likely they are to be Republican. There is a lot of nuance and geographical detail to be fleshed out in these results. But these facts are where we need to start.

Andrew Gelman has much more as usual. For example, this chart:

 

 

Why do I keep posting this stuff? Because facts matter. That’s my hope, my faith. Tell people facts, and they will open their eyes. Tell your friends, tell your family. Have whatever opinion you want to have, but start with the facts we know. Look up facts, calculate facts, analyze facts. They are there for us, we just need to go look. Google is your friend, Wikipedia is your friend. The General Social Survey is your friend.

December 30, 2011

Vocab by ethnicity, region, and education

Filed under: data,Data Analysis,GSS,I.Q.,Regionalism — Razib Khan @ 12:58 pm

A questioner below was curious if vocabulary test differences by ethnic and region persist across income. There’s a problem with this. First, the INCOME variable isn’t very fine-grained (there is a catchall $30,000 or greater category). Second, it doesn’t seem to control for inflation. But, there is a variable, DEGREE, which asks the highest level of education attained. I used this to create a “college” and “non-college” category (i.e., do you have a bachelor’s degree or not). Because of sample size considerations I removed some of the ethnic groups, but replicated the earlier analysis.

Below are two tables. One shows the mean vocab score for region and ethnicity (for whites) for those without college educations, and another shows those with college educations. I decided to generate a correlation over the two rows, even though it sure isn’t useful as a quantitative statistical measure because of the small number of data points. Rather, I just wanted a summary of the qualitative result. The short answer is that the average vocabulary difference seems to persist across educational levels (the exception here is the “German” ethnicity).

Mean WORDSUM Score by Ethnicity and Region
No college education

Northeast

Midwest

South

West
German 6.05 5.81 5.79 6.11
Eastern Europe 6.17 6.16 6.18 6.29
Scandinavian 6.35 5.97 6.23 6.35
British 6.6 6.21 6.02 6.57
Irish 6.66 5.83 5.69 6.58
Italian 6 5.85 5.8 6.18

College educated

Northeast

Midwest

South

West
German 8.03 7.48 7.63 7.33
Eastern Europe 7.7 7.37 7.5 8.09
Scandinavian 8.5 7.82 7.86 7.92
British 8.44 8.06 7.76 7.95
Irish 8.03 7.79 7.39 7.59
Italian 7.45 7.75 7.6 7.87

Correlation of college and non-college
German 0.08
Eastern Europe 0.92
Scandinavian 0.57
British 0.70
Irish 0.57
Italian 0.40
Older Posts »

Powered by WordPress