Razib Khan One-stop-shopping for all of my content

February 10, 2011

Counting beans the proper way

Filed under: Admixture,Analysis,Genetics,Genomics,Health — Razib Khan @ 9:46 am

Apropos of several of my recent posts, The New York Times has an interesting article up, Counting by Race Can Throw Off Some Numbers. Basically it outlines the difficulty of enumerating different racial and ethnic groups for different purposes in a more diverse and racially mixed USA. Numbers matter when it comes to apportioning resources, and the current methods are often quite coarse (though some interest groups prefer it that way, because it bolsters their numbers). Let’s focus on the point germane to the focus of this weblog:

The National Center for Health Statistics collects vital statistics from the states to document the health of the population. When it comes to collecting birth certificate information, though, the center encounters a problem: 38 states and the District of Columbia report race data in the new and more expansive manner that allows for the recording of more than one race. But a dozen states do not, because they still use old data systems and outdated forms. As a result, the center cannot produce consistent national data for what it calls “medical and health purposes only.”

To get around that problem, the center reclassifies mixed-race births using a complex algorithm. For example, a birth ...

December 9, 2010

Does majoring in science make a difference?

Filed under: Analysis,data,Data Analysis,GSS — Razib Khan @ 2:16 pm

On occasion I get queries about what distinguishes people with science backgrounds from those who don’t have science backgrounds. I think an anecdote might illustrate the type of difference one is expecting. Back in undergrad I was having lunch with my lab partner, when a friend saw us and decided to chat with us as we ate. This friend is now an academic, and has a doctorate in a humanistic field (something like Comparative Literature, I forget). In any case, she had read something about transgenic organisms, and obviously felt as if it was the time and place to go on a rant about this. She knew that I was totally comfortable with the idea of transgenic organisms, but she recounted the fish-genes-in-tomato patent story to my lab partner to illustrate how gross the outcome could be. My lab partner was a pre-med math major, and she just shrugged and explained that she’d done biomedical research last summer, so she understood the practical necessity of such methods, and admitted that it would take more than a story about “fish genes” in a tomato to freak her out.

Kevin Drum’s post about the lack of Republican scientists makes me want to revisit the issue of science vs. non-science. I think the lack of Republican scientists is pretty straightforward. There’s the clear cultural gap, as the Republican party emphasizes its conservative Christian component, which turns off libertarian-leaning but secular scientists. And, there’s the reality that agencies like the NSF and NIH are often attacked by fiscal conservatives, and many scientists in academia and government depend on this funding. Sarah Palin’s attack on “fruit fly” research combined the two threads neatly and unfortunately.

In any case, there is a major related variable in the GSS, MAJORCOL. The sample sizes are not the best, but at least it was a recently asked demographic variable, 2006 and 2008. I decided to look at three sets, those with “natural science” degrees, those with “cs & engineering” degrees, and the total pot (inclusive of the first two classes). The last is a snapshot of all those with at least a college degree (the sample is restricted to those who completed their degree).

In the tables below each cell gives a percentage of the row in the column class. So in the first table 79% of CS & engineering degree holders are male. 22% of CS & engineering degree holders are Roman Catholic.

Basic Demographics

Race Religion
Male White Black Other Protestant Catholic No Religion
Natural Science 57 80 5 15 39 24 29
CS & Engineering 79 79 3 18 50 22 18
All Degree Holders 43 86 6 8 44 27 17

Ideology Party
2004 Vote
Liberal Moderate Conserv Dem Ind Rep Yes – Abortion on Demand Bush
Natural Science 43 27 30 47 16 37 70 43
CS & Engineering 30 27 43 37 13 50 54 58
All Degree Holders 33 29 38 48 10 42 52 52

Bible is…. Humans evolved Attitude about GMO food

Word of God Inspired Book of Fables Yes Not concerned Won’t eat Atheist & Agnostic Know God Exists
Natural Science 18 36 44 81 30 4 23 35
CS & Engineering 11 64 24 75 30 8 16 48
All Degree Holders 16 59 23 64 17 27 10 51

Verbal intelligence (WORDSUM vocab test score)
Dull (0-5) Not dull (6-8) Smart (9-10)
Natural Science 8 70 22
CS & Engineering 20 66 14
All Degree Holders 20 57 24

I assume no one is too surprised by these results. Here’s the code for the Majors:

MAJORCOL( r:8,11,24,33,41,51″Natural Science”; 14,18″CS and Engineering”;1-98″Full Sample”)

I counted biology, chemistry, geology, physics and mathematics as natural sciences. Math is probably a stretch. Computer science and engineering were obviously in the second category. Obviously there’s more you could do. For example, 49% of males with natural science degrees voted for George W. Bush in 2004, while 60% of those with cs & engineering degrees did. The total sample for males was 57% for Bush.

Many of the sample sizes are small, but they align with our intuition. Which perhaps makes them less than interesting….

August 23, 2010

Just pushing buttons

Filed under: Analysis,Genetics,PCA,Tools — Razib Khan @ 11:05 pm

Mike the Mad Biologist, whose bailiwick is the domain of the small, asks in the comments:

I don’t mean to bring up a tangential point to the post, but why does the field of human genetics use PCA to visualize relationships? When I see plots like those shown here that have a ‘geometric pattern’ to them (the sharp right angles; another common pattern is a Y-shape), that tells me that there are lots of samples with zeros for many of the Y-variables (i.e., alleles that are unique to certain populations). Thus, the spatial arrangement of the points is largely an artifact of an inappropriate method: how does one calculate a correlation matrix when many of things one is correlating have values of zero?

If one really was keen on using PCA, one could calculate a pairwise distance matrix and then use that instead of the correlation matrix (Principal Coordinates Analysis).

Since I know some human geneticists do read this weblog, I thought it was worth throwing the question out there.

Powered by WordPress