Using Regression in Voting Research: ecological v. individual data

Several students have chosen to explain voting behavior, using data from the States file.

• If you choose to do this, you must pay attention to the level of analysis inherent in that file.
• The states file contains data on people in the aggregate--it reports what proportion of all votes went to Bush.
• Sociology uses the term ecology for the study of groups
• So this type of data is known in social research as ecological data.
• Because the states file has observations on states, not on individual voters, you will be explaining the voting behavior of states, not voters.
• Accordingly, you can make statements like the following:
• The correlation between percent black and percent vote for Reagan in 1984 across all American states was -.56, which indicates that 31% of the variance in the states' votes for Reagan can be explained by the racial composition of the state.
• From the same correlation, you cannot make this statement:
• -- which indicates that 31% of the variance in voting choice for Reagan can be explained by the racial characteristics of American voters.
• If you make such a statement, you are committing an ecological fallacy--making an unsupported generalization from group data to individual behavior.
• If you wish to predict the voting behavior of individual citizens, you must use survey data on individual respondents, such as
• vote00
• vote96
• vote92
• vote88

This example illustrates the ecological fallacy in operation:

Consider this set of [ecological] data for three communities:

r = 1.00
 Community % Republican vote % over \$50,000 A 25 25 B 50 50 C 75 75

Observing that r=1.0, one might be tempted to draw this erroneous conclusion:

There is a perfect relationship between making over \$50,000 and voting Republican.

In truth, the correlation between wealth and Republican vote for individuals
in the same three communities could have been r=-.33

Consider three separate samples of 100 voters taken in each community:

Community A
Community B
Community C

 voted Rep voted Dem Under \$50,000 25 75 50 75% Over \$50,000 25 0 25 25% 25% 75%

 voted Rep voted Dem Under \$50,000 50 50 0 50% Over \$50,000 50 0 50 50% 50% 50%

 voted Rep voted Dem Under \$50,000 25 25 0 25% Over \$50,000 75 50 25 75% 75% 25%

Now let's combine the three samples into one, looking at only the survey data for individuals:

 voted Rep voted Dem Under \$50,000 100 50 150 Over \$50,000 50 100 150 150 150 150
The correlation for these survey data on individuals is r = -.33

CONCLUSION: Ecological correlations are unreliable indicators of individual correlations.