Path: janda.org/c10 > Syllabus > Topics and Readings > Modeling Relationships > Regression in Voting Research
Using Regression in Voting Research: ecological v. individual data


Several students have chosen to explain voting behavior, using data from the States file.

  • If you choose to do this, you must pay attention to the level of analysis inherent in that file.
    • The states file contains data on people in the aggregate--it reports what proportion of all votes went to Bush.
    • Sociology uses the term ecology for the study of groups
    • So this type of data is known in social research as ecological data.
  • Because the states file has observations on states, not on individual voters, you will be explaining the voting behavior of states, not voters.
    • Accordingly, you can make statements like the following:
      • The correlation between percent black and percent vote for Reagan in 1984 across all American states was -.56, which indicates that 31% of the variance in the states' votes for Reagan can be explained by the racial composition of the state.
    • From the same correlation, you cannot make this statement:
      • -- which indicates that 31% of the variance in voting choice for Reagan can be explained by the racial characteristics of American voters.
  • If you make such a statement, you are committing an ecological fallacy--making an unsupported generalization from group data to individual behavior.
  • If you wish to predict the voting behavior of individual citizens, you must use survey data on individual respondents, such as
    • vote00
    • vote96
    • vote92
    • vote88


This example illustrates the ecological fallacy in operation:

Consider this set of [ecological] data for three communities:

r = 1.00
Community
% Republican vote
% over $50,000
A
25
25
B
50
50
C
75
75

 

Observing that r=1.0, one might be tempted to draw this erroneous conclusion:

There is a perfect relationship between making over $50,000 and voting Republican.


In truth, the correlation between wealth and Republican vote for individuals
in the same three communities could have been r=-.33

Consider three separate samples of 100 voters taken in each community:

Community A
Community B
Community C

voted Rep
voted Dem

Under $50,000

 

25

75

50

75%

Over $50,000

25

0

 

25

25%

25%

75%

voted Rep
voted Dem

Under $50,000

 

50

50

0

50%

Over $50,000

50

0

 

50

50%

50%

50%

voted Rep
voted Dem

Under $50,000

 

25

25

0

25%

Over $50,000

75

50

 

25

75%

75%

25%

Now let's combine the three samples into one, looking at only the survey data for individuals:

voted Rep
voted Dem

Under $50,000

 

100

 

50

 

150

Over $50,000

 

50

 

100

 

150

150

150

150

The correlation for these survey data on individuals is r = -.33


CONCLUSION: Ecological correlations are unreliable indicators of individual correlations.