SPSS crosstabs

Path: janda.org/c10 > Syllabus > Topic IX: Relationships between Discrete Variables

IX: Topics and Readings: Discrete Variables

Observing relationships and allowing for chance (review)

Two types of "chance" to be considered
- sampling errror (which has served as the model for discussing "significance" so far)
  - Occurs when making inferences from samples to populations
  - Typically involved when dealing with survey data, such as the vote00 data set.
  - The question here: does the population reveal the same relationship between variables as shown in the sample?
- Systematic or random causal processes for non-sample data
  - Suppose you're is not using a sample but the entire population, e.g., all the states, members of the U.S. Senate
  - Is an observed relationship (e.g., between religion and region) reflective of some systematic causation, or is it merely the result of random casual factors?
Testing for "statistical significance" regardless of the type of chance considered
- Tests for statistical significance are made against a null hypothesis that asserts that there is NO systematic relationship between the variables .
  - The technique of testing against a null hypothesis is a clever foil to get around the problem that statistics can never prove cause, they can only disprove cause.
  - If the null hypothesis of no causation is rejected, then some support is lent to the contrary hypothesis of causation.
- Type I and Type II errors
  - Type I and II errors cause even trained statisticians to stop and think; the ideas are not intuitively clear
  - Both types of errors refer to probabilities of making wrong decisions:
    - rejecting a null hypothesis when it is in fact true
    - accepting a null hypothesis when it is in fact false
- Type I error
  - Rejecting a true null hypothesis
  - Guarded against by demanding a highly significant result
  - Controlled by setting an extremely small rejection "area" in the sampling distribution of the statistic.
  - This rejection area is called "alpha"and is a probability value
- Type II error
  - Accepting a false null hypothesis
  - Can't easily be guarded against
  - Is called a "beta" error
  - Is inversely related to the alpha error

Testing statistical significance using chi-square

Chi-square tests for statistical independence between two variables
- The variables may be nominal level or higher, but it is best suited for discrete variables with limited categories.
- Chi-square is sensitive to any departure from chance relationship
- Strictly speaking, chi-square is only a test of significance, not a measure of a relationship between variables
  - I'll discuss this more later
Consider the relationship between two discrete variables from the 2000 National Election Study
- Feelings about the Bible is the dependent variable
- Region of the country is the independent variable

Crosstabulation: Feeling about the Bible * Region where interview occurred

Region where interview occurred

Total

Feeling about the Bible

Northeast

North Central

South

Border

Mountain

West

It's THE WORD OF GOD

73

141

239

47

30

70

600

Don't take LITERALLY

168

243

233

49

49

127

869

IS NOT God's word

59

51

49

11

23

54

247

Total

300

435

521

107

102

251

1716

Note that 239 of the 521 respondents from the south (46%) say that the Bible is the word of God,

but only 73 of 300 respondents in the Northeast (24%) agree.

Are these differences between region and response likely to be due to chance?

Calculation of chi-square: Formula in Schmidt, page 340, and in the Users' Guide, page 67 (both are the same):

Where: O = Observed frequencies, and E = Expected frequencies
Here is the same table from SPSS 10 as above, but this time "expected" was checked in the "options" box.

Crosstabulation: Feeling about the Bible * Region where interview occurred

Region where interview occurred

Total

Feeling about the Bible

Northeast

North Central

South

Border

Mountain

West

It's THE WORD OF GOD

Count

73

141

239

47

30

70

600

Expected

104.9

152.1

182.2

37.4

35.7

87.8

Don't take LITERALLY

Count

168

243

233

49

49

127

869

Expected

151.9

220.3

263.8

54.2

51.7

127.1

IS NOT God's word

Count

59

51

49

11

23

54

247

Expected

43.2

62.6

75

15.4

14.7

36.1

Total

Count

300

435

521

107

102

251

1716

Here's the chi-square table generated for the table above (you're responsible ONLY for the chi-square line)

Interpretation of chi-square
- The magnitude of the chi-square value must be judged against a table of values of the chi-square distribution:
- One must enter this table using the appropriate degrees of freedom: calculated according to the size of the table:
- X² degrees of freedom = df
  - = (rows - 1) (columns - 1)
  - = (3 - 1) (6 - 1)
  - = 2 x 5 = 10.
- Given the same degrees of freedom, the larger the chi-square value, the more "significant" it is.
- The entries in the chi-square table for a given chi-square are matched with "alpha" levels at specified levels of significance, e.g., .050 or .010 or .001
- A chi-square as large as 75 for 10 degrees of freedom would be expected by chance fewer than 1 time in 1000.

Chi-square tests for the "significance" of a relationship

The "level of significance" to be used in making a statistical decision must be set in advance by the researcher--this is the alpha value.
A chi-square value may be computed for any crosstabulation and checked for significance by comparing it with values in a chi-square table.
- Enter the table in the column for the proper significance level.
- Enter the table in the row for the proper "degrees of freedom".
- If the chi-square computed for the cross-tabulation is larger than the value listed in the table, the relationship is "significant" at the specified "level of significance."

Chi-square itself says nothing about the strength of a relationship, only its significance.

The magnitude of chi-square is a function of the number of cases, N
Given the same strength of relationship, increasing the N will increase chi-square:
- X² and expected cell frequencies less than 5 : SPSS refers to this in the output
General rule: the larger the number of cases,
- the easier it is to achieve significance;
- i.e., departure from a non-random pattern
  - getting 6 heads in 10 coin flips is likely by chance, but not 60 heads in 100 flips.

The search for a measure of association based on chi-square

Desirable characteristics of measure of association:
- Equal 0 when there is no relationship between the variables.
- Equal 1.0 when the relationship is perfect.
- Intermediate values between 0 and 1.0 should be interpretable.
Alternative measures adjusting for the number of cases
- PHI COEFICIENT,
- Varies from 0.0 to 1.0
- But suitable only for 2x2 tables
CONTINGENCY COEFFICIENT, C
Has lower limit of 0
But does not have upper limit of 1.0
Intermediate values of C are not mathematically interpretable
CRAMER'S V
Where: Min (r-1,c-1) is the minimum of the two values
- Has lower limit of 0
- Upper limit of 1.0
- But intermediate values of V are also non-interpretable