Path: janda.org/c10 > Syllabus > Topic IX: Relationships between Discrete Variables
IX: Topics and Readings: Discrete Variables

Observing relationships and allowing for chance (review)

•  Two types of "chance" to be considered
•  sampling errror (which has served as the model for discussing "significance" so far)
• Occurs when making inferences from samples to populations
• Typically involved when dealing with survey data, such as the vote00 data set.
• The question here: does the population reveal the same relationship between variables as shown in the sample?
• Systematic or random causal processes for non-sample data
• Suppose you're is not using a sample but the entire population, e.g., all the states, members of the U.S. Senate
• Is an observed relationship (e.g., between religion and region) reflective of some systematic causation, or is it merely the result of random casual factors?
• Testing for "statistical significance" regardless of the type of chance considered
• Tests for statistical significance are made against a null hypothesis that asserts that there is NO systematic relationship between the variables .
• The technique of testing against a null hypothesis is a clever foil to get around the problem that statistics can never prove cause, they can only disprove cause.
• If the null hypothesis of no causation is rejected, then some support is lent to the contrary hypothesis of causation.
•  Type I and Type II errors
• Type I and II errors cause even trained statisticians to stop and think; the ideas are not intuitively clear
• Both types of errors refer to probabilities of making wrong decisions:
• rejecting a null hypothesis when it is in fact true
• accepting a null hypothesis when it is in fact false
• Type I error
• Rejecting a true null hypothesis
• Guarded against by demanding a highly significant result
• Controlled by setting an extremely small rejection "area" in the sampling distribution of the statistic.
• This rejection area is called "alpha"and is a probability value
• Type II error
• Accepting a false null hypothesis
• Can't easily be guarded against
• Is called a "beta" error
• Is inversely related to the alpha error

Testing statistical significance using chi-square

•  Chi-square tests for statistical independence between two variables
• The variables may be nominal level or higher, but it is best suited for discrete variables with limited categories.
• Chi-square is sensitive to any departure from chance relationship
• Strictly speaking, chi-square is only a test of significance, not a measure of a relationship between variables
• I'll discuss this more later
• Consider the relationship between two discrete variables from the 2000 National Election Study
• Feelings about the Bible is the dependent variable
• Region of the country is the independent variable

 Crosstabulation: Feeling about the Bible * Region where interview occurred Region where interview occurred Total Feeling about the Bible Northeast North Central South Border Mountain West It's THE WORD OF GOD 73 141 239 47 30 70 600 Don't take LITERALLY 168 243 233 49 49 127 869 IS NOT God's word 59 51 49 11 23 54 247 Total 300 435 521 107 102 251 1716

• Note that 239 of the 521 respondents from the south (46%) say that the Bible is the word of God,
• but only 73 of 300 respondents in the Northeast (24%) agree.
• Are these differences between region and response likely to be due to chance?
• Calculation of chi-square: Formula in Schmidt, page 340, and in the Users' Guide, page 67 (both are the same): Where: O = Observed frequencies, and E = Expected frequencies
• Here is the same table from SPSS 10 as above, but this time "expected" was checked in the "options" box.

 Crosstabulation: Feeling about the Bible * Region where interview occurred Region where interview occurred Total Feeling about the Bible Northeast North Central South Border Mountain West It's THE WORD OF GOD Count 73 141 239 47 30 70 600 Expected 104.9 152.1 182.2 37.4 35.7 87.8 Don't take LITERALLY Count 168 243 233 49 49 127 869 Expected 151.9 220.3 263.8 54.2 51.7 127.1 IS NOT God's word Count 59 51 49 11 23 54 247 Expected 43.2 62.6 75 15.4 14.7 36.1 Total Count 300 435 521 107 102 251 1716

• Here's the chi-square table generated for the table above (you're responsible ONLY for the chi-square line) • Interpretation of chi-square
• The magnitude of the chi-square value must be judged against a table of values of the chi-square distribution:
• One must enter this table using the appropriate degrees of freedom: calculated according to the size of the table:
• X2 degrees of freedom = df
• = (rows - 1) (columns - 1)
• = (3 - 1) (6 - 1)
• = 2 x 5 = 10.
• Given the same degrees of freedom, the larger the chi-square value, the more "significant" it is.
• The entries in the chi-square table for a given chi-square are matched with "alpha" levels at specified levels of significance, e.g., .050 or .010 or .001
• A chi-square as large as 75 for 10 degrees of freedom would be expected by chance fewer than 1 time in 1000.

Chi-square tests for the "significance" of a relationship

• The "level of significance" to be used in making a statistical decision must be set in advance by the researcher--this is the alpha value.
• A chi-square value may be computed for any crosstabulation and checked for significance by comparing it with values in a chi-square table.
• Enter the table in the column for the proper significance level.
• Enter the table in the row for the proper "degrees of freedom".
• If the chi-square computed for the cross-tabulation is larger than the value listed in the table, the relationship is "significant" at the specified "level of significance."

Chi-square itself says nothing about the strength of a relationship, only its significance.

• The magnitude of chi-square is a function of the number of cases, N
• Given the same strength of relationship, increasing the N will increase chi-square:
•  X2 and expected cell frequencies less than 5 : SPSS refers to this in the output
• General rule: the larger the number of cases,
• the easier it is to achieve significance;
• i.e., departure from a non-random pattern
• getting 6 heads in 10 coin flips is likely by chance, but not 60 heads in 100 flips.

The search for a measure of association based on chi-square

•  Desirable characteristics of measure of association:
• Equal 0 when there is no relationship between the variables.
• Equal 1.0 when the relationship is perfect.
• Intermediate values between 0 and 1.0 should be interpretable.
•  Alternative measures adjusting for the number of cases
•  PHI COEFICIENT, • Varies from 0.0 to 1.0
• But suitable only for 2x2 tables
• CONTINGENCY COEFFICIENT, C • Has lower limit of 0
• But does not have upper limit of 1.0
• Intermediate values of C are not mathematically interpretable
•  CRAMER'S V Where: Min (r-1,c-1) is the minimum of the two values

• Has lower limit of 0
• Upper limit of 1.0
• But intermediate values of V are also non-interpretable