Observing relationships and allowing for chance
(review)
- Two types of
"chance" to be considered
- sampling
errror (which has served as the model for
discussing "significance" so far)
- Occurs when making
inferences from samples to populations
- Typically involved
when dealing with survey data, such as the
vote00 data set.
- The question here:
does the population reveal the same
relationship between variables as shown in the
sample?
- Systematic or
random causal processes for non-sample
data
- Suppose you're is
not using a sample but the entire population, e.g.,
all the states, members of the U.S.
Senate
- Is an observed
relationship (e.g., between religion and region)
reflective of some systematic causation, or is it
merely the result of random casual
factors?
- Testing for "statistical
significance" regardless of the type of chance
considered
- Tests for statistical
significance are made against a null hypothesis
that asserts that there is NO systematic relationship
between the variables .
- The technique of
testing against a null hypothesis is a clever foil
to get around the problem that statistics can never
prove cause, they can only disprove
cause.
- If the null
hypothesis of no causation is rejected, then some
support is lent to the contrary hypothesis of
causation.
- Type I and Type
II errors
- Type I and II
errors cause even trained statisticians to stop and
think; the ideas are not intuitively
clear
- Both types of
errors refer to probabilities of making wrong
decisions:
- rejecting a
null hypothesis when it is in fact
true
- accepting a
null hypothesis when it is in fact
false
- Type I error
- Rejecting a
true null hypothesis
- Guarded against by
demanding a highly significant
result
- Controlled by
setting an extremely small rejection "area" in the
sampling distribution of the statistic.
- This rejection
area is called "alpha"and is a probability
value
- Type II error
- Accepting a
false null hypothesis
- Can't easily be
guarded against
- Is called a "beta"
error
- Is inversely
related to the alpha error
Testing statistical significance using
chi-square
- Chi-square tests
for statistical independence between two variables
- The variables may be
nominal level or higher, but it is best suited for
discrete variables with limited
categories.
- Chi-square is
sensitive to any departure from chance
relationship
- Strictly speaking,
chi-square is only a test of significance, not a
measure of a relationship between variables
- I'll discuss this
more later
- Consider the
relationship between two discrete variables from the 2000
National Election Study
- Feelings about the
Bible is the dependent variable
- Region of the country
is the independent variable
Crosstabulation: Feeling about the Bible
* Region where interview occurred
|
|
Region where interview occurred
|
Total
|
Feeling about the Bible
|
Northeast
|
North Central
|
South
|
Border
|
Mountain
|
West
|
|
It's THE WORD OF GOD
|
73
|
141
|
239
|
47
|
30
|
70
|
600
|
Don't take LITERALLY
|
168
|
243
|
233
|
49
|
49
|
127
|
869
|
IS NOT God's word
|
59
|
51
|
49
|
11
|
23
|
54
|
247
|
Total
|
300
|
435
|
521
|
107
|
102
|
251
|
1716
|
- Note that 239 of the 521 respondents from the
south (46%) say that the Bible is the word of
God,
- but only 73 of 300 respondents in the Northeast
(24%) agree.
- Are these differences between region and response
likely to be due to chance?
- Calculation of
chi-square: Formula in Schmidt, page 340, and in the
Users' Guide, page 67 (both are the same):
- Where:
O = Observed frequencies, and E =
Expected frequencies
- Here is the same table
from SPSS 10 as above, but this time "expected" was
checked in the "options" box.
Crosstabulation: Feeling about the Bible
* Region where interview occurred
|
|
|
Region where interview occurred
|
Total
|
Feeling about the Bible
|
|
Northeast
|
North Central
|
South
|
Border
|
Mountain
|
West
|
|
It's THE WORD OF GOD
|
Count
|
73
|
141
|
239
|
47
|
30
|
70
|
600
|
|
Expected
|
104.9
|
152.1
|
182.2
|
37.4
|
35.7
|
87.8
|
|
Don't take LITERALLY
|
Count
|
168
|
243
|
233
|
49
|
49
|
127
|
869
|
|
Expected
|
151.9
|
220.3
|
263.8
|
54.2
|
51.7
|
127.1
|
|
IS NOT God's word
|
Count
|
59
|
51
|
49
|
11
|
23
|
54
|
247
|
|
Expected
|
43.2
|
62.6
|
75
|
15.4
|
14.7
|
36.1
|
|
Total
|
Count
|
300
|
435
|
521
|
107
|
102
|
251
|
1716
|
- Here's the chi-square table generated for the table
above (you're responsible ONLY for the chi-square
line)
- Interpretation of
chi-square
- The magnitude of the
chi-square value must be judged against a table of
values of the chi-square distribution:
- One must enter this
table using the appropriate degrees of freedom:
calculated according to the size of the
table:
- X2
degrees of freedom = df
- = (rows - 1)
(columns - 1)
- = (3 - 1) (6 - 1)
- = 2 x 5 =
10.
- Given the same
degrees of freedom, the larger the chi-square
value, the more "significant" it is.
- The entries in the
chi-square table for a given chi-square are matched
with "alpha" levels at specified levels of
significance, e.g., .050 or .010 or .001
- A chi-square as large
as 75 for 10 degrees of freedom would be expected by
chance fewer than 1 time in 1000.
Chi-square tests for the "significance" of a
relationship
- The "level of
significance" to be used in making a statistical decision
must be set in advance by the researcher--this is the
alpha value.
- A chi-square value may
be computed for any crosstabulation and checked for
significance by comparing it with values in a chi-square
table.
- Enter the table in
the column for the proper significance
level.
- Enter the table in
the row for the proper "degrees of
freedom".
- If the chi-square
computed for the cross-tabulation is larger
than the value listed in the table, the relationship
is "significant" at the specified "level of
significance."
Chi-square itself says nothing about the strength of
a relationship, only its significance.
- The magnitude of
chi-square is a function of the number of cases,
N
- Given the same strength
of relationship, increasing the N will increase
chi-square:
- X2
and expected cell frequencies less than 5 : SPSS
refers to this in the output
- General rule: the
larger the number of cases,
- the easier it is to
achieve significance;
- i.e., departure from
a non-random pattern
- getting 6 heads in
10 coin flips is likely by chance, but not 60 heads
in 100 flips.
The search for a measure of association based on
chi-square
|