Observing relationships and allowing for chance
(review)
 Two types of
"chance" to be considered
 sampling
errror (which has served as the model for
discussing "significance" so far)
 Occurs when making
inferences from samples to populations
 Typically involved
when dealing with survey data, such as the
vote00 data set.
 The question here:
does the population reveal the same
relationship between variables as shown in the
sample?
 Systematic or
random causal processes for nonsample
data
 Suppose you're is
not using a sample but the entire population, e.g.,
all the states, members of the U.S.
Senate
 Is an observed
relationship (e.g., between religion and region)
reflective of some systematic causation, or is it
merely the result of random casual
factors?
 Testing for "statistical
significance" regardless of the type of chance
considered
 Tests for statistical
significance are made against a null hypothesis
that asserts that there is NO systematic relationship
between the variables .
 The technique of
testing against a null hypothesis is a clever foil
to get around the problem that statistics can never
prove cause, they can only disprove
cause.
 If the null
hypothesis of no causation is rejected, then some
support is lent to the contrary hypothesis of
causation.
 Type I and Type
II errors
 Type I and II
errors cause even trained statisticians to stop and
think; the ideas are not intuitively
clear
 Both types of
errors refer to probabilities of making wrong
decisions:
 rejecting a
null hypothesis when it is in fact
true
 accepting a
null hypothesis when it is in fact
false
 Type I error
 Rejecting a
true null hypothesis
 Guarded against by
demanding a highly significant
result
 Controlled by
setting an extremely small rejection "area" in the
sampling distribution of the statistic.
 This rejection
area is called "alpha"and is a probability
value
 Type II error
 Accepting a
false null hypothesis
 Can't easily be
guarded against
 Is called a "beta"
error
 Is inversely
related to the alpha error
Testing statistical significance using
chisquare
 Chisquare tests
for statistical independence between two variables
 The variables may be
nominal level or higher, but it is best suited for
discrete variables with limited
categories.
 Chisquare is
sensitive to any departure from chance
relationship
 Strictly speaking,
chisquare is only a test of significance, not a
measure of a relationship between variables
 I'll discuss this
more later
 Consider the
relationship between two discrete variables from the 2000
National Election Study
 Feelings about the
Bible is the dependent variable
 Region of the country
is the independent variable
Crosstabulation: Feeling about the Bible
* Region where interview occurred


Region where interview occurred

Total

Feeling about the Bible

Northeast

North Central

South

Border

Mountain

West


It's THE WORD OF GOD

73

141

239

47

30

70

600

Don't take LITERALLY

168

243

233

49

49

127

869

IS NOT God's word

59

51

49

11

23

54

247

Total

300

435

521

107

102

251

1716

 Note that 239 of the 521 respondents from the
south (46%) say that the Bible is the word of
God,
 but only 73 of 300 respondents in the Northeast
(24%) agree.
 Are these differences between region and response
likely to be due to chance?
 Calculation of
chisquare: Formula in Schmidt, page 340, and in the
Users' Guide, page 67 (both are the same):
 Where:
O = Observed frequencies, and E =
Expected frequencies
 Here is the same table
from SPSS 10 as above, but this time "expected" was
checked in the "options" box.
Crosstabulation: Feeling about the Bible
* Region where interview occurred



Region where interview occurred

Total

Feeling about the Bible


Northeast

North Central

South

Border

Mountain

West


It's THE WORD OF GOD

Count

73

141

239

47

30

70

600


Expected

104.9

152.1

182.2

37.4

35.7

87.8


Don't take LITERALLY

Count

168

243

233

49

49

127

869


Expected

151.9

220.3

263.8

54.2

51.7

127.1


IS NOT God's word

Count

59

51

49

11

23

54

247


Expected

43.2

62.6

75

15.4

14.7

36.1


Total

Count

300

435

521

107

102

251

1716

 Here's the chisquare table generated for the table
above (you're responsible ONLY for the chisquare
line)
 Interpretation of
chisquare
 The magnitude of the
chisquare value must be judged against a table of
values of the chisquare distribution:
 One must enter this
table using the appropriate degrees of freedom:
calculated according to the size of the
table:
 X^{2}
degrees of freedom = df
 = (rows  1)
(columns  1)
 = (3  1) (6  1)
 = 2 x 5 =
10.
 Given the same
degrees of freedom, the larger the chisquare
value, the more "significant" it is.
 The entries in the
chisquare table for a given chisquare are matched
with "alpha" levels at specified levels of
significance, e.g., .050 or .010 or .001
 A chisquare as large
as 75 for 10 degrees of freedom would be expected by
chance fewer than 1 time in 1000.
Chisquare tests for the "significance" of a
relationship
 The "level of
significance" to be used in making a statistical decision
must be set in advance by the researcherthis is the
alpha value.
 A chisquare value may
be computed for any crosstabulation and checked for
significance by comparing it with values in a chisquare
table.
 Enter the table in
the column for the proper significance
level.
 Enter the table in
the row for the proper "degrees of
freedom".
 If the chisquare
computed for the crosstabulation is larger
than the value listed in the table, the relationship
is "significant" at the specified "level of
significance."
Chisquare itself says nothing about the strength of
a relationship, only its significance.
 The magnitude of
chisquare is a function of the number of cases,
N
 Given the same strength
of relationship, increasing the N will increase
chisquare:
 X^{2}
and expected cell frequencies less than 5 : SPSS
refers to this in the output
 General rule: the
larger the number of cases,
 the easier it is to
achieve significance;
 i.e., departure from
a nonrandom pattern
 getting 6 heads in
10 coin flips is likely by chance, but not 60 heads
in 100 flips.
The search for a measure of association based on
chisquare
