- Bivariate
Distributions: Strength, Form,
Significance
Statistical methods to use when both variables are
DISCRETE and NONORDERABLE
- Suitable SPSS procedure
under the Analyze Menu and under Descriptive
Statistics is crosstabs
- The "Cells" button in
crosstabs computes percentages for cell entries
based on three different totals:
- by Column
-- appropriate when the independent variable is in
the columns--the usual case
- by Row --
appropriate when the row variable is treated as the
independent variable
- by Total of
all the cases in the table--used only under special
circumstances of analysis
- The "Statistics"
button in crosstabs leads to boxes for several
types of statistics:
- chi-square
for tests of independence between the variables in
the table
- familiar Pearson
bivariate correlations between the
variables
- Measures of
association for nominal data
- contingency
coefficient
- Phi
and Cramer's V
- Lambda
- Uncertain
coefficient (which we did not
cover)
- Measures of
association for ordinal data
- Gamma
- Somer's
d
- Kendall's
Tau-b
- Kendall'es
Tau-c
- A measure of
association for interval data by
nominal classification:
eta
- Other assorted
statistics that we didn't cover.
- Strength of the
relationship can be measured by these measures, none of
which we really studied.
- Lambda, offers
the best PRE interpretation for predicting to
the dependent variable from knowledge of the
independent
- Contingency
Coefficient, C, which is based on chi-square, but
has no operational interpretation, cannot reach 1.0,
and cannot be compared across tables of different
size
- Cramer's V,
another chi-square measure, which can be compared
across tables of different size and ranges between 0
and 1.0 but has no PRE interpretation
- Phi, , yet
another chi-square based measure that does range
between 0 and 1.0 but is suitable only for 2x2
tables
- Form of the
relationship between two nominal variables can be
determined only by inspection of the cell entries to see
where cases cluster and where they are
absent.
- Significance of
the relationship is best determined by
- Chi-square,
X2which measures the
difference between observed and expected cell
frequencies. Significance is tested by entering the
chi-square table for appropriate degrees of freedom.
go
to example
- The
significance of lambda is calculated by SPSS,
using a complex formula for lambda's standard
error.
Statistics for variables that are CONTINUOUS or DISCRETE
ORDERABLE
- Suitable SPSS
procedures
- crosstabs
(requesting the Pearson correlation
coefficient)
- correlate
- scatterplot
under the Graph Menu --> Interactive
- Strength
can be measured by the Pearson product moment
correlation, r, which can be calculated by several
formulas.
- Actually, the
strength of a correlation is expressed by
r2, called the coefficient of
determination.
- It can be interpreted
as the proportion of variance in the dependent
variable that is "explained" by the independent
variable.
- Of course,
correlation does not mean causation, so explanation is
assessed theoretically rather than proved.
- Form can be
measured by the regression equation for the raw
data.
- If the data are in
standardized form, the b coefficient becomes a
standardized beta coefficient and is equal to
the correlation coefficient.
- The intercept
then becomes zero and drops out of the
equation.
- Significance can
be tested easily against the null hypothesis by
calculating a t statistic.
- The test is much more
complicated when a non-zero r is hypothesized, but we
did not take up that situation.
- Then the observed and
expected r's must be transformed into something called
"Fisher's Z" and a different test
used.
- You do not need to
know this procedure.
Statistical methods used when the dependent variable is
CONTINUOUS or ORDERED DISCRETE and the independent variable
is NOMINAL
- Suitable SPSS
procedures under the Analyze Menu, then Compare
Means
- one sample
t-test
- means
- One-Way ANOVA
- Significance can
be determined by the
- t-test, when
the nominal variable is a dichotomy and the
test become only one for the difference between two
means.
- The appropriate
form of the t-test depends on
- the
independence of the two samples, use the
Independent samples T test
- if the cases
are "matched" and then tested, use the
Paired samples T test
- the
"equality" (or "monogeneity") of the
variance for the dependent variable in each
sample
- Levene's
test for unequal variance directs you to
consult
- either the
line for equal variance, which uses a
"pooled" variance estimate
- or the line
for unequal variance, which uses a "separate"
estimate
- usually, the
two estimates will produce similar results
regarding the null hypothesis.
- F-test, which
follows from the analysis of variance as a
generalization of the difference of means test
(t-test) to k groups.
- In fact,
F=t-squared for 1 df between groups.
- The F-test is ratio
of the mean sums of squares calculated between groups
and that calculated within groups.
- Put another way,
it is the ratio of the mean between SS to the mean
within SS.
- Strength can be
measured by eta-squared, which is a measure of the
explained variation (between group SS) divided by
the total variation (total SS). Thus it is
analogous to the product moment correlation and is indeed
equal to it in the special case of a dichotomous variable
being correlated withan interval variable.
- Form can be
determined only by graphing the means of the dependent
variable for each category and seeing which go up and
which go down.
|