Theory,
Measurement, and Univariate
Statistics
I. Theoretical statements
make assertions about broad social and political
processes.
- They are concerned with
the general situation rather than particular
occurrences.
- They involve salient
causal factors, but not the only causal
factors.
- Therefore, they are
probabilistic rather than deterministic
statements.
- Statistical analysis
offers a rigorous methodology for testing such
theoretical statements.
- The first step in the
methodology is to formalize the theory.
- Define the concepts
and interrelate them in an abstract
propositional statement.
- Operationalize
the abstract concepts by linking them to measureable
variables representing the concepts, separated
into
- Dependent
variable: the phenomenon to be explained, and
the
- Independent
variables: the explanatory factors.
- Using these
variables, restate the proposition in the form of a
research hypothesis, preferably a
directional hypothesis.
- nondirectional
hypotheses merely assert that two or more variables
are "related."
- directional
hypotheses specify whether they are positively or
negatively related.
- Formulate a
contradictory or null hypothesis for
testing.
- If the research
hypothesis is nondirectional, the corresponding
null hypothesis requires a two-tailed
test.
- If it is
directional, a one-tailed test is more
likely to reject the null hypothesis.
II. Measurement in
statistical analysis:
- The process of
operationalizing abstract concepts by linking them to
concrete variables involves some form of
measurement.
- For example, the
concept of "party identification" is measured by
asking questions of respondents in a sample
survey.
- The result is a
seven-point scale of "Republicanness" ranging from 0
for "Strong Democrat" to 6 for "Strong
Republican."
- Because measuring a
concept typically involves a series of procedures or
operations, the measurement process is known
as operationalization.
- The types of statistical
analysis that can be performed depends on the types of
measurement used for the dependent and independent
variables.
- There are several
approaches to measurement; here are two of the most
important views:
- S.S. Stevens'
"levels" of measurement:
- nominal:
arbitrary numbers pinned to classes, no magnitudes
intended
- ordinal:
the numbers are orderable in magnitude, but
distances between values is not known
- interval:
the numbers are orderable and the distances are
known, but there is no zero point
- ratio: the
numbers are orderable, distances known, there is a
zero point
- An overlapping
distinction: discrete and continuous
variables:
- discrete variables
can assume only a (few) countable number of
values, which can be orderable or
nonorderable.
- nonorderable
discrete corresponds to "nominal"
measurement.
- orderable
discrete corresponds to "ordinal"
measurement.
- continuous
variables can assume any of an infinite number
of values on a number line.
- Time, which can
be measured in infinitely small units, would
qualify.
- Strictly
speaking, a variable like income, measured in
small but discrete units like cents, would not
be considered a "continuous"
variable.
- Practically
speaking, variables with many "countable"
units (e.g., income) are treated as continuous
and sometimes called
"quasi-continuous."
- Relevance of these
distinctions to statistical analysis:
- The "purist" would
always respect Stevens' four "levels" of measurement
in choosing a statistical test.
- The "pragmatist"
would draw the line of permissible operations between
orderable and nonorderable discrete
variables.
- The pragmatist
would argue that people regularly treat "ordinal"
data as interval anyway -- e.g., in computing
GPA.
- Labovitz's
research on alternative scoring schemes for ordinal
variables shows that
- -- as long
as monotonicity is maintained in the scoring
schemes
- -- even random
numbers pinned on ordered categories produces
very high intercorrelations among alternative
scoring schemes.
- Much of what passes
for "interval" data in statistical analysis is, in
fact, "ordinal," due to the presence of substantial
measurement error -- e.g, city population, military
spending, percent Hispanic.
- In 1980, Chicago
had 3,005,000 people; New York had
7,072,000
- In fact, these
figures are not exact, and the actual population
may be off by tens of thousands of
people.
- So, the
true "interval" between the population of
Chicago and New York is not 4,067,000 but
unknown,
- Like a number
pinned on an ordinal category, the population
interval is somewhere around 4 million.
- Moreover, a great
deal of professional research treats ordinal data as
interval in practice.
- There is a body of
"log-linear" techniques of data analysis that take an
entirely different approach to this problem, but they
are beyond the scope of this course (see David Knoke
and Peter J. Burke, Log-Linear Models (Sage
Publications, 1980)).
III. Univariate
statistics
- SPSS Procedures for
computing univariate statistics,
- available under the
Analyze Menu, then under "Descriptive
Statistics"
- Frequencies
for discrete variables
- Gives counts
and percents for each value of
variable.
- A "Graph"
option offers three types of graphs
- Bar
charts for discrete variables
- Pie
charts also for discrete
variables
- histograms
for continuous variables.
- A "statistics"
option offers various summary statistics:
- Three
measures of central tendency: mean, median,
and mode
- Three
measures of dispersion: standard deviation,
variance, and range
- You can
suppress the frequency table for
continuous or "quasi-continuous"
variables, or choose
- Descriptives
for continuous variables
- Summarizing
distributions of single variables
- Measures of
central tendency:
- Mode: most
frequent value, applies to all types of
data
- Median:
value that divides the cases into two halves,
applies to all but nominal data
- Mean: sum
of all values divided by number of cases, strictly
speaking, it applies only to interval and ratio
data, but practically is used for discrete
orderable data also
- Measures of
dispersion:
- Measures for
nominal data:
- "Variation" in
nominal variables is usually determined by
inspecting the frequency distribution.
- Summary
measures of variation are tyically based on
the proportion of cases in the
mode.
- One such
measure is the Variation Ratio
- Easy to understand
measures for continuous variables
- Range
- (Semi-)
Interquartile Range
- Average
(mean) deviation
- Harder to understand
measures, but most important ones
- Sums of
squares
- Variance
- Standard
deviation
- Standardizing the
dispersion among variables: dividing raw values by
their means and standard deviations to produce
z-scores:
- Computation:
- z-score
=
- Properties of
z-scores, they
- have a mean
of 0 and a
- Standard
deviation of 1 (therefore a variance of
1)
IV. Shapes of univariate
distributions
- The normal
distribution
- Bell-shaped
- Properties: contains
known % of cases within specified standard
deviations of the mean
± 1
s.d embraces
|
_____ % of
the cases
|
± 2
s.d. embraces
|
_____ % of
the cases
|
±
3s.d. embraces
|
_____ % of
the cases
|
- These % are reported
in tables of areas under the normal curve and
they are used in testing observed statistics against
expectations under the null hypothesis.
- Non-normal
distributions:
- Asymmetrical
distributions are skewed.
- Defined as
positively or negatively skewed by location of
tail.
- Measured by the
"third moment" of deviation from the
mean.
- Symmetrical
distributions that are too flat (platykurdic)
or too peaked (leptokurdic) are measured by
kurtosis.
- Measured by the
"fourth moment" of deviation from the
mean.
- Both measures of
skewness and kurtosis can be computed
by frequencies and are expressed as
deviations from 0.
- Some types of
nonlinear transformations help "normalize" a
non-normal distribution.
- Squaring X
helps normalize a distribution skewed to the
left.
- Taking the square
root helps normalize a distribution skewed
right.
- Taking the
logarithm of X also helps normalize a
right-skewed distribution.
|