Theory,
Measurement, and Univariate
Statistics
I. Theoretical statements
make assertions about broad social and political
processes.
 They are concerned with
the general situation rather than particular
occurrences.
 They involve salient
causal factors, but not the only causal
factors.
 Therefore, they are
probabilistic rather than deterministic
statements.
 Statistical analysis
offers a rigorous methodology for testing such
theoretical statements.
 The first step in the
methodology is to formalize the theory.
 Define the concepts
and interrelate them in an abstract
propositional statement.
 Operationalize
the abstract concepts by linking them to measureable
variables representing the concepts, separated
into
 Dependent
variable: the phenomenon to be explained, and
the
 Independent
variables: the explanatory factors.
 Using these
variables, restate the proposition in the form of a
research hypothesis, preferably a
directional hypothesis.
 nondirectional
hypotheses merely assert that two or more variables
are "related."
 directional
hypotheses specify whether they are positively or
negatively related.
 Formulate a
contradictory or null hypothesis for
testing.
 If the research
hypothesis is nondirectional, the corresponding
null hypothesis requires a twotailed
test.
 If it is
directional, a onetailed test is more
likely to reject the null hypothesis.
II. Measurement in
statistical analysis:
 The process of
operationalizing abstract concepts by linking them to
concrete variables involves some form of
measurement.
 For example, the
concept of "party identification" is measured by
asking questions of respondents in a sample
survey.
 The result is a
sevenpoint scale of "Republicanness" ranging from 0
for "Strong Democrat" to 6 for "Strong
Republican."
 Because measuring a
concept typically involves a series of procedures or
operations, the measurement process is known
as operationalization.
 The types of statistical
analysis that can be performed depends on the types of
measurement used for the dependent and independent
variables.
 There are several
approaches to measurement; here are two of the most
important views:
 S.S. Stevens'
"levels" of measurement:
 nominal:
arbitrary numbers pinned to classes, no magnitudes
intended
 ordinal:
the numbers are orderable in magnitude, but
distances between values is not known
 interval:
the numbers are orderable and the distances are
known, but there is no zero point
 ratio: the
numbers are orderable, distances known, there is a
zero point
 An overlapping
distinction: discrete and continuous
variables:
 discrete variables
can assume only a (few) countable number of
values, which can be orderable or
nonorderable.
 nonorderable
discrete corresponds to "nominal"
measurement.
 orderable
discrete corresponds to "ordinal"
measurement.
 continuous
variables can assume any of an infinite number
of values on a number line.
 Time, which can
be measured in infinitely small units, would
qualify.
 Strictly
speaking, a variable like income, measured in
small but discrete units like cents, would not
be considered a "continuous"
variable.
 Practically
speaking, variables with many "countable"
units (e.g., income) are treated as continuous
and sometimes called
"quasicontinuous."
 Relevance of these
distinctions to statistical analysis:
 The "purist" would
always respect Stevens' four "levels" of measurement
in choosing a statistical test.
 The "pragmatist"
would draw the line of permissible operations between
orderable and nonorderable discrete
variables.
 The pragmatist
would argue that people regularly treat "ordinal"
data as interval anyway  e.g., in computing
GPA.
 Labovitz's
research on alternative scoring schemes for ordinal
variables shows that
  as long
as monotonicity is maintained in the scoring
schemes
  even random
numbers pinned on ordered categories produces
very high intercorrelations among alternative
scoring schemes.
 Much of what passes
for "interval" data in statistical analysis is, in
fact, "ordinal," due to the presence of substantial
measurement error  e.g, city population, military
spending, percent Hispanic.
 In 1980, Chicago
had 3,005,000 people; New York had
7,072,000
 In fact, these
figures are not exact, and the actual population
may be off by tens of thousands of
people.
 So, the
true "interval" between the population of
Chicago and New York is not 4,067,000 but
unknown,
 Like a number
pinned on an ordinal category, the population
interval is somewhere around 4 million.
 Moreover, a great
deal of professional research treats ordinal data as
interval in practice.
 There is a body of
"loglinear" techniques of data analysis that take an
entirely different approach to this problem, but they
are beyond the scope of this course (see David Knoke
and Peter J. Burke, LogLinear Models (Sage
Publications, 1980)).
III. Univariate
statistics
 SPSS Procedures for
computing univariate statistics,
 available under the
Analyze Menu, then under "Descriptive
Statistics"
 Frequencies
for discrete variables
 Gives counts
and percents for each value of
variable.
 A "Graph"
option offers three types of graphs
 Bar
charts for discrete variables
 Pie
charts also for discrete
variables
 histograms
for continuous variables.
 A "statistics"
option offers various summary statistics:
 Three
measures of central tendency: mean, median,
and mode
 Three
measures of dispersion: standard deviation,
variance, and range
 You can
suppress the frequency table for
continuous or "quasicontinuous"
variables, or choose
 Descriptives
for continuous variables
 Summarizing
distributions of single variables
 Measures of
central tendency:
 Mode: most
frequent value, applies to all types of
data
 Median:
value that divides the cases into two halves,
applies to all but nominal data
 Mean: sum
of all values divided by number of cases, strictly
speaking, it applies only to interval and ratio
data, but practically is used for discrete
orderable data also
 Measures of
dispersion:
 Measures for
nominal data:
 "Variation" in
nominal variables is usually determined by
inspecting the frequency distribution.
 Summary
measures of variation are tyically based on
the proportion of cases in the
mode.
 One such
measure is the Variation Ratio
 Easy to understand
measures for continuous variables
 Range
 (Semi)
Interquartile Range
 Average
(mean) deviation
 Harder to understand
measures, but most important ones
 Sums of
squares
 Variance
 Standard
deviation
 Standardizing the
dispersion among variables: dividing raw values by
their means and standard deviations to produce
zscores:
 Computation:
 zscore
=
 Properties of
zscores, they
 have a mean
of 0 and a
 Standard
deviation of 1 (therefore a variance of
1)
IV. Shapes of univariate
distributions
 The normal
distribution
 Bellshaped
 Properties: contains
known % of cases within specified standard
deviations of the mean
± 1
s.d embraces

_____ % of
the cases

± 2
s.d. embraces

_____ % of
the cases

±
3s.d. embraces

_____ % of
the cases

 These % are reported
in tables of areas under the normal curve and
they are used in testing observed statistics against
expectations under the null hypothesis.
 Nonnormal
distributions:
 Asymmetrical
distributions are skewed.
 Defined as
positively or negatively skewed by location of
tail.
 Measured by the
"third moment" of deviation from the
mean.
 Symmetrical
distributions that are too flat (platykurdic)
or too peaked (leptokurdic) are measured by
kurtosis.
 Measured by the
"fourth moment" of deviation from the
mean.
 Both measures of
skewness and kurtosis can be computed
by frequencies and are expressed as
deviations from 0.
 Some types of
nonlinear transformations help "normalize" a
nonnormal distribution.
 Squaring X
helps normalize a distribution skewed to the
left.
 Taking the square
root helps normalize a distribution skewed
right.
 Taking the
logarithm of X also helps normalize a
rightskewed distribution.
