Path: janda.org/c10 > Syllabus > Review: Univariate Statistics
Theory, Measurement, and Univariate Statistics

• They are concerned with the general situation rather than particular occurrences.
• They involve salient causal factors, but not the only causal factors.
• Therefore, they are probabilistic rather than deterministic statements.
• Statistical analysis offers a rigorous methodology for testing such theoretical statements.
• The first step in the methodology is to formalize the theory.
• Define the concepts and interrelate them in an abstract propositional statement.
• Operationalize the abstract concepts by linking them to measureable variables representing the concepts, separated into
• Dependent variable: the phenomenon to be explained, and the
• Independent variables: the explanatory factors.
• Using these variables, restate the proposition in the form of a research hypothesis, preferably a directional hypothesis.
• nondirectional hypotheses merely assert that two or more variables are "related."
• directional hypotheses specify whether they are positively or negatively related.
• Formulate a contradictory or null hypothesis for testing.
• If the research hypothesis is nondirectional, the corresponding null hypothesis requires a two-tailed test.
• If it is directional, a one-tailed test is more likely to reject the null hypothesis.

II. Measurement in statistical analysis:

• The process of operationalizing abstract concepts by linking them to concrete variables involves some form of measurement.
• For example, the concept of "party identification" is measured by asking questions of respondents in a sample survey.
• The result is a seven-point scale of "Republicanness" ranging from 0 for "Strong Democrat" to 6 for "Strong Republican."
• Because measuring a concept typically involves a series of procedures or operations, the measurement process is known as operationalization.
• The types of statistical analysis that can be performed depends on the types of measurement used for the dependent and independent variables.
• There are several approaches to measurement; here are two of the most important views:
• S.S. Stevens' "levels" of measurement:
• nominal: arbitrary numbers pinned to classes, no magnitudes intended
• ordinal: the numbers are orderable in magnitude, but distances between values is not known
• interval: the numbers are orderable and the distances are known, but there is no zero point
• ratio: the numbers are orderable, distances known, there is a zero point
• An overlapping distinction: discrete and continuous variables:
• discrete variables can assume only a (few) countable number of values, which can be orderable or nonorderable.
• nonorderable discrete corresponds to "nominal" measurement.
• orderable discrete corresponds to "ordinal" measurement.
• continuous variables can assume any of an infinite number of values on a number line.
• Time, which can be measured in infinitely small units, would qualify.
• Strictly speaking, a variable like income, measured in small but discrete units like cents, would not be considered a "continuous" variable.
• Practically speaking, variables with many "countable" units (e.g., income) are treated as continuous and sometimes called "quasi-continuous."
• Relevance of these distinctions to statistical analysis:
• The "purist" would always respect Stevens' four "levels" of measurement in choosing a statistical test.
• The "pragmatist" would draw the line of permissible operations between orderable and nonorderable discrete variables.
• The pragmatist would argue that people regularly treat "ordinal" data as interval anyway -- e.g., in computing GPA.
• Labovitz's research on alternative scoring schemes for ordinal variables shows that
• -- as long as monotonicity is maintained in the scoring schemes
• -- even random numbers pinned on ordered categories produces very high intercorrelations among alternative scoring schemes.
• Much of what passes for "interval" data in statistical analysis is, in fact, "ordinal," due to the presence of substantial measurement error -- e.g, city population, military spending, percent Hispanic.
• In fact, these figures are not exact, and the actual population may be off by tens of thousands of people.
• So, the true "interval" between the population of Chicago and New York is not 4,067,000 but unknown,
• Like a number pinned on an ordinal category, the population interval is somewhere around 4 million.
• Moreover, a great deal of professional research treats ordinal data as interval in practice.
• There is a body of "log-linear" techniques of data analysis that take an entirely different approach to this problem, but they are beyond the scope of this course (see David Knoke and Peter J. Burke, Log-Linear Models (Sage Publications, 1980)).

III. Univariate statistics

• SPSS Procedures for computing univariate statistics,
• available under the Analyze Menu, then under "Descriptive Statistics"
• Frequencies for discrete variables
• Gives counts and percents for each value of variable.
• A "Graph" option offers three types of graphs
• Bar charts for discrete variables
• Pie charts also for discrete variables
• histograms for continuous variables.
• A "statistics" option offers various summary statistics:
• Three measures of central tendency: mean, median, and mode
• Three measures of dispersion: standard deviation, variance, and range
• You can suppress the frequency table for continuous or "quasi-continuous" variables, or choose
• Descriptives for continuous variables
• Summarizing distributions of single variables
• Measures of central tendency:
• Mode: most frequent value, applies to all types of data
• Median: value that divides the cases into two halves, applies to all but nominal data
• Mean: sum of all values divided by number of cases, strictly speaking, it applies only to interval and ratio data, but practically is used for discrete orderable data also
• Measures of dispersion:
• Measures for nominal data:
• "Variation" in nominal variables is usually determined by inspecting the frequency distribution.
• Summary measures of variation are tyically based on the proportion of cases in the mode.
• One such measure is the Variation Ratio

• Easy to understand measures for continuous variables
• Range

• (Semi-) Interquartile Range

• Average (mean) deviation

• Harder to understand measures, but most important ones
• Sums of squares

•  Variance

• Standard deviation

• Standardizing the dispersion among variables: dividing raw values by their means and standard deviations to produce z-scores:
• Computation:
z-score =

• Properties of z-scores, they
• have a mean of 0 and a
• Standard deviation of 1 (therefore a variance of 1)

IV. Shapes of univariate distributions

• The normal distribution
• Bell-shaped
• symmetrical
• unimodal
• Properties: contains known % of cases within specified standard deviations of the mean
 ± 1 s.d embraces _____ % of the cases ± 2 s.d. embraces _____ % of the cases ± 3s.d. embraces _____ % of the cases
• These % are reported in tables of areas under the normal curve and they are used in testing observed statistics against expectations under the null hypothesis.
• Non-normal distributions:
• Asymmetrical distributions are skewed.
• Defined as positively or negatively skewed by location of tail.
• Measured by the "third moment" of deviation from the mean.
• Symmetrical distributions that are too flat (platykurdic) or too peaked (leptokurdic) are measured by kurtosis.
• Measured by the "fourth moment" of deviation from the mean.
• Both measures of skewness and kurtosis can be computed by frequencies and are expressed as deviations from 0.
• Some types of nonlinear transformations help "normalize" a non-normal distribution.
• Squaring X helps normalize a distribution skewed to the left.
• Taking the square root helps normalize a distribution skewed right.
• Taking the logarithm of X also helps normalize a right-skewed distribution.