 What is
a "normal" distribution?
 The
distribution of equallylikely events in the long run
through accidents of nature.
 In an
infinite amount of time, a random process could
ultimately generate structured results: e.g., a group
of monkeys seated at typewriters could peck out all
the great works of literature.
 This
would be an extremely rare event, but it is
conceivable.
 The
normal curve is a mathematical formula that assigns
probabilities to the occurrence of rare events.
 Statistically
speaking, it is a probability distribution for a
continuous random variable:
 The
ordinate represents the probability density for the
occurrence of a value.
 The
baseline represents the values.
 The
exact shape of the curve is given by a complicated
formula that you do NOT need to know.
 The
area under the curve is interpreted as representing
all occurrences of the variable, X.
 We
can consider the area as representing 100 PERCENT
of the occurrences; in PROPORTIONS this is
expressed as 1.0.
 We
can then interpret AREAS under the curve as
representing certain PROPORTIONS of occurrences or
"probabilities".
 We
cannot assign a probability to any point, but we
can attach probabilities to INTERVALS on the
baseline associated with AREAS under the curve:
e.g., the mean has 50% of the cases standing to
each side.


Special
properties of the normal distribution:
 Their
shape is such that it
 Embraces
68.26% of the cases within 1 s.d.
around the mean.
 Embraces
95.46% of the cases within 2 s.d.
around the mean.
 Embraces
99.74% of the cases within 3 s.d.
around the mean.
 More
roughly speaking, 68%, 95%, and 99% of
the cases are embraced within 1, 2, and
3 standard deviations from the mean in a normal
distribution.
 Determining
whether a distribution is "normal"
 The
"Eyeball" test
 Is
the distribution unimodal?
 Is
the distribution symmetrical?
 More
exacting mathematical tests: measured according to
"moments" or "deviations" from the mean
 FIRST
MOMENT:
= 0 <you know this
already
 SECOND
MOMENT:
= variance < you had this
 THIRD
MOMENT:
= skewness
 Formula
calculated within SPSS
 Positive
values greater than 0 mean
rightskew
 FOURTH
MOMENT:
 3 = kurtosis
 Positive
values means more peaked (LEPTOKURDIC)
than the normal curve
 Negative
values means flatter (PLATYKURDIC)
 If
skewness and kurtosis values tend toward 0, then the
distribution approximates a normal
distribution.
 Suppose
the distribution is not normal?
 No
matter how the original observations are
distributed, the mean plus or minus two standard
deviations will include at least 75% of the
observations.
 No
matter how the original observations are
distributed, the mean plus or minus three standard
deviations witll include 89% or more. (Freeman, p.
62)


Using the Table of Areas under the Normal Curve: The
zscore
 One
determines the probability of occurrence of a random
event in a normal distribution by consulting a
tables
of areas under a normal curve (e.g., Table
D.2, pp.702705 in Kirk).
 Tables
of the normal curve have a mean of 0 and a standard
deviation of 1.
 To use
the table., you must convert your data to have a mean
of 0 and standard deviation of 1.
 This is
done by transforming your raw values into zscores,
according to this formula:
 zscore =
Example:
percent black in Washington, D.C. in 1980
(Note: not 1990)

D.C's
Raw score = 70.3



Mean
for all states = 10.3




(70.3
 10.3) = 60

60 / 12.5 = 4.8


standard
deviation = 12.5



zscore
for D.C. = 4.8

Example:
D.C.'s percent vote for Reagan in
1984

D.C's
Raw score = 13



Mean
for all states = 60




(13
 60) = 47

47 / 8.8 = 5.3


standard
deviation = 8.8



zscore
for D.C. = 5.3

Comparison
with Florida: percent vote for Reagan in
1984

Florida's
percent black is 13.8



zscore
= (13.8  10.3) / 12.5 = .28

Florida's
percent for Reagan was 65



zscore
= (65  60) / 8.8 = .57


 Computing
zscores for raw data Transforming raw
standardizes the data, which makes it easier to
compare values in different distributions.

 Limitations
of raw data values
 Raw
scores of individual cases do not disclose how they
vary from the central tendency of the
distribution
 One
needs to know also the mean of the distribution and
its variability to determine if any given score is
"far" from the mean
 Properties
of the zscore transformation
 It is a
LINEAR transformation: does not alter relative
positions of observations in the distribution nor
change the shape of the original
distribution.

 The
transformed observations have positive AND negative
DECIMAL values expressed in STANDARD DEVIATION
UNITS.
 The
SIGN of the zscore tells whether the observation
is above or below the mean.
 The
VALUE of the zscore tells how far above or below
it is.
 When transformed into zscores, all distributions are
standardized
 The
mean of the transformed distribution is equal to
0.
 The
standard deviation of the distribution is equal to
1.
 The
variance of the distribution is equal to 1. (Old
Chinese proverb.)
 When
subjected to a zscore transformation, any set of raw
scores that conform to a normal distribution, will
conform exactly to the table of areas under the normal
curve.
 That
is, the likelihood of observing zscores of certain
magnitudes can be read directly from Table D.2.
 Using
zscores to
read a
table of areas under the normal
curve
