Frequency
Distributions

Use of SPSS in
statistical analysis: Frequency
Distributions
The number of variables in
the analysis determines the possible type of
analysis.
 Univariate refers
to the analysis of a single variable.
 Bivariate
indicates analyzing the joint occurrence or
covariation of two variables.
 Multivariate
desccribes the joint covariation of more than two
variables.
Univariate statistics
at the simplest level merely constitutes representations of
distributions, which can be divided into
 FULL
representations of raw values of observations
 GROUPED
representations of raw values in collapsed
categories
 SUMMARY
representations of raw values of observations do not
display complete information about the distribution of
cases.

Full
representations of distributions:
raw counts of observed values

 Frequency
distributions: simple counts of observations
presorted into numerically ordered categories
 These can be produced
with the Frequencies procedure in SPSS, which
lies under the Analyze Menu.

 Frequencies is
suitable for nominalscale variables: i.e., nonorderable
discrete variables
 It is also suitable for
most ordinal variables: i.e., orderable discrete
variables with limited numbers of categories.

 An example:
FREQUENCIES for DIVISION in the
states2000 data:

DIVISION SUBDIVISIONS WITHIN REGIONS Valid Cum Value Label Value Frequency Percent Percent Percent
NEW ENGLAND 1 6 11.5 11.8 11.8 MIDDLE ATLANTIC 2 3 5.8 5.9 17.6 EAST NORTH CENTRAL 3 5 9.6 9.8 27.5 WEST NORTH CENTRAL 4 7 13.5 13.7 41.2 SOUTH ATLANTIC 5 9 17.3 17.6 58.8 EAST SOUTH CENTRAL 6 4 7.7 7.8 66.7 WEST SOUTH CENTRAL 7 4 7.7 7.8 74.5 MOUNTAIN 8 8 15.4 15.7 90.2 PACIFIC 9 5 9.6 9.8 100.0 . 1 1.9 Missing    Total 52 100.0 100.0
Valid cases 51 Missing cases 1

 Frequencies is
even suitable for interval or ratioscaled variables IF
the number categories is not great
 An example:
bill96 (I'm using this in lieu of bush2000,
but the point is the same.
 Strictly speaking,
bill96 is a discrete, orderable variable.
 Practically speaking,
bill96 can be considered a continuous
variablefor it assumes so many values
 or a
ratioscaled variablefor it has intervals
of know width (percentage points) and an absolute
zero (0 votes)
Frequencies for bill96:

BILL96 % of vote for Clinton in 1996 Valid Cum Value Label Value Frequency Percent Percent Percent
33 2 3.8 3.9 3.9 35 1 1.9 2.0 7.8 36 1 1.9 2.0 9.8 37 1 1.9 2.0 11.8 40 2 3.8 3.9 15.7 41 1 1.9 2.0 17.6 42 1 1.9 2.0 19.6 43 2 3.8 3.9 23.5 44 6 11.5 11.8 35.3 45 1 1.9 2.0 37.3 46 2 3.8 3.9 41.2 47 3 5.8 5.9 47.1 48 3 5.8 5.9 52.9 49 3 5.8 5.9 58.8 50 2 3.8 3.9 62.7 51 4 7.7 7.8 70.6 52 5 9.6 9.8 80.4 53 1 1.9 2.0 82.4 54 4 7.7 7.8 90.2 57 1 1.9 2.0 92.2 59 1 1.9 2.0 94.1 60 1 1.9 2.0 96.1 62 1 1.9 2.0 98.0 85 1 1.9 2.0 100.0 . 1 1.9 Missing    Total 52 100.0 100.0

 FREQUENCIES
is not useful for interval or ratioscaled
variables when the number categories is
large.
 An
example: billvote, the number of popular votes
cast for Clinton in 1996, by state
Because each
state cast a different number of votes for Clinton, there
are 51 valuesone for each state

BILLVOTE Total vote for Clinton in 1996 Valid Cum Value Label Value Frequency Percent Percent Percent
66508 1 1.9 2.0 2.0 77897 1 1.9 2.0 3.9 106405 1 1.9 2.0 5.9 138400 1 1.9 2.0 7.8 139295 1 1.9 2.0 9.8 140209 1 1.9 2.0 11.8 152031 1 1.9 2.0 13.7 165545 1 1.9 2.0 15.7 167169 1 1.9 2.0 17.6 203388 1 1.9 2.0 19.6 205012 1 1.9 2.0 21.6 220197 1 1.9 2.0 23.5 220592 1 1.9 2.0 25.5 231906 1 1.9 2.0 27.5 245260 1 1.9 2.0 29.4 252215 1 1.9 2.0 31.4 311092 1 1.9 2.0 33.3 324394 1 1.9 2.0 35.3 326099 1 1.9 2.0 37.3 384399 1 1.9 2.0 39.2 385005 1 1.9 2.0 41.2 469164 1 1.9 2.0 43.1 488102 1 1.9 2.0 45.1 495878 1 1.9 2.0 47.1 612412 1 1.9 2.0 49.0 615732 1 1.9 2.0 51.0 635804 1 1.9 2.0 52.9 664503 1 1.9 2.0 54.9 670854 1 1.9 2.0 56.9 712603 1 1.9 2.0 58.8 874668 1 1.9 2.0 60.8 899645 1 1.9 2.0 62.7 905599 1 1.9 2.0 64.7 924284 1 1.9 2.0 66.7 928983 1 1.9 2.0 68.6 1024817 1 1.9 2.0 70.6 1047214 1 1.9 2.0 72.5 1070990 1 1.9 2.0 74.5 1071859 1 1.9 2.0 76.5 1096355 1 1.9 2.0 78.4 1099132 1 1.9 2.0 80.4 1567223 1 1.9 2.0 82.4 1599932 1 1.9 2.0 84.3 1941126 1 1.9 2.0 86.3 2100690 1 1.9 2.0 88.2 2206241 1 1.9 2.0 90.2 2455735 1 1.9 2.0 94.1 2533502 1 1.9 2.0 96.1 3513191 1 1.9 2.0 98.0 4639935 1 1.9 2.0 100.0 . 1 1.9 Missing    Total 52 100.0 100.0

 This table has little
value, for it simply says that each unique vote cast,
occurs once.
The key point in using
Frequencies and asking for the frequency table is
whether the number of categories is large, with "large"
somewhat a matter of judgment.

GROUPED representations of raw values in collapsed
categories

 Used when the number of
"raw" values is too large for easy
comprehension
 Most typically, grouping
is suitable for continuous or "quasicontinuous"
variables
 Income
 Votes won in
elections
 Population
 Rules for grouping
continuous variables
 The number of
intervals depends on the RANGE of the values between
the low and high scores
 From 6 to 20
intervals usually provides for adequate
variation
 Interval size is
determined by dividing the range by number of
intervals
 Remember that each
interval is determined by its upper and lower TRUE
LIMITS:
 The distance on
the measurement scale actually enclosed by an
interval when grouping data
 The upper true
limit is halfway between the interval's apparent
upper limit and the apparent lower limit of the
nexthigher interval
 Example: Ages
2125, 2630 ... are actually 20.6  25.5 and 25.6
 30.5
 Discrete variables,
whether ordered or not, can be grouped usefully together
when the number of original categories is large:
 Ethnic groups in the
U.S.
 Nations of the world
grouped into regions
 Grouped data are often
displayed in graphs typically involve grouped
data, which have distinct advantages over tables of
numbers.
 Graphs are visually
striking and thus easier to interpret and
remember.
 Whereas numbers must be
processed in digital fashion,
 lLnes and areas can be
interpreted spatially  in analog fashion.
 Good graphs are
timeconsuming to construct by hand, but they can be
generated easily by computers.
 Types of graphs
available under Frequencies in SPSS:
 HISTOGRAMS for
grouped continuous data: bars should
touch.
 BAR GRAPHS for
categorical data: bars should not touch.
 PIE CHARTS
are also for categorical data.
 First, consider
HISTOGRAMS for the two variables, billvote
and pctblack:
 By default, values
are collected into several equal size intervals for
plotting the histogram.



 BARCHART
produces a graph suitable for DISCRETE
variables.
 Consider the example
for the variable DIVISION
 Note the spaces
between the bars.
They suggest that the
values are discrete, not continuous.


A Pie chart for the same variable is more
colorful, but may be too complex for each understanding.

SUMMARY
representations of raw values do not display complete
information about the distribution of cases.

 They provide only a
single value which attempts to summarize the
distribution.
 Because any summary
throws away information, summary measures are necessarily
imperfect.
 The two major classes of
summary measures:
 Measures of
central tendency
 Measures of
dispersion
Both of these will be taken
up later this week
