The nature and purpose of
analysis of variance
 Analysis of variance is
a generalization of the ttest
 The ttest is used to
test the null hypothesis: µ_{1
}=µ_{2}
 The ttest is
appropriate for only two groups
 Analysis of
variance, and the associated Ftest, can
test for differences between any number of means for
categorized variables.
 Formally, the
null hypothesis is µ_{1} =
µ_{2} = µ_{3} =
µ_{3} ... =
µ_{n}
 Example: CLASS by grade
on 2/3 Examination in C10 Statistics
 TTest results
for a previous class (not this year's class)
 Mean score for
undergraduates = 22.3
 Mean score for
graduates = 26.7
 TTest showed
that this was significant far beyond the .05 level
using a onetailed test.
 Analysis of variance
gives a general test for effect of class on
grade

The underlying structural model of Analysis of
variance
 Each observation in the
combined distribution represents a linear combination of
components:
X_{ij}
= µ_{grand} + a_{j} + e_{i
j}
Where:
a_{j }= effects of group and e_{i
j} = effects of random error
 If the treatment
or "group" effect is 0,
 then observations
will vary around the mean depending on e_{i
j} (errors around the mean)
 these errors are
assumed to be random, normally
distributed, and sum to 0.
 This modelwhich
accounts for individual observations in the
distributioncan be expressed in terms of the total
variation of individual observations from the grand
mean.
 Partitioning the
total variation (also known as sum of
squares, SS):

More
succinctly:

TSS
=

WSS

+
BSS

 Calculation of sums
of squares:



(individual score 
grand mean) = total sum of
squares




(individual score 
group mean) = within group sum of
squares




(group mean  grand
mean) = between group sum of
squares

An intuitive approach to what is going on:
 If the group means were
really different, then the BSS
(explained SS) would be large relative to the
WSS (unexplained SS).
 If the group means were
not much different, the WSS would be large
relative to the BSS.
The uses of BSS and
WSS
 They are measures of the
source of the total variation, and we will eventually use
them to measure the strength of the relationship between
the independent and dependent variable.
 But they have another
purpose: to test the hypothesis that the k groups are
really random samples from equivalent populations
 If they are really
drawn from the same population, then
µ_{1}
= µ_{2} = µ_{3} =
µ_{4} ... = µ_{n} 
as we hypothesized
 But if they really
are drawn from equivalent populations, then
also
1
= 2
= 3
= 4
.... n
 To test if they are from
equivalent populations, we actually test for equality of
estimates of population
variance.
BSS and WSS as tests of
population variance
 Each of these provide
the basis for two independent estimates of the
population variance
 One estimates the
population variance based on the variance
within each of the samples
 The other estimates
the population variance based on variance
between the sample means
 The F test is simply a
ratio between these two estimates of the
population variance:
F
=

estimate of
variance based on between mean
variation

estimate of
variance based on within group
variation

 If this ratio is small,
then the two estimates agree closely, and we conclude
that the groups represent random samples from equivalent
populations: i.e., same means and
variances
How do we get these
estimates of the population variances?
 They must be divided by
the appropriate degrees of freedom
 BSS / k1 =
between groups mean square (read "mean
square" as the mean of the sum of
squares)
 WSS / Nk =
within groups mean square
SPSS ONEWAY analysis of variance for class effect on 2/3
exam (from a class in the 1990s)
ONEWAY EXAM2 BY CLASS (1,5) / STATISTICS = DESCRIPTIVES
ANALYSIS OF VARIANCE
SUM OF MEAN F F
SOURCE D.F. SQUARES SQUARES RATIO PROB.
BETWEEN GROUPS [BSS] 4 406.9775 101.7444 5.4662 .0006 <The payoff!
WITHIN GROUPS [WSS] 84 1563.5169 18.6133
TOTAL [TSS] 88 1970.4944
STANDARD STANDARD
GROUP COUNT MEAN DEVIATION ERROR MINIMUM MAXIMUM
GRP 1 2 19.0000 14.1421 10.0000 9 29
GRP 2 13 23.1538 3.3128 .9188 17 29
GRP 3 31 22.7097 4.6918 .8427 13 31
GRP 4 16 21.3125 2.9148 .7287 17 28
GRP 5 27 26.6667 4.1324 .7953 15 32
TOTAL 89 23.6404 4.7320 .5016 9 32

