Path: janda.org/c10 > Syllabus > Topics and Readings >Testing Hypotheses: One Nominal and One Interval Variable
> Analysis of Variance
Introduction to Analysis of Variance

The nature and purpose of analysis of variance 

  • Analysis of variance is a generalization of the t-test
    • The t-test is used to test the null hypothesis: µ1 2
    • The t-test is appropriate for only two groups 
    • Analysis of variance, and the associated F-test, can test for differences between any number of means for categorized variables.
    •  Formally, the null hypothesis is µ1 = µ2 = µ3 = µ3 ... = µn 
  • Example: CLASS by grade on 2/3 Examination in C10 Statistics
    • T-Test results for a previous class (not this year's class)
      • Mean score for undergraduates = 22.3
      • Mean score for graduates = 26.7
    • T-Test showed that this was significant far beyond the .05 level using a one-tailed test.
  • Analysis of variance gives a general test for effect of class on grade 

The underlying structural model of Analysis of variance 
  • Each observation in the combined distribution represents a linear combination of components: 
    Xij = µgrand + aj + ei j
    Where:  aj = effects of group and ei j = effects of random error 
  • If the treatment or "group" effect is 0,
    • then observations will vary around the mean depending on ei j (errors around the mean)
    • these errors are assumed to be random, normally distributed, and sum to 0.
  •  This model--which accounts for individual observations in the distribution--can be expressed in terms of the total variation of individual observations from the grand mean. 
  •  Partitioning the total variation (also known as sum of squares, SS):

More succinctly:

TSS =

WSS

+ BSS

  • Calculation of sums of squares:

(individual score - grand mean) = total sum of squares

(individual score - group mean) = within group sum of squares

(group mean - grand mean) = between group sum of squares


An intuitive approach to what is going on:

  • If the group means were really different, then the BSS (explained SS) would be large relative to the WSS (unexplained SS).
  • If the group means were not much different, the WSS would be large relative to the BSS.

 The uses of BSS and WSS

  • They are measures of the source of the total variation, and we will eventually use them to measure the strength of the relationship between the independent and dependent variable.
  • But they have another purpose: to test the hypothesis that the k groups are really random samples from equivalent populations
    • If they are really drawn from the same population, then
      µ1 = µ2 = µ3 = µ4 ... = µn -- as we hypothesized 
    • But if they really are drawn from equivalent populations, then also
      1 = 2 = 3 = 4 ....
  • To test if they are from equivalent populations, we actually test for equality of estimates of population variance

BSS and WSS as tests of population variance 

  • Each of these provide the basis for two independent estimates of the population variance 
    • One estimates the population variance based on the variance within each of the samples
    • The other estimates the population variance based on variance between the sample means 
  • The F test is simply a ratio between these two estimates of the population variance: 

F =

estimate of variance based on between mean variation

estimate of variance based on within group variation 

  • If this ratio is small, then the two estimates agree closely, and we conclude that the groups represent random samples from equivalent populations: i.e., same means and variances 

How do we get these estimates of the population variances? 

  • They must be divided by the appropriate degrees of freedom
    • BSS / k-1 = between groups mean square (read "mean square" as the mean of the sum of squares)
    • WSS / N-k = within groups mean square  


SPSS ONEWAY analysis of variance for class effect on 2/3 exam (from a class in the 1990s)

 
ONEWAY EXAM2 BY CLASS (1,5) / STATISTICS = DESCRIPTIVES
        
                            ANALYSIS OF VARIANCE 
  
                                    SUM OF         MEAN             F      F 
         SOURCE            D.F.    SQUARES       SQUARES          RATIO  PROB. 
  
 BETWEEN GROUPS   [BSS]     4      406.9775      101.7444       5.4662  .0006 <--The payoff!
                       
 WITHIN GROUPS    [WSS]    84     1563.5169       18.6133 
  
 TOTAL            [TSS]    88     1970.4944 
  
  
                                  STANDARD   STANDARD 
 GROUP       COUNT        MEAN   DEVIATION      ERROR     MINIMUM     MAXIMUM   
  
 GRP 1           2     19.0000     14.1421    10.0000           9     29
 GRP 2          13     23.1538      3.3128      .9188          17     29
 GRP 3          31     22.7097      4.6918      .8427          13     31 
 GRP 4          16     21.3125      2.9148      .7287          17     28
 GRP 5          27     26.6667      4.1324      .7953          15     32 
  
 TOTAL          89     23.6404      4.7320      .5016           9     32