L19- Introduction to ANOVA

Path: janda.org/c10 > Syllabus > Topics and Readings >Testing Hypotheses: One Nominal and One Interval Variable
> Analysis of Variance

Introduction to Analysis of Variance

The nature and purpose of analysis of variance

Analysis of variance is a generalization of the t-test
- The t-test is used to test the null hypothesis: µ₁=µ₂
- The t-test is appropriate for only two groups
- Analysis of variance, and the associated F-test, can test for differences between any number of means for categorized variables.
- Formally, the null hypothesis is µ₁ = µ₂ = µ₃ = µ₃ ... = µ_n
Example: CLASS by grade on 2/3 Examination in C10 Statistics
- T-Test results for a previous class (not this year's class)
  - Mean score for undergraduates = 22.3
  - Mean score for graduates = 26.7
- T-Test showed that this was significant far beyond the .05 level using a one-tailed test.
Analysis of variance gives a general test for effect of class on grade

The underlying structural model of Analysis of variance

Each observation in the combined distribution represents a linear combination of components:
X_ij = µ_grand + a_j + e_{i
j}
Where: a_j= effects of group and e_{i
j} = effects of random error
If the treatment or "group" effect is 0,
- then observations will vary around the mean depending on e_{i
  j} (errors around the mean)
- these errors are assumed to be random, normally distributed, and sum to 0.
This model--which accounts for individual observations in the distribution--can be expressed in terms of the total variation of individual observations from the grand mean.
Partitioning the total variation (also known as sum of squares, SS):


More succinctly:	TSS =	WSS	+ BSS

Calculation of sums of squares:


(individual score - grand mean) = total sum of squares
	(individual score - group mean) = within group sum of squares
		(group mean - grand mean) = between group sum of squares

An intuitive approach to what is going on:

If the group means were really different, then the BSS (explained SS) would be large relative to the WSS (unexplained SS).
If the group means were not much different, the WSS would be large relative to the BSS.

The uses of BSS and WSS

They are measures of the source of the total variation, and we will eventually use them to measure the strength of the relationship between the independent and dependent variable.
But they have another purpose: to test the hypothesis that the k groups are really random samples from equivalent populations
- If they are really drawn from the same population, then
  µ₁ = µ₂ = µ₃ = µ₄ ... = µ_n -- as we hypothesized
- But if they really are drawn from equivalent populations, then also
  1 = 2 = 3 = 4 .... n
To test if they are from equivalent populations, we actually test for equality of estimates of population variance.

BSS and WSS as tests of population variance

Each of these provide the basis for two independent estimates of the population variance
- One estimates the population variance based on the variance within each of the samples
- The other estimates the population variance based on variance between the sample means
The F test is simply a ratio between these two estimates of the population variance:

F =	estimate of variance based on between mean variation
	estimate of variance based on within group variation

If this ratio is small, then the two estimates agree closely, and we conclude that the groups represent random samples from equivalent populations: i.e., same means and variances

How do we get these estimates of the population variances?

They must be divided by the appropriate degrees of freedom
- BSS / k-1 = between groups mean square (read "mean square" as the mean of the sum of squares)
- WSS / N-k = within groups mean square

SPSS ONEWAY analysis of variance for class effect on 2/3 exam (from a class in the 1990s)

ONEWAY EXAM2 BY CLASS (1,5) / STATISTICS = DESCRIPTIVES
        
                            ANALYSIS OF VARIANCE 
  
                                    SUM OF         MEAN             F      F 
         SOURCE            D.F.    SQUARES       SQUARES          RATIO  PROB. 
  
 BETWEEN GROUPS   [BSS]     4      406.9775      101.7444       5.4662  .0006 <--The payoff!
                       
 WITHIN GROUPS    [WSS]    84     1563.5169       18.6133 
  
 TOTAL            [TSS]    88     1970.4944 
  
  
                                  STANDARD   STANDARD 
 GROUP       COUNT        MEAN   DEVIATION      ERROR     MINIMUM     MAXIMUM   
  
 GRP 1           2     19.0000     14.1421    10.0000           9     29
 GRP 2          13     23.1538      3.3128      .9188          17     29
 GRP 3          31     22.7097      4.6918      .8427          13     31 
 GRP 4          16     21.3125      2.9148      .7287          17     28
 GRP 5          27     26.6667      4.1324      .7953          15     32 
  
 TOTAL          89     23.6404      4.7320      .5016           9     32