Relationship
of ANALYSIS OF VARIANCE to REGRESSION
ANALYSIS
- Analysis
of variance is suited for
- A
continuous dependent variable
- A
discrete independent variable
- Regression
analysis is suited for
- A
continuous dependent variable
- A
series of continuous independent
variables
- But
these independent variables can be dummy
variables
- So-called
"Dummy" variables take two values: 1 and 0 --
for SOUTH and NON-SOUTH.
- Coefficients
for dummy variables can be interpreted as the
effect of the variable when it is
present.
- What
happens if the several categories of a
discrete variable in analysis of variance
are made into k-1 dummy independent
variables and run through regression
analysis?
- Analysis
of variance in vote for Reagan in 1984 by
REGION
-
-
- - - - - - - -
- - A N A L Y S I S - O F - V A R I A N C E - - - -
- - - -
-
VALUE
|
LABEL
|
MEAN
|
STD DEV
|
SUM OF SQ
|
CASES
|
1
|
NORTHEAST
|
57.7778
|
5.6960
|
259.5556
|
9
|
2
|
NORTH CENTRAL
|
60.0833
|
5.9918
|
394.9167
|
12
|
3
|
SOUTH
|
58.2941
|
12.2666
|
2407.5294
|
17
|
4
|
WEST
|
63.5385
|
6.6785
|
535.2308
|
13
|
WITHIN GROUPS TOTAL
|
59.9608
|
8.7485
|
3597.2324
|
51
|
SOURCE
|
SUM OF SQUARES
|
D.F.
|
MEAN SQUARE
|
F
|
SIG.
|
BETWEEN GROUPS
|
256.6892
|
3
|
85.5631
|
1.1179
|
0.351
|
WITHIN GROUPS
|
3597.2324
|
47
|
76.5369
|
|
|
|
|
Eta=.2581
|
Eta2 = .0666
|
|
|
- Applying
Regression Analysis with Dummy Variables to
REAGAN84
-
- First,
create three "variables"--Northeast, South,
West--and set them to 0
- COMPUTE
NRTHEAST = 0
COMPUTE SOUTH = 0
COMPUTE WEST = 0
-
- Using the
IF command, set the value equal to 1 if
state is in the region.
- IF (REGION
= 1) NRTHEAST = 1
- IF (REGION = 3)
SOUTH = 1
IF (REGION = 4) WEST = 1
-
- Run
regression using these dummy
variables
- REGRESSION
VARIABLES = REAGAN84 NRTHEAST SOUTH WEST
- /DEPENDENT
= REAGAN84/ ENTER /
-
- * * * * * * *
* * * M U L T I P L E R E G R E S S I O N * * * * *
* * * *
- EQUATION
NUMBER 1 DEPENDENT VARIABLE.. REAGAN84 PCT VOTE FOR
REAGAN,
- 1..
WEST
2.. NRTHEAST
3.. SOUTH ELEVEN STATES OF THE
CONFEDERACY
MULTIPLE R
|
0.25808
|
|
|
|
|
R SQUARE
|
0.0666
|
|
|
|
|
ADJUSTED R SQUARE
|
0.00703
|
|
|
|
|
STANDARD ERROR
|
8.74853
|
|
|
|
|
|
|
|
|
|
|
ANALYSIS OF VARIANCE
|
DF
|
SUM OF SQUARES
|
MEAN SQUARE
|
REGRESSION
|
3
|
256.68917
|
85.56306
|
RESIDUAL
|
47
|
3597.2324
|
76.53686
|
|
|
F=1.1179
|
SIGNIF F = .3513
|
|
|
|
|
|
|
------------------ VARIABLES IN THE
EQUATION ------------------
|
VARIABLE
|
B
|
SE B
|
BETA
|
T
|
SIG.
|
WEST
|
3.45513
|
3.50222
|
0.17322
|
0.987
|
0.3289
|
NRTHEAST
|
-2.30556
|
3.85774
|
-0.10111
|
-0.598
|
0.5529
|
SOUTH
|
-1.78922
|
3.29852
|
-0.09703
|
-0.542
|
0.5901
|
(CONSTANT)
|
60.08333
|
2.52548
|
|
23.791
|
0
|
- Computing the regression
equations:
if state is in
the
|
WEST
|
= 60.08333
|
+ 3.45513
|
- 0
|
- 0
|
= 63.5385
|
NrthEast
|
= 60.08333
|
+ 0
|
- 2.30556
|
- 0
|
= 57.7778
|
SOUTH
|
= 60.08333
|
+ 0
|
- 0
|
- 1.78922
|
= 58.2941
|
Nrth Centrl
|
= 60.08333
|
+ 0
|
- 0
|
- 0
|
= 60.0833
|
These computed means
are identical
to the means produced in the top table from
ANOVA.
- Thus,
analysis of variance and regression analysis produce
the same results when applied to exactly the same
problem viewed as a single DISCRETE variable or as
k-1 DUMMY variables.
Using
SPSS to create "dummy" variables
- "Dummy"
variables are dichotomous renditions of the
absence or presence of some qualitative attribute
- For
example, "region," "sex," "race," "party,"
etc.
- These
qualitative attributes are typically coded "1" to
indicate the presence of the trait, and "0" to
indicate its absence.
- Then
the "dummy" variable can be used in multiple
regression as an independent variable.
- It
will function as a switch, turning on if the
trait is present (= 1), and turnning off if
it is absent (=0).
- SPSS
syntax command procedures (you can do the
equivalent using the Transform Menu and then
Compute
- Choose
a name for your variable and set all cases equal to
0
- COMPUTE
SOUTH = 0
- Use
"IF" command to code the cases as you desire
- IF
(REGION = 3) SOUTH = 1
- All
cases that are not in Region 3 will be left
= 0.
- SOUTH
becomes available to use in any subsequent run.
- You can use thie procedure with either of the
cross-national data sets to treat any region as a
dummy variable.
- Check
your results by running Frequencies on all
"dummy" variables you create.
|