Relationship
of ANALYSIS OF VARIANCE to REGRESSION
ANALYSIS
 Analysis
of variance is suited for
 A
continuous dependent variable
 A
discrete independent variable
 Regression
analysis is suited for
 A
continuous dependent variable
 A
series of continuous independent
variables
 But
these independent variables can be dummy
variables
 Socalled
"Dummy" variables take two values: 1 and 0 
for SOUTH and NONSOUTH.
 Coefficients
for dummy variables can be interpreted as the
effect of the variable when it is
present.
 What
happens if the several categories of a
discrete variable in analysis of variance
are made into k1 dummy independent
variables and run through regression
analysis?
 Analysis
of variance in vote for Reagan in 1984 by
REGION


       
  A N A L Y S I S  O F  V A R I A N C E    
   

VALUE

LABEL

MEAN

STD DEV

SUM OF SQ

CASES

1

NORTHEAST

57.7778

5.6960

259.5556

9

2

NORTH CENTRAL

60.0833

5.9918

394.9167

12

3

SOUTH

58.2941

12.2666

2407.5294

17

4

WEST

63.5385

6.6785

535.2308

13

WITHIN GROUPS TOTAL

59.9608

8.7485

3597.2324

51

SOURCE

SUM OF SQUARES

D.F.

MEAN SQUARE

F

SIG.

BETWEEN GROUPS

256.6892

3

85.5631

1.1179

0.351

WITHIN GROUPS

3597.2324

47

76.5369





Eta=.2581

Eta^{2} = .0666



 Applying
Regression Analysis with Dummy Variables to
REAGAN84

 First,
create three "variables"Northeast, South,
Westand set them to 0
 COMPUTE
NRTHEAST = 0
COMPUTE SOUTH = 0
COMPUTE WEST = 0

 Using the
IF command, set the value equal to 1 if
state is in the region.
 IF (REGION
= 1) NRTHEAST = 1
 IF (REGION = 3)
SOUTH = 1
IF (REGION = 4) WEST = 1

 Run
regression using these dummy
variables
 REGRESSION
VARIABLES = REAGAN84 NRTHEAST SOUTH WEST
 /DEPENDENT
= REAGAN84/ ENTER /

 * * * * * * *
* * * M U L T I P L E R E G R E S S I O N * * * * *
* * * *
 EQUATION
NUMBER 1 DEPENDENT VARIABLE.. REAGAN84 PCT VOTE FOR
REAGAN,
 1..
WEST
2.. NRTHEAST
3.. SOUTH ELEVEN STATES OF THE
CONFEDERACY
MULTIPLE R

0.25808





R SQUARE

0.0666





ADJUSTED R SQUARE

0.00703





STANDARD ERROR

8.74853











ANALYSIS OF VARIANCE

DF

SUM OF SQUARES

MEAN SQUARE

REGRESSION

3

256.68917

85.56306

RESIDUAL

47

3597.2324

76.53686



F=1.1179

SIGNIF F = .3513







 VARIABLES IN THE
EQUATION 

VARIABLE

B

SE B

BETA

T

SIG.

WEST

3.45513

3.50222

0.17322

0.987

0.3289

NRTHEAST

2.30556

3.85774

0.10111

0.598

0.5529

SOUTH

1.78922

3.29852

0.09703

0.542

0.5901

(CONSTANT)

60.08333

2.52548


23.791

0

 Computing the regression
equations:
if state is in
the

WEST

= 60.08333

+ 3.45513

 0

 0

= 63.5385

NrthEast

= 60.08333

+ 0

 2.30556

 0

= 57.7778

SOUTH

= 60.08333

+ 0

 0

 1.78922

= 58.2941

Nrth Centrl

= 60.08333

+ 0

 0

 0

= 60.0833

These computed means
are identical
to the means produced in the top table from
ANOVA.
 Thus,
analysis of variance and regression analysis produce
the same results when applied to exactly the same
problem viewed as a single DISCRETE variable or as
k1 DUMMY variables.
Using
SPSS to create "dummy" variables
 "Dummy"
variables are dichotomous renditions of the
absence or presence of some qualitative attribute
 For
example, "region," "sex," "race," "party,"
etc.
 These
qualitative attributes are typically coded "1" to
indicate the presence of the trait, and "0" to
indicate its absence.
 Then
the "dummy" variable can be used in multiple
regression as an independent variable.
 It
will function as a switch, turning on if the
trait is present (= 1), and turnning off if
it is absent (=0).
 SPSS
syntax command procedures (you can do the
equivalent using the Transform Menu and then
Compute
 Choose
a name for your variable and set all cases equal to
0
 COMPUTE
SOUTH = 0
 Use
"IF" command to code the cases as you desire
 IF
(REGION = 3) SOUTH = 1
 All
cases that are not in Region 3 will be left
= 0.
 SOUTH
becomes available to use in any subsequent run.
 You can use thie procedure with either of the
crossnational data sets to treat any region as a
dummy variable.
 Check
your results by running Frequencies on all
"dummy" variables you create.
