Path: janda.org/c10 > Syllabus > Topics and Readings > Modeling Relationships > Regression and Analysis of Variance
Regression &ANOVA
Relationship of ANALYSIS OF VARIANCE to REGRESSION ANALYSIS
  • Analysis of variance is suited for
    • A continuous dependent variable
    • A discrete independent variable
  • Regression analysis is suited for
    • A continuous dependent variable
    • A series of continuous independent variables
      • But these independent variables can be dummy variables
      • So-called "Dummy" variables take two values: 1 and 0 -- for SOUTH and NON-SOUTH.
      • Coefficients for dummy variables can be interpreted as the effect of the variable when it is present.
      • What happens if the several categories of a discrete variable in analysis of variance are made into k-1 dummy independent variables and run through regression analysis?
Analysis of variance in vote for Reagan in 1984 by REGION
 
 
- - - - - - - - - A N A L Y S I S - O F - V A R I A N C E - - - - - - - -
 

VALUE

LABEL

MEAN

STD DEV

SUM OF SQ

CASES

1

NORTHEAST

57.7778

5.6960

259.5556

9

2

NORTH CENTRAL

60.0833

5.9918

394.9167

12

3

SOUTH

58.2941

12.2666

2407.5294

17

4

WEST

63.5385

6.6785

535.2308

13

WITHIN GROUPS TOTAL

59.9608

8.7485

3597.2324

51

 

SOURCE

SUM OF SQUARES

D.F.

MEAN SQUARE

F

SIG.

BETWEEN GROUPS

256.6892

3

85.5631

1.1179

0.351

WITHIN GROUPS

3597.2324

47

76.5369

Eta=.2581

Eta2 = .0666

Applying Regression Analysis with Dummy Variables to REAGAN84
 
First, create three "variables"--Northeast, South, West--and set them to 0
COMPUTE NRTHEAST = 0
COMPUTE SOUTH = 0
COMPUTE WEST = 0
 
Using the IF command, set the value equal to 1 if state is in the region.
IF (REGION = 1) NRTHEAST = 1
IF (REGION = 3) SOUTH = 1
IF (REGION = 4) WEST = 1
 
Run regression using these dummy variables 
REGRESSION VARIABLES = REAGAN84 NRTHEAST SOUTH WEST
/DEPENDENT = REAGAN84/ ENTER /
 
* * * * * * * * * * M U L T I P L E R E G R E S S I O N * * * * * * * * *
EQUATION NUMBER 1 DEPENDENT VARIABLE.. REAGAN84 PCT VOTE FOR REAGAN,
1.. WEST
2.. NRTHEAST
3.. SOUTH ELEVEN STATES OF THE CONFEDERACY

MULTIPLE R

0.25808

R SQUARE

0.0666

ADJUSTED R SQUARE

0.00703

STANDARD ERROR

8.74853

ANALYSIS OF VARIANCE

DF

SUM OF SQUARES

MEAN SQUARE

REGRESSION

3

256.68917

85.56306

RESIDUAL

47

3597.2324

76.53686

F=1.1179

SIGNIF F = .3513

------------------ VARIABLES IN THE EQUATION ------------------

VARIABLE

B

SE B

BETA

T

SIG.

WEST

3.45513

3.50222

0.17322

0.987

0.3289

NRTHEAST

-2.30556

3.85774

-0.10111

-0.598

0.5529

SOUTH

-1.78922

3.29852

-0.09703

-0.542

0.5901

(CONSTANT)

60.08333

2.52548

23.791

0

Computing the regression equations:

if state is in the

WEST

= 60.08333

+ 3.45513

- 0

- 0

= 63.5385

NrthEast

= 60.08333

+ 0

- 2.30556

- 0

= 57.7778

SOUTH

= 60.08333

+ 0

- 0

- 1.78922

= 58.2941

Nrth Centrl

= 60.08333

+ 0

- 0

- 0

= 60.0833

These computed means are identical
to the means produced in the top table from ANOVA.

Thus, analysis of variance and regression analysis produce the same results when applied to exactly the same problem viewed as a single DISCRETE variable or as k-1 DUMMY variables.
Using SPSS to create "dummy" variables
  • "Dummy" variables are dichotomous renditions of the absence or presence of some qualitative attribute
    • For example, "region," "sex," "race," "party," etc.
    • These qualitative attributes are typically coded "1" to indicate the presence of the trait, and "0" to indicate its absence.
    • Then the "dummy" variable can be used in multiple regression as an independent variable.
    • It will function as a switch, turning on if the trait is present (= 1), and turnning off if it is absent (=0).
  • SPSS syntax command procedures (you can do the equivalent using the Transform Menu and then Compute
    • Choose a name for your variable and set all cases equal to 0
      COMPUTE SOUTH = 0
    • Use "IF" command to code the cases as you desire
      IF (REGION = 3) SOUTH = 1
    • All cases that are not in Region 3 will be left = 0.
    • SOUTH becomes available to use in any subsequent run.
  • You can use thie procedure with either of the cross-national data sets to treat any region as a dummy variable.
  • Check your results by running Frequencies on all "dummy" variables you create.