Application of
multiple regression to explaining the vote for Reagan by
state in 1984:
 The setting of the 1984
election:
 Republican President
Ronald Reagan ran against the Democratic candidate,
Walter Mondale
 How well would
Reagan do in 1984 comapred with his election in
1980?
 Walter Mondale had
won the nomination handily over Jesse Jackson, who
promised to mobilize the black vote in 1984
 What impact would
the "Jackson factor" have on turning out the
Democratic vote in 1984?
 Mondale's
runningmate for Vice President was Geraldine
Ferraro
 What impact would
the "Ferraro factor" have on turning out women to
vote for the ticket in 1984?
 Let's include all
three variables in a model to explain Reagan's vote in
1984.
 Matrix of
correlations:
 Use of Multiple
Regression in SPSS 10 to execute this
model:
 Under Analyze
in the Menu select Linear Regression
 In the "Dependent"
box, enter the name of the variable you want to
explain: Reagan84
 In the
"Independents" box enter the names of the
explanatory variables: Reagan80, PctBlack,
PctWomen
 for this analysis,
select STEPWISE [enters the variables one
at a time] in the "Method" box.
 Click on the
"Statistics" button and check
 "Estimates" for
Regression Coefficients and
 "Model Fit"
 Press
"Continue"
 Click on
"OK"
Here are the series of output boxes for the above SPSS
run:
 Interpretation:
 Although three
independent variables were offered for inclusion using
the stepwise procedure, only Reagan84
was included in Model 1
 The box for the
"Excluded Variables" shows that % Black and
% Women were not significant at the .05
level and therefore were not selected for
inclusion.
Another try: Reconsidering the politics of the 1980
and 1984 elections
 In 1980, Reagan ran
against President Jimmy Carter, a southerner
 Southern states have
the highest proportions of black voters
 Perhaps Carter in
1980 ran relatively better in the south against Reagan
than Minnesotan Mondale did in 1984
 So perhaps we need to
control for the south before assessing the impact of
the Jackson factor
 Let's redo the analysis
entering the variable, South, scored 1 if the
state is one of the 11 in the deep south, or 0
otherwise.
 Here's the new
correlation matrix
 Note that the
correlation between "south" and "Reagan80" was
negative (as expected) but very small.
CORRELATION:

REAGAN80

PCTBLACK

PCTWOMEN

SOUTH

REAGAN84

0.900

0.565

0.513

0.123

REAGAN80


0.595

0.537

0.082

PCTBLACK



0.522

0.511

PCTWOMEN




0.211

 Still, let's carry out
the new stepwise analysis using four independent
variables:

Interpreting this stepwise output
 Adding south as a
variable completely alters the analysis.
 In Model 2, South
is added to Reagan as a second explanatory
variable, raising the R^{2} from .81 to
.85.
 In Model 3, once
South is controlled for, %Black enters the
equation as a significant variable (.05 level), raising
the R^{2} to ,88.
 However, %Women
is not added into the equation to further increase the
explanation:
 As shown in the
Excluded Variables table above, it is only
significant at .612  which is well short of .05 and
so not added.
 In truth, there is
relatively little variance among states in % women,
making that a weak explanatory
variable.
Interpreting the regression coefficients in the
REGRESSION printout:
 REGRESSION
coefficients
 Regression
coefficients are embedded within a regression
EQUATION
 "b" coefficients
are "unstandardized" coefficients and can be
interpreted using the scale of measurement for the
raw data.
 "beta"
coefficients are "standardized" and must be
interpreted in terms of "standard deviation"
units.
 Because of the
standardization, "beta" coefficients can be
compared WITHIN equations; "b" coefficients
cannot.
 The regression
coefficients in a multiple regression equation are
"partial" coefficients because they state the
"effect" of a given independent variable on the
dependent variable while "controlling" for
effects of other variables in the
equation.
 Regression
coefficients measure CHANGE in Y, not percent
of variance explained.

CORRELATION coefficients:
 Measure the
STRENGTH of a relationship, i.e., the fit of
observed Y values around the line representing
predicted Y values according to a regression
equation.
 Simple
correlations pertain to bivariate relationships
involving only one independent variable and are
expressed by r.
 Multiple
correlations pertain to multivariate
relationships using multiple independent variables and
are expressed by R.
 The regression
procedure refers to the "partial" correlations of
variables not in the equation to choose which
variables (if any) are to be added to the equation
next.
 Partial
correlations computed by regression pertain
to the correlation between a dependent variable, Y,
and an independent variable, Xi, out of the
equation while "controlling for" the effects of the
other independent variables already in the
equation.
 In essence,
partial correlations express the correlation
between the residuals (i.e., deviations) of Y
regressed on X_{i }and the residuals around
the line of the regression equation built from the
other variables in the equation.
