A problem for multiple regression
 One dependent
variable: REAGAN84  % vote by state for Reagan in
1984
 Three independent
variables:
 % vote for Reagan in
1980 < past voting history of
state
 % Black
< the "Jackson
factor"
 % Women
< the "Ferraro
factor"
 Correlation
matrix: A table showing intercorrelations among
all variables
 Bivariate results
 As shown previously,
the vote for Reagan in 1980 is strongly related to the
vote for Reagan in 1984.
 The vote for Reagan
in both 1980 and 1984 is strongly correlated with %
Black and % Women  but slightly LESS in 1984 than
1980.
 The two demographic
variables, Black and Women, are themselves highly
correlated.
 Multiple regression is
needed to assess the combined explanatory power of
all three independent variables, magically (i.e.,
statistically) adjusting for their
intercorrelations
The explanatory model underlying multiple
regression
 Here is a schematic
presentation of the simplest model of additive effects:
 Here is the regression
model expressed mathematically:
 Assumptions of this
simple additive model
 Independent variables
are not causes of each other.
 They are not caused
by other common variables.
 Therefore the
independent variables are uncorrelated.
 Departures from the
additive nature of explanation:
 In a strict additive
model, the variance explained by each variable can be
added to that explained by the other
variables.
 In practice, strict
additivity rarely obtains because independent
variables tend to be correlated.
 In practice, every
variable added to the equation will increase its power
of explanation but in decreasing amounts.
 Usually, there is
little increase after adding 5 or 6
variables.
Multiple regression as an extension of simple linear
regression
 Linear regression model
for one independent variable
 Unstandardized or
"raw" data
 Standardized data
(transformed into zscores)

where is
a standardized coefficient, not a population
parameter
 Extension of linear
model to multiple independent variables
 Unstandardized
model uses bcoefficients: Y_{i}=
b_{1}X_{1i} +
b_{2}X_{2i} +
b_{3}X_{3i}+ . . . .
b_{n}X_{ni
}
 Standardized
model uses betacoefficients:
 Conceptualization
 Whereas in simple
regression, we calculated the best fitting line that
could pass through a series of points in
twodimensional space
 in multiple
regression, we seek the best fitting "hyperplane" that
can pass through a mass of points in k+1
dimensional space.
Computation and interpretation of
and b  the REGRESSION COEFFICIENTS
 These values are
computed by the computer, and in practice you will not
have to calculate them yourself
 Computation of
 Relationship of
to b
 Interpretation
 b_{1}
is the expected change in Y with one unit
change in Xi when all other variables are
controlled.
 See
the example based on the SPSS Users' Guide,
p. 198.

however, represents the expected change in Y in
standard deviation unitswith one standard
deviation change in Xi when the other independent
variables are controlled.
 The
b_{1} coefficients offer the advantage
of interpretation in the original units of
measurement
 They have the
disadvantage of making it difficult to compare the
effects of independent variables, for variables may
vary widely in means and standard deviations and thus
in their values.
 Consider the effect
of income in dollars as an independent variable
 one unit change (i.e., $1) is likely to have a very
small effect on any Y and thus b would be
tiny.
 If measured instead
in thousands of dollars, the b
coefficient would be larger  for one unit change
(i.e., $1,000) would have a larger effect on
Y.

coefficients have the advantage of being directly
comparable in relative importance of effects on Y, but
they can't be interpreted in the original measurement
scale.
Interpretation of the multiple correlation coefficient,
R
 R is the product
moment correlation between the dependent variable and the
predicted values as generated by the multiple regression
equation, using the bcoefficients.
 See
the example that illustrates the meaning of R using
data to explain female life expectancy across
nations.
 R^{2} as
a PRE (proportionalreductioninerror measure of
association)
 Adjusted
R^{2}  adjusting for the number of variables in
equation:
 According to the
logic of multiple regression, each new variable adds
"something" to the explanation.
 After a point, the
addition of each new variable adds less and less to
the explanation  until the addition of a new
variable is not "worth" its contribution.
 The formula for the
"adjusted R2" allows for the number of variables
involved in the equation.
 In general, one
should not add variables beyond the point that
the adjusted R^{2} begins to
decrease.
where: p = number of independent variables
in equation
Assumptions of multiple regression
 Normality and
Eequality of variance for distributions of Y for
each value of the X variable (homoscedasticity,
discussed earlier)
 Independence of
observations on Y (i.e., not repeated measures on the
same unit, as in the paired samples design for the
TTest)
 Linearity of
relationships between Y and X variables
