Path: janda.org/c10
> Syllabus
> Topics
and Readings >
Modeling
Relationships >
Multiple
Regression
> Users'
Guide p. 198
Example from SPSS
Users' Guide, Chapter 12, page 198
|
First, a note about notation introduced on page
195
- The SPSS User's Guide
describes regression analysis using a format that differs
from the usual one.
- Most texts use Roman
letters in the linear regression equation:
- Yi
= a + bXi + ei
- The ei
term expresses the positive and negative
residuals for values of Yi that deviate
above and below the regression line
- Note that the sum
of all positive and negative residuals equals 0 --
that is
ei
= 0
- However, the Users'
Guide [regrettably] employs a different
notation, which appears on page 195:
- Yi
= ß0 + ßi
Xi +
- The Users'
Guide uses Greek letters because it views the
regression coefficients as estimates of the
population parameters.
- This practice
leads to confusion:
- It differs from
the practice of most (but not all) beginning
texts that discuss regression
- It gives no
good term for what SPSS later refers to as
standardized regression coefficients,
which it calls betas
- Unfortunately,
those betas are not the
ßs above.
- You will have to live
with this practice in the Users' Guide
Consider the equation on page 198 -- The book reports
this table from SPSS 10:
The text then says:
- The estimates of the
model coefficients ß0
(intercept) and ß1 (slope)
are, respectively, 47.17 and 0.307. So the estimated
model is :
- female life
expectancy = 47.17 + 0.307 x female
literacy
-
- But note the disjunction
between the text and the table:
- In the table, 47.170
(intercept or constant) lies under the B
heading--which itself lies under "Unstandardized
Coefficients"
- Under that is the
value .307, which is also under the B heading,
althought it is the ß1
coefficient.
- In the next column,
headed "Standardized Coefficients," is the Beta
value of .819 -- which is not mentioned in the
model.
- The point is that the
SPSS output is badly labeled, and you need to understand
these points:
- Regardless of what
SPSS says in its heading,
- interpret the
constant as the intercept, which you
learned as a.
- interpret the
Unstandardized Coefficients under the B heading as
the b coefficients
- Interpret the
Standardized Coefficients, Beta, as the b
cofficients for the same data--
- after all the
independent variables and dependents variables have
been standardized -- transformed into
z-scores.
Consider this graph of the model fitted to the
data
- Here is the graph of the
data that the Users' Guide really should have
presented at this point.
- This model, which
explains female life expectancy as a function of women's
literacy, has this substantive interpretation:
- Intercept =
47.17
- Given a society in
which 0% of the women could read, the expected life
expectancy would be 47.17
- Slope = .31
- Starting from the
an expected life span of 47.17 in societies in
which 0% of the women could read, each 1 percentage
point increase in female literacy tends to increase
life expectancy by .31 years.
- R-Square = .67
- 67% of the
variation in female life expectancy in the world's
nations can be explained by female
literacy.
|
An alternative explanation
of female life span -- wealth of a society, as measured
by GDP per capita
First, let's consider the
means and standard deviations of the variables that we want
to correlate:
- Here's the basic
regression output:
- Here's the associated
scatterplot:
- What's wrong with this
plot?
- The relationship is
not linear--but you could not tell this from the table
above.
- GNP per capita is a
highly skewed variable that needs to be transformed to
produce a normalized distribution.
- Note that the slope
is 0.00 -- the slope is so small because $1 in GDP per
capita does not buy much life expectancy.
|
Another try with a
logarithmic transformation -- wealth of a society, as
measured by the logarithm of GDP per capita
- Note the improvement in
the fit:
- The percent of
explained variation rises from 41% in the
untransformed GDP per capita to 69% using the log of
GDP per cap.
- The slope now has a
value of 14.17, not 0.00 as the in previous
model.
- The slope can be
interpreted as follows: for each 10-fold increase in
GDP per capita, female life span increases by 14.17
years.
|