Example from SPSS Users’ Guide, page 198

Path: janda.org/c10 > Syllabus > Topics and Readings > Modeling Relationships > Multiple Regression > Users' Guide p. 198

Example from SPSS Users' Guide, Chapter 12, page 198

First, a note about notation introduced on page 195

The SPSS User's Guide describes regression analysis using a format that differs from the usual one.

Most texts use Roman letters in the linear regression equation:

Y_i = a + bX_i + e_i

The e_iterm expresses the positive and negative residuals for values of Y_ithat deviate above and below the regression line

Note that the sum of all positive and negative residuals equals 0 -- that is e_i = 0

However, the Users' Guide [regrettably] employs a different notation, which appears on page 195:

Y_i = ß₀+ ß_iX_i +

The Users' Guide uses Greek letters because it views the regression coefficients as estimates of the population parameters.

This practice leads to confusion:

It differs from the practice of most (but not all) beginning texts that discuss regression

It gives no good term for what SPSS later refers to as standardized regression coefficients, which it calls betas

Unfortunately, those betas are not the ßs above.

You will have to live with this practice in the Users' Guide

Consider the equation on page 198 -- The book reports this table from SPSS 10:

The text then says:

The estimates of the model coefficients ß₀ (intercept) and ß₁ (slope) are, respectively, 47.17 and 0.307. So the estimated model is :

female life expectancy = 47.17 + 0.307 x female literacy

But note the disjunction between the text and the table:

In the table, 47.170 (intercept or constant) lies under the B heading--which itself lies under "Unstandardized Coefficients"

Under that is the value .307, which is also under the B heading, althought it is the ß₁coefficient.

In the next column, headed "Standardized Coefficients," is the Beta value of .819 -- which is not mentioned in the model.

The point is that the SPSS output is badly labeled, and you need to understand these points:

Regardless of what SPSS says in its heading,

interpret the constant as the intercept, which you learned as a.

interpret the Unstandardized Coefficients under the B heading as the b coefficients

Interpret the Standardized Coefficients, Beta, as the b cofficients for the same data--

after all the independent variables and dependents variables have been standardized -- transformed into z-scores.

Consider this graph of the model fitted to the data

Here is the graph of the data that the Users' Guide really should have presented at this point.

This model, which explains female life expectancy as a function of women's literacy, has this substantive interpretation:

Intercept = 47.17

Given a society in which 0% of the women could read, the expected life expectancy would be 47.17

Slope = .31

Starting from the an expected life span of 47.17 in societies in which 0% of the women could read, each 1 percentage point increase in female literacy tends to increase life expectancy by .31 years.

R-Square = .67

67% of the variation in female life expectancy in the world's nations can be explained by female literacy.

An alternative explanation of female life span -- wealth of a society, as measured by GDP per capita

First, let's consider the means and standard deviations of the variables that we want to correlate:

Here's the basic regression output:

Here's the associated scatterplot:

What's wrong with this plot?

The relationship is not linear--but you could not tell this from the table above.

GNP per capita is a highly skewed variable that needs to be transformed to produce a normalized distribution.

Note that the slope is 0.00 -- the slope is so small because $1 in GDP per capita does not buy much life expectancy.

Another try with a logarithmic transformation -- wealth of a society, as measured by the logarithm of GDP per capita

Note the improvement in the fit:

The percent of explained variation rises from 41% in the untransformed GDP per capita to 69% using the log of GDP per cap.

The slope now has a value of 14.17, not 0.00 as the in previous model.

The slope can be interpreted as follows: for each 10-fold increase in GDP per capita, female life span increases by 14.17 years.