Path: janda.org/c10 > Syllabus > Topics and Readings > Modeling Relationships > Multiple Regression > Users' Guide p. 198

Example from SPSS Users' Guide, Chapter 12, page 198


First, a note about notation introduced on page 195

  • The SPSS User's Guide describes regression analysis using a format that differs from the usual one.
  • Most texts use Roman letters in the linear regression equation:
    • Yi = a + bXi + ei
      • The ei term expresses the positive and negative residuals for values of Yi that deviate above and below the regression line
      • Note that the sum of all positive and negative residuals equals 0 -- that is ei = 0
  • However, the Users' Guide [regrettably] employs a different notation, which appears on page 195:
    • Yi = ß0 + ßi Xi +
      • The Users' Guide uses Greek letters because it views the regression coefficients as estimates of the population parameters.
        • This practice leads to confusion:
        • It differs from the practice of most (but not all) beginning texts that discuss regression
        • It gives no good term for what SPSS later refers to as standardized regression coefficients, which it calls betas
        • Unfortunately, those betas are not the ßs above.
  • You will have to live with this practice in the Users' Guide


Consider the equation on page 198 -- The book reports this table from SPSS 10:

The text then says:

  • The estimates of the model coefficients ß0 (intercept) and ß1 (slope) are, respectively, 47.17 and 0.307. So the estimated model is :
    female life expectancy = 47.17 + 0.307 x female literacy
     
  • But note the disjunction between the text and the table:
    • In the table, 47.170 (intercept or constant) lies under the B heading--which itself lies under "Unstandardized Coefficients"
    • Under that is the value .307, which is also under the B heading, althought it is the ß1 coefficient.
    • In the next column, headed "Standardized Coefficients," is the Beta value of .819 -- which is not mentioned in the model.
  • The point is that the SPSS output is badly labeled, and you need to understand these points:
    • Regardless of what SPSS says in its heading,
      • interpret the constant as the intercept, which you learned as a.
      • interpret the Unstandardized Coefficients under the B heading as the b coefficients
    • Interpret the Standardized Coefficients, Beta, as the b cofficients for the same data--
      • after all the independent variables and dependents variables have been standardized -- transformed into z-scores.


Consider this graph of the model fitted to the data

  • Here is the graph of the data that the Users' Guide really should have presented at this point.
  • This model, which explains female life expectancy as a function of women's literacy, has this substantive interpretation:
    Intercept = 47.17
    • Given a society in which 0% of the women could read, the expected life expectancy would be 47.17
    Slope = .31
    • Starting from the an expected life span of 47.17 in societies in which 0% of the women could read, each 1 percentage point increase in female literacy tends to increase life expectancy by .31 years.
    R-Square = .67
    • 67% of the variation in female life expectancy in the world's nations can be explained by female literacy.


An alternative explanation of female life span -- wealth of a society, as measured by GDP per capita

First, let's consider the means and standard deviations of the variables that we want to correlate:

  • Here's the basic regression output:
  • Here's the associated scatterplot:
  • What's wrong with this plot?
    • The relationship is not linear--but you could not tell this from the table above.
    • GNP per capita is a highly skewed variable that needs to be transformed to produce a normalized distribution.
    • Note that the slope is 0.00 -- the slope is so small because $1 in GDP per capita does not buy much life expectancy.


Another try with a logarithmic transformation -- wealth of a society, as measured by the logarithm of GDP per capita

  • Note the improvement in the fit:
    • The percent of explained variation rises from 41% in the untransformed GDP per capita to 69% using the log of GDP per cap.
    • The slope now has a value of 14.17, not 0.00 as the in previous model.
    • The slope can be interpreted as follows: for each 10-fold increase in GDP per capita, female life span increases by 14.17 years.