Recoding a nominal variable for Regression

Recoding a Nominal Variable as Ordinal

[Fall 2001 class please note: This student was using SPSS in the syntax mode. Although you
would use the menu commands to recode the variable, the logic is the same.]

Q. I finished reading the chapter on crosstabs tonight, and I realized that in my research, my dependent variable "Do you support/oppose death penalty?" is a dichotomy...since it is neither interval nor ordinal, can I really not use multiple regression ? Could I just convert it into a dummy variable and use it that way?

A. This week in lecture, I said that using a dichotomous variable as a dependent variable in regression raised certain problems. Although the coefficients in the equation are unbiased estimates of the population parameters when used to explain a dependent variable, they are also inefficient and unstable. This means that one cannot test them adequately for significance. You could employ other techniques (that we haven't covered) to do this analysis, but those are beyond us now.

Q. My problem is that I really feel like I have a great grasp on regression analysis, and I would be more interested in doing that. We haven't even gone over crosstabs in class, and it doesn't seem as interesting to me when we have multiple independent variables.

A. I'm glad that you feel this way, for understanding regression analysis is very important. crosstabs is a procedure to use mainly when the regression alternative is unavailable.

Q. What would your suggestions be? Should I find a different dependent variable, one that is continuous? Can I stick to death penalty and do regression ? Or must I do crosstabs ?

A. The first thing to do is to run Frequencies to examine the variable codes and see whether you can do anything with the variable. Here is the table from my run of vote96:

	V961197   96PO: Does R favor/oppose the death penalty?
                                                        Valid     Cum
	Value Label                 Value  Frequency  Percent  Percent  Percent

	1. Favor                        1      1168     68.1     78.7     78.7
	5. Oppose                       5       316     18.4     21.3    100.0
	0. Inap, no post IW             0       180     10.5   Missing
	8. DK                           8        40      2.3   Missing
	9. NA                           9        10       .6   Missing
	                                     -------  -------  -------
	                            Total      1714    100.0    100.0

You will want to create a new variable, let's call it KILLEM, from V961197. Consider the logic of this command applied to the file vote96.

   2  missing values v961197 (99). 
          ^ Replaces all previous missing values codes.

   3  compute killem = v961197.
          ^ Duplicates v961197 as the new variable, KILLEM.

   4  recode killem (5=-1) (8= 0).
	  ^ Recodes KILLEM into a three category ordinal scale.

   5  missing values killem (9).
	  ^ Establishes as missing data only the "No Answer" code (9).

   6  variable label killem 'Favor execution (V961197) made ordinal'.
          ^ Labels the new variable, KILLEM (not really necessary; cosmetic)

   7  value labels killem 
	    -1 'Oppose execution' 0 'Undecided' 1 'Favor execution'.
	  ^ Labels the values in KILLEM (not really necessary)

   8  crosstabs /tables =  v961197 by killem /missing=include / cells=count.
	  ^ Checks the coding validity for KILLEM.  The subcommand 
	     "missing=include" forces CROSSTABS to show all data codes.

V961197  96PO: Does R favor/oppose the death pen  by  KILLEM
                       KILLEM                         
            Count  |
                   |Oppose e Undecide Favor ex
                   |xecution d        ecution             Row
                   |   -1.00|     .00|    1.00|    9.00| Total
V961197    --------+--------+--------+--------+--------+
                0  |        |   180  |        |        |   180
  0. Inap, no post |        |        |        |        |  10.5
                   +--------+--------+--------+--------+
                1  |        |        |  1168  |        |  1168
  1. Favor         |        |        |        |        |  68.1
                   +--------+--------+--------+--------+
                5  |   316  |        |        |        |   316
  5. Oppose        |        |        |        |        |  18.4
                   +--------+--------+--------+--------+
                8  |        |    40  |        |        |    40
  8. DK            |        |        |        |        |   2.3
                   +--------+--------+--------+--------+
                9  |        |        |        |    10  |    10
  9. NA            |        |        |        |        |    .6
                   +--------+--------+--------+--------+
            Column     316      220     1168       10     1714
             Total    18.4     12.8     68.1       .6    100.0


Now you have a new variable, KILLEM, with this distribution:

KILLEM    Favor execution (V961197) made ordinal
                                                           Valid     Cum
Value Label                 Value  Frequency  Percent  Percent  Percent
   
Oppose execution            -1.00       316     18.4     18.5     18.5
Undecided                     .00       220     12.8     12.9     31.5
Favor execution              1.00      1168     68.1     68.5    100.0
                             9.00        10       .6   Missing
                                     -------  -------  -------
                            Total      1714    100.0    100.0

Although this is not an ideal measure of attitudes toward the death penalty, it will do for our class.  It will enable you to use 
multiple regression.