Path: janda.org/c10 > Syllabus > Topics and Readings > Measuring Relationships > Handling Skewed Variables

 Measuring Relationships between Continuous Variables Dealing with Skewed Variables
Dealing with a Highly Skewed Variable: CIVILDOR in the POLITY Study

When you undertake a statistical analysis, be sure that you understand the variables you are analyzing. For example, the POLITY data set contains a variable, CIVILDOR, labeled as a "civil disorder index." If you consult the pamphlet, "Nations of the World Database" (by David Garson) in Government Publications, you will find this description of CIVILDOR: "Total number of incidents of civil disorder. (Taken from pages 61-62 of Charles L. Taylor, World Handbook of Political and Social Indicators.) Period covered is 1948-1977."

Using SPSS to compute the mininum and maximum values, you find that the minimum value for CIVILDOR is 2, and the maximum is 8470. That range suggests that the variable has a large standard deviation. You can learn how the values are distributed by running Frequencies (suppressing the Table); asking for Mean, Standard Deviation, Skewness, and Kurtosis; and producing a histogram.

 CIVIL DISORDER INDEX N Valid 109 Missing 2 Mean 692.46 Std. Deviation 1396.89 Skewness 3.839 Kurtosis 16.338

If values for CIVILDOR are total counts of incidents of civil disorder, populous countries would have more disorder by normal human aggression. That this is largely true is shown by the plot of CIVILDOR with POPULA70.

Seeing that incidents of civil disorder are influenced by size of population, you might compute a new variable based on incidents of civil disorder per 1,000,000 people according to this formula:

compute civilcap=civildor*1000000/popula70.
You can then run the Frequencies procedure to examine the new distribution:
 Civil disorder per 1,000,000 N Valid 109 Missing 2 Mean 61.3624 Std. Deviation 139.5905 Skewness 5.35 Kurtosis 32.901

Dividing by population removed the influence of population in calculating civil disorder, but the resulting distribution was even more skewed to the right. Another technique for dealing with an extreme positively skewed distribution is to compute its logarithm--the exponent of the power to which another number, the base (in this case 10), must be raised to equal the original number.

In substantive terms, this means that incidents of disorder in one nation must be ten times the incidents in another nation to separate the nations by a full unit of measurement. This transformation can be justified by an argument similar to that in economics about the diminishing utility of a dollar at high income levels. Similarly, computing the logarithm of CIVILDOR implies that different incidents of disorder between two nations do not "register" unless one rate is at least ten times the other. This approach to measurement is frequently used for many forms of political and social behavior. During the Korean and Vietnam wars, for examples, public opposition to U.S. involvement was linked more closely to the logarithm of battlefield casualties than to a simple count of casualties. (In the table below, only the integer characteristic is listed and not the decimal mantissa.)

compute civillog=lg10(civilcap).
 log of civilcap N Valid 109 Missing 2 Mean 1.3375 Std. Deviation 0.6156 Skewness 0.092 Std. Error of Skewness 0.231 Kurtosis 0.126

The distribution of the logarithm of CIVILCAP is very close to normal. The following command will list the individual nations and the relevant variables to help evaluate our reworking of the CIVILDOR variable:

LIST VARIABLES = COUNTRY CIVILDOR CIVILCAP CIVILLOG

 COUNTRY CIVILDOR CIVILCAP CIVILLOG AFGHANISTAN 38 3.05 0.484 ALGERIA 4679 340.39 2.532 ANGOLA 541 91.14 1.96 ARGENTINA 1137 47.88 1.68 AUSTRALIA 113 9.03 0.956 AUSTRIA 110 14.81 1.171 BANGLADESH 41 0.6 -0.22 BELGIUM 229 23.72 1.375 BOLIVIA 468 108.21 2.034 BRAZIL 364 3.8 0.58 BULGARIA 22 2.59 0.414 BURMA 1357 50.26 1.701 CAMEROON 142 20.94 1.321 CANADA 260 12.19 1.086 CENTR.AFRICAN REPUBLIC 99.99 . . CHAD 52 14.27 1.155 CHILE 297 31.7 1.501 CHINA 2662 3.17 0.501 COLOMBIA 833 39.17 1.593