When you undertake a statistical analysis, be sure that
you understand the variables you are analyzing. For
example, the POLITY data set contains a variable,
CIVILDOR, labeled as a "civil disorder index." If you
consult the Nations of the World Database (by David
Garson) in Government Publications, you will find this
description of CIVILDOR: "Total number of incidents of
civil disorder. (Taken from pages 61-62 of Charles L.
Taylor, World Handbook of Political and Social
Indicators.) Period covered is 1948-1977."
The "getinfo" printout reveals that the minimum value
for CIVILDOR is 2, and the maximum is 8470. That range
suggests that the variable has a large standard
deviation. You can learn how the values are distributed
by running the FREQUENCIES procedure with the
subcommands
FORMAT=NOTABLE/HISTOGRAM/STATISTICS=STDDEV
KURTOSIS SKEWNESS.
That produces this result for 109 of the 111
cases in the file:
CIVILDOR CIVIL DISORDER INDEX
77 0 |***************************************
22 1000 |***********
2 2000 |*
2 3000 |*
2 4000 |*
2 5000 |*
0 6000 |
0 7000 |
2 8000 |*
+----+----+----+----+----+----+----+----+----+----+
0 20 40 60 80 100
Histogram frequency
Std dev 1396.894 Kurtosis 16.338 Skewness 3.839
|
If values for CIVILDOR are total counts of
incidents of civil disorder, populous countries should
have more disorder. That this is largely true is shown by
the plot of CIVILDOR with POPULA70.
++----+----+----+----+----+----+----+----+----+----+----+----+----++
| |
| 1 |
C8000+ 1 +
I | |
V | |
I | |
L | |
6000+ +
D | R
I | 1 |
S | 1 |
O | 1 |
R4000+ 1 +
D | |
E | |
R | 1 1 |
| |
I2000+ +
N | 1111 |
D | 312 |
E | 2312 |
X | J941 2 1 |
0+ *8 1 +
| |
R |
++----+----+----+----+----+----+----+----+----+----+----+----+---++
-3.0E+08 -1.0E+08 1.00E+08 3.00E+08 5.00E+08 7.00E+08 9.00E+08
POPULATION SIZE IN 1970
Correlation=35398 R-Squared=.12530 S.E. of Est=1312.54267 Sig.=.0002
|
Seeing that incidents of civil disorder are
influenced by size of population, you might compute a new
variable based on incidents of civil disorder per
1,000,000 people according to this formula:
compute
civilcap=civildor*1000000/popula70.
You can then run the Frequencies procedure to
examine the new distribution:
CIVILCAP
77 0 |***************************************
24 100 |************
3 200 |**
1 300 |*
1 300 |*
1 400 |*
1 500 |*
0 600 |
0 700 |
1 800 |*
0 900 |
0 1000 |
1 1083 |*
+----+----+----+----+----+----+----+----+----+----+
0 20 40 60 80 100
Histogram frequency
Std dev=139.591 Kurtosis=32.901 Skewness=5.350
|
Dividing by population removed the influence of
population in calculating civil disorder, but the
resulting distribution was even more skewed to the right.
Another technique for dealing with an extreme positively
skewed distribution is to compute its logarithm--the
exponent of the power to which another number, the base
(in this case 10), must be raised to equal the original
number.
In substantive terms, this means that incidents of
disorder in one nation must be ten times the incidents in
another nation to separate the nations by a full unit of
measurement. This transformation can be justified by an
argument similar to that in economics about the
diminishing utility of a dollar at high income levels.
Similarly, computing the logarithm of CIVILDOR implies
that different incidents of disorder between two nations
do not "register" unless one rate is at least ten times
the other. This approach to measurement is frequently
used for many forms of political and social behavior.
During the Korean and Vietnam wars, for examples, public
opposition to U.S. involvement was linked more closely to
the logarithm of battlefield casualties than to a simple
count of casualties. (In the table below, only the
integer characteristic is listed and not the decimal
mantissa.)
compute
civillog=lg10(civilcap).
CIVILLOG
2 0 |*****
14 1 |***********************************
9 1 |***********************
13 1 |*********************************
20 1 |**************************************************
11 2 |****************************
20 2 |**************************************************
10 2 |*************************
4 2 |**********
2 3 |*****
1 3 |***
2 3 |*****
+----+----+----+----+----+----+----+----+----+----+
0 4 8 12 16 20
Histogram frequency
Std dev .616 Kurtosis .126 Skewness .092
|
The distribution of the logarithm of CIVILCAP is
very close to normal. The following command will list the
individual nations and the relevant variables to help
evaluate our reworking of the CIVILDOR variable:
LIST VARIABLES =
COUNTRY CIVILDOR CIVILCAP CIVILLOG
COUNTRY CIVILDOR CIVILCAP CIVILLOG
AFGHANISTAN 38.00 3.05 .484
ALGERIA 4679.00 340.39 2.532
ANGOLA 541.00 91.14 1.960
ARGENTINA 1137.00 47.88 1.680
AUSTRALIA 113.00 9.03 .956
AUSTRIA 110.00 14.81 1.171
BANGLADESH 41.00 .60 -.220
BELGIUM 229.00 23.72 1.375
BOLIVIA 468.00 108.21 2.034
BRAZIL 364.00 3.80 .580
BULGARIA 22.00 2.59 .414
BURMA 1357.00 50.26 1.701
CAMEROON 142.00 20.94 1.321
CANADA 260.00 12.19 1.086
CENTR.AFRICAN REPUBLIC 99.99 . .
CHAD 52.00 14.27 1.155
CHILE 297.00 31.70 1.501
CHINA 2662.00 3.17 .501 COLOMBIA 833.00 39.17 1.593
|
|