Path: janda.org/c10 > Syllabus > Topics and Readings > Readings Menu > Labovitz Article

Readings in Statistical Analysis

Labovitz: Treating Ordinal Data as Interval
 THE ASSIGNMENT OF NUMBERS TO RANK ORDER CATEGORIES *

Sanford Labovitz
The American Sociological Review, 35 (1970), 151-524.

Edited by Kenneth Janda

Abstract

By both random and nionrandom, assignments of numbers to rank orders (which are consistent with the monotonic nature of the categories), it is shown that ordinal variables can be treated as if they conform to interval scales. The scoring systems, of which 18 were randomly generated by a computer, resulted in negligible error when comparing any assigned scoring system with any selected "true" scoring system. Errors are determined by the Pearsonian correlation coefficient (r) and r2. The advantages of treating ordinal variables as interval are demonstrated with regard to the relation between occupational prestige and suicide. These advantages include: (1) the use of more powerful, sensitive, better developed and interpretable statistics with known sampling error, (2) the retention of more knowledge about the characteristics of the data, and (3) greater versatility in statistical manipulation (e.g., partial and multiple correlation and regression, analysis of variance and covariance, and most pictorial presentation). The computer approach to this problem does not exhaust all possibilities for assigning numbers, which partially limits the generality of the findings.

EMPIRICAL evidence supports the treatment of ordinal variables as if they conform to interval scales (Labovitz, 1967).[1] Although some small error may accompany the treatment of ordinal variables as interval.[2] this is offset by the use of more powerful, more sensitive, better developed, and more clearly interpretable statistics with known sampling error. For example, well-defined measures of dispersion (variance) require interval or ratio based measures. Furthermore, many more manipulations (which may be necessary to the problem in question) are possible with interval measurement, e.g., partial correlation, multivariate correlation and regression, analysis of variance and covariance, and most pictorial presentations. The arguments presented below are general enough to apply to any ordinal scale, and perhaps with even greater confidence they apply to variables that fall between ordinal and interval, e.g., I.Q. scores and formal education (Somers, 1962: 800).

To determine the degree of error of results when treating ordinal variables as if they are interval, the relation between occupational prestige and suicide rates is analyzed. Prestige rankings obtained by NORC in its 1947 survey are related to suicides by occupation for males in the United States in 1950. The list of occupations, taken from Duncan's comparisons of occupational categories used in the survey, are matched to the detailed occupational classification in the U.S. Census of 1950 (Reiss et al., 1961). Because suicides are not reported for all of these occupations and sometimes the reported suicides are for two or three occupations grouped into one, 36 occupations were selected which contain the necessary data used in this study (see Table 1). Measurement of occupational prestige is based solely on the principle of ordinal ranking. In the survey, respondents were given occupations to rank by the method of paired comparisons; consequently, the resulting prestige scores indicate merely the rank of one occupation relative to the others (Reiss et al., 1961: 122 l23).[3] [Editor's note: Labovitz is using a well-known ranking of occupation prestige as a criterion variable which he will correlate with other scales having different values.]

TABLE 1. Prestige, Income, Education, and Suicide Rates for 36 Occupations

UNITED STATES, MALES, Circa, 1950
Occupation
NORC Rating Scalea
Male Suicide Rateb
Median Incomec
Median School Yrs . Completedd

Accountants and auditors

82
23.8
3977
14.4

Architects

90
37.5
5509
16+

Authors, editors and reporters

76
37
4303
15.6

Chemists

90
20.7
4091
16+

Clergymen

87
10.6
2410
16+

College presidents, professors and instructors (n.e.c.)

93
14.2
4366
16+

Dentists

90
45.6
6448
16+

Engineers,civil

88
31.9
4590
16+

Lawyers and judges

89
24.3
6284
16+

Physiciansand surgeons

97
31.9
8302
16+

Social welfare, recreation and groupworkers

59
16
3176
15.8

Teachers (n.e.c.)

73
16.8
3465
16+

Managers, officials and proprietors (n.e.c.)&emdash;self-employed&emdash;manufacturing

81
64.8
4700
12.2

Managers, officials and proprietors (n.e.c.) -- selfemployed -- wholesale and retail trade

45
47.3
33806
11.6

Bookkeepers

39
21.9
2828
12.7

Mail-carriers

34
16.5
3480
12.2

Insurance agents and brokers

41
32.4
3771
12.7

Salesmen and sales clerks (n.e.c.), retail trade

16
24.1
2543
12.1

Carpenters

33
32.7
2450
8.7

Electricians

53
30.8
3447
11.1

Locomotive engineers

67
34.2
4648
8.8

Machinists and jobsetters, metal

57
34.5
3303
9.6

Mechanics and repairmen, automobile

26
24.4
2693
9.4

Plumbers and pipe fitters

29
29.4
3353
9.3

Attendents,autoservice and parking

10
14.4
1898
10.3

Mine operatives and laborers (n.e.c.)

15
41.7
2410
8.2

Motormen, street, subway, and elevated railway

19
19.2
3424
9.2

Taxscab-drivers and chauffeurs

10
24.9
2213
8.9

Truck and tractor drivers, deliverymen and routemen

13
17.9
2590
9.6

Operatives and kindred workers, (n.e.c.), machinery, except electrical

24
15.7
2915
9.6

Barbers, beauticians and manicurists

20
36
2357
8.8

Waiters, bartenders and counter and fountain workers

7
24.4
1942
9.8

Cooks, except private household

16
42.2
2249
8.7

Guards and watchmen

11
38.2
2551
8.5

Janitors, sextons and porters

8
20.3
1866
8.2

Policemen, detectives, sheriffs, bailiffs, marshals and constables

41
47.6
2866
10.6

a. Albert J. Reiss, Jr., et al., 1961:122&emdash;123. The scale ls based on a 1947 survey.

b. Males, aged 20-64. National Office of Vital Statistics, Vital Statistics&emdash;Special Report, Vol. 53, No. 3

c. 1949 Median Income. United States Census of Population, 1950. Occupational Characteristics.

d. 1950 Median School Years Completed, Ibid.

 The rank correlation (rho) between occupational prestige and suicide is .07. The scatter diagram of the NORC prestige ratings and suicide rates suggests that the relation is roughly linear, although the plotted points are widely scattered. The Pearsonian correlation coefficient (r) on the same data is slightly larger (.11). The .04 discrepancy between the two measures is due to the magnitude of the differences between adjacent scores which are not considered in rho, but do influence the value of r.

ASSIGNMENT OF SCORING SYSTEMS TO ORDINAL CATEGORIES

Twenty [different] scoring systems are used on NORC's occupational prestige values. One scoring system is the actual prestige ratings resulting from the study (the NORC Prestige Rating Scale in Table 1). A second scoring system is the assignment of equidistant numbers (i.e., an equal distance between assigned numbers) to the occupational categories (Table 2). The remaining scoring systems in Table 2 were generated from a computer according to the following conditions: (1) the assigned numbers lie between the range of 1 and 10,000, (2) the assignment of numbers is consistent with the monotonic function of the ordinal rankings, (3) any ties in the ordinal rankings are assigned identical numbers, and (4) the selection of a number is made on the basis of a random generator in the computer program. To be consistent with the monotonic function, any subsequent randomly selected numbers must be higher than previous ones (except for ties). The resulting largely random scoring systems vary among themselves (sometimes to a large extent) on the actual values assigned to each rank, the range of values, and the size of the differences between adjacent values. Although all are necessarily consistent with the monotonicity of the ordinal rankings, they vary widely among themselves. In fact, some of the scoring systems show definite curvilinear patterns--logarithmic, exponential or higher order curves (two or more inflection points).

TABLE 2. NORC PRESTIGE RATINGS, LINEAR SCORES, AND FIVE
MONOTONIC RANDOM GENERATED SCORING SYSTEMS
Linear
NORC
Monotonic Random Generated Scoring Systems
(1)
(2)
(3)
(5)
(9)
(13)
(18)
1
7
13
79
52
849
418
2
8
34
105
109
909
585
3.5
10
99
233
380
923
648
3.5
10
99
233
380
923
648
5
11
248
389
518
1152
820
6
13
407
580
557
1167
869
7
15
727
605
799
2300
1271
8.5
16
1824
771
2167
2343
1478
8.5
16
1824
771
2167
2343
1478
10
19
1897
1042
2790
2845
1647
11
20
2021
1287
2796
2876
1789
12
24
2470
1374
3209
3107
2112
13
26
2978
1713
3558
3159
2627
14
29
2995
2083
3598
3231
2628
15
33
3330
2595
3808
3409
2777
16
34
3412
2715
3945
3760
2921
17
39
3535
2751
4087
4238
3077
18.5
41
3952
2861
4094
4898
3156
18.5
41
3952
2861
4094
4898
3156
20
45
4082
3003
4745
5336
3209
21
53
4485
3266
4885
5903
3600
22
57
4865
4013
4892
6016
4304
23
59
5091
4267
5044
6106
4323
24
67
5146
4449
5300
6242
4762
25
73
5349
5318
5819
6270
5020
26
76
5775
6330
5876
6681
5528
27
81
5995
6547
5923
6787
5797
28
82
6304
6810
5932
6915
6027
29
87
6356
6974
5976
7118
6388
30
88
6644
7660
5995
7229
6471
31
89
6742
8145
6160
7652
6560
33
90
7657
9085
6231
7926
6911
33
90
7657
9085
6231
7926
6911
33
90
7657
9085
6231
7926
6911
35
93
7841
9108
6458
8283
6972
36
97
8164
9461
7094
8472
7588

See text for an explanation of the scoring systems. The five random scoring systems are indicative of the 18 used in the study.

Because this computer approach to assigning numbers to rank order data partially is based on a random selection of numbers, the generality of the findings is somewhat limited. It is possible that some systematic selection of numbers will not yield such consistent results as those reported herein.

The similarity among the scoring systems can be assessed by their matrix of intercorrelations (Table 3). By assuming, in turn, that each scoring system is the "true" one, the intercorrelations (Pearson product-moment coefficients) indicate the extent of "error" of using one of the other 19 scoring systems. For example, if (4) is the "true" system and (7) has been used in its place, then .97 (the correlation between the two scoring systems) indicates the degree to which the two systems vary together. On the other hand, r2 (the values below the diagonal in Table 3) indicates "error" in terms of the amount of variance in the assigned scoring system accounted for by the variation in the "true" scoring system (Abelson and Tukey, 1959).[4] In this instance, between scoring systems (4) and (7), 94% of the variance in (7) is accounted for by the variation in (4).

TABLE 3. INTERCORRELATIONS (r) AMONG TWENTY SCORING SYSTEMS

Scoring Systems

Twenty Different Scoring Systems

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)

(1)b

...
.98
1
.99
.97
.99
.98
.99
.97
.98
.99
.99
.99
.99
1
.98
.99
.99
.99
1

(2)

.98
...
.97
.96
.97
.98
.95
.97
.96
.94
.97
.97
.96
.98
.93
.97
.94
.99
.98
.96

(3)

1
.94
...
.99
.97
.99
.98
.98
.98
.98
.99
.99
.99
.99
.99
.98
.99
.99
.99
.99

( 4)

.98
.92
.98
...
.98
.99
.97
.99
.95
.99
.98
.98
.98
.99
.99
.99
.99
1
.98
.99

( 5)

.94
.94
.94
.96
...
.99
.93
.99
.91
.99
.96
.95
.96
.96
.97
.99
.98
.99
.94
.96

( 6)

.98
.96
.98
.98
.98
...
.97
.99
.94
.99
.97
.98
.98
.98
.99
.99
.99
.99
.97
.98

( 7)

.96
.9
.96
.94
.86
.94
...
.96
.98
.95
.96
.98
.98
.98
.98
.94
.96
.96
.99
.98

( 8)

.98
.94
.96
.98
.98
.98
.92
...
.93
.99
.97
.97
.98
.98
.99
.99
.98
.99
.96
.98

( 9)

.94
.92
.96
.9
.83
.88
.96
.86
...
.93
.97
.98
.97
.97
.96
.92
.96
.9S
.99
.98

( 10)

.96
.88
.96
.98
.98
.98
.9
.98
.86
...
.97
.97
.97
.98
.98
.99
.99
1
.96
.98

( 11)

.98
.94
.98
.96
.92
.94
.96
.94
.94
.94
...
.99
.98
.98
.99
.98
.99
.98
.98
.99

( 12)

.98
.94
.98
.96
.9
.96
.96
.94
.96
.94
.98
...
.99
.99
.98
.97
.98
.98
.99
.99

( 13)

.98
.92
.98
.96
.92
.96
.96
.96
.94
.94
.96
.98
...
.99
.99
.97
.98
.98
.99
.99

( 14)

.98
.96
.98
.98
.92
.96
.96
.96
.94
.96
.96
.98
.98
...
.99
.97
.98
.99
.98
.99

( 15)

1
.86
.98
.98
.94
.98
.96
.98
.92
.96
.98
.96
.98
.98
...
.98
.99
.99
.98
.99

(16)

.96
.94
.96
.98
.98
.98
.88
.98
.85
.98
.96
.94
.94
.94
.96
...
.99
.99
.96
.97

( 17)

.98
.88
.98
.98
.96
.98
.92
.96
.92
.98
.98
.96
.96
.96
.98
.98
...
.99
.98
.99

( 18)

.98
.98
.98
1
.98
.98
.92
.98
.90
1
.96
.96
.96
.98
.98
.98
.98
...
.97
.98

( 19)

.98
.96
.98
.96
.88
.94
.98
.92
.98
.92
.96
.98
.98
.96
.96
.92
.96
.94
...
.99

( 20)

1
.92
.98
.98
.92
.96
.96
.96
.96
.96
.98
.98
.98
.98
.98
.94
.98
.96
.98
...

a. r above the diagonal; r2 below

b. linear scoring system.

c. NORC prestige ratings.

The r and r2 values in Table 3 are consistently and substantially high, indicating a high degree of interchangeability among the 20 scoring systems. Out of 190 correlation coefficients, all are above .90 (a few even reach unity), and 157 are .97 and above. Therefore, even without a rationale concerning the differences between ranks, by using a nearly random method of assigning scoring systems (consistent with the monotonic function), it is possible that under specific conditions the selected scoring system will deviate from the "true" system by a near zero or negligible amount. The r2 values are slightly lower than the r values, but still exceedingly high. For example, only nine of the 190 are below .90, and none are below .83. (Since r2 is the square of a decimal fraction, it is necessarily smaller than r.)

Note that if the equidistant (linear) scoring system is always selected (no matter what the "true" scoring system may be), the expected error is smaller than the larger errors cited above. Almost all the r's and r2's for the linear system (1) are near unity, with the lowest r being .97 and the lowest r2 being .94. The linear scoring system lies midway between the other scoring systems (in correlational terms), which by definition excludes the most extreme scoring systems in each direction. The correlations between the extremes are lowest, and, therefore, selecting the linear scoring system eliminates the lowest r's and the highest potential "errors" in selecting a scoring system different from the "true" one.

Possessing some knowledge about the amount of differences between ranks can reduce the small error even further, if the linear scoring system has been assigned to the ordinal categories. Perhaps, the best strategy, if there is some knowledge of the differences between ranks, is to modify the linear scoring system accordingly. For example, in the relation X1 > X2> X3, X2 is assumed to be closer to X3 than to X,. Consequently, the linear scoring system of 10, 20 and 30 (as values for Xs, X2 and X3) can be modified to 10, 25 and 30 to account for this additional knowledge. It should be stressed that without prior knowledge or theory such score assignments are not likely to prove useful for analysis.

Table 4 offers further evidence that ordinal data can be treated as if they are interval by assigning scoring systems to the ordered categories. In this instance, the predictive ability of each scoring system is assessed in terms of its relation to suicide rates. As indicated previously, the rho value between the NORC prestige scale and 1950 suicide rates for males in 36 occupations is .07; for the same data, r is .11. Table 4 reports the r and r2 values between the 20 scoring systems and the suicide rate. (The last two columns in Table 4 are r values for 20 and 10 occupations respectively and will be discussed later in the paper.) The similarity in predicting an outside variable is extremely high. The r's vary between .09 and .15, and the 3 values are either .01 or .02.[5] Given some degree of unreliability in occupational prestige and suicide data, and the rather crude measurement procedures, these results substantiate the point that different systems yield interchangeable variables. Each indicates a quite low positive (statistically nonsignificant) relation between occupational prestige and suicide. These results are consistent with a previous study (Labovitz, 1967), which also found the relations to be very similar; however, in the previous study, the relations are somewhat higher and statistically significant.

TABLE 4: CORRELATION COEFFICIENTS (r)
BETWEEN SUICIDE RATES AND TWENTY SCORING SYSTEMS
OF OCCUPATIONAL PRESTIGE
Scoring System
r(N=36)
r2
r(N=20)
r(N=10)

(linear) (1)

0.13
0.02
0.35
0.28

(prestige rankings) ( 2)

0.11
0.01
0.35
0.25

( 3)

0.13
0.02
0.31
0.24

( 4)

0.11
0.01
0.32
0.3

( 5)

0.1
0.01
0.3
0.21

( 6)

0.11
0.01
0.35
0.18

( 7)

0.14
0.02
0.28
0.33

( 8)

0.12
0.01
0.38
0.34

( 9)

0.14
0.02
0.26
0.15

( 0)

0.09
0.01
0.29
0.24

( 11)

0.13
0.02
0.3
0.24

( 12)

0.11
0.01
0.28
0.22

( 13)

0.15
0.02
0.41
0.35

( 14)

0.14
0.02
0.37
0.25

( 15)

0.13
0.02
0.35
0.32

( 16)

0.09
0.01
0.33
0.18

( 17)

0.12
0.01
0.37
0.34

( 18)

0.11
0.01
0.3
0.33

( 19)

0.15
0.02
0.33
0.25

( 20)

0.14
0.02
0.38
0.41
Partially based on the data in Tables 1 and 2. Scoring systems 3 to 18 are randomly generated.
. . . [Text omitted]

CONCLUSIONS

The results of the tests based on assigning interval scores to ordinal categories suggest: (1) certain interval statistics can be used interchangeably with ordinal statistics and interpreted as ordinal, (2) certain interval statistics (e.g., variance) can be computed where no ordinal equivalent exists and can be interpreted with accuracy, (3) certain interval statistics can be given their interval interpretation with only negligible error if the variable is "nearly" interval, and (4) certain interval statistics can be given their interval interpretations with caution (even if the variable is "purely" ordinal), because the "true" scoring system and the assigned scoring system, especially the equidistant system, are almost always close as measured by r and A

Consequently, treating ordinal variables as if they are interval has these advantages: (1) the use of more powerful, sensitive, better developed and interpretable statistics with known sampling error, (2) the retention of more knowledge about the characteristics of the data, and (3) greater versatility in statistical manipulation, e.g., partial and multiple correlation and regression, analysis of variance and covariance, and most pictorial presentations.

The study suggests two research strategies when analyzing ordinal variables. First, assign a linear scoring system according to the available evidence on the distances between ranks. Second, use all available rank order categories, rather than collapsing them into a smaller number, because the greater the number of ranks the greater the stability and confidence in the assigned scoring system (unless the dichotomization of ranks is suspected). The all-too-frequent strategy of dichotomizing or trichotomizing variables should be avoided if possible.[7]

A final word of caution is necessary. The researcher should know and report the actual scales of his data, and any interval statistics selected should be interpreted with care. Further exploration and tests are necessary for added confidence in treating ordinal data as if they are interval. The more conservative procedure, of course, is to treat ordinal data as strictly ordinal, and thereby avoid the possibility of attributing a property to a given scale which it does not possess.


NOTES

* I am grateful to Robert Hacedorn, Harvey Marshall, Ross Purdy, and the referees of ASR for their helpful comments and critical reading of an earlier draft.

1. Labovitz demonstrates the utility of treating ordinal variables as interval for a hypothetical problem relating two types of therapy to four subjective responses: it made me worse (-); it had no effect (0) it helped a little (+) and it helped quite a bit (++ ). The four ordinal responses are assigned scores ranging from highly skewed (e.g., 0,1, 2, 10) to equidistant systems (e.g., 0, 3,6, 10. The monotonic scoring systems produce largely similar point-biserial coefficients, t-tests, and critical ratios. Furthermore, the divergent scoring systems are highly interrelated. The vs between the two types of therapy and the four subjective responses are somewhat higher (averaging about .20) than the correlation coefficients in this study (averaging about .12).

2. Small error may result because the difference between two adjacent ranks may not be the same as the difference between two other adjacent ranks.

3. Duncan's socioeconomic index, based upon the income and educational levels of each occupation, correlates highly with the NORC prestige scale.

4. Abelson and Tukey also use r2 as the criterion for assessing the adequacy of numerical assignments and, in addition, present a "maximin" 72 to assess the largest possible error in a scoring system. Briefly, an assigned scoring system X is correlated with a "true" system Y so that the minimum possible r2 between X and Y achieves its maximum value. Their analysis, instead of leading to an average error rate (in which the "true" r2 may be equally above or below the rate), results in a conservative lower limit estimate. This lower limit estimate Is based on a sequence called "corners," which is consistent with the inequalities (i.e., it follows the monotonic or equality functions) and is based on a set of dichotomized values. For example, given the following relations Y,~Y±~Y~ V., a set of corners is (0, 0, 0, 1), (0, 0,1, 1), and (0, 1, 1, 1). One of these corner sequences yields the maximin r2. There are three problems with Abelson and Tukey's analysis: (1) an average error rate is more indicative of a representative error (i.e., the most likely error in assigning a scoring system) and, therefore, is more useful to the researcher, (2) the corner sequence is based on dichotomies which is a highly unlikely occurrence and a waste of information, and (3) they analyze only "greater than" end "equal to" models in com hination ~ while the most frequent ordinal cases are "greater than" between most ranks (Y,>Y,>Y,). The "greater than" model leads into a dichotomous analysis onlyif there are two ranks (a trivial case).

5. It should be noted that the usual purpose of a transformation in correlation work is to raise the correlation. However, the stability of the correlations in this study is not inconsistent with this general principle.

6. This is Anderson's basic reason for selecting parametric over nonparametric statistics.

7. Another reason against the use of dichotomies or trichotomies is that often a large amount of information is lost by such drastic collapsing.

REFERENCES

Abelson, Robert P. and John W. Tukey. (1959) "Efficient conversation of non-metric information to metric information." Proceedings of the Social Statistics Section, American Statistical Association: 226-230.

Anderson, Norman A. (1961) "Scales and statistics: Parametric and non-parametric." Psychological Bulletin, 58 (July) :305--316.

Blalock, Hubert M., Jr. (1967) Letter. American Journal of Sociology 72 (May) :675--677.

__________________(1968) "Theory building and causal inferences." pp. 155--196 in Hubert M. Blalock, Jr. and Ann B. Bialock (eds.), Methodology in Social Research. New York: McGraw-Hill. Boneau, C. A.

__________________(1960) "The effects of violations of assumptions underlying the 't' test." Psychological Bulletin, 57 (January) :49--64.

Breed, Warren (1963) "Occupational mobility and suicide among white males." American Sociological Review, 28 (April):179--189.

Dublin, L. I. (1963) Suicide: A Sociological and Statistical Study. New York: Ronald Press.

Hirsh, Joseph. (1959) "Suicide." Mental Hygiene, 43 (October): 516--526.

Labovitz, Sanford (1967) "Some observations on measurement and statistics." Social Forces, 46 (December): 151--160.

______________ (1968a) "Reply to Champion and Morris." Social Forces, 46 (June) :543--545.

______________ (1968b) 'Variation in suicide rates." Pp. 5 7--74 in Jack P. Gibbs (ed.), Suicide. New York: Harper and Row.

Lindquist, E. F. (1953) Design and Analysis of Experiments. Pp. 78--90. New York: Houghton Mifflin.

Mans, Ronald (1967) "Suicide, status, and mobility in Chicago." Social Forces, 46 (December) :246-256.

Morris, Raymond N. (1968) "Commentary." Social Forces, 46 (June): 541--343.

Powell, Elwin M. (1958) "Occupation, status, and suicide: Toward a redefinition of anomie." American Sociological Review, 23 (April):131--140.

Rein, Albert J., Jr., with 0. D. Duncan, Paul K. Hatt, and Cecil C. North (1961) Occupations and Social Status. New York:Free Press.

Somers, Robert H.(1962) "A new asymmetric measure of association for ordinal variables." American Sociological Review, 27 (December):799--811.