Labovitz: Treating Ordinal Data as Interval 
Sanford Labovitz Edited by Kenneth Janda 

EMPIRICAL evidence supports the treatment of ordinal variables as if they conform to interval scales (Labovitz, 1967).[1] Although some small error may accompany the treatment of ordinal variables as interval.[2] this is offset by the use of more powerful, more sensitive, better developed, and more clearly interpretable statistics with known sampling error. For example, welldefined measures of dispersion (variance) require interval or ratio based measures. Furthermore, many more manipulations (which may be necessary to the problem in question) are possible with interval measurement, e.g., partial correlation, multivariate correlation and regression, analysis of variance and covariance, and most pictorial presentations. The arguments presented below are general enough to apply to any ordinal scale, and perhaps with even greater confidence they apply to variables that fall between ordinal and interval, e.g., I.Q. scores and formal education (Somers, 1962: 800). To determine the degree of error of results when treating ordinal variables as if they are interval, the relation between occupational prestige and suicide rates is analyzed. Prestige rankings obtained by NORC in its 1947 survey are related to suicides by occupation for males in the United States in 1950. The list of occupations, taken from Duncan's comparisons of occupational categories used in the survey, are matched to the detailed occupational classification in the U.S. Census of 1950 (Reiss et al., 1961). Because suicides are not reported for all of these occupations and sometimes the reported suicides are for two or three occupations grouped into one, 36 occupations were selected which contain the necessary data used in this study (see Table 1). Measurement of occupational prestige is based solely on the principle of ordinal ranking. In the survey, respondents were given occupations to rank by the method of paired comparisons; consequently, the resulting prestige scores indicate merely the rank of one occupation relative to the others (Reiss et al., 1961: 122 l23).[3] [Editor's note: Labovitz is using a wellknown ranking of occupation prestige as a criterion variable which he will correlate with other scales having different values.] 









Accountants and auditors 




Architects 




Authors, editors and reporters 




Chemists 




Clergymen 




College presidents, professors and instructors (n.e.c.) 




Dentists 




Engineers,civil 




Lawyers and judges 




Physiciansand surgeons 




Social welfare, recreation and groupworkers 




Teachers (n.e.c.) 




Managers, officials and proprietors (n.e.c.)&emdash;selfemployed&emdash;manufacturing 




Managers, officials and proprietors (n.e.c.)  selfemployed  wholesale and retail trade 




Bookkeepers 




Mailcarriers 




Insurance agents and brokers 




Salesmen and sales clerks (n.e.c.), retail trade 




Carpenters 




Electricians 




Locomotive engineers 




Machinists and jobsetters, metal 




Mechanics and repairmen, automobile 




Plumbers and pipe fitters 




Attendents,autoservice and parking 




Mine operatives and laborers (n.e.c.) 




Motormen, street, subway, and elevated railway 




Taxscabdrivers and chauffeurs 




Truck and tractor drivers, deliverymen and routemen 




Operatives and kindred workers, (n.e.c.), machinery, except electrical 




Barbers, beauticians and manicurists 




Waiters, bartenders and counter and fountain workers 




Cooks, except private household 




Guards and watchmen 




Janitors, sextons and porters 




Policemen, detectives, sheriffs, bailiffs, marshals and constables 




a. Albert J. Reiss, Jr., et al., 1961:122&emdash;123. The scale ls based on a 1947 survey. 

b. Males, aged 2064. National Office of Vital Statistics, Vital Statistics&emdash;Special Report, Vol. 53, No. 3 

c. 1949 Median Income. United States Census of Population, 1950. Occupational Characteristics. 

d. 1950 Median School Years Completed, Ibid. 
The rank correlation (rho) between occupational prestige and suicide is .07. The scatter diagram of the NORC prestige ratings and suicide rates suggests that the relation is roughly linear, although the plotted points are widely scattered. The Pearsonian correlation coefficient (r) on the same data is slightly larger (.11). The .04 discrepancy between the two measures is due to the magnitude of the differences between adjacent scores which are not considered in rho, but do influence the value of r. Twenty [different] scoring systems are used on NORC's occupational prestige values. One scoring system is the actual prestige ratings resulting from the study (the NORC Prestige Rating Scale in Table 1). A second scoring system is the assignment of equidistant numbers (i.e., an equal distance between assigned numbers) to the occupational categories (Table 2). The remaining scoring systems in Table 2 were generated from a computer according to the following conditions: (1) the assigned numbers lie between the range of 1 and 10,000, (2) the assignment of numbers is consistent with the monotonic function of the ordinal rankings, (3) any ties in the ordinal rankings are assigned identical numbers, and (4) the selection of a number is made on the basis of a random generator in the computer program. To be consistent with the monotonic function, any subsequent randomly selected numbers must be higher than previous ones (except for ties). The resulting largely random scoring systems vary among themselves (sometimes to a large extent) on the actual values assigned to each rank, the range of values, and the size of the differences between adjacent values. Although all are necessarily consistent with the monotonicity of the ordinal rankings, they vary widely among themselves. In fact, some of the scoring systems show definite curvilinear patternslogarithmic, exponential or higher order curves (two or more inflection points). 
MONOTONIC RANDOM GENERATED SCORING SYSTEMS 








































































































































































































































































See text for an explanation of the scoring systems. The five random scoring systems are indicative of the 18 used in the study. 
Because this computer approach to assigning numbers to rank order data partially is based on a random selection of numbers, the generality of the findings is somewhat limited. It is possible that some systematic selection of numbers will not yield such consistent results as those reported herein. The similarity among the scoring systems can be assessed by their matrix of intercorrelations (Table 3). By assuming, in turn, that each scoring system is the "true" one, the intercorrelations (Pearson productmoment coefficients) indicate the extent of "error" of using one of the other 19 scoring systems. For example, if (4) is the "true" system and (7) has been used in its place, then .97 (the correlation between the two scoring systems) indicates the degree to which the two systems vary together. On the other hand, r^{2} (the values below the diagonal in Table 3) indicates "error" in terms of the amount of variance in the assigned scoring system accounted for by the variation in the "true" scoring system (Abelson and Tukey, 1959).[4] In this instance, between scoring systems (4) and (7), 94% of the variance in (7) is accounted for by the variation in (4). 


Scoring Systems 























(1)b 




















(2) 




















(3) 




















( 4) 




















( 5) 




















( 6) 




















( 7) 




















( 8) 




















( 9) 




















( 10) 




















( 11) 




















( 12) 




















( 13) 




















( 14) 




















( 15) 




















(16) 




















( 17) 




















( 18) 




















( 19) 




















( 20) 




















a. r above the diagonal; r^{2} below 

b. linear scoring system. 

c. NORC prestige ratings. 
The r and r^{2} values in Table 3 are consistently and substantially high, indicating a high degree of interchangeability among the 20 scoring systems. Out of 190 correlation coefficients, all are above .90 (a few even reach unity), and 157 are .97 and above. Therefore, even without a rationale concerning the differences between ranks, by using a nearly random method of assigning scoring systems (consistent with the monotonic function), it is possible that under specific conditions the selected scoring system will deviate from the "true" system by a near zero or negligible amount. The r^{2} values are slightly lower than the r values, but still exceedingly high. For example, only nine of the 190 are below .90, and none are below .83. (Since r^{2} is the square of a decimal fraction, it is necessarily smaller than r.) Note that if the equidistant (linear) scoring system is always selected (no matter what the "true" scoring system may be), the expected error is smaller than the larger errors cited above. Almost all the r's and r^{2}'s for the linear system (1) are near unity, with the lowest r being .97 and the lowest r^{2} being .94. The linear scoring system lies midway between the other scoring systems (in correlational terms), which by definition excludes the most extreme scoring systems in each direction. The correlations between the extremes are lowest, and, therefore, selecting the linear scoring system eliminates the lowest r's and the highest potential "errors" in selecting a scoring system different from the "true" one. Possessing some knowledge about the amount of differences between ranks can reduce the small error even further, if the linear scoring system has been assigned to the ordinal categories. Perhaps, the best strategy, if there is some knowledge of the differences between ranks, is to modify the linear scoring system accordingly. For example, in the relation X1 > X2> X3, X2 is assumed to be closer to X3 than to X,. Consequently, the linear scoring system of 10, 20 and 30 (as values for Xs, X2 and X3) can be modified to 10, 25 and 30 to account for this additional knowledge. It should be stressed that without prior knowledge or theory such score assignments are not likely to prove useful for analysis. Table 4 offers further evidence that ordinal data can be treated as if they are interval by assigning scoring systems to the ordered categories. In this instance, the predictive ability of each scoring system is assessed in terms of its relation to suicide rates. As indicated previously, the rho value between the NORC prestige scale and 1950 suicide rates for males in 36 occupations is .07; for the same data, r is .11. Table 4 reports the r and r2 values between the 20 scoring systems and the suicide rate. (The last two columns in Table 4 are r values for 20 and 10 occupations respectively and will be discussed later in the paper.) The similarity in predicting an outside variable is extremely high. The r's vary between .09 and .15, and the 3 values are either .01 or .02.[5] Given some degree of unreliability in occupational prestige and suicide data, and the rather crude measurement procedures, these results substantiate the point that different systems yield interchangeable variables. Each indicates a quite low positive (statistically nonsignificant) relation between occupational prestige and suicide. These results are consistent with a previous study (Labovitz, 1967), which also found the relations to be very similar; however, in the previous study, the relations are somewhat higher and statistically significant. 
BETWEEN SUICIDE RATES AND TWENTY SCORING SYSTEMS OF OCCUPATIONAL PRESTIGE 






(linear) (1) 




(prestige rankings) ( 2) 




( 3) 




( 4) 




( 5) 




( 6) 




( 7) 




( 8) 




( 9) 




( 0) 




( 11) 




( 12) 




( 13) 




( 14) 




( 15) 




( 16) 




( 17) 




( 18) 




( 19) 




( 20) 




Partially based on the data in Tables 1 and 2. Scoring systems 3 to 18 are randomly generated. 
CONCLUSIONS The results of the tests based on assigning interval scores to ordinal categories suggest: (1) certain interval statistics can be used interchangeably with ordinal statistics and interpreted as ordinal, (2) certain interval statistics (e.g., variance) can be computed where no ordinal equivalent exists and can be interpreted with accuracy, (3) certain interval statistics can be given their interval interpretation with only negligible error if the variable is "nearly" interval, and (4) certain interval statistics can be given their interval interpretations with caution (even if the variable is "purely" ordinal), because the "true" scoring system and the assigned scoring system, especially the equidistant system, are almost always close as measured by r and A Consequently, treating ordinal variables as if they are interval has these advantages: (1) the use of more powerful, sensitive, better developed and interpretable statistics with known sampling error, (2) the retention of more knowledge about the characteristics of the data, and (3) greater versatility in statistical manipulation, e.g., partial and multiple correlation and regression, analysis of variance and covariance, and most pictorial presentations. The study suggests two research strategies when analyzing ordinal variables. First, assign a linear scoring system according to the available evidence on the distances between ranks. Second, use all available rank order categories, rather than collapsing them into a smaller number, because the greater the number of ranks the greater the stability and confidence in the assigned scoring system (unless the dichotomization of ranks is suspected). The alltoofrequent strategy of dichotomizing or trichotomizing variables should be avoided if possible.[7] A final word of caution is necessary. The researcher should know and report the actual scales of his data, and any interval statistics selected should be interpreted with care. Further exploration and tests are necessary for added confidence in treating ordinal data as if they are interval. The more conservative procedure, of course, is to treat ordinal data as strictly ordinal, and thereby avoid the possibility of attributing a property to a given scale which it does not possess. NOTES * I am grateful to Robert Hacedorn, Harvey Marshall, Ross Purdy, and the referees of ASR for their helpful comments and critical reading of an earlier draft. 1. Labovitz demonstrates the utility of treating ordinal variables as interval for a hypothetical problem relating two types of therapy to four subjective responses: it made me worse (); it had no effect (0) it helped a little (+) and it helped quite a bit (++ ). The four ordinal responses are assigned scores ranging from highly skewed (e.g., 0,1, 2, 10) to equidistant systems (e.g., 0, 3,6, 10. The monotonic scoring systems produce largely similar pointbiserial coefficients, ttests, and critical ratios. Furthermore, the divergent scoring systems are highly interrelated. The vs between the two types of therapy and the four subjective responses are somewhat higher (averaging about .20) than the correlation coefficients in this study (averaging about .12). 2. Small error may result because the difference between two adjacent ranks may not be the same as the difference between two other adjacent ranks. 3. Duncan's socioeconomic index, based upon the income and educational levels of each occupation, correlates highly with the NORC prestige scale. 4. Abelson and Tukey also use r2 as the criterion for assessing the adequacy of numerical assignments and, in addition, present a "maximin" 72 to assess the largest possible error in a scoring system. Briefly, an assigned scoring system X is correlated with a "true" system Y so that the minimum possible r2 between X and Y achieves its maximum value. Their analysis, instead of leading to an average error rate (in which the "true" r2 may be equally above or below the rate), results in a conservative lower limit estimate. This lower limit estimate Is based on a sequence called "corners," which is consistent with the inequalities (i.e., it follows the monotonic or equality functions) and is based on a set of dichotomized values. For example, given the following relations Y,~Y±~Y~ V., a set of corners is (0, 0, 0, 1), (0, 0,1, 1), and (0, 1, 1, 1). One of these corner sequences yields the maximin r2. There are three problems with Abelson and Tukey's analysis: (1) an average error rate is more indicative of a representative error (i.e., the most likely error in assigning a scoring system) and, therefore, is more useful to the researcher, (2) the corner sequence is based on dichotomies which is a highly unlikely occurrence and a waste of information, and (3) they analyze only "greater than" end "equal to" models in com hination ~ while the most frequent ordinal cases are "greater than" between most ranks (Y,>Y,>Y,). The "greater than" model leads into a dichotomous analysis onlyif there are two ranks (a trivial case). 5. It should be noted that the usual purpose of a transformation in correlation work is to raise the correlation. However, the stability of the correlations in this study is not inconsistent with this general principle. 6. This is Anderson's basic reason for selecting parametric over nonparametric statistics. 7. Another reason against the use of dichotomies or trichotomies is that often a large amount of information is lost by such drastic collapsing. Abelson, Robert P. and John W. Tukey. (1959) "Efficient conversation of nonmetric information to metric information." Proceedings of the Social Statistics Section, American Statistical Association: 226230. Anderson, Norman A. (1961) "Scales and statistics: Parametric and nonparametric." Psychological Bulletin, 58 (July) :305316. Blalock, Hubert M., Jr. (1967) Letter. American Journal of Sociology 72 (May) :675677. __________________(1968) "Theory building and causal inferences." pp. 155196 in Hubert M. Blalock, Jr. and Ann B. Bialock (eds.), Methodology in Social Research. New York: McGrawHill. Boneau, C. A. __________________(1960) "The effects of violations of assumptions underlying the 't' test." Psychological Bulletin, 57 (January) :4964. Breed, Warren (1963) "Occupational mobility and suicide among white males." American Sociological Review, 28 (April):179189. Dublin, L. I. (1963) Suicide: A Sociological and Statistical Study. New York: Ronald Press. Hirsh, Joseph. (1959) "Suicide." Mental Hygiene, 43 (October): 516526. Labovitz, Sanford (1967) "Some observations on measurement and statistics." Social Forces, 46 (December): 151160. ______________ (1968a) "Reply to Champion and Morris." Social Forces, 46 (June) :543545. ______________ (1968b) 'Variation in suicide rates." Pp. 5 774 in Jack P. Gibbs (ed.), Suicide. New York: Harper and Row. Lindquist, E. F. (1953) Design and Analysis of Experiments. Pp. 7890. New York: Houghton Mifflin. Mans, Ronald (1967) "Suicide, status, and mobility in Chicago." Social Forces, 46 (December) :246256. Morris, Raymond N. (1968) "Commentary." Social Forces, 46 (June): 541343. Powell, Elwin M. (1958) "Occupation, status, and suicide: Toward a redefinition of anomie." American Sociological Review, 23 (April):131140. Rein, Albert J., Jr., with 0. D. Duncan, Paul K. Hatt, and Cecil C. North (1961) Occupations and Social Status. New York:Free Press. Somers, Robert H.(1962) "A new asymmetric measure of association for ordinal variables." American Sociological Review, 27 (December):799811. 