Path: janda.org/c10 > Syllabus > Topics and Readings > Interpreting r > Correlation Articats

Correlation Artifacts


Correlational analysis often produces high correlations that are unwarranted regarding the intent of the analysis. One of the most common errors--made by more than half the class every year that I've taught statistics--is correlating two variables that are both functions of a third variable. This problem surfaces most in the case of population size. Because the observed correlation is artificially high (given one's understanding of the relationship), it is called an artifact--not a true reflection of any causal relationship between the two variables.

Consider this example, which correlated V1557 (Total federal income tax, 1988 in millions of dollars)with V1571 (Federal Funds & Grants in 1989 in million $), two variables in the STATES file. The correlation was above .95, but this high correlation essentially reflected only the population sizes of the states in the analysis. California tops both variables, New York is next, and so on down to the small states.