Correlational analysis often produces high correlations
that are unwarranted regarding the intent of the
analysis. One of the most common errors--made by more
than half the class every year that I've taught
statistics--is correlating two variables that are both
functions of a third variable. This problem surfaces most
in the case of population size. Because the observed
correlation is artificially high (given one's
understanding of the relationship), it is called an
**artifact***--*not a true reflection of any
causal relationship between the two variables.
Consider this example, which
correlated **V1557** (Total federal income tax,
1988 in millions of dollars)with
**V1571** (Federal Funds & Grants in 1989 in
million $), two variables in the STATES file. The
correlation was above .95, but this high correlation
essentially reflected only the population sizes of the
states in the analysis. California tops both variables,
New York is next, and so on down to the small
states.