Path: janda.org/c10 > Syllabus > Topics and Readings > Relations between Interval Variables > Significance of r
Significance of the Correlation Coefficient

Test for the significance of relationships between two CONTINUOUS variables

  • We introduced Pearson correlation as a measure of the STRENGTH of a relationship between two variables
  • But any relationship should be assessed for its SIGNIFICANCE as well as its strength.

A general discussion of significance tests for relationships between two continuous variables.

  • Factors in relationships between two variables
    • The strength of the relationship:
      • is indicated by the correlation coefficient: r
      • but is actually measured by the coefficient of determination: r2
    • The significance of the relationship
      • is expressed in probability levels: p (e.g., significant at p =.05)
      • This tells how unlikely a given correlation coefficient, r, will occur given no relationship in the population
        • NOTE! NOTE! NOTE! The smaller the p-level, the more significant the relationship
        • BUT! BUT! BUT! The larger the correlation, the stronger the relationship
  • Consider the classical model for testing significance
    • It assumes that you have a sample of cases from a population
    • The question is whether your observed statistic for the sample is likely to be observed given some assumption of the corresponding population parameter.
    • If your observed statistic does not exactly match the population parameter, perhaps the difference is due to sampling error
    • The fundamental question: is the difference between what you observe and what you expect given the assumption of the population large enough to be significant -- to reject the assumption?
    • The greater the difference -- the more the sample statistic deviates from the population parameter -- the more significant it is
    • That is, the lessl ikely (small probability values) that the population assumption is true.
  •  The classical model makes some assumptions about the population parameter:
    • Population parameters are expressed as Greek letters, while corresponding sample statistics are expressed in lower-case Roman letters:
      • r = correlation between two variables in the sample
      • (rho) = correlation between the same two variables in the population
    • A common assumption is that there is NO relationship between X and Y in the population: = 0.0
    • Under this common null hypothesis in correlational analysis: r = 0.0 


Testing for the significance of the correlation coefficient, r

  • When the test is against the null hypothesis: r xy = 0.0
    • What is the likelihood of drawing a sample with r xy ­ 0.0?
    • The sampling distribution of r is
      • approximately normal (but bounded at -1.0 and +1.0) when N is large
      • and distributes t when N is small.
    • The simplest formula for computing the appropriate t value to test significance of a correlation coefficient employs the t distribution:
    •  The degrees of freedom for entering the t-distribution is N - 2
  • Example: Suppose you obsserve that r= .50 between literacy rate and political stability in 10 nations
    • Is this relationship "strong"?
      • Coefficient of determination = r-squared = .25
      • Means that 25% of variance in political stability is "explained" by literacy rate
    • Is the relationship "significant"?
    • That remains to be determined using the formula above
      r = .50 and N=10

      set level of significance (assume .05)

      determine one-or two-tailed test (aim for one-tailed)

    • For 8 df and one-tailed test, critical value of t = 1.86
      • We observe only t = 1.63
      • It lies below the critical t of 1.86
      • So the null hypothesis of no relationship in the population (r = 0) cannot be rejected
         
  • Comments
    • Note that a relationship can be strong and yet not significant
    • Conversely, a relationship can be weak but significant
      • The key factor is the size of the sample.
      • For small samples, it is easy to produce a strong correlation by chance and one must pay attention to signficance to keep from jumping to conclusions: i.e.,
        • rejecting a true null hypothesis,
        • which meansmaking a Type I error.
    • For large samples, it is easy to achieve significance, and one must pay attention to the strength of the correlation to determine if the relationship explains very much.
  •  Alternative ways of testing significance of r against the null hypothesis
    • Look up the values in a table
    • Read them off the SPSS output:
      • check to see whether SPSS is making a one-tailed test
      • or a two-tailed test 


Testing the significance of r when r is NOT assumed to be 0

  • This is a more complex procedure, which is discussed briefly in the Kirk reading
    • The test requires first transforming the sample r to a new value, Z'.
      • This test is seldom used.
      • You will not be responsible for it.