Path: Janda's Home Page > Workshop > Canonical Analysis > Outline

Programs that run canonical analysis


  • Canonical analysis is a standard procedure in Stata, SAS, and Statistica.
  • SPSS treats canonical analysis differently.
    • In early versions of SPSS, canonical analysis was a standard procedure.
    • Years ago, it was relegated to running as a "macro" using the "Syntax Command" structure.
    • Go to the SPSS HELP menu, search for "canonical" as a topic, and then read.
    • I have done that in the past, but I could not get SPSS to cooperate with me for this workshop.
  • I'll show only the results of the analysis, but not how to execute the SPSS commands.
  • The typical computer output (this discussion comes from Statistica) reports: 

Eigenvalues. When extracting the canonical roots, STATISTICA computes the eigenvalues. These can be interpreted as the proportion of variance accounted for by the correlation between the respective canonical variates. Note that the proportion here is computed relative to the variance of the canonical variates, that is, of the weighted sum scores of the two sets of variables; the eigenvalues do not tell how much variability is explained in either set of variables. The program will compute as many eigenvalues as there are canonical roots, that is, as many as the minimum number of variables in either of the two sets.

Canonical correlations. If the square root of the eigenvalues is taken, then the resulting numbers can be interpreted as correlation coefficients. Because the correlations pertain to the canonical variates, they are called canonical correlations. Like the eigenvalues, the correlations between successively extracted canonical variates are smaller and smaller. Therefore, as an overall index of the canonical correlation between two sets of variables, it is customary to report the largest correlation, that is, the one for the first root. However, the other canonical variates can also be correlated in a meaningful and interpretable manner (see below). 

Canonical weights. After determining the number of significant canonical roots, the question arises as to how to interpret each (significant) root. Remember that each root actually represents two weighted sums, one for each set of variables. One way to interpret the "meaning" of each canonical root would be to look at the weights for each set. These weights are called the canonical weights.
In general, the larger the weight (i.e., the absolute value of the weight), the greater is the respective variable's unique positive or negative contribution to the sum. To facilitate comparisons between weights, the canonical weights are usually reported for the standardized variables, that is, for the z transformed variables with a mean of 0 and a standard deviation of 1.
If you are familiar with multiple regression, you may interpret the canonical weights in the same manner as you would interpret the beta weights in a multiple regression equation. In a sense, they represent the partial correlations of the variables with the respective canonical root. If you are familiar with factor analysis, you can interpret the canonical weights in the same manner as you would interpret the factor score coefficients. To summarize, the canonical weights allow the user to understand the "make-up" of each canonical root, that is, it lets the user see how each variable in each set uniquely contributes to the respective weighted sum (canonical variate).

Canonical Scores. Canonical weights can also be used to compute actual values of the canonical variates; that is, you can simply use the weights to compute the respective sums. Again, remember that the canonical weights are customarily reported for the standardized (z transformed) variables. The Canonical Analysis module will automatically compute the canonical scores for you, which then can be saved for further analyses with other modules.

Factor structure. Another way of interpreting the canonical roots is to look at the simple correlations between the canonical variates (or factors) and the variables in each set. These correlations are also called canonical factor loadings. The logic here is that variables that are highly correlated with a canonical variate have more in common with it. Therefore, you should weigh them more heavily when deriving a meaningful interpretation of the respective canonical variate. This method of interpreting canonical variates is identical to the manner in which factors are interpreted in factor analysis.

Factor structure versus canonical weights. Sometimes, the canonical weights for a variable are nearly zero, but the respective loading for the variable is very high. The opposite pattern of results may also occur. At first, such a finding may seem contradictory; however, remember that the canonical weights pertain to the unique contribution of each variable, while the canonical factor loadings represent simple overall correlations. For example, suppose you included in your satisfaction survey two items which measured basically the same thing, namely: (1) "Are you satisfied with your supervisors?" and (2) "Are you satisfied with your bosses?" Obviously, these items are very redundant. When the program computes the weights for the weighted sums (canonical variates) in each set so that they correlate maximally, it only "needs" to include one of the items to capture the essence of what they measure. Once a large weight is assigned to the first item, the contribution of the second item is redundant; consequently, it will receive a zero or negligibly small canonical weight. Nevertheless, if you then look at the simple correlations between the respective sum score with the two items (i.e., the factor loadings), those may be substantial for both. To reiterate, the canonical weights pertain to the unique contributions of the respective variables with a particular weighted sum or canonical variate; the canonical factor loadings pertain to the overall correlation of the respective variables with the canonical variate.

Variance extracted. As discussed earlier, the canonical correlation coefficient refers to the correlation between the weighted sums of the two sets of variables. It tells nothing about how much variability (variance) each canonical root explains in the variables. However, you can infer the proportion of variance extracted from each set of variables by a particular root by looking at the canonical factor loadings. Remember that those loadings represent correlations between the canonical variates and the variables in the respective set. If you square those correlations, the resulting numbers reflect the proportion of variance accounted for in each variable. For each root, you can take the average of those proportions across variables to get an indication of how much variability is explained, on the average, by the respective canonical variate in that set of variables. Put another way, you can compute in this manner the average proportion of variance extracted by each root.