CROSSTABS

Path: janda.org/c10 > Overview of SPSS > Crosstabs

CROSSTABS in SPSS 10

Crosstabs is an SPSS procedure that cross-tabulates two variables, thus displaying their relationship in tabular form. In contrast to Frequencies, which summarizes information about one variable, Crosstabs generates information about bivariate relationships.

Crosstabs creates a table that contains a cell for every combination of categories in the two variables.

Inside each cell is the number of cases that fit that particular combination of responses.
SPSS can also report the row, column, and total percentages for each cell of the table.

Because Crosstabs creates a row for each value in one variable and a column for each value in the other, the procedure is not suitable for continuous variables that assume many values. Crosstabs is designed for discrete variables--usually those measured on nominal or ordinal scales.

How crosstabs differs from a scatterplot under the "Graphs" Menu

Like crosstabs, scatterplot portrays the joint distribution of two variables.
Unlike crosstabs, scatterplot is designed for continuous variables which distribute cases across unique points in space.
To underscore the difference, consider this scatterplot that plots "party identification" (dependent) by "ideology" (independent) for several hundred cases in the vote00 file:

A scattergram is not useful for this analysis because

neither variable is continuous, so the cases are not spread along both axes
both variables are discrete, so their values occupy specific points along ech axis
each [] in the scattergram simply indicates the occurrence of at least one case at that point
there are hundreds of cases represented in the plot, so many cases lie behind each [] -- but we can't tell how many.

Here is the Crosstabs output for the same two variables:

Crosstabulation -- K1x. PARTY ID SUMMARY by R's placement on Liberal-Conservative scale
	R's placement on Liberal-Conservative scale							Total
K1x. PARTY ID SUMMARY	extremely liberal	liberal	slightly liberal	moderate	slightly conservative	conservative	extremely conservative
Strong Democrat	7	35	24	32	10	15	1	124
Weak Democrat	2	13	17	38	11	8		89
Ind Democrat	6	16	21	46	7	11		107
Independent		4	8	30	9	10	3	64
Ind Republican		4	7	32	30	15	6	94
Weak Republican	1	2	6	24	29	24	6	92
Strong Republican		1	2	12	17	57	10	99
Total	16	75	85	214	113	140	26	669

Note two things about this table:

The cell entries indicate the number of cases (respondents) with that combination of scores on each variable.
The low values (0=Democrat) occurs at the top of the table in crosstabs
- They occur at the bottom of the table in scattergram
Below is the general, abstract form of crosstabs output

	Independent Variable
Dependent Variable	category 1	category 2	. . category k	Totals
Category 1	Table entries consist of frequencies, or percentages, or both. Intersections of rows and columns are called "cells."			row 1
Category 2				row 1
. . Category j				. . row j
Totals	N	N	N	Grand N
Percents	100.0	100.0	100.0	100.0

Crosstabs are usually presented with the independent variable across the top and the dependent along the side.
This follows the presentation in scattergram plots.
As explained on page 68 of the Users' Guide, SPSS can calculate percentages for cell entries in three different ways
- calculated according to the number of cases in each column
- calculated according to the number of cases in each row
- calculated according to the TOTAL number of cases in the table

Conventions and advice concerning crosstabs

By convention, the independent variable is arranged across the top of the table, unless number of categories or size of space prohibit.
ALWAYS, percentages are computed within the categories of the independent variable -- as shown in the sample table.
- Percentages are computed by rows only if the layout of the data call for placing the independent variable in the rows.
- That may be needed if there are more categories in the independent variable than fit easily along the columns
- Only unique analytical needs invite calculating percentages by totals--avoid doing this unless you know why.
SPSS offers the option of calculating percentages all three ways, but that produces a cluttered table.
- avoid checking all three options for percentages.
Limitations of crosstabs print format
- crosstabs tables in SPSS can't handle more than seven categories in a column variable without "wrapping" over.
- There is no limitation on the number of categories for the dependent variable -- down the side.
Consequences of the limitation
- Tables with more than 10 categories for the independent variable will be "wrapped around" and printed as a "continuation" of the first table.
- Consider this example:
  - Supposing an AGE variable has values ranging from 17 to 99 and an INCOME variable has 20 coding categories.
  - If one specified CROSSTABS INCOME BY AGE, only the first 10 of AGE's values could fit across the top of the page, and a continuation table would be printed on another page.
  - However, the command CROSSTABS AGE BY INCOME would place AGE along the side, allowing it to print out in full on one table (if one really wanted age by exact years).
    The AGE variable could also be "recoded" into fewer categories with handled by using the RECODE command in SPSS.
  - RECODE can be used either to change or to combine codes assigned to variables in an SPSS file.
  - For example, V8 is a 7 category measure of party identification, ranging from 0 to 6.
  - These scores can be "recoded" to a 3-point scale as follows:
    
    RECODE V8 (0=1) (2,4=3) (6=5)
  - When placed before the CROSSTABS command, RECODE will change the variable into a trichotomy: 1=Democrats, 3=Independents, and 5=Republicans.