Review: Bivariate Statistics

Path: janda.org/c10 > Syllabus > Review: Bivariate Statistics

Bivariate Distributions: Strength, Form, Significance

Statistical methods to use when both variables are DISCRETE and NONORDERABLE

Suitable SPSS procedure under the Analyze Menu and under Descriptive Statistics is crosstabs
- The "Cells" button in crosstabs computes percentages for cell entries based on three different totals:
  - by Column -- appropriate when the independent variable is in the columns--the usual case
  - by Row -- appropriate when the row variable is treated as the independent variable
  - by Total of all the cases in the table--used only under special circumstances of analysis
- The "Statistics" button in crosstabs leads to boxes for several types of statistics:
  - chi-square for tests of independence between the variables in the table
  - familiar Pearson bivariate correlations between the variables
  - Measures of association for nominal data
    - contingency coefficient
    - Phi and Cramer's V
    - Lambda
    - Uncertain coefficient (which we did not cover)
  - Measures of association for ordinal data
    - Gamma
    - Somer's d
    - Kendall's Tau-b
    - Kendall'es Tau-c
  - A measure of association for interval data by nominal classification: eta
  - Other assorted statistics that we didn't cover.
Strength of the relationship can be measured by these measures, none of which we really studied.
- Lambda, offers the best PRE interpretation for predicting to the dependent variable from knowledge of the independent
- Contingency Coefficient, C, which is based on chi-square, but has no operational interpretation, cannot reach 1.0, and cannot be compared across tables of different size
- Cramer's V, another chi-square measure, which can be compared across tables of different size and ranges between 0 and 1.0 but has no PRE interpretation
- Phi, , yet another chi-square based measure that does range between 0 and 1.0 but is suitable only for 2x2 tables
Form of the relationship between two nominal variables can be determined only by inspection of the cell entries to see where cases cluster and where they are absent.
Significance of the relationship is best determined by
- Chi-square, X²which measures the difference between observed and expected cell frequencies. Significance is tested by entering the chi-square table for appropriate degrees of freedom. go to example
- The significance of lambda is calculated by SPSS, using a complex formula for lambda's standard error.

Statistics for variables that are CONTINUOUS or DISCRETE ORDERABLE

Suitable SPSS procedures
- crosstabs (requesting the Pearson correlation coefficient)
- correlate
- scatterplot under the Graph Menu --> Interactive
Strength can be measured by the Pearson product moment correlation, r, which can be calculated by several formulas.
- Actually, the strength of a correlation is expressed by r², called the coefficient of determination.
- It can be interpreted as the proportion of variance in the dependent variable that is "explained" by the independent variable.
- Of course, correlation does not mean causation, so explanation is assessed theoretically rather than proved.
Form can be measured by the regression equation for the raw data.
- If the data are in standardized form, the b coefficient becomes a standardized beta coefficient and is equal to the correlation coefficient.
  - The intercept then becomes zero and drops out of the equation.
Significance can be tested easily against the null hypothesis by calculating a t statistic.
- The test is much more complicated when a non-zero r is hypothesized, but we did not take up that situation.
- Then the observed and expected r's must be transformed into something called "Fisher's Z" and a different test used.
- You do not need to know this procedure.

Statistical methods used when the dependent variable is CONTINUOUS or ORDERED DISCRETE and the independent variable is NOMINAL

Suitable SPSS procedures under the Analyze Menu, then Compare Means
- one sample t-test
- means
- One-Way ANOVA
Significance can be determined by the
- t-test, when the nominal variable is a dichotomy and the test become only one for the difference between two means.
  - The appropriate form of the t-test depends on
    - the independence of the two samples, use the Independent samples T test
      - if the cases are "matched" and then tested, use the Paired samples T test
    - the "equality" (or "monogeneity") of the variance for the dependent variable in each sample
      - Levene's test for unequal variance directs you to consult
      - either the line for equal variance, which uses a "pooled" variance estimate
      - or the line for unequal variance, which uses a "separate" estimate
    - usually, the two estimates will produce similar results regarding the null hypothesis.
- F-test, which follows from the analysis of variance as a generalization of the difference of means test (t-test) to k groups.
  - In fact, F=t-squared for 1 df between groups.
- The F-test is ratio of the mean sums of squares calculated between groups and that calculated within groups.
  - Put another way, it is the ratio of the mean between SS to the mean within SS.
Strength can be measured by eta-squared, which is a measure of the explained variation (between group SS) divided by the total variation (total SS). Thus it is analogous to the product moment correlation and is indeed equal to it in the special case of a dichotomous variable being correlated withan interval variable.
Form can be determined only by graphing the means of the dependent variable for each category and seeing which go up and which go down.