Path: janda.org/c10 > Syllabus > Topics and Readings > Statistical Inference > Sampling Distributions

Statistical Inference
Sampling Distributions and Population Distributions
 
Probability distributions for CONTINUOUS variables
  • We will be using four major types of probability distributions:
    • The normal distribution, which you already encountered.
    • The t distribution, which you will learn next.
    • The F distribution, which is related to the t distribution.
    • The chi-square distribution.
  • Using the normal distribution as a probability distribution requires thinking in probability terms.
    • Sample statistics are used to make predictions of population parameters
    • Theory is based on simple random samples
  • Population parameters are descriptive characteristics of populations
    • The population mean is, (mu)
    • The population variance is 2, (sigma squared) and its standard deviation, (sigma).
  • Distinctions among terms and symbols for different "distributions":

    Mean
    St. Dev.

    The POPULATION distribution (real but unknown)

    _________
    _________

    The SAMPLE distribution (real and known)

    _________
    _________

    The SAMPLING distribution (unreal--i.e., hypothetical--but known)

    _________
    _________
  • Illustrations with state vote for Clinton in 1996:

    Mean
    St. Dev.

    Known population parameters for 51 states:

    _________
    _________

    Observed values for any sample: (your sample and s.d.)

    _________
    _________

    Expected values of the sampling distribution

    _________
    _________
  • The expected values are known by the CENTRAL LIMIT THEOREM: 
    If repeated samples of N observations are drawn from a population with mean, and variance, 2,
    then as N grows large, the sample means will become normally distributed with
    • mean, ,
    • variance, 2/n , and
    • standard deviation, . 
  • The standard deviation of the sampling distribution of means
    • is known as the standard error of the mean,
    • and it is symbolized as , -- that is, = .


Using the central limit theorem

  • to make point estimates and
  • to construct confidence intervals for population parameters from sample data. 


    POINT ESTIMATES of population parameters

  • A point estimate is the value of a sample statistic used to estimate a population parameter.
    • Any population parameter can be estimated from sample data -- mean, standard deviation, correlation coefficient, etc. 
    • We will be concerned here only with point estimates of the mean 
    • The logic of inference for point estimates of the mean extends to other parameters -- e.g, the correlation coefficient, slope, etc.
    • Point estimates are "best guesses" and involve margins of sampling errors in guessing. 
  • Desirable properties of a point estimate of a population parameter:
    • A good estimate is unbiased:
      • the mean of the sampling distribution of the statistic is equal to the parameter.
        • the Central Limit Theorem tells us this is true for sample estimates of the population mean.
        • however, when N is used in the denominator to calculate the standard deviation for sample data, it is a biased estimate of the population standard deviation -- hence the correction by dividing by N-1 to produce an unbiased estimate.
    • A good estimate is consistent: it approaches the parameter as the sample size increases.
    • A good estimate is efficient: its sampling distribution has a smaller standard deviation (standard error) than any rival statistic -- e.g, the sample mean is a more efficient estimate of the population mean than is the median, and the median is more efficient than the mode. 
  • Using the standard error of the sample statistic, these margins of errors can be expressed precisely in a confidence interval.


    CONFIDENCE INTERVALS for population parameters

  • A confidence interval is a range of values around a point estimate that expresses the probability that the interval contains the population parameter between the upper and lower limits of the interval.
    • A confidence interval is computed by adding and subtracting standard error units around the mean.
    • A confidence interval is always associated with a level of confidence that the estimate will be correct.
    Example of point estimates and confidence intervals in the boxed note that the New York Times prints along with its surveys:
     
    Point estimate in article: "40% in Survey Say Inflation is Major Issue"
    Confidence interval as expressed in the New York Times:

    "In theory, one can say with 95 percent certainty that the results based on the entire sample differ by no more than 3 percentage points in either direction from what would have been obtained by interviewing all adult Americans."

    Confidence interval as expressed in statistics: 

    We are 95% confident that the population parameter (40% figure) lies between 37% and 43%.

  • Factors in the confidence interval estimate of a population mean
    • Variability in the sampling distribution of the mean, based on
      • The amount of variation of values in the population
      • the size of the sample:
      • both factors are weighed in the standard error of the mean:
    • Degree of confidence in making the estimate
      • level of confidence e.g., .95, .99, .999
      • Complement of the alpha value: e.g., .05, .01, .001