Path: janda.org/c10 > Syllabus > Topics and Readings > Statistical Inference > Sampling Distributions

 Statistical Inference
Sampling Distributions and Population Distributions

Probability distributions for CONTINUOUS variables
• We will be using four major types of probability distributions:
• The normal distribution, which you already encountered.
• The t distribution, which you will learn next.
• The F distribution, which is related to the t distribution.
• The chi-square distribution.
• Using the normal distribution as a probability distribution requires thinking in probability terms.
• Sample statistics are used to make predictions of population parameters
• Theory is based on simple random samples
• Population parameters are descriptive characteristics of populations
• The population mean is, (mu)
• The population variance is 2, (sigma squared) and its standard deviation, (sigma).
• Distinctions among terms and symbols for different "distributions": Mean St. Dev. The POPULATION distribution (real but unknown) _________ _________ The SAMPLE distribution (real and known) _________ _________ The SAMPLING distribution (unreal--i.e., hypothetical--but known) _________ _________
• Illustrations with state vote for Clinton in 1996: Mean St. Dev. Known population parameters for 51 states: _________ _________ Observed values for any sample: (your sample and s.d.) _________ _________ Expected values of the sampling distribution _________ _________
• The expected values are known by the CENTRAL LIMIT THEOREM:
If repeated samples of N observations are drawn from a population with mean, and variance, 2,
then as N grows large, the sample means will become normally distributed with
• mean, ,
• variance, 2/n , and
• standard deviation, .
• The standard deviation of the sampling distribution of means
• is known as the standard error of the mean,
• and it is symbolized as , -- that is, = .

Using the central limit theorem

• to make point estimates and
• to construct confidence intervals for population parameters from sample data.

POINT ESTIMATES of population parameters

• A point estimate is the value of a sample statistic used to estimate a population parameter.
• Any population parameter can be estimated from sample data -- mean, standard deviation, correlation coefficient, etc.
• We will be concerned here only with point estimates of the mean
• The logic of inference for point estimates of the mean extends to other parameters -- e.g, the correlation coefficient, slope, etc.
• Point estimates are "best guesses" and involve margins of sampling errors in guessing.
• Desirable properties of a point estimate of a population parameter:
• A good estimate is unbiased:
• the mean of the sampling distribution of the statistic is equal to the parameter.
• the Central Limit Theorem tells us this is true for sample estimates of the population mean.
• however, when N is used in the denominator to calculate the standard deviation for sample data, it is a biased estimate of the population standard deviation -- hence the correction by dividing by N-1 to produce an unbiased estimate.
• A good estimate is consistent: it approaches the parameter as the sample size increases.
• A good estimate is efficient: its sampling distribution has a smaller standard deviation (standard error) than any rival statistic -- e.g, the sample mean is a more efficient estimate of the population mean than is the median, and the median is more efficient than the mode.
• Using the standard error of the sample statistic, these margins of errors can be expressed precisely in a confidence interval.

CONFIDENCE INTERVALS for population parameters

• A confidence interval is a range of values around a point estimate that expresses the probability that the interval contains the population parameter between the upper and lower limits of the interval.
• A confidence interval is computed by adding and subtracting standard error units around the mean.
• A confidence interval is always associated with a level of confidence that the estimate will be correct.
Example of point estimates and confidence intervals in the boxed note that the New York Times prints along with its surveys:

Point estimate in article: "40% in Survey Say Inflation is Major Issue"
Confidence interval as expressed in the New York Times:
 "In theory, one can say with 95 percent certainty that the results based on the entire sample differ by no more than 3 percentage points in either direction from what would have been obtained by interviewing all adult Americans."
Confidence interval as expressed in statistics:
 We are 95% confident that the population parameter (40% figure) lies between 37% and 43%.
• Factors in the confidence interval estimate of a population mean
• Variability in the sampling distribution of the mean, based on
• The amount of variation of values in the population
• the size of the sample:
• both factors are weighed in the standard error of the mean:
• Degree of confidence in making the estimate
• level of confidence e.g., .95, .99, .999
• Complement of the alpha value: e.g., .05, .01, .001