Lecture 14 - Normal Distribution and Sampling Distributions

Path: janda.org/c10 > Syllabus > Topics and Readings > Statistical Inference > Sampling Distributions

Statistical Inference

Sampling Distributions and Population Distributions

Probability distributions for CONTINUOUS variables

We will be using four major types of probability distributions:

The normal distribution, which you already encountered.

The t distribution, which you will learn next.

The F distribution, which is related to the t distribution.

The chi-square distribution.

Using the normal distribution as a probability distribution requires thinking in probability terms.

Sample statistics are used to make predictions of population parameters

Theory is based on simple random samples

Population parameters are descriptive characteristics of populations

The population mean is, (mu)

The population variance is ², (sigma squared) and its standard deviation, (sigma).

Distinctions among terms and symbols for different "distributions":

Mean St. Dev.

The POPULATION distribution (real but unknown)
_________ _________

The SAMPLE distribution (real and known)
_________ _________

The SAMPLING distribution (unreal--i.e., hypothetical--but known)
_________ _________

Illustrations with state vote for Clinton in 1996:

Mean St. Dev.

Known population parameters for 51 states:
_________ _________

Observed values for any sample: (your sample and s.d.)
_________ _________

Expected values of the sampling distribution
_________ _________

The expected values are known by the CENTRAL LIMIT THEOREM:

If repeated samples of N observations are drawn from a population with mean, and variance, ²,

then as N grows large, the sample means will become normally distributed with

mean, ,

variance, ²/n , and

standard deviation, .

The standard deviation of the sampling distribution of means

is known as the standard error of the mean,

and it is symbolized as , -- that is, = .

Using the central limit theorem

to make point estimates and

to construct confidence intervals for population parameters from sample data.

POINT ESTIMATES of population parameters

A point estimate is the value of a sample statistic used to estimate a population parameter.

Any population parameter can be estimated from sample data -- mean, standard deviation, correlation coefficient, etc.

We will be concerned here only with point estimates of the mean

The logic of inference for point estimates of the mean extends to other parameters -- e.g, the correlation coefficient, slope, etc.

Point estimates are "best guesses" and involve margins of sampling errors in guessing.

Desirable properties of a point estimate of a population parameter:

A good estimate is unbiased:

the mean of the sampling distribution of the statistic is equal to the parameter.

the Central Limit Theorem tells us this is true for sample estimates of the population mean.

however, when N is used in the denominator to calculate the standard deviation for sample data, it is a biased estimate of the population standard deviation -- hence the correction by dividing by N-1 to produce an unbiased estimate.

A good estimate is consistent: it approaches the parameter as the sample size increases.

A good estimate is efficient: its sampling distribution has a smaller standard deviation (standard error) than any rival statistic -- e.g, the sample mean is a more efficient estimate of the population mean than is the median, and the median is more efficient than the mode.

Using the standard error of the sample statistic, these margins of errors can be expressed precisely in a confidence interval.

CONFIDENCE INTERVALS for population parameters

A confidence interval is a range of values around a point estimate that expresses the probability that the interval contains the population parameter between the upper and lower limits of the interval.

A confidence interval is computed by adding and subtracting standard error units around the mean.

A confidence interval is always associated with a level of confidence that the estimate will be correct.

Example of point estimates and confidence intervals in the boxed note that the New York Times prints along with its surveys:

Point estimate in article: "40% in Survey Say Inflation is Major Issue"

Confidence interval as expressed in the New York Times:

"In theory, one can say with 95 percent certainty that the results based on the entire sample differ by no more than 3 percentage points in either direction from what would have been obtained by interviewing all adult Americans."

Confidence interval as expressed in statistics:

We are 95% confident that the population parameter (40% figure) lies between 37% and 43%.

Factors in the confidence interval estimate of a population mean

Variability in the sampling distribution of the mean, based on

The amount of variation of values in the population

the size of the sample:

both factors are weighed in the standard error of the mean:

Degree of confidence in making the estimate

level of confidence e.g., .95, .99, .999

Complement of the alpha value: e.g., .05, .01, .001