
 Probability
distributions for CONTINUOUS variables
 We will be
using four major types of probability
distributions:
 The
normal distribution, which you already
encountered.
 The
t distribution, which you will learn
next.
 The
F distribution, which is related to the
t distribution.
 The
chisquare distribution.
 Using the
normal distribution as a probability distribution
requires thinking in probability terms.
 Sample
statistics are used to make predictions of
population parameters
 Theory
is based on simple random samples
 Population
parameters are descriptive characteristics of
populations
 The population
mean is,
(mu)
 The
population variance is ^{2},
(sigma squared) and its standard deviation,
(sigma).
 Distinctions
among terms and symbols for different
"distributions":

Mean

St.
Dev.

The
POPULATION distribution (real but
unknown)

_________

_________

The
SAMPLE distribution (real and known)

_________

_________

The
SAMPLING distribution (unreali.e.,
hypotheticalbut known)

_________

_________

 Illustrations
with state vote for Clinton in
1996:

Mean

St.
Dev.

Known
population parameters for 51 states:

_________

_________

Observed
values for any sample: (your sample and
s.d.)

_________

_________

Expected
values of the sampling distribution

_________

_________

 The
expected values are known by the CENTRAL LIMIT
THEOREM:
 If
repeated samples of N observations are drawn from a
population with mean,
and variance, ^{2},
 then
as N grows large, the sample
means will become normally distributed
with
 mean,
,
 variance,
^{2}/n
, and
 standard
deviation, .
 The
standard deviation of the sampling distribution
of means
 is
known as the standard error of the mean,
 and it
is symbolized as ,
 that is,
=
.
Using the central limit theorem
 to make
point estimates and
 to
construct confidence intervals for population
parameters from sample data.
POINT
ESTIMATES of population parameters
 A point
estimate is the value of a sample statistic used to
estimate a population parameter.
 Any
population parameter can be estimated from sample data
 mean, standard deviation, correlation coefficient,
etc.
 We will
be concerned here only with point estimates of the
mean
 The
logic of inference for point estimates of the mean
extends to other parameters  e.g, the correlation
coefficient, slope, etc.
 Point
estimates are "best guesses" and involve margins of
sampling errors in guessing.
 Desirable properties of a point estimate of a
population parameter:
 A good
estimate is unbiased:
 the
mean of the sampling distribution of the statistic
is equal to the parameter.
 the
Central Limit Theorem tells us this is true for
sample estimates of the population
mean.
 however,
when N is used in the denominator to calculate
the standard deviation for sample data, it is a
biased estimate of the population
standard deviation  hence the correction by
dividing by N1 to produce an unbiased
estimate.
 A good
estimate is consistent: it approaches the
parameter as the sample size increases.
 A good
estimate is efficient: its sampling
distribution has a smaller standard deviation
(standard error) than any rival statistic  e.g, the
sample mean is a more efficient estimate of the
population mean than is the median, and the median is
more efficient than the mode.
 Using the
standard error of the sample statistic, these
margins of errors can be expressed precisely in a
confidence interval.
CONFIDENCE
INTERVALS for population parameters
 A
confidence interval is a range of values around a
point estimate that expresses the probability that the
interval contains the population parameter between the
upper and lower limits of the interval.
 A
confidence interval is computed by adding and
subtracting standard error units around the
mean.
 A
confidence interval is always associated with a
level of confidence that the estimate will be
correct.
 Example
of point estimates and confidence intervals in the
boxed note that the New York Times prints along
with its surveys:

 Point
estimate in article: "40% in Survey Say Inflation is
Major Issue"
 Confidence
interval as expressed in the New York
Times:
"In
theory, one can say with 95 percent
certainty that the results based on the
entire sample differ by no more than 3
percentage points in either direction from
what would have been obtained by
interviewing all adult
Americans."

 Confidence
interval as expressed in statistics:
We
are 95% confident that the population
parameter (40% figure) lies between 37%
and 43%.

 Factors in
the confidence interval estimate of a population
mean
 Variability
in the sampling distribution of the mean, based
on
 The
amount of variation of values in the
population
 the
size of the sample:
 both
factors are weighed in the standard error of the
mean:
 Degree
of confidence in making the estimate
 level
of confidence e.g., .95, .99, .999
 Complement
of the alpha value: e.g., .05, .01,
.001
