Path: > Syllabus > Topics and Readings > Discrete variables > Sample size
Closing Comments: Sample Size, Research Papers

Consider this question:

  • "Does it make any practical difference how large a sample with repect to the population from which it was drawn?"
    • Answer:
      • "No."

Consider this rejoinder:

  • " Do you mean to sit there, with that smug smile on your face, and tell me that to make equally accurate estimates for the population of Evanston, Illinois (population around 80,000), and for the United States (poplation around 280,000,000) you would require the same sample sizes?"
    • The smug answer:
      • "Yes." 

Consider the quandry of the apoplectic questioner:

  • "How can this be so?"
    • The appropriate reply:
    • "Take a course in statistics and find out."
  • You are about to find out.

Review of point estimates, confidence intervals, and sample sizes in predicting to population proportions

  • A point estimate is the prediction of the population proportion based on the observed proportion in a sample
    • If the sample shows that 33% of the population favors abolishing the death penalty
    • The researcher estimates that .33 of the population favors the death penalty.
  • A confidence interval is a range of values around a point estimate that expresses the probability that the interval contains the population parameter between the upper and lower limits of the interval.
    • A confidence interval is computed by adding and subtracting standard error units around the mean.
    • A confidence interval is always associated with a level of confidence that the estimate will be correct.
    • Example:
      • A point estimate and confidence interval in a 11/15/85 article on the Geneva summit in the New York Times, based on a NYT/CBS telephone poll of 1,659 adults in 48 continental states.
      • Question: "U.S. should try to reduce tensions with Russians"
      • Point estimate: 49% agreed, 41% disagreed
      • Statement on confidence interval:
        • "In theory, in 19 cases out of 20 the results based on such samples will differ by no more than 3 percentage points in either direction from what would have been obtained by interviewing all adult Americans. The error for smaller subgroups is larger. For example, the margin of sampling error for blacks is plus or minus 8 percentage points, and for college graduates it is plus or minus 5 percentage points."
      • Computing a confidence for a population parameter from these data:
      Sampling error for continuous data:
      • Sampling error for proportions:
      •  2 times .012 = .024 (or almost 3 percentage points)
      • Confidence interval:
        • 49% + or - 3% = 46% to 52%
        • Thus, we are 95% confident that the population parameter (49% figure) lies between 46% and 52%.

What factors figure into the confidence interval estimate of a population mean?

  • Variability in the sampling distribution of the mean, which is based on
    • variation of values in the population:
    • size of the sample:
    • both factors are combined in the standard error of the mean:
  • Degree of confidence in making the estimate, which is decided on by the researcher.
    • Level of confidence: e.g., .95, .99, .999
    • Complement of the alpha value: e.g., .05, .01, .001
  • Note that the proportion that the sample is of the population is not a major factor in the estimate's accuracy.
    • This is counter-intuitive (i.e., it goes against reason).
    • To explain this requires discussion of sampling with and without replacement.

There are two forms of simple random sampling:

  • Sampling with replacement
    • After each case is selected, it is replaced in the sample.
    • If replacement is not done with small populations (e.g., a deck of 52 cards), probability calculations can be materially affected.
  • Sampling without replacement.
    • If the population is very large (e.g, thousands of cases), probability calculations will not be materially affected.
    • The population decreases by one each time a case is drawn.
    • This is because accuracy of inferences from samples to populations
      • is due primarily to the amount of information (i.e., the size of the sample)
      • and not the proportion of information (i.e., the percent the sample is of the population).

Correction for sampling without replacement (see Leslie Kish, Survey Sampling (New York: John Wiley, 1965), pp. 43-45.
  • Recall that the accuracy of a sample depends on the size of its standard error, =
    • If you can reduce the standard error, you can increase the accuracy of the estimate
    • You can't manipulate the population variance, so a researcher cannot reduce the standard error that route.
    • But the research can increase the sample size, which does reduce standard error.
    • What about the proportion that the sample is of the population? Need the researcher consider that?
  • In truth, the s.e. of the mean can be lowered considering that respondendents are drawn for samping without replacement
    • The researcher is entitled to multiply the standard error by a fractional correction factor
      • The correction factor is always less than 1.
      • Therefore, the s.e. is reduced by being multiplied by a fator less than 1.
    • This correction factor is based on p (p = the proportion the sample is of the population)
        -- where is the correction factor
    • Assume the sample were 20% of the population
      • Then the correction factor, , would equal the square root of 1-.2 = sr rt of .8 = .895
      • So you would be entitled to reduce the s.e. by multiplying it by .895
      • Unfortunately, this is not much of a reduction.
    • Assume the sample were 10% of the population
      • Then the correction factor, , would equal the square root of 1-.1 = sr rt of .9 = .95
      • So you would be entitled to reduce the s.e. by multiplying it by .895
      • Unfortunately, this is not much of a reduction.
  • So this correction factor has little effect unless a sample if 20% or more of the population, which is rare.
    • Evanston has a population of 80,000.
    • A 20% sample needs 16,000 respondents for the correction factor to "kick in" at .895.
      • This huge sample only slightly reduces the s.e. and thus only slightly increases accuracy in the estimate.
    • A 10% sample needs 8,000 respondents for the correction factor to "kick in" at .95.
      • This very large sample hardly affects the s.e. at all.
    • Essentially, neither sample reduces the s.e. in any substantial way
    • Hence, the proportion that the sample is of the population has little practical effect on its accuracy .
  • Most samples do not approach proportions that would produce a meaningful correction factor.
  • Hence, it is usually ignored in computing the s.e. of the mean. 
  • Ergo: It is unimportant what proportion the sample is of the population .
  • What is important is the raw size of the sample.
  • Period.