Path: janda.org/c10 > Syllabus > Topics and Readings > Statistical Inference > Random and non-random samples
 Methods of Sampling: Random and Non-random
 Go here to read how the Gallup Organization draws its sample and deals with sample size. Go here to read about a post-September 11 poll of the public's international attitudes by the Pew Research Center Types of samples HAPHAZARD or accidental: Chosen at "whim" rather than "at random" Does not produce representative samples because of subtle bias in selection  PURPOSIVE samples Chosen with some controls in mind: e.g., four states east of the Mississippi and four west Designed to achieve "representativeness" but mathematical estimates of error do not apply  QUOTA samples More systematic attempt to reflect a population distribution: sex, race, etc. May be "representative" on these factors but biased on others  CLUSTER or STRATIFIED samples Divide population into groups (e.g., census areas) and sample the groups Then do a simple random sample within groups  SYSTEMATIC RANDOM samples Choose every Nth entity from an ordered list Depends on the principles used in ordering the list  SIMPLE RANDOM samples Every case has an equal chance of being drawn, due to purely chance factors Your samples of 10 states were supposed to be drawn this way Statistically, the most efficient form of sample, but not the most economically efficient.   Computing Examples An example of computing a confidence for a population parameter A confidence interval is a RANGE OF VALUES around a point estimate that expresses the probability that the interval contains the population parameter between the upper and lower limits of the interval.  A confidence interval is computed by adding and subtracting standard error units around the mean. A confidence interval is always associated with a LEVEL OF CONFIDENCE that the estimate will be correct. A point estimate and confidence interval in a 11/15/85 article on the Geneva summit in the NEW YORK TIMES, based on a NYT/CBS telephone poll of 1,659 adults in 48 continental states. "The sample of telephone exchanges was selected by a computer from a complete list of exchanges in the country. The exchanges were chosen so as to insure that each region of the country was represented in proportion to its population. For each exchange, the telephone numbers were formed by random digits, thus permitting access to both listed and unlisted residential numbers."  QUESTION: "U.S. should try to reduce tensions with Russians" POINT ESTIMATE: 49% agreed, 41% disagreed Statement on CONFIDENCE INTERVAL: "In theory, in 19 cases out of 20 the results based on such samples will differ by no more than 3 percentage points in either direction from what would have been obtained by interviewing all adult Americans. The error for smaller subgroups is larger. For example, the margin of sampling error for blacks is plus or minus 8 percentage points, and for college graduates it is plus or minus 5 percentage points." Sampling error for CONTINUOUS data: Sampling error for PROPORTIONS:   2 times .012 = .024 (or almost 3 percentage points) Thus: We are 95% confident that the population parameter (49% figure) lies between 46% and 52%. Factors in the confidence interval estimate of a population mean Variability in the sampling distribution of the mean, based on variation of values in the population: size of the sample: both factors are combined in the standard error of the mean: Degree of confidence in making the estimate Level of confidence: e.g., .95, .99, .999 Complement of the alpha value: e.g., .05, .01, .001 Note that the PROPORTION that the sample is of the population is NOT a major factor in the accuracy of the estimate, which is counter-intuitive (i.e., goes against reason) Two forms of simple random sampling Sampling with replacement After each case is selected, it is replaced in the sample. If replacement is not done with small populations (e.g., a deck of 52 cards), probability calculations can be materially affected. Sampling without replacement. If the population is very large (e.g, thousands of cases), probability calculations will not be materially affected. The population decreases by one each time a case is drawn. This is because accuracy of inferences from samples to populations is due primarily to the AMOUNT of information (i.e., the size of the sample) and not the PROPORTION of information (i.e., the percent the sample is of the population). In truth, the s.e. of the mean can be lowered by multiplying the standard error by a correction factor based on p (p = the proportion the sample is of the population)   --where is the correction factor is the correction factor But this correction factor has little effect unless p > .20. Because most samples do not approach this figure, this correction factor is usually ignored in computing the s.e. of the mean.