The Sampling Framework
Introduction
The objective of sample surveys is to make inference about a population from information contained in a sample selected from that population. We are usually interested in estimating some population parameter such as the population mean, proportion or total.
(1.2) Populations and Representative Samples
A sample is representative if it can be used to “reconstruct” what the population looks like and if we can provide an accurate assessment of how good that reconstruction is.
Key Terms
Some definitions are needed to make the notions of a “population” and a “representative sample” more precise.
- Target Population:
- Sampling frame:
- Observational Unit:
- Sample:
- Sampled population:
- Sampling unit:
In an ideal survey, the sampled population will be identical to the target population. 🦄
(1.3) Selection Bias
Selection bias occurs when the target population does not coincide with the sampled population because some population units are sampled at a different rate than intended by the investigator.
❓ If a survey designed to study household income has fewer poor households than would be obtained in a representative sample, would the survey estimates of the average or median household income be too high? or to low?
- Convenience Sample:
- Purposive or Judgement Sample:
- Self-Selected Sample:
- Undercoverage Bias:
- Overcoverage:
- Nonresponse:
What Good Are Samples with Selection Bias?
Can Bias ever be good?
By Oct ’19, over 1,600 cases of lung injuries and 34 deaths were found to be associated with vaping, but the actual cause of injuries were not known. In 2019 a group of researchers (see text for citation) conducted a study that ultimately led to the recommendation that the public stop using these products until more research on the causal association could be conducted.
This impact survey had the following types of selection bias:
- Researchers attempted to interview 83 patients who were reported to have lung injuries but only 53 responded (________).
- The sampling frame of 83 only came from physician-reported cases, which means there may be bias if the physicians only reported the more serious cases (________)
- THC is illegal in that state, so patients may have under-reported their use (________)
- Individuals who vape, incurred a lung injury but didn’t seek care were excluded from the study (________)
Just because your sample may likely contain bias, being up front about it in your reporting (this is what the limitations section is all about) is a very important aspect of open science.
(1.4) Measurement Error
- Measurement error:
- Measurement bias:
Often in ecological surveys, areas are divided into plots or grids of smaller size. A sample of plots/grids are selected and the number of plants in those selected grids are counted.
Field researchers have to make a decision about whether or not to count plants directly on the border. If one researcher always counts plants on the boarder as being inside the grid, and another one does not, their estimate will always be higher than the other person.
Surveying people
Obtaining accurate responses is challenging in all types of surveys. Especially when dealing with humans.
- People may choose not to respond or lie if they’re embarrassed about the response (like income level)
Clear sampling protocols and thoughtful & validated survey design can minimize measurement error.
(1.5) Questionnaire Design
This is a short, short version of how to design a questionnaire. If you are considering writing one for a project or thesis, please consult one of the references at the end of the chapter in the textbook.
Be as precise as possible
“I want to learn something about persons experiencing homelessness”
Is not good enough. Let’s revise this to
“What percentage of persons using homeless shelters in Chicago between January and March 2021 are under 16 years old?”
Now you can write questions that will be able to actually answer this research question.
Guidance
(1.6) Sampling and Nonsampling errors
- Most surveys report a “margin of error”, often something like “3 percentage points”
- The margin of error describes …..
- Nonsampling errors cannot be attributed to ……
(1.7) Why use sampling?
“After all, my opinion has never been asked, so how can the survey results claim to represent me?” “Extrapolating what tens of millions are thinking from a tiny sample of opinions affronts human intelligence and negates true freedom of thought.”
Public distrust of surveys intensify when high stakes elections (1936, 2016) are predicted incorrectly.
Initial recommendations to use a sample to understand a characteristic about a population developed in Norway in 1895, but it wasn’t until 1920-1960 that statisticians were able to create mathematical proofs to support that a probability sample
- a sample chosen using random selection methods - produces reliable results that can be used to make inference on a population.
Advantages of a sample instead of a census
“Sampling is not mere substitution of a partial coverage for a total coverage. Sampling is the science and art of controlling and measuring the reliability of useful statistical information through the theory of probability” (Deming, 1950, p. 2)