The Sampling Framework

The objective of sample surveys is to make inference about a population from information contained in a sample selected from that population.
Published

January 22, 2025

Blank notes file

Introduction

The objective of sample surveys is to make inference about a population from information contained in a sample selected from that population. We are usually interested in estimating some population parameter such as the population mean, proportion or total.

(1.2) Populations and Representative Samples

A sample is representative if it can be used to “reconstruct” what the population looks like and if we can provide an accurate assessment of how good that reconstruction is.

Key Terms

Some definitions are needed to make the notions of a “population” and a “representative sample” more precise.

Textbook Figure 1.1. Target population and sampled population in a telephone survey of registered voters.

  • Target Population:
  • Sampling frame:
  • Observational Unit:
  • Sample:
  • Sampled population:
  • Sampling unit:

In an ideal survey, the sampled population will be identical to the target population. 🦄

👥 Key Terms

In small groups go to our collaborative notes and put your names in the [names] portion for one of the three examples that you are assigned to work on. Then as a group read the details for your example from the textbook and identify as many key terms as you can.

Suggestion: Only one person from each group should be doing the writing but everyone should be logged in and viewing.


(1.3) Selection Bias

Selection bias occurs when the target population does not coincide with the sampled population because some population units are sampled at a different rate than intended by the investigator.

If a survey designed to study household income has fewer poor households than would be obtained in a representative sample, would the survey estimates of the average or median household income be too high? or to low?

👥 Some types of selection bias

In small groups briefly summarize the type of selection bias that you are assigned to review. Write your answers in HackMD Provide a unique example (that is, not the one in the text). Be prepared to share this out with the class.

  • Convenience Sample:
  • Purposive or Judgement Sample:
  • Self-Selected Sample:
  • Undercoverage Bias:
  • Overcoverage:
  • Nonresponse:

What Good Are Samples with Selection Bias?

Can Bias ever be good?

Example: Early indicators of lung injury associated with vaping (2019)

By Oct ’19, over 1,600 cases of lung injuries and 34 deaths were found to be associated with vaping, but the actual cause of injuries were not known. In 2019 a group of researchers (see text for citation) conducted a study that ultimately led to the recommendation that the public stop using these products until more research on the causal association could be conducted.

This impact survey had the following types of selection bias:

  • Researchers attempted to interview 83 patients who were reported to have lung injuries but only 53 responded (________).
  • The sampling frame of 83 only came from physician-reported cases, which means there may be bias if the physicians only reported the more serious cases (________)
  • THC is illegal in that state, so patients may have under-reported their use (________)
  • Individuals who vape, incurred a lung injury but didn’t seek care were excluded from the study (________)

Just because your sample may likely contain bias, being up front about it in your reporting (this is what the limitations section is all about) is a very important aspect of open science.


(1.4) Measurement Error

  • Measurement error:
  • Measurement bias:
Example: Ecology surveys

Often in ecological surveys, areas are divided into plots or grids of smaller size. A sample of plots/grids are selected and the number of plants in those selected grids are counted.

Field researchers have to make a decision about whether or not to count plants directly on the border. If one researcher always counts plants on the boarder as being inside the grid, and another one does not, their estimate will always be higher than the other person.

Surveying people

Obtaining accurate responses is challenging in all types of surveys. Especially when dealing with humans.

Sourced from Tenor.com

Bonus credz to anyone who can name this bat. Double credz if you can quote a line from his intro rap.
👥 Think-pair-share

Write down all reasons you can think of for measurement error in self-report surveys on people. Write down everything that comes to mind, no judgement on if you think its a “good” response or not.

  • People may choose not to respond or lie if they’re embarrassed about the response (like income level)

Clear sampling protocols and thoughtful & validated survey design can minimize measurement error.


(1.5) Questionnaire Design

This is a short, short version of how to design a questionnaire. If you are considering writing one for a project or thesis, please consult one of the references at the end of the chapter in the textbook.

The most important step

Is deciding what you want to learn. What are the goals of your survey?

Be as precise as possible

“I want to learn something about persons experiencing homelessness”

Is not good enough. Let’s revise this to

“What percentage of persons using homeless shelters in Chicago between January and March 2021 are under 16 years old?”

Now you can write questions that will be able to actually answer this research question.

Guidance

👥 Think-pair-share

Use HackMD to summarize the guidance item assigned to you. Be prepared to share out verbally in class.


(1.6) Sampling and Nonsampling errors

  • Most surveys report a “margin of error”, often something like “3 percentage points”
  • The margin of error describes …..
  • Nonsampling errors cannot be attributed to ……
You try it

Go find a poll on a subject that interests you. Share the following information in the #458-class-chat discord channel, and include this info in your notes.

  • Name of poll:
  • URL:
  • Stated Population:
  • Statistic being estimated
  • Margin of Error:

(1.7) Why use sampling?

“After all, my opinion has never been asked, so how can the survey results claim to represent me?” “Extrapolating what tens of millions are thinking from a tiny sample of opinions affronts human intelligence and negates true freedom of thought.”

Public distrust of surveys intensify when high stakes elections (1936, 2016) are predicted incorrectly.

Initial recommendations to use a sample to understand a characteristic about a population developed in Norway in 1895, but it wasn’t until 1920-1960 that statisticians were able to create mathematical proofs to support that a probability sample - a sample chosen using random selection methods - produces reliable results that can be used to make inference on a population.

Advantages of a sample instead of a census

👥 Think-pair-share

In your notes write down all reasons you can think of for why a sample may be more advantageous compared to a census. Don’t forget to consult the textbook when you run out of ideas. Add your ideas to HackMD, making sure to read others contributions and not duplicate ideas.

“Sampling is not mere substitution of a partial coverage for a total coverage. Sampling is the science and art of controlling and measuring the reliability of useful statistical information through the theory of probability” (Deming, 1950, p. 2)