library(survey)
Homework 03: Simple Random Sampling
This assignment will use statewide open source data on Covid-19 hospitalizations.
- Go go the CA Open data portal: https://data.ca.gov/
- At the top, click on “Datasets”
- On the left, click on “CSV”
- Download the COVID-19 Hospital data (data dictionary and data set)
Be sure to read the codebook to make sure you understand what data was collected, the time frame, and exactly what the variables mean. Your answers must be in context of the data.
Generate two random samples without replacement. One between 10% and 30% of the data set size, and one between 50 and 70%. The exact numbers are your choice. Set a seed to ensure reproducibility.
Choose a numeric variable. For both samples estimate the total, mean, and proportions of variables of your choice with 95% confidence intervals. Be sure to adjust for your sample weights and fpc. Use the functions from the
survey
package to do these calculations. Interpret each estimate and interval in context of the problem.Using a summary table similar to the one from the notes, calculate and report the bias for each sample. What effect did the sample size have on your estimates or the variability of the estimates?
2. Conduct your own survey.
Take a small SRS of something you are interested in. The data collection for this exercise should not take a great deal of effort, as you are surrounded by things waiting to be sampled. Some examples: proportion of web pages that result from an internet search about a topic, average weights of 1-pound bags of carrots at the supermarket, or the average cost of a used dining room table from an online classified advertisement site.
- Explain what it is you decide to measure. Be explicit about what parameter you are trying to estimate.
- Define the following terms for your sample: target population, sampled population, sampling frame, sampling unit, observational unit
- Describe how you created your sampling frame and show how you generated the randomly sampled numbers.
- Collect your data, do your measurements, and create a data frame with appropriate weights.
- Using the
svydesign
and appropriatesvy*
functions, calculate a point estimate and confidence interval for the parameter of interest. - Report your results in a full complete english sentence with point estimate and CI.