Worksheet 03: Exploring Statewide hospitalizations due to Covid-19

Worksheet
Published

February 24, 2023

This assignment will use statewide open source data on Covid-19 hospitalizations.

  1. Go go the CA Open data portal: https://data.ca.gov/
  2. At the top, click on “Datasets”
  3. On the left, click on “CSV”
  4. Download the COVID-19 Hospital data (data dictionary and data set)

Be sure to read the codebook to make sure you understand what data was collected, the time frame, and exactly what the variables mean. Your answers must be in context of the data.

  1. Generate two random samples without replacement. One between 10% and 30% of the data set size, and one between 50 and 70%. The exact numbers are your choice. Set a seed to ensure reproducibility.

  2. Choose a numeric variable. For both samples estimate the total, mean, and proportions of variables of your choice with 95% confidence intervals. Be sure to adjust for your sample weights and fpc. Use the functions from the survey package to do these calculations. Interpret each estimate and interval in context of the problem.

  3. Using a summary table similar to the one from the notes, calculate and report the error of estimation for each sample. What effect did the sample size have on your estimates or the variability of the estimates?

  4. Are your estimates generalizable to the entire state? Why or why not? It may be helpful before you answer this question to take another look at the context in which the data was collected.