library(tidyverse);library(knitr)
library(sampling); library(survey)
Worksheet 03: Exploring the partitoning of variance in cluster sampling
Worksheet
A student wants to estimate the average GPA in their dorm. Obtaining a listing of all students in the hall and conducting an SRS would take a lot of time. Instead, since each of the 100 suites in the hall have 4 students, the student randomly samples 5 suites and collects GPA data for each student in the suite. This data is part of Example 5.2 and 5.4. Lets explore that data.
- What is contained in each row?
<- readr::read_csv(here::here("data", "gpa.csv"))
gpa.data head(gpa.data)
# A tibble: 6 × 3
suite gpa wt
<dbl> <dbl> <dbl>
1 1 3.08 20
2 1 2.6 20
3 1 3.44 20
4 1 3.04 20
5 2 2.36 20
6 2 3.04 20
- What is the explanatory variable? What is the response variable?
Recreate the ANOVA table in 5.4.
aov(gpa ~ suite, data=gpa.data) |> summary()
Df Sum Sq Mean Sq F value Pr(>F)
suite 1 0.008 0.00784 0.028 0.869
Residuals 18 5.023 0.27908
What went wrong? Explain how you detected this, how you fixed it, and rerun the ANOVA with the correct data.
From the ANOVA table calculate the unbiased estimate of the population standard deviation
. Interpret this numberCalculate the ICC and R2.
How much is the increase in variance for using clustering sampling compared to an SRS?