svyby(~acres92, by=~region, agstrat.dsgn, svytotal, keep.var = TRUE)
Formulas and Definitions
Parameters and Statistics
Measure | Population |
Sample |
---|---|---|
Total | ||
Mean | ||
Variance | ||
Proportion |
: total population size : total sample size : value of measurement on unit For proportions is a binary indicator of success. . I.e. .
Expected Value and Variance
- Sums are over all possible values of
. is the probability of occurring.
Definition: Bias, Variance, Accuracy
Sample Weights
The sampling weights
- SRS:
- SRSWOR:
- Stratified:
- One stage cluster:
Simple Random Sample
Population Quantity | Estimator |
Estimated variance of |
---|---|---|
Mean: |
||
Total: |
||
Proportion: |
: Unit is an element in the sample- The standard error of the estimate is the square root of the estimated variance.
Stratified Random Sample
See section-04 for notation.
Population Quantity | Estimator |
Estimated variance of |
---|---|---|
Within strata total: |
||
Overall total: |
||
Within strata mean: |
||
Overall mean: |
||
Within strata proportion: |
||
Overall proportion: |
is a simplified version of and is a simplified version of
Cluster Random Sample
⚠️ Note notation change!
: measurement for the th element in th psu : the number of clusters (psus) in the population : the number of psus from the sample
: number of ssus in psu in the population : the number of ssu’s in psu from the sample
: total number of ssus in the population
Population Quantity | Estimator |
Estimated variance of |
---|---|---|
Total in psu |
- | |
Variance of psu totals : |
AKA variance between psu | |
Overall Total: |
||
mean in psu |
||
Overall mean: |
R commands
This is a quick reference list. See the R companion for the textbook, the package help files, vignettes or other tutorials listed at the bottom of this page for more information.
Analysis
The survey
package supports the analysis of data collected using complex survey designs.
Specify survey design svydesign
- Function call:
svydesign(id = , weights=, fpc= , data = )
id
= variable that identifies clustersweights
= variable that holds the sampling weightsfpc
= finite population correction. Typically defined in the function call.
The argument details can be found on the specified pages in the R companion for the book, and in the respective sections of these notes.
- SRS: pg 21
- Stratified Random Sample: pg 34
- Cluster sampling: p57
Estimators
- mean:
svymean(~x, design)
- total:
svytotal(~x, design)
- proportion: Use
svytotal
and divide byN
- CI for the mean or total: Use
confint()
after calculating the point estimate - CI for proportion:
svyciprop(~x, design)
Will also print out
Calculating stratum means and variances
- The first argument of
svyby
is the formula for the variable(s) for which statistics are desired (by=)
is the variable that defines the groups.- Then list the
design
object - and the name of the function that calculates the statistics.
- Set
keep.var=TRUE
to display the standard errors for the statistics.
Sampling
The sampling
package allows you to take random samples from a sampling frame using different sampling frameworks in a reproducible manner.
Setup your sampling frame in a spreadsheet. This example uses google sheets and the googlesheets4 package.
Import your sampling frame into R.
library(googlesheets4)
<- read_sheet("https://docs.google.com/spreadsheets/d/17bg__F6Cq0zBnbPtMBsNCKNM-pyybVnhujvI2J66n_4") frame
- Use functions from the
sampling
package to draw random samples according to your design. See the links for more details on what the arguments mean.
library(sampling)
set.seed(12345)
<- srswor(4, length(frame$unit_id))
srs.idx getdata(frame, srs.idx)
ID_unit unit_id group
1 14 14 B
2 16 16 B
3 26 26 C
4 28 28 C
library(dplyr)
<- frame %>% arrange(group) # sort first
frame <- sampling::strata(data = frame, # data set
strata.idx stratanames = "group", # variable name
size = c(2,3,2,1,2), # stratum sample sizes
method = "srswor") # method for selecting within strata
getdata(frame, strata.idx)
unit_id group ID_unit Prob Stratum
2 2 A 2 0.2500000 1
8 8 A 8 0.2500000 1
10 10 B 10 0.3333333 2
14 14 B 14 0.3333333 2
16 16 B 16 0.3333333 2
23 23 C 23 0.1428571 3
28 28 C 28 0.1428571 3
38 38 D 38 0.1428571 4
39 39 E 39 0.1666667 5
48 48 E 48 0.1666667 5
One stage cluster
<- sampling::cluster(data=frame, # Data set
onestage.idx clustername = "group", # variable name containing clusters
size = 3, # number of clusters
method = "srswor", # how to draw clusters
description = TRUE) # show descriptive output
Number of selected clusters: 3
Number of units in the population and number of selected units: 50 24
getdata(frame, onestage.idx)
unit_id group ID_unit Prob
1 3 A 3 0.6
2 2 A 2 0.6
3 7 A 7 0.6
4 4 A 4 0.6
5 1 A 1 0.6
6 6 A 6 0.6
7 8 A 8 0.6
8 5 A 5 0.6
9 12 B 12 0.6
10 9 B 9 0.6
11 11 B 11 0.6
12 16 B 16 0.6
13 13 B 13 0.6
14 10 B 10 0.6
15 15 B 15 0.6
16 17 B 17 0.6
17 14 B 14 0.6
18 38 D 38 0.6
19 32 D 32 0.6
20 33 D 33 0.6
21 34 D 34 0.6
22 35 D 35 0.6
23 36 D 36 0.6
24 37 D 37 0.6
Two stage cluster
<- sampling::mstage(data=frame,
mstage.idx stage = c("cluster", ""), # sampling method for each stage, blank means SRS
varnames = list("group", "unit_id"), # variable names for each stage
size = list(3, c(5,5,5)), # 3 psus, 5 ssus from each psu
method = c("srswor", "srswor"))
getdata(frame, mstage.idx)
[[1]]
unit_id group ID_unit Prob_ 1 _stage
1 8 A 8 0.6
2 6 A 6 0.6
3 7 A 7 0.6
4 3 A 3 0.6
5 5 A 5 0.6
6 1 A 1 0.6
7 2 A 2 0.6
8 4 A 4 0.6
9 25 C 25 0.6
10 18 C 18 0.6
11 19 C 19 0.6
12 20 C 20 0.6
13 21 C 21 0.6
14 22 C 22 0.6
15 23 C 23 0.6
16 24 C 24 0.6
17 29 C 29 0.6
18 26 C 26 0.6
19 27 C 27 0.6
20 28 C 28 0.6
21 30 C 30 0.6
22 31 C 31 0.6
23 38 D 38 0.6
24 32 D 32 0.6
25 33 D 33 0.6
26 34 D 34 0.6
27 35 D 35 0.6
28 36 D 36 0.6
29 37 D 37 0.6
[[2]]
unit_id group ID_unit Prob_ 2 _stage Prob
1 6 A 6 0.6250000 0.3750000
2 7 A 7 0.6250000 0.3750000
3 3 A 3 0.6250000 0.3750000
4 5 A 5 0.6250000 0.3750000
5 1 A 1 0.6250000 0.3750000
6 18 C 18 0.3571429 0.2142857
7 20 C 20 0.3571429 0.2142857
8 29 C 29 0.3571429 0.2142857
9 27 C 27 0.3571429 0.2142857
10 31 C 31 0.3571429 0.2142857
11 38 D 38 0.7142857 0.4285714
12 32 D 32 0.7142857 0.4285714
13 33 D 33 0.7142857 0.4285714
14 34 D 34 0.7142857 0.4285714
15 37 D 37 0.7142857 0.4285714