svyby(~acres92, by=~region, agstrat.dsgn, svytotal, keep.var = TRUE)
Formulas and Definitions
Parameters and Statistics
Measure | Population |
Sample |
---|---|---|
Total | ||
Mean | ||
Variance | ||
Proportion |
= total population size = Value of measurement on unit For proportions is a binary indicator of success. . I.e. .
Expected Value and Variance
- Sums are over all possible values of
. is the probability of occurring.
Definition: Bias, Variance, Accuracy
Sample Weights
The sampling weights
- SRS:
- SRSWOR:
- Stratified:
Simple Random Sample
Population Quantity | Estimator |
Estimated variance of |
---|---|---|
Mean: |
||
Total: |
||
Proportion: |
: Unit is an element in the sample- The standard error of the estimate is the square root of the estimated variance.
Stratified Random Sample
See section-04 for notation.
Population Quantity | Estimator |
Estimated variance of |
---|---|---|
Within strata total: |
||
Overall total: |
||
Within strata mean: |
||
Overall mean: |
||
Within strata proportion: |
||
Overall proportion: |
is a simplified version of and is a simplified version of
R commands
This is a quick reference list. See the R companion for the textbook, the package help files, vignettes or other tutorials listed at the bottom of this page for more information.
Analysis
The survey
package supports the analysis of data collected using complex survey designs.
Specify survey design svydesign
- Function call:
svydesign(id = , weights=, fpc= , data = )
id
= variable that identifies clustersweights
= variable that holds the sampling weightsfpc
= finite population correction. Typically defined in the function call.
The argument details can be found on the specified pages in the R companion for the book, and in the respective sections of these notes.
- SRS: pg 21
- Stratified Random Sample: pg 34
Estimators
- mean:
svymean(~x, design)
- total:
svytotal(~x, design)
- proportion: Use
svytotal
and divide byN
- CI for the mean or total: Use
confint()
after calculating the point estimate - CI for proportion:
svyciprop(~x, design)
Will also print out
Calculating stratum means and variances
- The first argument of
svyby
is the formula for the variable(s) for which statistics are desired (by=)
is the variable that defines the groups.- Then list the
design
object - and the name of the function that calculates the statistics.
- Set
keep.var=TRUE
to display the standard errors for the statistics.
Sampling
The sampling
package allows you to take random samples from a sampling frame using different sampling frameworks in a reproducible manner.
Setup your sampling frame in a spreadsheet. This example uses google sheets and the googlesheets4 package.
Import your sampling frame into R.
library(googlesheets4)
<- read_sheet("https://docs.google.com/spreadsheets/d/13t_2a1nymS-RfAdDN1lq_WrpLD2xjy2rNsol95rD9VA") frame
- Use functions from the
sampling
package to draw random samples according to your design. See the links for more details on what the arguments mean.
library(sampling)
set.seed(12345)
<- srswor(4, length(frame$unit_id))
srs.idx getdata(frame, srs.idx)
ID_unit group unit_id
1 14 B 7
2 16 B 9
3 26 D 1
4 28 D 3
library(dplyr)
<- frame %>% arrange(group) # sort first
frame <- sampling::strata(data = frame, # data set
strata.idx stratanames = "group", # variable name
size = c(2,3,2,1,2), # stratum sample sizes
method = "srswor") # method for selecting within strata
getdata(frame, strata.idx)
unit_id group ID_unit Prob Stratum
2 2 A 2 0.28571429 1
5 5 A 5 0.28571429 1
9 2 B 9 0.30000000 2
13 6 B 13 0.30000000 2
15 8 B 15 0.30000000 2
20 3 C 20 0.25000000 3
23 6 C 23 0.25000000 3
32 7 D 32 0.07692308 4
39 1 E 39 0.16666667 5
48 10 E 48 0.16666667 5
One stage cluster
<- sampling::cluster(data=frame, # Data set
onestage.idx clustername = "group", # variable name containing clusters
size = 3, # number of clusters
method = "srswor", # how to draw clusters
description = TRUE) # show descriptive output
Number of selected clusters: 3
Number of units in the population and number of selected units: 50 30
getdata(frame, onestage.idx)
unit_id group ID_unit Prob
1 4 A 4 0.6
2 1 A 1 0.6
3 2 A 2 0.6
4 3 A 3 0.6
5 7 A 7 0.6
6 5 A 5 0.6
7 6 A 6 0.6
8 6 B 13 0.6
9 5 B 12 0.6
10 10 B 17 0.6
11 7 B 14 0.6
12 8 B 15 0.6
13 9 B 16 0.6
14 1 B 8 0.6
15 2 B 9 0.6
16 3 B 10 0.6
17 4 B 11 0.6
18 13 D 38 0.6
19 1 D 26 0.6
20 2 D 27 0.6
21 3 D 28 0.6
22 4 D 29 0.6
23 5 D 30 0.6
24 6 D 31 0.6
25 7 D 32 0.6
26 8 D 33 0.6
27 9 D 34 0.6
28 10 D 35 0.6
29 11 D 36 0.6
30 12 D 37 0.6
Two stage cluster
<- sampling::mstage(data=frame,
mstage.idx stage = c("cluster", ""), # sampling method for each stage, blank means SRS
varnames = list("group", "unit_id"), # variable names for each stage
size = list(3, c(5,5,5)), # 3 psus, 5 ssus from each psu
method = c("srswor", "srswor"))
getdata(frame, mstage.idx)
[[1]]
unit_id group ID_unit Prob_ 1 _stage
1 6 A 6 0.6
2 7 A 7 0.6
3 3 A 3 0.6
4 5 A 5 0.6
5 1 A 1 0.6
6 2 A 2 0.6
7 4 A 4 0.6
8 8 C 25 0.6
9 1 C 18 0.6
10 2 C 19 0.6
11 3 C 20 0.6
12 4 C 21 0.6
13 5 C 22 0.6
14 6 C 23 0.6
15 7 C 24 0.6
16 13 D 38 0.6
17 1 D 26 0.6
18 2 D 27 0.6
19 3 D 28 0.6
20 4 D 29 0.6
21 5 D 30 0.6
22 6 D 31 0.6
23 7 D 32 0.6
24 8 D 33 0.6
25 9 D 34 0.6
26 10 D 35 0.6
27 11 D 36 0.6
28 12 D 37 0.6
[[2]]
group unit_id ID_unit Prob_ 2 _stage Prob
1 A 7 7 0.7142857 0.4285714
2 A 3 3 0.7142857 0.4285714
3 A 5 5 0.7142857 0.4285714
4 A 1 1 0.7142857 0.4285714
5 A 2 2 0.7142857 0.4285714
6 C 1 18 0.6250000 0.3750000
7 C 2 19 0.6250000 0.3750000
8 C 3 20 0.6250000 0.3750000
9 C 5 22 0.6250000 0.3750000
10 C 7 24 0.6250000 0.3750000
11 D 3 28 0.3846154 0.2307692
12 D 7 32 0.3846154 0.2307692
13 D 8 33 0.3846154 0.2307692
14 D 11 36 0.3846154 0.2307692
15 D 12 37 0.3846154 0.2307692