sampling_design() is the entry point for creating survey sampling
specifications. It creates an empty design object that can be built
up using pipe-able verbs like stratify_by(), cluster_by(),
draw(), and add_stage().
Details
The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:
Reused across different data frames
Partially executed (e.g., stage by stage)
Inspected and validated before execution
Documented and shared
The design specification is frame-independent: it describes how to sample, not what to sample from.
Design Flow
A typical design workflow follows this pattern:
sampling_design() |>
stratify_by(...) |>
cluster_by(...) |>
draw(...) |>
execute(frame)See also
stratify_by() for defining strata,
cluster_by() for defining clusters,
draw() for specifying selection parameters,
add_stage() for multi-stage designs,
execute() for running designs
Examples
# Simple random sample of 100 EAs
sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights: 149 [149, 149]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_10155 Boucle … Mouhoun Ouarko… Rural 1347 187 33.0
#> 2 EA_01016 Centre-… Zoundwe… Bere Rural 1166 193 23.2
#> 3 EA_04918 Centre-… Kourite… Goungu… Rural 949 141 6.89
#> 4 EA_01890 Hauts-B… Houet Bobo-D… Urban 1195 191 0.25
#> 5 EA_14703 Plateau… Oubrite… Ziniare Rural 1340 177 26.1
#> 6 EA_12688 Est Tapoa Tambaga Rural 994 139 6.46
#> 7 EA_11778 Plateau… Ganzour… Saolgo Rural 998 124 1.41
#> 8 EA_05700 Hauts-B… Houet Karang… Rural 1049 119 23.6
#> 9 EA_06057 Est Gnagna Koala Rural 1291 173 27.6
#> 10 EA_04482 Centre-… Boulgou Garango Rural 1553 262 1.44
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified sample with proportional allocation
sampling_design(title = "Burkina Faso EA Survey") |>
stratify_by(region, alloc = "proportional") |>
draw(n = 400) |>
execute(bfa_eas, seed = 2)
#> # A tbl_sample: 400 × 17 | Burkina Faso EA Survey
#> # Weights: 37.25 [36.77, 38]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_06443 Boucle … Mouhoun Kona Rural 1083 171 50.1
#> 2 EA_08629 Boucle … Kossi Nouna Rural 1197 156 45.9
#> 3 EA_08693 Boucle … Kossi Nouna Rural 1643 214 25.2
#> 4 EA_12417 Boucle … Banwa Solenzo Rural 1316 157 15.5
#> 5 EA_12393 Boucle … Banwa Solenzo Rural 79 9 5.68
#> 6 EA_06860 Boucle … Banwa Kouka Rural 1748 201 18.1
#> 7 EA_07200 Boucle … Sourou Lanfie… Rural 1716 250 0.5
#> 8 EA_14006 Boucle … Nayala Yaba Rural 1115 150 11.8
#> 9 EA_13692 Boucle … Sourou Toeni Rural 1172 179 5.9
#> 10 EA_05775 Boucle … Sourou Kassoum Rural 1255 170 52.2
#> # ℹ 390 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Two-stage cluster sample of districts and EAs
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sampling_design(title = "Zimbabwe DHS") |>
add_stage(label = "Districts") |>
cluster_by(district) |>
draw(n = 20, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 10) |>
execute(zwe_frame, seed = 3)
#> # A tbl_sample: 200 × 16 | Zimbabwe DHS
#> # Weights: 117.98 [51.23, 156.43]
#> ea_id province district urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00413 Bulawayo Bulawayo Urban 1368 378 0.21
#> 2 EA_00108 Bulawayo Bulawayo Urban 1083 295 1.44
#> 3 EA_00261 Bulawayo Bulawayo Urban 1185 342 0.3
#> 4 EA_00165 Bulawayo Bulawayo Rural 614 136 18.3
#> 5 EA_00137 Bulawayo Bulawayo Urban 917 276 0.52
#> 6 EA_00393 Bulawayo Bulawayo Urban 1307 360 0.37
#> 7 EA_00256 Bulawayo Bulawayo Urban 1299 398 0.24
#> 8 EA_00376 Bulawayo Bulawayo Urban 1187 341 0.36
#> 9 EA_00374 Bulawayo Bulawayo Urban 1352 389 0.38
#> 10 EA_00274 Bulawayo Bulawayo Urban 1264 374 0.24
#> # ℹ 190 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>