sampling_design() is the entry point for creating survey sampling
specifications. It creates an empty design object that can be built
up using pipe-able verbs like stratify_by(), cluster_by(),
draw(), and add_stage().
Details
The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:
Reused across different data frames
Partially executed (e.g., stage by stage)
Inspected and validated before execution
Documented and shared
The design specification is frame-independent: it describes how to sample, not what to sample from.
Design Flow
A typical design workflow follows this pattern:
sampling_design() |>
stratify_by(...) |>
cluster_by(...) |>
draw(...) |>
execute(frame)See also
stratify_by() for defining strata,
cluster_by() for defining clusters,
draw() for specifying selection parameters,
add_stage() for multi-stage designs,
execute() for running designs
Examples
# Simple random sample of 100 EAs
sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights: 149.34 [149.34, 149.34]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_10182 Boucle … Mouhoun Ouarko… Rural 1347 185 33.0
#> 2 EA_14571 Centre-… Nahouri Zecco Urban 2829 452 8.91
#> 3 EA_03356 Centre-… Kourite… Dialga… Rural 1010 150 19.0
#> 4 EA_01856 Hauts-B… Houet Bobo-D… Urban 1938 311 0.32
#> 5 EA_14703 Plateau… Oubrite… Ziniare Rural 1320 188 3
#> 6 EA_10602 Est Tapoa Partia… Rural 1393 191 17.2
#> 7 EA_08087 Plateau… Ganzour… Mogtedo Rural 2284 285 1.99
#> 8 EA_05693 Hauts-B… Houet Karang… Rural 1295 181 19.0
#> 9 EA_01975 Est Gnagna Bogande Rural 2018 276 22.2
#> 10 EA_04482 Centre-… Boulgou Garango Rural 1211 180 18.3
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified sample with proportional allocation
sampling_design(title = "Burkina Faso EA Survey") |>
stratify_by(region, alloc = "proportional") |>
draw(n = 400) |>
execute(bfa_eas, seed = 2)
#> # A tbl_sample: 400 × 17 | Burkina Faso EA Survey
#> # Weights: 37.34 [36.18, 38]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_06470 Boucle … Mouhoun Kona Rural 1083 150 50.1
#> 2 EA_08656 Boucle … Kossi Nouna Rural 1197 177 45.9
#> 3 EA_08720 Boucle … Kossi Nouna Rural 1643 242 25.2
#> 4 EA_12444 Boucle … Banwa Solenzo Rural 1316 177 15.5
#> 5 EA_12420 Boucle … Banwa Solenzo Rural 79 11 5.68
#> 6 EA_06887 Boucle … Banwa Kouka Rural 1748 241 18.1
#> 7 EA_06014 Boucle … Sourou Kiemba… Rural 518 67 18.3
#> 8 EA_14033 Boucle … Nayala Yaba Rural 1115 147 11.8
#> 9 EA_07232 Boucle … Sourou Lankoue Rural 1031 136 18.5
#> 10 EA_04730 Boucle … Sourou Gomboro Rural 1597 225 43.8
#> # ℹ 390 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Two-stage cluster sample of districts and EAs
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sampling_design(title = "Zimbabwe DHS") |>
add_stage(label = "Districts") |>
cluster_by(district) |>
draw(n = 20, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 10) |>
execute(zwe_frame, seed = 3)
#> # A tbl_sample: 200 × 16 | Zimbabwe DHS
#> # Weights: 117.98 [51.23, 156.43]
#> ea_id province district urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <int> <int> <dbl>
#> 1 EA_00413 Bulawayo Bulawayo Urban 1368 378 0.21
#> 2 EA_00108 Bulawayo Bulawayo Urban 1083 295 1.44
#> 3 EA_00261 Bulawayo Bulawayo Urban 1185 342 0.3
#> 4 EA_00165 Bulawayo Bulawayo Rural 614 136 18.3
#> 5 EA_00137 Bulawayo Bulawayo Urban 917 276 0.52
#> 6 EA_00393 Bulawayo Bulawayo Urban 1307 360 0.37
#> 7 EA_00256 Bulawayo Bulawayo Urban 1299 398 0.24
#> 8 EA_00376 Bulawayo Bulawayo Urban 1187 341 0.36
#> 9 EA_00374 Bulawayo Bulawayo Urban 1352 389 0.38
#> 10 EA_00274 Bulawayo Bulawayo Urban 1264 374 0.24
#> # ℹ 190 more rows
#> # ℹ 9 more variables: district_hh <int>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_2 <dbl>, .fpc_2 <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>