sampling_design() is the entry point for creating survey sampling
specifications. It creates an empty design object that can be built
up using pipe-able verbs like stratify_by(), cluster_by(),
draw(), and add_stage().
Details
The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:
Reused across different data frames
Partially executed (e.g., stage by stage)
Inspected and validated before execution
Documented and shared
The design specification is frame-independent: it describes how to sample, not what to sample from.
Design Flow
A typical design workflow follows this pattern:
sampling_design() |>
stratify_by(...) |>
cluster_by(...) |>
draw(...) |>
execute(frame)See also
stratify_by() for defining strata,
cluster_by() for defining clusters,
draw() for specifying selection parameters,
add_stage() for multi-stage designs,
execute() for running designs
Examples
# Simple random sample of 100 EAs
sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights: 445.7 [445.7, 445.7]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 43475 Est Gnagna Piela Rural 971 114 8.95
#> 2 6592 Sud-Ouest Noumbiel Kpuere Rural 88 11 7.89
#> 3 11611 Boucle du … Nayala Yaba Rural 111 15 8.97
#> 4 45236 Centre-Est Boulgou Beguedo Urban 939 167 0.32
#> 5 3549 Est Gourma Fada-N… Rural 263 32 3.4
#> 6 39095 Hauts-Bass… Kenedou… Sindo Rural 37 4 6.09
#> 7 21818 Centre-Est Kourite… Goungu… Rural 21 4 1.07
#> 8 15528 Centre Kadiogo Ouagad… Urban 1207 182 0.14
#> 9 14284 Est Gourma Matiak… Rural 178 21 8.86
#> 10 3433 Est Gourma Fada-N… Rural 285 34 8.55
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified sample with proportional allocation
sampling_design(title = "Burkina Faso EA Survey") |>
stratify_by(region, alloc = "proportional") |>
draw(n = 400) |>
execute(bfa_eas, seed = 2)
#> # A tbl_sample: 400 × 17 | Burkina Faso EA Survey
#> # Weights: 111.42 [107.47, 113.12]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 6219 Boucle du … Nayala Kougny Rural 1544 178 2.52
#> 2 34772 Boucle du … Sourou Tougan Rural 872 120 0.99
#> 3 25911 Boucle du … Mouhoun Dedoug… Rural 222 39 8.39
#> 4 31953 Boucle du … Sourou Kiemba… Rural 602 78 6.91
#> 5 31928 Boucle du … Sourou Kiemba… Rural 906 118 0.85
#> 6 36703 Boucle du … Bale Fara Rural 975 143 1.31
#> 7 8235 Boucle du … Mouhoun Ouarko… Rural 271 37 8.86
#> 8 43720 Boucle du … Mouhoun Safane Rural 101 14 8.14
#> 9 11746 Boucle du … Bale Yaho Rural 98 15 8.72
#> 10 44450 Boucle du … Mouhoun Tcheri… Rural 628 100 7.25
#> # ℹ 390 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Two-stage cluster sample of districts and EAs
zwe_frame <- zwe_eas |>
dplyr::mutate(district_hh = sum(households), .by = district)
sampling_design(title = "Zimbabwe DHS") |>
add_stage(label = "Districts") |>
cluster_by(district) |>
draw(n = 20, method = "pps_brewer", mos = district_hh) |>
add_stage(label = "EAs") |>
draw(n = 10) |>
execute(zwe_frame, seed = 3)
#> # A tbl_sample: 200 × 21 | Zimbabwe DHS
#> # Weights: 486.81 [152.91, 1126.67]
#> ea_id province district ward_pcode urban_rural population households
#> * <int> <fct> <fct> <chr> <fct> <int> <int>
#> 1 48190 Bulawayo Bulawayo ZW102106 Rural 86 24
#> 2 23741 Bulawayo Bulawayo ZW102118 Urban 383 107
#> 3 1302 Bulawayo Bulawayo ZW102105 Urban 261 78
#> 4 48265 Bulawayo Bulawayo ZW102106 Urban 129 36
#> 5 1415 Bulawayo Bulawayo ZW102105 Urban 97 29
#> 6 1315 Bulawayo Bulawayo ZW102105 Urban 158 47
#> 7 23772 Bulawayo Bulawayo ZW102118 Urban 346 97
#> 8 2268 Bulawayo Bulawayo ZW102104 Urban 116 35
#> 9 47300 Bulawayo Bulawayo ZW102107 Urban 253 66
#> 10 22855 Bulawayo Bulawayo ZW102109 Urban 946 254
#> # ℹ 190 more rows
#> # ℹ 14 more variables: buildings <int>, women_15_49 <int>, men_15_49 <int>,
#> # children_under5 <int>, area_km2 <dbl>, district_hh <int>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>