sampling_design() is the entry point for creating survey sampling
specifications. It creates an empty design object that can be built
up using pipe-able verbs like stratify_by(), cluster_by(),
draw(), and stage().
sampling_design(title = NULL)A sampling_design object that can be piped to other design
functions.
The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:
Reused across different data frames
Partially executed (e.g., stage by stage)
Inspected and validated before execution
Documented and shared
The design specification is frame-independent: it describes how to sample, not what to sample from.
A typical design workflow follows this pattern:
sampling_design() |>
stratify_by(...) |>
cluster_by(...) |>
draw(...) |>
execute(frame)stratify_by() for defining strata,
cluster_by() for defining clusters,
draw() for specifying selection parameters,
stage() for multi-stage designs,
execute() for running designs
# Simple random sample of 100 health facilities
sampling_design() |>
draw(n = 100) |>
execute(kenya_health, seed = 1)
#> == tbl_sample ==
#> Weights: 30.98 - 30.98 (mean: 30.98 )
#>
#> # A tibble: 100 × 14
#> facility_id region county urban_rural facility_type beds staff_count
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 KE_16_0054 Eastern Meru Rural Health Centre 9 13
#> 2 KE_12_0043 Eastern Embu Rural Health Centre 9 11
#> 3 KE_28_0066 Rift Valley Baringo Rural County Hospi… 96 57
#> 4 KE_15_0048 Eastern Makueni Rural Dispensary 2 3
#> 5 KE_19_0002 North Eastern Garissa Rural Sub-County H… 57 15
#> 6 KE_11_0015 Coast Lamu Urban Health Centre 18 13
#> 7 KE_30_0075 Rift Valley Kericho Urban Clinic 2 7
#> 8 KE_04_0056 Central Nyanda… Urban Clinic 3 6
#> 9 KE_18_0081 Nairobi Nairobi Urban Health Centre 14 11
#> 10 KE_10_0004 Coast Tana R… Rural Dispensary 2 4
#> # ℹ 90 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified sample with proportional allocation
sampling_design(title = "Kenya Health Facility Survey") |>
stratify_by(facility_type, alloc = "proportional") |>
draw(n = 400) |>
execute(kenya_health, seed = 2)
#> == tbl_sample: Kenya Health Facility Survey ==
#> Weights: 7.65 - 8.25 (mean: 7.74 )
#>
#> # A tibble: 400 × 14
#> facility_type facility_id region county urban_rural beds staff_count
#> * <fct> <chr> <fct> <fct> <fct> <dbl> <dbl>
#> 1 Referral Hospital KE_28_0042 Rift Vall… Barin… Rural 324 113
#> 2 Referral Hospital KE_24_0041 Nyanza Kisumu Rural 240 101
#> 3 Referral Hospital KE_06_0017 Coast Kilifi Rural 143 161
#> 4 Referral Hospital KE_37_0042 Western Busia Rural 175 123
#> 5 County Hospital KE_14_0058 Eastern Macha… Urban 69 35
#> 6 County Hospital KE_04_0026 Central Nyand… Urban 137 47
#> 7 County Hospital KE_07_0014 Coast Kwale Urban 111 28
#> 8 County Hospital KE_37_0063 Western Busia Urban 212 41
#> 9 County Hospital KE_18_0141 Nairobi Nairo… Urban 58 25
#> 10 County Hospital KE_21_0027 North Eas… Wajir Rural 59 63
#> # ℹ 390 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Two-stage cluster sample of schools and students
sampling_design(title = "Tanzania Education Survey") |>
stage(label = "Schools") |>
cluster_by(school_id) |>
draw(n = 50, method = "pps_brewer", mos = enrollment) |>
stage(label = "Students") |>
draw(n = 20) |>
execute(tanzania_schools, seed = 3)
#> == tbl_sample: Tanzania Education Survey ==
#> Weights: 10.52 - 295.46 (mean: 47.87 )
#>
#> # A tibble: 50 × 17
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_01_0043 Dar es Sala… Ilala Primary Governme… 201 5
#> 2 TZ_01_0044 Dar es Sala… Ilala Primary Private 309 8
#> 3 TZ_01_0049 Dar es Sala… Ilala Primary Governme… 1158 33
#> 4 TZ_01_0115 Dar es Sala… Ilala Primary Governme… 518 11
#> 5 TZ_01_0130 Dar es Sala… Ilala Primary Governme… 671 14
#> 6 TZ_02_0006 Dar es Sala… Kinondo… Primary Governme… 728 17
#> 7 TZ_02_0028 Dar es Sala… Kinondo… Primary Governme… 764 19
#> 8 TZ_02_0074 Dar es Sala… Kinondo… Primary Private 961 27
#> 9 TZ_02_0103 Dar es Sala… Kinondo… Primary Governme… 636 18
#> 10 TZ_03_0006 Dar es Sala… Temeke Primary Governme… 350 9
#> # ℹ 40 more rows
#> # ℹ 10 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> # .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>