sampling_design() is the entry point for creating survey sampling specifications. It creates an empty design object that can be built up using pipe-able verbs like stratify_by(), cluster_by(), draw(), and stage().

sampling_design(title = NULL)

Arguments

title

Optional character string providing a title for the design. Useful for documentation and printing purposes.

Value

A sampling_design object that can be piped to other design functions.

Details

The sampling design paradigm separates the specification of a sampling plan from its execution. This allows designs to be:

  • Reused across different data frames

  • Partially executed (e.g., stage by stage)

  • Inspected and validated before execution

  • Documented and shared

The design specification is frame-independent: it describes how to sample, not what to sample from.

Design Flow

A typical design workflow follows this pattern:


sampling_design() |>
  stratify_by(...) |>
  cluster_by(...) |>
  draw(...) |>
  execute(frame)

See also

stratify_by() for defining strata, cluster_by() for defining clusters, draw() for specifying selection parameters, stage() for multi-stage designs, execute() for running designs

Examples

# Simple random sample of 100 health facilities
sampling_design() |>
  draw(n = 100) |>
  execute(kenya_health, seed = 1)
#> == tbl_sample ==
#> Weights: 30.98 - 30.98 (mean: 30.98 )
#> 
#> # A tibble: 100 × 14
#>    facility_id region        county  urban_rural facility_type  beds staff_count
#>  * <chr>       <fct>         <fct>   <fct>       <fct>         <dbl>       <dbl>
#>  1 KE_16_0054  Eastern       Meru    Rural       Health Centre     9          13
#>  2 KE_12_0043  Eastern       Embu    Rural       Health Centre     9          11
#>  3 KE_28_0066  Rift Valley   Baringo Rural       County Hospi…    96          57
#>  4 KE_15_0048  Eastern       Makueni Rural       Dispensary        2           3
#>  5 KE_19_0002  North Eastern Garissa Rural       Sub-County H…    57          15
#>  6 KE_11_0015  Coast         Lamu    Urban       Health Centre    18          13
#>  7 KE_30_0075  Rift Valley   Kericho Urban       Clinic            2           7
#>  8 KE_04_0056  Central       Nyanda… Urban       Clinic            3           6
#>  9 KE_18_0081  Nairobi       Nairobi Urban       Health Centre    14          11
#> 10 KE_10_0004  Coast         Tana R… Rural       Dispensary        2           4
#> # ℹ 90 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified sample with proportional allocation
sampling_design(title = "Kenya Health Facility Survey") |>
  stratify_by(facility_type, alloc = "proportional") |>
  draw(n = 400) |>
  execute(kenya_health, seed = 2)
#> == tbl_sample: Kenya Health Facility Survey ==
#> Weights: 7.65 - 8.25 (mean: 7.74 )
#> 
#> # A tibble: 400 × 14
#>    facility_type     facility_id region     county urban_rural  beds staff_count
#>  * <fct>             <chr>       <fct>      <fct>  <fct>       <dbl>       <dbl>
#>  1 Referral Hospital KE_28_0042  Rift Vall… Barin… Rural         324         113
#>  2 Referral Hospital KE_24_0041  Nyanza     Kisumu Rural         240         101
#>  3 Referral Hospital KE_06_0017  Coast      Kilifi Rural         143         161
#>  4 Referral Hospital KE_37_0042  Western    Busia  Rural         175         123
#>  5 County Hospital   KE_14_0058  Eastern    Macha… Urban          69          35
#>  6 County Hospital   KE_04_0026  Central    Nyand… Urban         137          47
#>  7 County Hospital   KE_07_0014  Coast      Kwale  Urban         111          28
#>  8 County Hospital   KE_37_0063  Western    Busia  Urban         212          41
#>  9 County Hospital   KE_18_0141  Nairobi    Nairo… Urban          58          25
#> 10 County Hospital   KE_21_0027  North Eas… Wajir  Rural          59          63
#> # ℹ 390 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Two-stage cluster sample of schools and students
sampling_design(title = "Tanzania Education Survey") |>
  stage(label = "Schools") |>
    cluster_by(school_id) |>
    draw(n = 50, method = "pps_brewer", mos = enrollment) |>
  stage(label = "Students") |>
    draw(n = 20) |>
  execute(tanzania_schools, seed = 3)
#> == tbl_sample: Tanzania Education Survey ==
#> Weights: 10.52 - 295.46 (mean: 47.87 )
#> 
#> # A tibble: 50 × 17
#>    school_id  region       district school_level ownership enrollment n_teachers
#>  * <chr>      <fct>        <fct>    <fct>        <fct>          <dbl>      <dbl>
#>  1 TZ_01_0043 Dar es Sala… Ilala    Primary      Governme…        201          5
#>  2 TZ_01_0044 Dar es Sala… Ilala    Primary      Private          309          8
#>  3 TZ_01_0049 Dar es Sala… Ilala    Primary      Governme…       1158         33
#>  4 TZ_01_0115 Dar es Sala… Ilala    Primary      Governme…        518         11
#>  5 TZ_01_0130 Dar es Sala… Ilala    Primary      Governme…        671         14
#>  6 TZ_02_0006 Dar es Sala… Kinondo… Primary      Governme…        728         17
#>  7 TZ_02_0028 Dar es Sala… Kinondo… Primary      Governme…        764         19
#>  8 TZ_02_0074 Dar es Sala… Kinondo… Primary      Private          961         27
#>  9 TZ_02_0103 Dar es Sala… Kinondo… Primary      Governme…        636         18
#> 10 TZ_03_0006 Dar es Sala… Temeke   Primary      Governme…        350          9
#> # ℹ 40 more rows
#> # ℹ 10 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_2 <dbl>, .fpc_2 <int>,
#> #   .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>