draw() specifies how units are selected: sample size, sampling fraction, selection method, and measure of size for PPS sampling. Every stage in a sampling design must end with draw().

draw(
  .data,
  n = NULL,
  frac = NULL,
  min_n = NULL,
  max_n = NULL,
  method = "srswor",
  mos = NULL,
  round = "up",
  control = NULL,
  certainty_size = NULL,
  certainty_prop = NULL
)

Arguments

.data

A sampling_design object (piped from sampling_design(), stratify_by(), or cluster_by()).

n

Sample size. Can be:

  • A scalar: applies per stratum (if no alloc) or as total (if alloc specified)

  • A named vector: stratum-specific sizes (for single stratification variable)

  • A data frame: stratum-specific sizes with stratification columns + n column

frac

Sampling fraction. Can be:

  • A scalar: same fraction for all strata

  • A named vector: stratum-specific fractions

  • A data frame: stratum-specific fractions with stratification columns + frac column Only one of n or frac should be specified.

min_n

Minimum sample size per stratum. When an allocation method (e.g., Neyman, proportional) would assign fewer than min_n units to a stratum, that stratum receives min_n units instead. The excess is redistributed proportionally among strata that were above min_n. Commonly set to 2 (minimum for variance estimation) or higher for reliable subgroup estimates. Only applies when stratification with an allocation method is used. Default is NULL (no minimum).

max_n

Maximum sample size per stratum. When an allocation method would assign more than max_n units to a stratum, that stratum is capped at max_n units. The surplus is redistributed proportionally among strata that were below max_n. Useful for capping dominant strata or managing operational constraints. Only applies when stratification with an allocation method is used. Default is NULL (no maximum).

method

Character string specifying the selection method. One of:

Equal probability methods:

  • "srswor" (default): Simple random sampling without replacement

  • "srswr": Simple random sampling with replacement

  • "systematic": Systematic (fixed interval) sampling

  • "bernoulli": Independent Bernoulli trials (random sample size)

PPS methods (require mos):

  • "pps_systematic": PPS systematic sampling

  • "pps_brewer": Generalized Brewer (Tillé) method

  • "pps_maxent": Maximum entropy / conditional Poisson

  • "pps_poisson": PPS Poisson sampling (random sample size)

  • "pps_multinomial": PPS multinomial (with replacement, any hit count)

  • "pps_chromy": Chromy's sequential PPS (minimum replacement)

mos

<data-masking> Measure of size variable for PPS methods. Required for all pps_* methods.

round

Rounding method when converting frac to sample sizes. One of:

  • "up" (default): Round up (ceiling). Matches SAS SURVEYSELECT default.

  • "down": Round down (floor).

  • "nearest": Round to nearest integer (standard rounding).

This parameter only affects designs using frac to specify the sampling rate. When n is specified directly, no rounding occurs.

control

<data-masking> Variables for sorting the frame before selection. Control sorting provides implicit stratification, which is particularly effective with systematic and sequential sampling methods. Can be:

  • A single variable: control = region

  • Multiple variables: control = c(region, district)

  • With serp() for serpentine sorting: control = serp(region, district)

  • With dplyr::desc() for descending: control = c(region, desc(population))

  • Mixed: control = c(region, serp(district, commune), desc(size))

When stratification is also specified, control sorting is applied within each stratum. See the section "Control Sorting" below for details.

certainty_size

For PPS methods, units with MOS >= this value are selected with certainty (probability = 1). Can be:

  • A scalar: same threshold for all strata

  • A data frame: stratum-specific thresholds with stratification columns

    • certainty_size column

Certainty units are removed from the frame before probability sampling, and the remaining sample size is reduced accordingly. Mutually exclusive with certainty_prop. Equivalent to SAS SURVEYSELECT CERTSIZE= option.

certainty_prop

For PPS methods, units whose MOS proportion (MOS_i / sum(MOS)) >= this value are selected with certainty. Can be:

  • A scalar between 0 and 1 (exclusive): same threshold for all strata

  • A data frame: stratum-specific thresholds with stratification columns

    • certainty_prop column

Uses iterative selection: after removing certainty units, proportions are recomputed and the check is repeated until no new units qualify. Mutually exclusive with certainty_size. Equivalent to SAS SURVEYSELECT CERTSIZE=P= option.

Value

A modified sampling_design object with selection parameters specified.

Details

Selection Methods

Equal Probability Methods

MethodReplacementSample SizeNotes
srsworWithoutFixedStandard SRS
srswrWithFixedAllows duplicates
systematicWithoutFixedPeriodic selection
bernoulliWithoutRandomEach unit selected independently

PPS Methods

MethodReplacementSample SizeNotes
pps_systematicWithoutFixedSimple, some bias
pps_brewerWithoutFixedFast, π_ij > 0
pps_maxentWithoutFixedHighest entropy, π_ij available
pps_poissonWithoutRandomPPS analog of Bernoulli
pps_multinomialWithFixedAny hit count, Hansen-Hurwitz
pps_chromyMin. repl.FixedSAS default PPS_SEQ

Parameter Requirements

Methodnfracmos
srsworor ✓
srswror ✓
systematicor ✓
bernoulli
pps_systematicor ✓
pps_breweror ✓
pps_maxent
pps_poisson
pps_multinomialor ✓
pps_chromyor ✓

Fixed vs Random Sample Size Methods

Methods with fixed sample size (srswor, srswr, systematic, pps_systematic, pps_brewer, pps_maxent, pps_multinomial) accept either n or frac. When frac is provided, the sample size is computed based on the round parameter (default: ceiling).

Methods with random sample size (bernoulli, pps_poisson) require frac only. These methods perform independent selection trials for each unit, so the final sample size is a random variable—not a fixed count. Specifying n would be misleading since the method cannot guarantee exactly n selections.

Custom Allocation with Data Frames

For stratum-specific sample sizes or rates, pass a data frame to n or frac. The data frame must contain:

  • All stratification variable columns (matching those in stratify_by())

  • An n column (for sizes) or frac column (for rates)

Certainty Selection

In PPS sampling, very large units can have theoretical inclusion probabilities exceeding 1. Certainty selection handles this by selecting such units with probability 1 before sampling the remainder. The output includes a .certainty_k column (where k is the stage number) indicating which units were certainty selections.

For stratum-specific thresholds, pass a data frame containing:

  • All stratification variable columns

  • A certainty_size or certainty_prop column

Control Sorting

Control sorting orders the sampling frame before selection, providing implicit stratification. This is particularly effective with systematic and sequential methods (systematic, pps_systematic, pps_chromy), where it ensures the sample spreads evenly across the sorted variables.

Serpentine vs Nested Sorting:

  • Nested (default): Standard ascending sort by each variable in order. Use control = c(var1, var2, var3).

  • Serpentine: Alternating direction that minimizes "jumps" between adjacent units. Use control = serp(var1, var2, var3).

Serpentine sorting makes nearby observations more similar by reversing direction at each hierarchy level. For geographic hierarchies, this means the last district of region 1 is adjacent to the last district of region 2.

Combining with Explicit Stratification: When both stratify_by() and control are used, sorting is applied within each stratum. This allows explicit stratification for variance control combined with implicit stratification for sample spread.

See also

sampling_design() for creating designs, stratify_by() for stratification, cluster_by() for clustering, execute() for running designs, serp() for serpentine sorting

Examples

# Simple random sample of 100 facilities
sampling_design() |>
  draw(n = 100) |>
  execute(kenya_health, seed = 1)
#> == tbl_sample ==
#> Weights: 30.98 - 30.98 (mean: 30.98 )
#> 
#> # A tibble: 100 × 14
#>    facility_id region        county  urban_rural facility_type  beds staff_count
#>  * <chr>       <fct>         <fct>   <fct>       <fct>         <dbl>       <dbl>
#>  1 KE_16_0054  Eastern       Meru    Rural       Health Centre     9          13
#>  2 KE_12_0043  Eastern       Embu    Rural       Health Centre     9          11
#>  3 KE_28_0066  Rift Valley   Baringo Rural       County Hospi…    96          57
#>  4 KE_15_0048  Eastern       Makueni Rural       Dispensary        2           3
#>  5 KE_19_0002  North Eastern Garissa Rural       Sub-County H…    57          15
#>  6 KE_11_0015  Coast         Lamu    Urban       Health Centre    18          13
#>  7 KE_30_0075  Rift Valley   Kericho Urban       Clinic            2           7
#>  8 KE_04_0056  Central       Nyanda… Urban       Clinic            3           6
#>  9 KE_18_0081  Nairobi       Nairobi Urban       Health Centre    14          11
#> 10 KE_10_0004  Coast         Tana R… Rural       Dispensary        2           4
#> # ℹ 90 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Systematic sample of 10%
sampling_design() |>
  draw(frac = 0.10, method = "systematic") |>
  execute(kenya_health, seed = 123)
#> == tbl_sample ==
#> Weights: 9.99 - 9.99 (mean: 9.99 )
#> 
#> # A tibble: 310 × 14
#>    facility_id region  county    urban_rural facility_type  beds staff_count
#>  * <chr>       <fct>   <fct>     <fct>       <fct>         <dbl>       <dbl>
#>  1 KE_01_0003  Central Kiambu    Rural       Clinic            4           5
#>  2 KE_01_0013  Central Kiambu    Rural       Clinic            2           8
#>  3 KE_01_0023  Central Kiambu    Rural       Dispensary        2           4
#>  4 KE_01_0033  Central Kiambu    Rural       Clinic            3           4
#>  5 KE_01_0043  Central Kiambu    Urban       Dispensary        2           3
#>  6 KE_01_0053  Central Kiambu    Rural       Dispensary        1           6
#>  7 KE_01_0063  Central Kiambu    Rural       Health Centre    16          13
#>  8 KE_02_0003  Central Kirinyaga Urban       Dispensary        3           4
#>  9 KE_02_0013  Central Kirinyaga Urban       Dispensary        1           5
#> 10 KE_02_0023  Central Kirinyaga Rural       Dispensary        2           4
#> # ℹ 300 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS sample of schools using enrollment
sampling_design() |>
  cluster_by(school_id) |>
  draw(n = 50, method = "pps_brewer", mos = enrollment) |>
  execute(tanzania_schools, seed = 42)
#> == tbl_sample ==
#> Weights: 15.14 - 135.75 (mean: 44.21 )
#> 
#> # A tibble: 50 × 15
#>    school_id  region       district school_level ownership enrollment n_teachers
#>  * <chr>      <fct>        <fct>    <fct>        <fct>          <dbl>      <dbl>
#>  1 TZ_04_0011 Arusha       Arusha … Primary      Governme…        722         16
#>  2 TZ_04_0022 Arusha       Arusha … Primary      Governme…        396          8
#>  3 TZ_04_0091 Arusha       Arusha … Primary      Governme…        484         11
#>  4 TZ_05_0095 Arusha       Arusha … Primary      Private          148          4
#>  5 TZ_06_0078 Arusha       Meru     Primary      Private          453         11
#>  6 TZ_07_0034 Arusha       Monduli  Primary      Governme…        446         10
#>  7 TZ_07_0041 Arusha       Monduli  Primary      Governme…        920         19
#>  8 TZ_01_0129 Dar es Sala… Ilala    Primary      Private          389          8
#>  9 TZ_02_0071 Dar es Sala… Kinondo… Primary      Governme…        381         11
#> 10 TZ_03_0027 Dar es Sala… Temeke   Primary      Governme…        317          7
#> # ℹ 40 more rows
#> # ℹ 8 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>

# Bernoulli sampling (random sample size, expected ~5%)
sampling_design() |>
  draw(frac = 0.05, method = "bernoulli") |>
  execute(nigeria_business, seed = 1234)
#> == tbl_sample ==
#> Weights: 20 - 20 (mean: 20 )
#> 
#> # A tibble: 563 × 12
#>    enterprise_id zone  state sector size_class employees annual_turnover .weight
#>  * <chr>         <fct> <fct> <fct>  <fct>          <dbl>           <dbl>   <dbl>
#>  1 NG_01_00039   Nort… Benue Servi… Micro              3         6228000      20
#>  2 NG_01_00113   Nort… Benue Hospi… Micro              3         8684000      20
#>  3 NG_01_00123   Nort… Benue Whole… Micro              3         5940000      20
#>  4 NG_01_00131   Nort… Benue Manuf… Small             13        41599000      20
#>  5 NG_07_00004   Nort… FCT … Whole… Small              6        35743000      20
#>  6 NG_07_00011   Nort… FCT … Servi… Small              7        11020000      20
#>  7 NG_07_00018   Nort… FCT … Const… Small             13        18812000      20
#>  8 NG_07_00020   Nort… FCT … Trans… Micro              4         6466000      20
#>  9 NG_07_00047   Nort… FCT … Servi… Large           1222      1548058000      20
#> 10 NG_07_00054   Nort… FCT … Servi… Medium            56       155155000      20
#> # ℹ 553 more rows
#> # ℹ 4 more variables: .sample_id <int>, .stage <int>, .weight_1 <dbl>,
#> #   .fpc_1 <int>

# Stratified with different sizes per stratum (data frame)
facility_sizes <- data.frame(
  facility_type = c("Clinic", "Dispensary", "Health Centre",
                    "Sub-County Hospital", "County Hospital",
                    "Referral Hospital", "Maternity Home"),
  n = c(30, 40, 35, 25, 20, 10, 15)
)
sampling_design() |>
  stratify_by(facility_type) |>
  draw(n = facility_sizes) |>
  execute(kenya_health, seed = 123)
#> == tbl_sample ==
#> Weights: 3.3 - 35.5 (mean: 17.7 )
#> 
#> # A tibble: 175 × 14
#>    facility_type     facility_id region     county urban_rural  beds staff_count
#>  * <fct>             <chr>       <fct>      <fct>  <fct>       <dbl>       <dbl>
#>  1 Referral Hospital KE_37_0042  Western    Busia  Rural         175         123
#>  2 Referral Hospital KE_24_0041  Nyanza     Kisumu Rural         240         101
#>  3 Referral Hospital KE_26_0065  Nyanza     Nyami… Rural         305         153
#>  4 Referral Hospital KE_23_0059  Nyanza     Kisii  Rural         254         258
#>  5 Referral Hospital KE_03_0047  Central    Muran… Urban         420         118
#>  6 Referral Hospital KE_17_0066  Eastern    Thara… Rural         201         128
#>  7 Referral Hospital KE_26_0055  Nyanza     Nyami… Rural         163         336
#>  8 Referral Hospital KE_28_0078  Rift Vall… Barin… Rural         372         146
#>  9 Referral Hospital KE_18_0051  Nairobi    Nairo… Urban         419         221
#> 10 Referral Hospital KE_04_0035  Central    Nyand… Rural         221         124
#> # ℹ 165 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified with different rates per stratum (named vector)
sampling_design() |>
  stratify_by(size_class) |>
  draw(frac = c(Micro = 0.01, Small = 0.05, Medium = 0.20, Large = 0.50)) |>
  execute(nigeria_business, seed = 42)
#> == tbl_sample ==
#> Weights: 2 - 99.15 (mean: 14.72 )
#> 
#> # A tibble: 789 × 12
#>    size_class enterprise_id zone  state sector employees annual_turnover .weight
#>  * <fct>      <chr>         <fct> <fct> <fct>      <dbl>           <dbl>   <dbl>
#>  1 Micro      NG_27_00194   Sout… Baye… Manuf…         3         7576000    99.2
#>  2 Micro      NG_33_00528   Sout… Lagos Hospi…         2         4142000    99.2
#>  3 Micro      NG_26_00035   Sout… Akwa… Servi…         4         7047000    99.2
#>  4 Micro      NG_34_00596   Sout… Ogun  Retai…         4         9600000    99.2
#>  5 Micro      NG_15_00004   Nort… Kadu… Retai…         3         7316000    99.2
#>  6 Micro      NG_17_00008   Nort… Kats… Trans…         2         5477000    99.2
#>  7 Micro      NG_06_00045   Nort… Plat… Hospi…         4        10406000    99.2
#>  8 Micro      NG_24_00141   Sout… Enugu Trans…         2         3140000    99.2
#>  9 Micro      NG_34_00549   Sout… Ogun  Manuf…         3         5295000    99.2
#> 10 Micro      NG_35_00120   Sout… Ondo  Hospi…         1         2816000    99.2
#> # ℹ 779 more rows
#> # ℹ 4 more variables: .sample_id <int>, .stage <int>, .weight_1 <dbl>,
#> #   .fpc_1 <int>

# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
  stratify_by(region, alloc = "neyman", variance = niger_eas_variance) |>
  draw(n = 150, min_n = 2) |>
  execute(niger_eas, seed = 2026)
#> == tbl_sample ==
#> Weights: 7.29 - 13 (mean: 10.24 )
#> 
#> # A tibble: 150 × 11
#>    region ea_id       department strata hh_count pop_estimate .weight .sample_id
#>  * <fct>  <chr>       <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Agadez Aga_03_0002 Bilma      Urban       279         1395    7.29          1
#>  2 Agadez Aga_03_0006 Bilma      Rural        98          490    7.29          2
#>  3 Agadez Aga_04_0001 Tchirozér… Rural        87          609    7.29          3
#>  4 Agadez Aga_04_0008 Tchirozér… Rural        76          456    7.29          4
#>  5 Agadez Aga_04_0010 Tchirozér… Rural        37          185    7.29          5
#>  6 Agadez Aga_02_0014 Arlit      Urban       142          710    7.29          6
#>  7 Agadez Aga_04_0007 Tchirozér… Urban       121          847    7.29          7
#>  8 Diffa  Dif_06_0003 Mainé-Sor… Rural        56          448   13             8
#>  9 Diffa  Dif_06_0015 Mainé-Sor… Rural        93          558   13             9
#> 10 Diffa  Dif_08_0005 Bosso      Rural        66          462   13            10
#> # ℹ 140 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Proportional allocation with min and max bounds
sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 200, min_n = 10, max_n = 50) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 5.1 - 8.03 (mean: 7.68 )
#> 
#> # A tibble: 200 × 11
#>    region ea_id       department strata hh_count pop_estimate .weight .sample_id
#>  * <fct>  <chr>       <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Agadez Aga_04_0012 Tchirozér… Urban       431         2586     5.1          1
#>  2 Agadez Aga_03_0010 Bilma      Urban       259         1554     5.1          2
#>  3 Agadez Aga_01_0001 Agadez     Rural        59          413     5.1          3
#>  4 Agadez Aga_02_0012 Arlit      Rural       137          959     5.1          4
#>  5 Agadez Aga_01_0010 Agadez     Urban       192         1344     5.1          5
#>  6 Agadez Aga_03_0009 Bilma      Rural        41          287     5.1          6
#>  7 Agadez Aga_02_0005 Arlit      Urban       138          966     5.1          7
#>  8 Agadez Aga_02_0011 Arlit      Rural        70          350     5.1          8
#>  9 Agadez Aga_01_0007 Agadez     Rural       166         1162     5.1          9
#> 10 Agadez Aga_04_0009 Tchirozér… Rural       107          856     5.1         10
#> # ℹ 190 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with serpentine ordering (implicit stratification)
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = serp(region, department)) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 15.36 - 15.36 (mean: 15.36 )
#> 
#> # A tibble: 100 × 11
#>    ea_id       region department strata hh_count pop_estimate .weight .sample_id
#>  * <chr>       <fct>  <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Aga_02_0002 Agadez Arlit      Rural        63          315    15.4          1
#>  2 Aga_03_0003 Agadez Bilma      Urban       154          770    15.4          2
#>  3 Aga_04_0008 Agadez Tchirozér… Rural        76          456    15.4          3
#>  4 Dif_07_0010 Diffa  N'Guigmi   Rural        65          325    15.4          4
#>  5 Dif_06_0008 Diffa  Mainé-Sor… Rural        46          322    15.4          5
#>  6 Dif_05_0008 Diffa  Diffa      Rural        36          288    15.4          6
#>  7 Dif_08_0006 Diffa  Bosso      Urban       185         1480    15.4          7
#>  8 Dos_10_0006 Dosso  Boboye     Rural        39          312    15.4          8
#>  9 Dos_10_0021 Dosso  Boboye     Urban       162          810    15.4          9
#> 10 Dos_11_0002 Dosso  Dogondout… Rural        43          258    15.4         10
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with nested (standard) ordering
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = c(region, department)) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 15.36 - 15.36 (mean: 15.36 )
#> 
#> # A tibble: 100 × 11
#>    ea_id       region department strata hh_count pop_estimate .weight .sample_id
#>  * <chr>       <fct>  <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Aga_02_0002 Agadez Arlit      Rural        63          315    15.4          1
#>  2 Aga_03_0003 Agadez Bilma      Urban       154          770    15.4          2
#>  3 Aga_04_0008 Agadez Tchirozér… Rural        76          456    15.4          3
#>  4 Dif_08_0010 Diffa  Bosso      Rural        86          602    15.4          4
#>  5 Dif_05_0010 Diffa  Diffa      Rural        54          378    15.4          5
#>  6 Dif_06_0007 Diffa  Mainé-Sor… Rural        53          318    15.4          6
#>  7 Dif_07_0008 Diffa  N'Guigmi   Rural        83          581    15.4          7
#>  8 Dos_10_0006 Dosso  Boboye     Rural        39          312    15.4          8
#>  9 Dos_10_0021 Dosso  Boboye     Urban       162          810    15.4          9
#> 10 Dos_11_0002 Dosso  Dogondout… Rural        43          258    15.4         10
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Combined explicit stratification with control sorting within strata
sampling_design() |>
  stratify_by(strata) |>
  draw(n = 50, method = "systematic",
       control = serp(region, department)) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 6.62 - 24.1 (mean: 15.36 )
#> 
#> # A tibble: 100 × 11
#>    strata ea_id       region department hh_count pop_estimate .weight .sample_id
#>  * <fct>  <chr>       <fct>  <fct>         <dbl>        <dbl>   <dbl>      <int>
#>  1 Urban  Aga_02_0001 Agadez Arlit           128          640    6.62          1
#>  2 Urban  Aga_03_0002 Agadez Bilma           279         1395    6.62          2
#>  3 Urban  Aga_04_0005 Agadez Tchirozér…       75          525    6.62          3
#>  4 Urban  Dif_06_0010 Diffa  Mainé-Sor…      119          714    6.62          4
#>  5 Urban  Dos_10_0021 Dosso  Boboye          162          810    6.62          5
#>  6 Urban  Dos_12_0023 Dosso  Gaya            101          707    6.62          6
#>  7 Urban  Mar_20_0038 Maradi Tessaoua        204         1224    6.62          7
#>  8 Urban  Mar_14_0026 Maradi Maradi          130          780    6.62          8
#>  9 Urban  Mar_18_0022 Maradi Madarounfa      100          700    6.62          9
#> 10 Urban  Mar_17_0010 Maradi Guidan-Ro…      120          720    6.62         10
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS with certainty selection (absolute threshold)
# Large EAs selected with certainty, rest sampled with PPS
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = hh_count,
       certainty_size = 500) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1 - 9.71 (mean: 2.17 )
#> 
#> # A tibble: 716 × 12
#>    region ea_id       department strata hh_count pop_estimate .weight .sample_id
#>  * <fct>  <chr>       <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Agadez Aga_01_0001 Agadez     Rural        59          413       1          1
#>  2 Agadez Aga_01_0002 Agadez     Urban       157          942       1          2
#>  3 Agadez Aga_01_0003 Agadez     Urban       124          868       1          3
#>  4 Agadez Aga_01_0004 Agadez     Rural       146         1022       1          4
#>  5 Agadez Aga_01_0005 Agadez     Urban       112          896       1          5
#>  6 Agadez Aga_01_0006 Agadez     Rural       182         1092       1          6
#>  7 Agadez Aga_01_0007 Agadez     Rural       166         1162       1          7
#>  8 Agadez Aga_01_0008 Agadez     Urban        54          432       1          8
#>  9 Agadez Aga_01_0009 Agadez     Rural        97          582       1          9
#> 10 Agadez Aga_01_0010 Agadez     Urban       192         1344       1         10
#> # ℹ 706 more rows
#> # ℹ 4 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>

# PPS with certainty selection (proportional threshold)
# EAs with >= 10% of stratum total selected with certainty
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_systematic", mos = hh_count,
       certainty_prop = 0.10) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1 - 13.19 (mean: 2.14 )
#> 
#> # A tibble: 716 × 12
#>    region ea_id       department strata hh_count pop_estimate .weight .sample_id
#>  * <fct>  <chr>       <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Agadez Aga_01_0001 Agadez     Rural        59          413       1          1
#>  2 Agadez Aga_01_0002 Agadez     Urban       157          942       1          2
#>  3 Agadez Aga_01_0003 Agadez     Urban       124          868       1          3
#>  4 Agadez Aga_01_0004 Agadez     Rural       146         1022       1          4
#>  5 Agadez Aga_01_0005 Agadez     Urban       112          896       1          5
#>  6 Agadez Aga_01_0006 Agadez     Rural       182         1092       1          6
#>  7 Agadez Aga_01_0007 Agadez     Rural       166         1162       1          7
#>  8 Agadez Aga_01_0008 Agadez     Urban        54          432       1          8
#>  9 Agadez Aga_01_0009 Agadez     Rural        97          582       1          9
#> 10 Agadez Aga_01_0010 Agadez     Urban       192         1344       1         10
#> # ℹ 706 more rows
#> # ℹ 4 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>

# Stratum-specific certainty thresholds (data frame)
cert_thresholds <- data.frame(
  region = c("Agadez", "Diffa", "Dosso", "Maradi",
             "Niamey", "Tahoua", "Tillaberi", "Zinder"),
  certainty_size = c(1000, 500, 600, 700, 300, 800, 650, 750)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = hh_count,
       certainty_size = cert_thresholds) |>
  execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1 - 9.71 (mean: 2.17 )
#> 
#> # A tibble: 716 × 12
#>    region ea_id       department strata hh_count pop_estimate .weight .sample_id
#>  * <fct>  <chr>       <fct>      <fct>     <dbl>        <dbl>   <dbl>      <int>
#>  1 Agadez Aga_01_0001 Agadez     Rural        59          413       1          1
#>  2 Agadez Aga_01_0002 Agadez     Urban       157          942       1          2
#>  3 Agadez Aga_01_0003 Agadez     Urban       124          868       1          3
#>  4 Agadez Aga_01_0004 Agadez     Rural       146         1022       1          4
#>  5 Agadez Aga_01_0005 Agadez     Urban       112          896       1          5
#>  6 Agadez Aga_01_0006 Agadez     Rural       182         1092       1          6
#>  7 Agadez Aga_01_0007 Agadez     Rural       166         1162       1          7
#>  8 Agadez Aga_01_0008 Agadez     Urban        54          432       1          8
#>  9 Agadez Aga_01_0009 Agadez     Rural        97          582       1          9
#> 10 Agadez Aga_01_0010 Agadez     Urban       192         1344       1         10
#> # ℹ 706 more rows
#> # ℹ 4 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> #   .certainty_1 <lgl>