Skip to contents

draw() specifies how units are selected: sample size, sampling fraction, selection method, and measure of size for PPS sampling. Every stage in a sampling design must end with draw().

Usage

draw(
  .data,
  n = NULL,
  frac = NULL,
  min_n = NULL,
  max_n = NULL,
  method = "srswor",
  mos = NULL,
  prn = NULL,
  aux = NULL,
  round = "up",
  control = NULL,
  certainty_size = NULL,
  certainty_prop = NULL,
  certainty_overflow = "error",
  on_empty = "error"
)

Arguments

.data

A sampling_design object (piped from sampling_design(), stratify_by(), or cluster_by()).

n

Sample size. For random-size methods (bernoulli, pps_poisson), n is the expected sample size (converted internally to frac = n / N). Can be:

  • A scalar: applies per stratum (if no alloc) or as total (if alloc specified)

  • A named vector: stratum-specific sizes (for single stratification variable)

  • A data frame: stratum-specific sizes with stratification columns + n column

frac

Sampling fraction. Can be:

  • A scalar: same fraction for all strata

  • A named vector: stratum-specific fractions

  • A data frame: stratum-specific fractions with stratification columns + frac column Only one of n or frac should be specified.

min_n

Minimum sample size per stratum. When an allocation method (e.g., Neyman, proportional) would assign fewer than min_n units to a stratum, that stratum receives min_n units instead. The excess is redistributed proportionally among strata that were above min_n. Commonly set to 2 (minimum for variance estimation) or higher for reliable subgroup estimates. Only applies when stratification with an allocation method is used. Default is NULL (no minimum).

max_n

Maximum sample size per stratum. When an allocation method would assign more than max_n units to a stratum, that stratum is capped at max_n units. The surplus is redistributed proportionally among strata that were below max_n. Useful for capping dominant strata or managing operational constraints. Only applies when stratification with an allocation method is used. Default is NULL (no maximum).

method

Character string specifying the selection method. One of:

Equal probability methods:

  • "srswor" (default): Simple random sampling without replacement

  • "srswr": Simple random sampling with replacement

  • "systematic": Systematic (fixed interval) sampling

  • "bernoulli": Independent Bernoulli trials (random sample size)

PPS methods (require mos):

  • "pps_systematic": PPS systematic sampling

  • "pps_brewer": Generalized Brewer (Tillé) method

  • "pps_cps": Conditional Poisson sampling (maximum entropy)

  • "pps_poisson": PPS Poisson sampling (random sample size)

  • "pps_sps": Sequential Poisson sampling (fixed size, supports prn)

  • "pps_pareto": Pareto sampling (fixed size, supports prn)

  • "pps_multinomial": PPS multinomial (with replacement, any hit count)

  • "pps_chromy": Chromy's sequential PPS (minimum replacement)

Balanced sampling:

  • "balanced": Balanced sampling via the cube method (Deville & Tille 2004). Uses auxiliary variables (aux) to balance the sample so that Horvitz-Thompson estimates of auxiliary totals match population totals. Supports equal or unequal (mos) inclusion probabilities. When stratified, uses the stratified cube algorithm (Chauvet 2009). At most 2 stages may use "balanced".

mos

Measure of size variable for PPS methods and optional for "balanced", specified as a bare column name (unquoted). Required for all pps_* methods.

prn

Permanent random number variable for sample coordination, specified as a bare column name (unquoted). Must be a numeric column with values in the open interval (0, 1) and no missing values. Supported methods: "bernoulli", "pps_poisson", "pps_sps", "pps_pareto". When supplied, the sample is deterministic for a given set of PRN values, enabling coordination across survey waves.

aux

Auxiliary balancing variables for method = "balanced", specified as bare column names: aux = c(income, pop_density). Columns must be numeric with no missing values. The cube algorithm ensures the Horvitz-Thompson estimator of these auxiliary totals equals (or nearly equals) the population totals, improving precision. When used with cluster_by(), auxiliary values are automatically aggregated (summed) to the cluster level before selection.

round

Rounding method when converting frac to sample sizes. One of:

  • "up" (default): Round up (ceiling). Matches SAS SURVEYSELECT default.

  • "down": Round down (floor).

  • "nearest": Round to nearest integer (standard rounding).

This parameter only affects designs using frac to specify the sampling rate. When n is specified directly, no rounding occurs.

control

<data-masking> Variables for sorting the frame before selection. Control sorting provides implicit stratification, which is particularly effective with systematic and sequential sampling methods. Can be:

  • A single variable: control = region

  • Multiple variables: control = c(region, district)

  • With serp() for serpentine sorting: control = serp(region, district)

  • With dplyr::desc() for descending: control = c(region, desc(population))

  • Mixed: control = c(region, serp(district, commune), desc(size))

When stratification is also specified, control sorting is applied within each stratum. See the section "Control Sorting" below for details.

certainty_size

For PPS without-replacement methods, units with MOS >= this value are selected with certainty (probability = 1). Can be:

  • A scalar: same threshold for all strata

  • A data frame: stratum-specific thresholds with stratification columns

    • certainty_size column

Certainty units are removed from the frame before probability sampling, and the remaining sample size is reduced accordingly. Mutually exclusive with certainty_prop. Equivalent to SAS SURVEYSELECT CERTSIZE= option.

certainty_prop

For PPS without-replacement methods, units whose MOS proportion (MOS_i / sum(MOS)) >= this value are selected with certainty. Can be:

  • A scalar between 0 and 1 (exclusive): same threshold for all strata

  • A data frame: stratum-specific thresholds with stratification columns

    • certainty_prop column

Uses iterative selection: after removing certainty units, proportions are recomputed and the check is repeated until no new units qualify. Mutually exclusive with certainty_size. Equivalent to SAS SURVEYSELECT CERTSIZE=P= option.

certainty_overflow

Controls behavior when certainty units exceed the target sample size n. One of:

  • "error" (default): Stop with an informative error.

  • "allow": Return all certainty units with stage weight 1, even if the resulting sample has more than n units.

Equivalent to SAS SURVEYSELECT allowing CERTSIZE= overflow.

on_empty

Behaviour when a random-size method (bernoulli, pps_poisson) selects zero units in a stratum or the whole frame. One of:

  • "error" (default): Stop with an informative error. Zero selections usually indicate a design problem (sampling fraction too small or stratum too small) that should be fixed rather than silently papered over.

  • "warn": Issue a warning and fall back to SRS of 1 unit.

  • "silent": Fall back to SRS of 1 unit without a message.

Weight note: when falling back ("warn" or "silent"), the fallback selects 1 unit via SRS, so the resulting weight is N (the stratum or frame size), not 1/frac. This reflects the actual selection mechanism, not the intended Bernoulli/Poisson design. Downstream variance estimation treats this unit as an SRS draw.

Value

A modified sampling_design object with selection parameters specified.

Details

Selection Methods

Equal Probability Methods

MethodReplacementSample SizeNotes
srsworWithoutFixedStandard SRS
srswrWithFixedAllows duplicates
systematicWithoutFixedPeriodic selection
bernoulliWithoutRandomEach unit selected independently

PPS Methods

MethodReplacementSample SizeNotes
pps_systematicWithoutFixedSimple, some bias
pps_brewerWithoutFixedFast, joint prob > 0
pps_cpsWithoutFixedHighest entropy, joint prob available
pps_poissonWithoutRandomPPS analog of Bernoulli
pps_spsWithoutFixedSequential Poisson, supports prn
pps_paretoWithoutFixedPareto sampling, supports prn
pps_multinomialWithFixedAny hit count, Hansen-Hurwitz
pps_chromyMin. repl.FixedSAS default PPS_SEQ

Balanced Sampling

MethodReplacementSample SizeNotes
balancedWithoutFixedDeville & Tille 2004, uses aux

Parameter Requirements

Methodnfracmosaux
srsworYesor Yes
srswrYesor Yes
systematicYesor Yes
bernoulliExpectedor Yes
pps_systematicYesor YesYes
pps_brewerYesor YesYes
pps_cpsYesYes
pps_poissonExpectedor YesYes
pps_spsYesor YesYes
pps_paretoYesor YesYes
pps_multinomialYesor YesYes
pps_chromyYesor YesYes
balancedYesor YesOptionalOptional

Fixed vs Random Sample Size Methods

Methods with fixed sample size (srswor, srswr, systematic, pps_systematic, pps_brewer, pps_cps, pps_sps, pps_pareto, pps_multinomial, pps_chromy) accept either n or frac. When frac is provided, the sample size is computed based on the round parameter (default: ceiling).

Methods with random sample size (bernoulli, pps_poisson) accept either n or frac. When n is provided, it is converted to frac = n / N (where N is the stratum or frame size). The resulting sample size is still random: n specifies the expected sample size, not a fixed count.

For pps_poisson, the raw inclusion probabilities are computed as \(\pi_i = f \cdot x_i / \bar{x}\) where \(f\) is frac and \(x_i\) is the MOS value. Any \(\pi_i > 1\) is clipped to 1, so the expected sample size \(E[n] = \sum \min(\pi_i, 1)\) can be less than \(f \cdot N\) when large units dominate the MOS distribution. Use certainty_size or certainty_prop to handle these dominant units explicitly.

When an allocation method is set in stratify_by() (equal, proportional, neyman, optimal, power), specify total sample size via n. Combining alloc with frac is not supported.

Custom Allocation with Data Frames

For stratum-specific sample sizes or rates, pass a data frame to n or frac. The data frame must contain:

  • All stratification variable columns (matching those in stratify_by())

  • An n column (for sizes) or frac column (for rates)

Certainty Selection

In PPS without-replacement sampling, very large units can have theoretical inclusion probabilities exceeding 1. Certainty selection handles this by selecting such units with probability 1 before sampling the remainder. The output includes a .certainty_k column (where k is the stage number) indicating which units were certainty selections.

Certainty selection is only available for WOR PPS methods (pps_systematic, pps_brewer, pps_cps, pps_poisson, pps_sps, pps_pareto). With-replacement methods (pps_multinomial) and PMR methods (pps_chromy) handle large units natively through their hit mechanism.

When certainty_overflow = "allow", if more units qualify for certainty selection than the requested n, all certainty units are returned with probability 1 (stage weight = 1). No probabilistic sampling is performed in this case. The resulting sample size will be the number of certainty units, which exceeds n. In multi-stage designs, the final .weight can still exceed 1 because it compounds all stage weights.

For stratum-specific thresholds, pass a data frame containing:

  • All stratification variable columns

  • A certainty_size or certainty_prop column

Control Sorting

Control sorting orders the sampling frame before selection, providing implicit stratification. This is particularly effective with systematic and sequential methods (systematic, pps_systematic, pps_chromy), where it ensures the sample spreads evenly across the sorted variables.

Serpentine vs Nested Sorting:

  • Nested (default): Standard ascending sort by each variable in order. Use control = c(var1, var2, var3).

  • Serpentine: Alternating direction that minimizes "jumps" between adjacent units. Use control = serp(var1, var2, var3).

Serpentine sorting makes nearby observations more similar by reversing direction at each hierarchy level. For geographic hierarchies, this means the last district of region 1 is adjacent to the last district of region 2.

Combining with Explicit Stratification: When both stratify_by() and control are used, sorting is applied within each stratum. This allows explicit stratification for variance control combined with implicit stratification for sample spread.

References

srswor, srswr, systematic, bernoulli, pps_systematic, pps_multinomial: Cochran, W.G. (1977). Sampling Techniques, 3rd ed. Wiley.

pps_brewer: Brewer, K.R.W. (1975). A simple procedure for sampling PPS WOR. Australian Journal of Statistics, 17(3), 166-172.

pps_cps: Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35(4), 1491-1523.

Chen, X.-H., Dempster, A.P. and Liu, J.S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.

pps_poisson: Tillé, Y. (2006). Sampling Algorithms. Springer.

pps_sps: Ohlsson, E. (1998). Sequential Poisson sampling. Journal of Official Statistics, 14(2), 149-162.

pps_pareto: Rosén, B. (1997). Asymptotic theory for order sampling. Journal of Statistical Planning and Inference, 62(2), 135-158.

pps_chromy: Chromy, J.R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.

balanced: Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912.

Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35(1), 115-119.

See also

sampling_design() for creating designs, stratify_by() for stratification, cluster_by() for clustering, execute() for running designs, serp() for serpentine sorting

Examples

# Simple random sample of 100 EAs
sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights:      149.34 [149.34, 149.34]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_10182 Boucle … Mouhoun  Ouarko… Rural             1347        185    33.0 
#>  2 EA_14571 Centre-… Nahouri  Zecco   Urban             2829        452     8.91
#>  3 EA_03356 Centre-… Kourite… Dialga… Rural             1010        150    19.0 
#>  4 EA_01856 Hauts-B… Houet    Bobo-D… Urban             1938        311     0.32
#>  5 EA_14703 Plateau… Oubrite… Ziniare Rural             1320        188     3   
#>  6 EA_10602 Est      Tapoa    Partia… Rural             1393        191    17.2 
#>  7 EA_08087 Plateau… Ganzour… Mogtedo Rural             2284        285     1.99
#>  8 EA_05693 Hauts-B… Houet    Karang… Rural             1295        181    19.0 
#>  9 EA_01975 Est      Gnagna   Bogande Rural             2018        276    22.2 
#> 10 EA_04482 Centre-… Boulgou  Garango Rural             1211        180    18.3 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Systematic sample of 10%
sampling_design() |>
  draw(frac = 0.10, method = "systematic") |>
  execute(bfa_eas, seed = 123)
#> # A tbl_sample: 1494 × 17
#> # Weights:      10 [10, 10]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00247 Boucle … Bale     Bagassi Rural             1666        200    33.4 
#>  2 EA_00257 Boucle … Bale     Bagassi Rural              931        112    18.1 
#>  3 EA_00267 Boucle … Bale     Bagassi Rural             1319        159    17.1 
#>  4 EA_00277 Boucle … Bale     Bagassi Rural             1740        209    15.8 
#>  5 EA_00464 Boucle … Bale     Bana    Rural              915        120     7.95
#>  6 EA_02161 Boucle … Bale     Boromo  Rural              805        105     9.43
#>  7 EA_02171 Boucle … Bale     Boromo  Rural             1644        214    15.3 
#>  8 EA_02181 Boucle … Bale     Boromo  Rural              996        130     7.42
#>  9 EA_04219 Boucle … Bale     Fara    Rural             1491        218    22.2 
#> 10 EA_04229 Boucle … Bale     Fara    Rural             1187        174    17.6 
#> # ℹ 1,484 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS sample of EAs using household count
sampling_design() |>
  cluster_by(ea_id) |>
  draw(n = 50, method = "pps_brewer", mos = households) |>
  execute(bfa_eas, seed = 42)
#> # A tbl_sample: 50 × 18
#> # Weights:      317.19 [91.16, 769.04]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_02179 Boucle … Bale     Boromo  Rural             1269        165    61.4 
#>  2 EA_10348 Boucle … Bale     Ouri    Rural              783        100    74.8 
#>  3 EA_03775 Boucle … Kossi    Dokui   Rural             1188        147    26.1 
#>  4 EA_13705 Boucle … Sourou   Toeni   Rural              753         78     7.82
#>  5 EA_03997 Cascades Leraba   Douna   Rural             1854        241     7   
#>  6 EA_06716 Centre   Kadiogo  Koubri  Rural             1571        309     2.61
#>  7 EA_08811 Centre   Kadiogo  Ouagad… Urban             2495        376     0.38
#>  8 EA_09492 Centre   Kadiogo  Ouagad… Urban             1922        289     0.39
#>  9 EA_09970 Centre   Kadiogo  Ouagad… Urban             2349        354     0.34
#> 10 EA_00984 Centre-… Boulgou  Beguedo Urban             3699        658     3.17
#> # ℹ 40 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Bernoulli sampling with frac (random sample size, expected ~5%)
sampling_design() |>
  draw(frac = 0.05, method = "bernoulli") |>
  execute(ken_enterprises, seed = 12345)
#> # A tbl_sample: 309 × 14
#> # Weights:      20 [20, 20]
#>    enterprise_id county region sector      size_class employees revenue_millions
#>  * <chr>         <fct>  <fct>  <fct>       <fct>          <int>            <dbl>
#>  1 KEN_00010     Kiambu Kiambu Food & Bev… Medium            30             93.4
#>  2 KEN_00020     Kiambu Kiambu Food & Bev… Large            668            305. 
#>  3 KEN_00023     Kiambu Kiambu Other Manu… Small             16            133. 
#>  4 KEN_00051     Kiambu Kiambu Other Serv… Small              8              4.9
#>  5 KEN_00063     Kiambu Kiambu Other Serv… Small             10             10.9
#>  6 KEN_00065     Kiambu Kiambu Other Serv… Small             18             10.6
#>  7 KEN_00077     Kiambu Kiambu Other Serv… Small              8              2.7
#>  8 KEN_00103     Kiambu Kiambu Other Serv… Medium            40             28.5
#>  9 KEN_00117     Kiambu Kiambu Other Serv… Medium            60             47.9
#> 10 KEN_00146     Kiambu Kiambu Retail      Small             11             26  
#> # ℹ 299 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Bernoulli sampling with expected n (converted to frac = 500/N)
sampling_design() |>
  draw(n = 500, method = "bernoulli") |>
  execute(bfa_eas, seed = 42)
#> # A tbl_sample: 503 × 17
#> # Weights:      29.87 [29.87, 29.87]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00261 Boucle … Bale     Bagassi Rural             1128        136    24.0 
#>  2 EA_00267 Boucle … Bale     Bagassi Rural             1319        159    17.1 
#>  3 EA_00465 Boucle … Bale     Bana    Rural             1381        181     3.79
#>  4 EA_02157 Boucle … Bale     Boromo  Rural              129         17     8.12
#>  5 EA_02170 Boucle … Bale     Boromo  Rural              537         70    45.7 
#>  6 EA_12117 Boucle … Bale     Siby    Rural             1555        230    45.7 
#>  7 EA_00377 Boucle … Banwa    Balave  Rural             1310        206    27.1 
#>  8 EA_06898 Boucle … Banwa    Kouka   Rural             1169        161     0.99
#>  9 EA_11715 Boucle … Banwa    Sanaba  Rural             1844        235    14.0 
#> 10 EA_11723 Boucle … Banwa    Sanaba  Rural             2060        262     3.13
#> # ℹ 493 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified with different sizes per stratum (data frame)
region_sizes <- data.frame(
  region = levels(bfa_eas$region),
  n = c(20, 12, 25, 18, 22, 16, 14, 15, 20, 18, 12, 10, 8)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = region_sizes) |>
  execute(bfa_eas, seed = 123)
#> # A tbl_sample: 210 × 17
#> # Weights:      71.11 [43.93, 106]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_12443 Boucle … Banwa    Solenzo Rural             1395        187     8.03
#>  2 EA_12900 Boucle … Banwa    Tansila Rural             1164        140    39.1 
#>  3 EA_11082 Boucle … Bale     Pompoi  Rural             1198        162    33.3 
#>  4 EA_00767 Boucle … Kossi    Barani  Rural             1433        188    17.5 
#>  5 EA_12105 Boucle … Bale     Siby    Rural              912        135     8.55
#>  6 EA_03217 Boucle … Mouhoun  Dedoug… Rural              998        177     3.4 
#>  7 EA_04552 Boucle … Nayala   Gassan  Rural              649         72     0.49
#>  8 EA_04732 Boucle … Sourou   Gomboro Rural             1825        257    32.9 
#>  9 EA_14304 Boucle … Nayala   Ye      Rural             1323        190    10.9 
#> 10 EA_14319 Boucle … Nayala   Ye      Rural             1030        148    18.1 
#> # ℹ 200 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified with different rates per stratum (named vector)
sampling_design() |>
  stratify_by(size_class) |>
  draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
  execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 758 × 14
#> # Weights:      9 [2, 49.85]
#>    enterprise_id county      region sector size_class employees revenue_millions
#>  * <chr>         <fct>       <fct>  <fct>  <fct>          <int>            <dbl>
#>  1 KEN_05526     Homa Bay    Rest … Retail Small              8             18.8
#>  2 KEN_05090     Nyamira     Rest … Other… Small              6              4.7
#>  3 KEN_02534     Nairobi     Nairo… Retail Small             14             38.4
#>  4 KEN_02455     Nairobi     Nairo… Retail Small             11              7.5
#>  5 KEN_02609     Nairobi     Nairo… Retail Small              5              8.4
#>  6 KEN_06669     Uasin Gishu Uasin… Other… Small             11             16.1
#>  7 KEN_01498     Nairobi     Nairo… Other… Small              8             13.4
#>  8 KEN_04590     Kisii       Rest … Other… Small             10             11.2
#>  9 KEN_02509     Nairobi     Nairo… Retail Small              8             19.3
#> 10 KEN_02684     Nairobi     Nairo… Retail Small             16             23.7
#> # ℹ 748 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
  stratify_by(region, alloc = "neyman", variance = bfa_eas_variance) |>
  draw(n = 150, min_n = 2) |>
  execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 150 × 17
#> # Weights:      99.56 [71.24, 160.88]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_08679 Boucle … Kossi    Nouna   Rural             1393        206    17.6 
#>  2 EA_10158 Boucle … Mouhoun  Ouarko… Rural             1123        154    23.4 
#>  3 EA_06908 Boucle … Banwa    Kouka   Rural             1229        169     9.3 
#>  4 EA_04542 Boucle … Nayala   Gassan  Rural             1556        172    42.6 
#>  5 EA_10401 Boucle … Bale     Pa      Rural             1150        165    82.3 
#>  6 EA_13746 Boucle … Nayala   Toma    Rural              955        138     0.98
#>  7 EA_12417 Boucle … Banwa    Solenzo Rural             2953        397    25.6 
#>  8 EA_06901 Boucle … Banwa    Kouka   Rural             1723        237    17.7 
#>  9 EA_02110 Boucle … Mouhoun  Bondok… Rural             1413        216    16.8 
#> 10 EA_11717 Boucle … Banwa    Sanaba  Rural              929        118     1.9 
#> # ℹ 140 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Proportional allocation with min and max bounds
sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 200, min_n = 10, max_n = 50) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 200 × 17
#> # Weights:      74.67 [61.5, 78.05]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_10182 Boucle … Mouhoun  Ouarko… Rural             1347        185    33.0 
#>  2 EA_03982 Boucle … Kossi    Doumba… Rural             1558        191    16.6 
#>  3 EA_10352 Boucle … Bale     Ouri    Rural              767         98    17.6 
#>  4 EA_03209 Boucle … Mouhoun  Dedoug… Rural              446         79    18.1 
#>  5 EA_12908 Boucle … Banwa    Tansila Rural             1076        129     9.45
#>  6 EA_11640 Boucle … Banwa    Sami    Rural              912        137    43.2 
#>  7 EA_06884 Boucle … Banwa    Kouka   Rural             1642        226    12.5 
#>  8 EA_13757 Boucle … Nayala   Toma    Rural              973        141     0.88
#>  9 EA_05785 Boucle … Sourou   Kassoum Rural             1315        195     1.5 
#> 10 EA_03598 Boucle … Kossi    Djibas… Rural              725         93     1.01
#> # ℹ 190 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with serpentine ordering (implicit stratification)
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = serp(region, province)) |>
  execute(bfa_eas, seed = 2)
#> # A tbl_sample: 100 × 17
#> # Weights:      149.34 [149.34, 149.34]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00272 Boucle … Bale     Bagassi Rural              950        114     1.01
#>  2 EA_11080 Boucle … Bale     Pompoi  Rural             1686        228    37.8 
#>  3 EA_11730 Boucle … Banwa    Sanaba  Rural             1007        128    36.1 
#>  4 EA_12913 Boucle … Banwa    Tansila Rural             1277        153    22.7 
#>  5 EA_03626 Boucle … Kossi    Djibas… Rural             1108        142     3.31
#>  6 EA_08721 Boucle … Kossi    Nouna   Rural             1295        191    47.2 
#>  7 EA_03203 Boucle … Mouhoun  Dedoug… Rural             1810        321    28.9 
#>  8 EA_12937 Boucle … Mouhoun  Tcheri… Rural             1202        192     8.67
#>  9 EA_14028 Boucle … Nayala   Yaba    Rural              796        105    42.2 
#> 10 EA_06013 Boucle … Sourou   Kiemba… Rural              898        117    35.3 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with nested (standard) ordering
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = c(region, province)) |>
  execute(bfa_eas, seed = 3)
#> # A tbl_sample: 100 × 17
#> # Weights:      149.34 [149.34, 149.34]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00270 Boucle … Bale     Bagassi Rural             1080        130    37.0 
#>  2 EA_11078 Boucle … Bale     Pompoi  Rural             1021        138    35.6 
#>  3 EA_11727 Boucle … Banwa    Sanaba  Rural             1304        166    61.5 
#>  4 EA_12911 Boucle … Banwa    Tansila Rural             1003        120    38.1 
#>  5 EA_03624 Boucle … Kossi    Djibas… Rural             1615        206    11.0 
#>  6 EA_08718 Boucle … Kossi    Nouna   Rural              729        108     1.05
#>  7 EA_03201 Boucle … Mouhoun  Dedoug… Rural             1223        217    17.8 
#>  8 EA_12935 Boucle … Mouhoun  Tcheri… Rural             1170        187     9.7 
#>  9 EA_14025 Boucle … Nayala   Yaba    Rural              982        130     1.68
#> 10 EA_06011 Boucle … Sourou   Kiemba… Rural             1199        156    14.8 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Combined explicit stratification with control sorting within strata
sampling_design() |>
  stratify_by(urban_rural) |>
  draw(n = 50, method = "systematic",
       control = serp(region, province)) |>
  execute(bfa_eas, seed = 25)
#> # A tbl_sample: 100 × 17
#> # Weights:      149.34 [53.12, 245.56]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_04239 Boucle … Bale     Fara    Rural              611         89    17.3 
#>  2 EA_12385 Boucle … Banwa    Solenzo Rural             1235        166    20.8 
#>  3 EA_03604 Boucle … Kossi    Djibas… Rural             1389        178     5.06
#>  4 EA_03127 Boucle … Mouhoun  Dedoug… Rural              736        131     8.98
#>  5 EA_12958 Boucle … Mouhoun  Tcheri… Rural             1197        191    63.8 
#>  6 EA_05793 Boucle … Sourou   Kassoum Rural              812        121    21.4 
#>  7 EA_08531 Cascades Leraba   Nianko… Rural             1100        216    14.8 
#>  8 EA_07751 Cascades Comoe    Mangod… Rural             1140        157    32.7 
#>  9 EA_12224 Cascades Comoe    Sidera… Rural             1196        191    46.3 
#> 10 EA_00072 Centre-… Kourite… Andemt… Rural             1945        311    19.6 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS with certainty selection (absolute threshold)
# Large EAs selected with certainty, rest sampled with PPS
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = households,
       certainty_size = 800) |>
  execute(bfa_eas, seed = 3)
#> # A tbl_sample: 1300 × 18
#> # Weights:      11.21 [1, 55.43]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00381 Boucle … Banwa    Balave  Rural             2280        359    48.0 
#>  2 EA_13762 Boucle … Nayala   Toma    Rural             1464        212     4.78
#>  3 EA_03582 Boucle … Kossi    Djibas… Rural             1760        225    16.4 
#>  4 EA_12916 Boucle … Banwa    Tansila Rural             1038        125    30.1 
#>  5 EA_03183 Boucle … Mouhoun  Dedoug… Rural             1103        196     9.11
#>  6 EA_03185 Boucle … Mouhoun  Dedoug… Rural             1710        303    15.1 
#>  7 EA_11128 Boucle … Bale     Poura   Urban             2631        423    27.6 
#>  8 EA_12452 Boucle … Banwa    Solenzo Rural             1099        148    18.5 
#>  9 EA_03152 Boucle … Mouhoun  Dedoug… Rural             1668        296    10.4 
#> 10 EA_03215 Boucle … Mouhoun  Dedoug… Rural             1320        234     0.88
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# PPS with certainty selection (proportional threshold)
# EAs with >= 10% of stratum total selected with certainty
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_systematic", mos = households,
       certainty_prop = 0.10) |>
  execute(bfa_eas, seed = 321)
#> # A tbl_sample: 1300 × 18
#> # Weights:      11.49 [1.7, 79.19]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00261 Boucle … Bale     Bagassi Rural             1128        136    24.0 
#>  2 EA_00455 Boucle … Bale     Bana    Rural             1150        150    22.6 
#>  3 EA_02158 Boucle … Bale     Boromo  Rural             4122        537    23.5 
#>  4 EA_02172 Boucle … Bale     Boromo  Rural             2111        275     2.75
#>  5 EA_04212 Boucle … Bale     Fara    Rural              708        104    25.8 
#>  6 EA_04227 Boucle … Bale     Fara    Rural             1213        177    24.2 
#>  7 EA_04241 Boucle … Bale     Fara    Rural             1587        232    12.5 
#>  8 EA_04255 Boucle … Bale     Fara    Rural             1194        175     9.82
#>  9 EA_10357 Boucle … Bale     Ouri    Rural             1244        159     1.31
#> 10 EA_10374 Boucle … Bale     Ouri    Rural             1493        191     1.63
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Stratum-specific certainty thresholds (data frame)
cert_thresholds <- data.frame(
  region = levels(bfa_eas$region),
  certainty_size = c(700, 450, 800, 850, 750, 800, 550,
                     450, 700, 950, 750, 600, 480)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = households,
       certainty_size = cert_thresholds) |>
  execute(bfa_eas, seed = 424)
#> # A tbl_sample: 1300 × 18
#> # Weights:      11.67 [1, 59.08]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_13813 Boucle … Sourou   Tougan  Rural             1335        183     9.87
#>  2 EA_03609 Boucle … Kossi    Djibas… Rural             1310        167    16.5 
#>  3 EA_00372 Boucle … Banwa    Balave  Rural              975        154    13.8 
#>  4 EA_12114 Boucle … Bale     Siby    Rural              471         70    20.9 
#>  5 EA_11738 Boucle … Banwa    Sanaba  Rural             1127        143    55.3 
#>  6 EA_02132 Boucle … Mouhoun  Bondok… Rural             1418        216    34.7 
#>  7 EA_06903 Boucle … Banwa    Kouka   Rural             1608        221     7.67
#>  8 EA_11566 Boucle … Mouhoun  Safane  Rural             1141        163    16.4 
#>  9 EA_03144 Boucle … Mouhoun  Dedoug… Rural             1821        323     9.35
#> 10 EA_11638 Boucle … Banwa    Sami    Rural             1095        165    40.6 
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>