Skip to contents

draw() specifies how units are selected: sample size, sampling fraction, selection method, and measure of size for PPS sampling. Every stage in a sampling design must end with draw().

Usage

draw(
  .data,
  n = NULL,
  frac = NULL,
  min_n = NULL,
  max_n = NULL,
  method = "srswor",
  mos = NULL,
  prn = NULL,
  aux = NULL,
  round = "up",
  control = NULL,
  certainty_size = NULL,
  certainty_prop = NULL,
  certainty_overflow = "error",
  on_empty = "error"
)

Arguments

.data

A sampling_design object (piped from sampling_design(), stratify_by(), or cluster_by()).

n

Sample size. For random-size methods (bernoulli, pps_poisson), n is the expected sample size (converted internally to frac = n / N). Can be:

  • A scalar: applies per stratum (if no alloc) or as total (if alloc specified)

  • A named vector: stratum-specific sizes (for single stratification variable)

  • A data frame: stratum-specific sizes with stratification columns + n column

frac

Sampling fraction. Can be:

  • A scalar: same fraction for all strata

  • A named vector: stratum-specific fractions

  • A data frame: stratum-specific fractions with stratification columns + frac column Only one of n or frac should be specified.

min_n

Minimum sample size per stratum. When an allocation method (e.g., Neyman, proportional) would assign fewer than min_n units to a stratum, that stratum receives min_n units instead. The excess is redistributed proportionally among strata that were above min_n. Commonly set to 2 (minimum for variance estimation) or higher for reliable subgroup estimates. Only applies when stratification with an allocation method is used. Default is NULL (no minimum).

max_n

Maximum sample size per stratum. When an allocation method would assign more than max_n units to a stratum, that stratum is capped at max_n units. The surplus is redistributed proportionally among strata that were below max_n. Useful for capping dominant strata or managing operational constraints. Only applies when stratification with an allocation method is used. Default is NULL (no maximum).

method

Character string specifying the selection method. One of:

Equal probability methods:

  • "srswor" (default): Simple random sampling without replacement

  • "srswr": Simple random sampling with replacement

  • "systematic": Systematic (fixed interval) sampling

  • "bernoulli": Independent Bernoulli trials (random sample size)

PPS methods (require mos):

  • "pps_systematic": PPS systematic sampling

  • "pps_brewer": Generalized Brewer (Tillé) method

  • "pps_cps": Conditional Poisson sampling (maximum entropy)

  • "pps_poisson": PPS Poisson sampling (random sample size)

  • "pps_sps": Sequential Poisson sampling (fixed size, supports prn)

  • "pps_pareto": Pareto sampling (fixed size, supports prn)

  • "pps_multinomial": PPS multinomial (with replacement, any hit count)

  • "pps_chromy": Chromy's sequential PPS (minimum replacement)

Balanced sampling:

  • "balanced": Balanced sampling via the cube method (Deville & Tille 2004). Uses auxiliary variables (aux) to balance the sample so that Horvitz-Thompson estimates of auxiliary totals match population totals. Supports equal or unequal (mos) inclusion probabilities. When stratified, uses the stratified cube algorithm (Chauvet 2009). At most 2 stages may use "balanced".

Custom PPS methods registered with sondage::register_method() are also accepted, using the "pps_<name>" convention (e.g., "pps_mymethod").

mos

Measure of size variable for PPS methods and optional for "balanced", specified as a bare column name (unquoted). Required for all pps_* methods.

prn

Permanent random number variable for sample coordination, specified as a bare column name (unquoted). Must be a numeric column with values in the open interval (0, 1) and no missing values. Supported methods: "bernoulli", "pps_poisson", "pps_sps", "pps_pareto". When supplied, the sample is deterministic for a given set of PRN values, enabling coordination across survey waves.

aux

Auxiliary balancing variables for method = "balanced", specified as bare column names: aux = c(income, pop_density). Columns must be numeric with no missing values. The cube algorithm ensures the Horvitz-Thompson estimator of these auxiliary totals equals (or nearly equals) the population totals, improving precision. When used with cluster_by(), auxiliary values are automatically aggregated (summed) to the cluster level before selection.

round

Rounding method when converting frac to sample sizes. One of:

  • "up" (default): Round up (ceiling). Matches SAS SURVEYSELECT default.

  • "down": Round down (floor).

  • "nearest": Round to nearest integer (standard rounding).

This parameter only affects designs using frac to specify the sampling rate. When n is specified directly, no rounding occurs. After rounding, a minimum of 1 is enforced per stratum or group.

control

<data-masking> Variables for sorting the frame before selection. Control sorting provides implicit stratification, which is particularly effective with systematic and sequential sampling methods. Can be:

  • A single variable: control = region

  • Multiple variables: control = c(region, district)

  • With serp() for serpentine sorting: control = serp(region, district)

  • With dplyr::desc() for descending: control = c(region, desc(population))

  • Mixed: control = c(region, serp(district, commune), desc(size))

When stratification is also specified, control sorting is applied within each stratum. See the section "Control Sorting" below for details.

certainty_size

For PPS without-replacement methods, units with MOS >= this value are selected with certainty (probability = 1). Can be:

  • A scalar: same threshold for all strata

  • A data frame: stratum-specific thresholds with stratification columns

    • certainty_size column

Certainty units are removed from the frame before probability sampling, and the remaining sample size is reduced accordingly. Mutually exclusive with certainty_prop. Equivalent to SAS SURVEYSELECT CERTSIZE= option.

certainty_prop

For PPS without-replacement methods, units whose MOS proportion (MOS_i / sum(MOS)) >= this value are selected with certainty. Can be:

  • A scalar between 0 and 1 (exclusive): same threshold for all strata

  • A data frame: stratum-specific thresholds with stratification columns

    • certainty_prop column

Uses iterative selection: after removing certainty units, proportions are recomputed and the check is repeated until no new units qualify. Mutually exclusive with certainty_size. Equivalent to SAS SURVEYSELECT CERTSIZE=P= option.

certainty_overflow

Controls behavior when certainty units exceed the target sample size n. One of:

  • "error" (default): Stop with an informative error.

  • "allow": Return all certainty units with stage weight 1, even if the resulting sample has more than n units.

Equivalent to SAS SURVEYSELECT allowing CERTSIZE= overflow.

on_empty

Behaviour when a random-size method (bernoulli, pps_poisson) selects zero units in a stratum or the whole frame. One of:

  • "error" (default): Stop with an informative error. Zero selections usually indicate a design problem (sampling fraction too small or stratum too small) that should be fixed rather than silently papered over.

  • "warn": Issue a warning and fall back to SRS of 1 unit.

  • "silent": Fall back to SRS of 1 unit without a message.

Weight note: when falling back ("warn" or "silent"), the fallback selects 1 unit via SRS, so the resulting weight is N (the stratum or frame size), not 1/frac. This reflects the actual selection mechanism, not the intended Bernoulli/Poisson design. Downstream variance estimation treats this unit as an SRS draw.

Value

A modified sampling_design object with selection parameters specified.

Details

Selection Methods

Equal Probability Methods

MethodReplacementSample SizeNotes
srsworWithoutFixedStandard SRS
srswrWithFixedAllows duplicates
systematicWithoutFixedPeriodic selection
bernoulliWithoutRandomEach unit selected independently

PPS Methods

MethodReplacementSample SizeNotes
pps_systematicWithoutFixedSimple, some bias
pps_brewerWithoutFixedFast, joint prob > 0
pps_cpsWithoutFixedHighest entropy, joint prob available
pps_poissonWithoutRandomPPS analog of Bernoulli
pps_spsWithoutFixedSequential Poisson, supports prn
pps_paretoWithoutFixedPareto sampling, supports prn
pps_multinomialWithFixedAny hit count, Hansen-Hurwitz
pps_chromyMin. repl.FixedSAS default PPS_SEQ

Balanced Sampling

MethodReplacementSample SizeNotes
balancedWithoutFixedDeville & Tille 2004, uses aux

Parameter Requirements

Methodnfracmosaux
srsworYesor Yes
srswrYesor Yes
systematicYesor Yes
bernoulliExpectedor Yes
pps_systematicYesor YesYes
pps_brewerYesor YesYes
pps_cpsYesYes
pps_poissonExpectedor YesYes
pps_spsYesor YesYes
pps_paretoYesor YesYes
pps_multinomialYesor YesYes
pps_chromyYesor YesYes
balancedYesor YesOptionalOptional

Fixed vs Random Sample Size Methods

Methods with fixed sample size (srswor, srswr, systematic, pps_systematic, pps_brewer, pps_cps, pps_sps, pps_pareto, pps_multinomial, pps_chromy) accept either n or frac. When frac is provided, the sample size is computed based on the round parameter (default: ceiling).

Methods with random sample size (bernoulli, pps_poisson) accept either n or frac. When n is provided, it is converted to frac = n / N (where N is the stratum or frame size). The resulting sample size is still random: n specifies the expected sample size, not a fixed count.

For pps_poisson, the raw inclusion probabilities are computed as \(\pi_i = f \cdot x_i / \bar{x}\) where \(f\) is frac and \(x_i\) is the MOS value. Any \(\pi_i > 1\) is clipped to 1, so the expected sample size \(E[n] = \sum \min(\pi_i, 1)\) can be less than \(f \cdot N\) when large units dominate the MOS distribution. Use certainty_size or certainty_prop to handle these dominant units explicitly.

When an allocation method is set in stratify_by() (equal, proportional, neyman, optimal, power), specify total sample size via n. Combining alloc with frac is not supported.

Custom Allocation with Data Frames

For stratum-specific sample sizes or rates, pass a data frame to n or frac. The data frame must contain:

  • All stratification variable columns (matching those in stratify_by())

  • An n column (for sizes) or frac column (for rates)

Certainty Selection

In PPS without-replacement sampling, very large units can have theoretical inclusion probabilities exceeding 1. Certainty selection handles this by selecting such units with probability 1 before sampling the remainder. The output includes a .certainty_k column (where k is the stage number) indicating which units were certainty selections.

Certainty selection is only available for WOR PPS methods (pps_systematic, pps_brewer, pps_cps, pps_poisson, pps_sps, pps_pareto). With-replacement methods (pps_multinomial) and PMR methods (pps_chromy) handle large units natively through their hit mechanism.

When certainty_overflow = "allow", if more units qualify for certainty selection than the requested n, all certainty units are returned with probability 1 (stage weight = 1). No probabilistic sampling is performed in this case. The resulting sample size will be the number of certainty units, which exceeds n. In multi-stage designs, the final .weight can still exceed 1 because it compounds all stage weights.

For stratum-specific thresholds, pass a data frame containing:

  • All stratification variable columns

  • A certainty_size or certainty_prop column

Control Sorting

Control sorting orders the sampling frame before selection, providing implicit stratification. This is particularly effective with systematic and sequential methods (systematic, pps_systematic, pps_chromy), where it ensures the sample spreads evenly across the sorted variables.

Serpentine vs Nested Sorting:

  • Nested (default): Standard ascending sort by each variable in order. Use control = c(var1, var2, var3).

  • Serpentine: Alternating direction that minimizes "jumps" between adjacent units. Use control = serp(var1, var2, var3).

Serpentine sorting makes nearby observations more similar by reversing direction at each hierarchy level. For geographic hierarchies, this means the last district of region 1 is adjacent to the last district of region 2.

Combining with Explicit Stratification: When both stratify_by() and control are used, sorting is applied within each stratum. This allows explicit stratification for variance control combined with implicit stratification for sample spread.

References

srswor, srswr, systematic, bernoulli, pps_systematic, pps_multinomial: Cochran, W.G. (1977). Sampling Techniques, 3rd ed. Wiley.

pps_brewer: Brewer, K.R.W. (1975). A simple procedure for sampling PPS WOR. Australian Journal of Statistics, 17(3), 166-172.

pps_cps: Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35(4), 1491-1523.

Chen, X.-H., Dempster, A.P. and Liu, J.S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.

pps_poisson: Tillé, Y. (2006). Sampling Algorithms. Springer.

pps_sps: Ohlsson, E. (1998). Sequential Poisson sampling. Journal of Official Statistics, 14(2), 149-162.

pps_pareto: Rosén, B. (1997). Asymptotic theory for order sampling. Journal of Statistical Planning and Inference, 62(2), 135-158.

pps_chromy: Chromy, J.R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.

balanced: Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912.

Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35(1), 115-119.

See also

sampling_design() for creating designs, stratify_by() for stratification, cluster_by() for clustering, execute() for running designs, serp() for serpentine sorting

Examples

# Simple random sample of 100 EAs
sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights:      445.7 [445.7, 445.7]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 43475 Est         Gnagna   Piela   Rural              971        114     8.95
#>  2  6592 Sud-Ouest   Noumbiel Kpuere  Rural               88         11     7.89
#>  3 11611 Boucle du … Nayala   Yaba    Rural              111         15     8.97
#>  4 45236 Centre-Est  Boulgou  Beguedo Urban              939        167     0.32
#>  5  3549 Est         Gourma   Fada-N… Rural              263         32     3.4 
#>  6 39095 Hauts-Bass… Kenedou… Sindo   Rural               37          4     6.09
#>  7 21818 Centre-Est  Kourite… Goungu… Rural               21          4     1.07
#>  8 15528 Centre      Kadiogo  Ouagad… Urban             1207        182     0.14
#>  9 14284 Est         Gourma   Matiak… Rural              178         21     8.86
#> 10  3433 Est         Gourma   Fada-N… Rural              285         34     8.55
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Systematic sample of 10%
sampling_design() |>
  draw(frac = 0.10, method = "systematic") |>
  execute(bfa_eas, seed = 123)
#> # A tbl_sample: 4457 × 17
#> # Weights:      10 [10, 10]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 11761 Boucle du … Bale     Bagassi Rural               63          8     8.54
#>  2 11771 Boucle du … Bale     Bagassi Rural              234         28     5.91
#>  3 11781 Boucle du … Bale     Bagassi Rural               86         10     3.68
#>  4 11791 Boucle du … Bale     Bagassi Rural              970        117     1.2 
#>  5 11801 Boucle du … Bale     Bagassi Rural              102         12     9.29
#>  6 11811 Boucle du … Bale     Bagassi Rural              135         16     8.25
#>  7 11821 Boucle du … Bale     Bagassi Rural              208         25     8.13
#>  8 11831 Boucle du … Bale     Bagassi Rural               92         11     8.97
#>  9 11843 Boucle du … Bale     Bagassi Rural              132         16    10.7 
#> 10 11853 Boucle du … Bale     Bagassi Rural               63          8     6.11
#> # ℹ 4,447 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS sample of EAs using household count
sampling_design() |>
  cluster_by(ea_id) |>
  draw(n = 50, method = "pps_brewer", mos = households) |>
  execute(bfa_eas, seed = 42)
#> # A tbl_sample: 50 × 18
#> # Weights:      1143.26 [117.16, 8569.22]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1  1204 Boucle du … Bale     Boromo  Rural              456         59     0.15
#>  2  8357 Boucle du … Bale     Ouri    Rural               54          7    12.2 
#>  3 41835 Boucle du … Kossi    Dokui   Rural              194         24    11.8 
#>  4 11438 Boucle du … Sourou   Toeni   Rural              157         16     0.2 
#>  5 12900 Cascades    Leraba   Douna   Rural              990        129     0.59
#>  6 13925 Centre      Kadiogo  Koubri  Rural             1372        270     2.1 
#>  7 15179 Centre      Kadiogo  Ouagad… Urban              981        148     0.15
#>  8 16818 Centre      Kadiogo  Ouagad… Urban             1019        153     0.11
#>  9 18091 Centre      Kadiogo  Ouagad… Urban             1152        173     0.15
#> 10   683 Centre-Est  Boulgou  Beguedo Urban             1112        198     1.55
#> # ℹ 40 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Bernoulli sampling with frac (random sample size, expected ~5%)
sampling_design() |>
  draw(frac = 0.05, method = "bernoulli") |>
  execute(ken_enterprises, seed = 12345)
#> # A tbl_sample: 309 × 14
#> # Weights:      20 [20, 20]
#>    enterprise_id county region sector      size_class employees revenue_millions
#>  * <chr>         <fct>  <fct>  <fct>       <fct>          <int>            <dbl>
#>  1 KEN_00010     Kiambu Kiambu Food & Bev… Medium            30             93.4
#>  2 KEN_00020     Kiambu Kiambu Food & Bev… Large            668            305. 
#>  3 KEN_00023     Kiambu Kiambu Other Manu… Small             16            133. 
#>  4 KEN_00051     Kiambu Kiambu Other Serv… Small              8              4.9
#>  5 KEN_00063     Kiambu Kiambu Other Serv… Small             10             10.9
#>  6 KEN_00065     Kiambu Kiambu Other Serv… Small             18             10.6
#>  7 KEN_00077     Kiambu Kiambu Other Serv… Small              8              2.7
#>  8 KEN_00103     Kiambu Kiambu Other Serv… Medium            40             28.5
#>  9 KEN_00117     Kiambu Kiambu Other Serv… Medium            60             47.9
#> 10 KEN_00146     Kiambu Kiambu Retail      Small             11             26  
#> # ℹ 299 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Bernoulli sampling with expected n (converted to frac = 500/N)
sampling_design() |>
  draw(n = 500, method = "bernoulli") |>
  execute(bfa_eas, seed = 42)
#> # A tbl_sample: 526 × 17
#> # Weights:      89.14 [89.14, 89.14]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 11781 Boucle du … Bale     Bagassi Rural               86         10     3.68
#>  2 36665 Boucle du … Bale     Fara    Rural              518         76     8.89
#>  3 36750 Boucle du … Bale     Fara    Rural              107         16     4.14
#>  4  9032 Boucle du … Bale     Poura   Urban              432         69     4.38
#>  5  9047 Boucle du … Bale     Poura   Urban              181         29     4.7 
#>  6  9053 Boucle du … Bale     Poura   Urban              436         70     3.19
#>  7 39999 Boucle du … Banwa    Balave  Rural             1820        287     2.04
#>  8 34012 Boucle du … Banwa    Sami    Rural              135         20     5.63
#>  9  9680 Boucle du … Banwa    Sanaba  Rural               59          8     8.98
#> 10 23812 Boucle du … Banwa    Solenzo Rural              679         91     8.85
#> # ℹ 516 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified with different sizes per stratum (data frame)
region_sizes <- data.frame(
  region = levels(bfa_eas$region),
  n = c(20, 12, 25, 18, 22, 16, 14, 15, 20, 18, 12, 10, 8)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = region_sizes) |>
  execute(bfa_eas, seed = 123)
#> # A tbl_sample: 210 × 17
#> # Weights:      212.24 [115.14, 414.4]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 33181 Boucle du … Kossi    Nouna   Rural              133         20     0.07
#>  2 33229 Boucle du … Kossi    Nouna   Rural               48          7     4.53
#>  3 21506 Boucle du … Kossi    Doumba… Rural              485         59     4.82
#>  4 45264 Boucle du … Bale     Pa      Rural              825        119     0.66
#>  5 26201 Boucle du … Sourou   Di      Rural               95         12    21.1 
#>  6 26077 Boucle du … Mouhoun  Dedoug… Rural              176         31     8.96
#>  7 25510 Boucle du … Kossi    Bombor… Rural              246         32     0.44
#>  8 23774 Boucle du … Banwa    Solenzo Rural              158         21     2.9 
#>  9  8332 Boucle du … Mouhoun  Ouarko… Rural               55          8     5.68
#> 10 43713 Boucle du … Mouhoun  Safane  Rural               97         14     6.46
#> # ℹ 200 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified with different rates per stratum (named vector)
sampling_design() |>
  stratify_by(size_class) |>
  draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
  execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 758 × 14
#> # Weights:      9 [2, 49.85]
#>    enterprise_id county      region sector size_class employees revenue_millions
#>  * <chr>         <fct>       <fct>  <fct>  <fct>          <int>            <dbl>
#>  1 KEN_05526     Homa Bay    Rest … Retail Small              8             18.8
#>  2 KEN_05090     Nyamira     Rest … Other… Small              6              4.7
#>  3 KEN_02534     Nairobi     Nairo… Retail Small             14             38.4
#>  4 KEN_02455     Nairobi     Nairo… Retail Small             11              7.5
#>  5 KEN_02609     Nairobi     Nairo… Retail Small              5              8.4
#>  6 KEN_06669     Uasin Gishu Uasin… Other… Small             11             16.1
#>  7 KEN_01498     Nairobi     Nairo… Other… Small              8             13.4
#>  8 KEN_04590     Kisii       Rest … Other… Small             10             11.2
#>  9 KEN_02509     Nairobi     Nairo… Retail Small              8             19.3
#> 10 KEN_02684     Nairobi     Nairo… Retail Small             16             23.7
#> # ℹ 748 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
  stratify_by(region, alloc = "neyman", variance = bfa_eas_variance) |>
  draw(n = 150, min_n = 2) |>
  execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 150 × 17
#> # Weights:      297.13 [112.69, 2419.5]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 34795 Boucle du … Sourou   Tougan  Rural               45          6    14.0 
#>  2 44496 Boucle du … Mouhoun  Tcheri… Rural              180         29    16.6 
#>  3  9624 Boucle du … Banwa    Sanaba  Rural              177         23     6.56
#>  4  7012 Boucle du … Kossi    Madouba Rural              387         48     6.18
#>  5 44420 Boucle du … Mouhoun  Tcheri… Rural               63         10    22.3 
#>  6   515 Boucle du … Kossi    Barani  Rural              651         85     1.11
#>  7  1144 Boucle du … Bale     Boromo  Rural               75         10     8.92
#>  8  8378 Boucle du … Bale     Ouri    Rural              693         89     9.26
#>  9  4909 Boucle du … Sourou   Kassoum Rural              662         98     0.65
#> 10 21094 Boucle du … Kossi    Bouras… Rural               25          3     5.1 
#> # ℹ 140 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Proportional allocation with min and max bounds
sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 200, min_n = 10, max_n = 50) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 200 × 17
#> # Weights:      222.85 [161.2, 238.52]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1  9648 Boucle du … Banwa    Sanaba  Rural              279         35     8.37
#>  2 11547 Boucle du … Sourou   Toeni   Rural               49          5    21.4 
#>  3 41824 Boucle du … Kossi    Dokui   Rural               75          9    15.8 
#>  4 11012 Boucle du … Banwa    Tansila Rural              592         71     0.95
#>  5 32308 Boucle du … Sourou   Lanfie… Rural               57          8     9.49
#>  6  7017 Boucle du … Kossi    Madouba Rural              599         74     0.74
#>  7 36700 Boucle du … Bale     Fara    Rural              402         59     6.36
#>  8 11611 Boucle du … Nayala   Yaba    Rural              111         15     8.97
#>  9  8342 Boucle du … Mouhoun  Ouarko… Rural               94         13     8.1 
#> 10 11626 Boucle du … Nayala   Yaba    Rural               56          7     8.31
#> # ℹ 190 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with serpentine ordering (implicit stratification)
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = serp(region, province)) |>
  execute(bfa_eas, seed = 2)
#> # A tbl_sample: 100 × 17
#> # Weights:      445.7 [445.7, 445.7]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 11843 Boucle du … Bale     Bagassi Rural              132         16    10.7 
#>  2  8876 Boucle du … Bale     Pompoi  Rural               55          7     8.62
#>  3 34070 Boucle du … Banwa    Sami    Rural               24          4     8.12
#>  4 24054 Boucle du … Banwa    Solenzo Rural              123         17     8.82
#>  5 21070 Boucle du … Kossi    Bouras… Rural              115         13     9.87
#>  6 37405 Boucle du … Kossi    Kombori Rural              308         39     9.76
#>  7 21009 Boucle du … Mouhoun  Bondok… Rural              556         85     9.16
#>  8  5672 Boucle du … Mouhoun  Kona    Rural              222         31     8.98
#>  9 44440 Boucle du … Mouhoun  Tcheri… Rural               55          9     5.53
#> 10 11656 Boucle du … Nayala   Yaba    Rural             1292        170     1.35
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with nested (standard) ordering
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = c(region, province)) |>
  execute(bfa_eas, seed = 3)
#> # A tbl_sample: 100 × 17
#> # Weights:      445.7 [445.7, 445.7]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 11833 Boucle du … Bale     Bagassi Rural               35          4     7.15
#>  2  8620 Boucle du … Bale     Pa      Rural               52          7     1.46
#>  3 34062 Boucle du … Banwa    Sami    Rural              286         43     7.68
#>  4 24046 Boucle du … Banwa    Solenzo Rural               24          3     6.37
#>  5 25526 Boucle du … Kossi    Bombor… Rural              152         20     0.06
#>  6 37398 Boucle du … Kossi    Kombori Rural              201         26     5.89
#>  7 21002 Boucle du … Mouhoun  Bondok… Rural              791        121     9.13
#>  8  5663 Boucle du … Mouhoun  Kona    Rural              180         25     7.94
#>  9 44432 Boucle du … Mouhoun  Tcheri… Rural              502         80    23.6 
#> 10 11649 Boucle du … Nayala   Yaba    Rural              526         69     7.66
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Combined explicit stratification with control sorting within strata
sampling_design() |>
  stratify_by(urban_rural) |>
  draw(n = 50, method = "systematic",
       control = serp(region, province)) |>
  execute(bfa_eas, seed = 25)
#> # A tbl_sample: 100 × 17
#> # Weights:      445.7 [137.66, 753.74]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 36744 Boucle du … Bale     Fara    Rural              565         83     9.07
#>  2  9728 Boucle du … Banwa    Sanaba  Rural               69          9     6.85
#>  3 25519 Boucle du … Kossi    Bombor… Rural              240         32     4.62
#>  4 10541 Boucle du … Kossi    Sono    Rural              738        111     0.6 
#>  5  8318 Boucle du … Mouhoun  Ouarko… Rural              160         22     7.76
#>  6 11674 Boucle du … Nayala   Yaba    Rural             1353        178     1.3 
#>  7 34832 Boucle du … Sourou   Tougan  Rural               33          5     8.81
#>  8 20641 Cascades    Comoe    Banfora Rural              996        122     0.24
#>  9 15093 Cascades    Comoe    Niango… Rural              126         17     8.87
#> 10 10357 Cascades    Comoe    Sidera… Rural               31          5     8.64
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS with certainty selection (absolute threshold)
# Large EAs selected with certainty, rest sampled with PPS
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = households,
       certainty_size = 800) |>
  execute(bfa_eas, seed = 3)
#> # A tbl_sample: 1300 × 18
#> # Weights:      35.84 [1, 874.53]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 39999 Boucle du … Banwa    Balave  Rural             1820        287     2.04
#>  2 44814 Boucle du … Nayala   Toma    Rural              399         58     0.29
#>  3 12752 Boucle du … Kossi    Djibas… Rural              336         43     8.61
#>  4 11005 Boucle du … Banwa    Tansila Rural              664         80    10.2 
#>  5 26060 Boucle du … Mouhoun  Dedoug… Rural              988        175     2.47
#>  6 26079 Boucle du … Mouhoun  Dedoug… Rural             1114        198     1.9 
#>  7  9031 Boucle du … Bale     Poura   Urban              941        151     1.06
#>  8 23980 Boucle du … Banwa    Solenzo Rural              870        117     8.12
#>  9 25983 Boucle du … Mouhoun  Dedoug… Rural              877        156     1.18
#> 10 45289 Boucle du … Mouhoun  Dedoug… Rural              910        161     0.83
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# PPS with certainty selection (proportional threshold)
# EAs with >= 10% of stratum total selected with certainty
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_systematic", mos = households,
       certainty_prop = 0.10) |>
  execute(bfa_eas, seed = 321)
#> # A tbl_sample: 1300 × 18
#> # Weights:      33.17 [1.99, 427.22]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 11817 Boucle du … Bale     Bagassi Rural              433         52     0.42
#>  2 29500 Boucle du … Bale     Bana    Rural              378         49     8.91
#>  3  1136 Boucle du … Bale     Boromo  Rural             3613        471     4.23
#>  4  1180 Boucle du … Bale     Boromo  Rural             2111        275     2.75
#>  5 36664 Boucle du … Bale     Fara    Rural              152         22     8.44
#>  6 36708 Boucle du … Bale     Fara    Rural              581         85     8.9 
#>  7 36740 Boucle du … Bale     Fara    Rural              997        146     0.77
#>  8 36778 Boucle du … Bale     Fara    Rural              489         72     0.69
#>  9  8397 Boucle du … Bale     Ouri    Rural             1244        159     1.31
#> 10  8450 Boucle du … Bale     Ouri    Rural             1020        130     1.13
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Stratum-specific certainty thresholds (data frame)
cert_thresholds <- data.frame(
  region = levels(bfa_eas$region),
  certainty_size = c(700, 450, 800, 850, 750, 800, 550,
                     450, 700, 950, 750, 600, 480)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = households,
       certainty_size = cert_thresholds) |>
  execute(bfa_eas, seed = 424)
#> # A tbl_sample: 1300 × 18
#> # Weights:      35.77 [1, 640.13]
#>    ea_id region      province commune urban_rural population households area_km2
#>  * <int> <fct>       <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 34783 Boucle du … Sourou   Tougan  Rural              989        136     1.38
#>  2 12833 Boucle du … Kossi    Djibas… Rural              219         28     8.46
#>  3 39958 Boucle du … Banwa    Balave  Rural              684        108     0.88
#>  4  9915 Boucle du … Bale     Siby    Rural               75         11     8.54
#>  5  9738 Boucle du … Banwa    Sanaba  Rural              504         64     8.53
#>  6 21043 Boucle du … Mouhoun  Bondok… Rural              942        144     1.41
#>  7  6371 Boucle du … Banwa    Kouka   Rural             2329        321     1.42
#>  8 43729 Boucle du … Mouhoun  Safane  Rural             1047        150     1.06
#>  9 25968 Boucle du … Mouhoun  Dedoug… Rural              766        136     0.3 
#> 10 34034 Boucle du … Banwa    Sami    Rural              175         26     8.81
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>