Specify Selection Parameters

draw() specifies how units are selected: sample size, sampling fraction, selection method, and measure of size for PPS sampling. Every stage in a sampling design must end with draw().

Usage

draw(
  .data,
  n = NULL,
  frac = NULL,
  min_n = NULL,
  max_n = NULL,
  method = "srswor",
  mos = NULL,
  prn = NULL,
  aux = NULL,
  round = "up",
  control = NULL,
  certainty_size = NULL,
  certainty_prop = NULL,
  certainty_overflow = "error",
  on_empty = "error"
)

Arguments

.data

A sampling_design object (piped from sampling_design(), stratify_by(), or cluster_by()).

n

Sample size. For random-size methods (bernoulli, pps_poisson), n is the expected sample size (converted internally to frac = n / N). Can be:

A scalar: applies per stratum (if no alloc) or as total (if alloc specified)
A named vector: stratum-specific sizes (for single stratification variable)
A data frame: stratum-specific sizes with stratification columns + n column

frac

Sampling fraction. Can be:

A scalar: same fraction for all strata
A named vector: stratum-specific fractions
A data frame: stratum-specific fractions with stratification columns + frac column Only one of n or frac should be specified.

min_n

Minimum sample size per stratum. When an allocation method (e.g., Neyman, proportional) would assign fewer than min_n units to a stratum, that stratum receives min_n units instead. The excess is redistributed proportionally among strata that were above min_n. Commonly set to 2 (minimum for variance estimation) or higher for reliable subgroup estimates. Only applies when stratification with an allocation method is used. Default is NULL (no minimum).

max_n

Maximum sample size per stratum. When an allocation method would assign more than max_n units to a stratum, that stratum is capped at max_n units. The surplus is redistributed proportionally among strata that were below max_n. Useful for capping dominant strata or managing operational constraints. Only applies when stratification with an allocation method is used. Default is NULL (no maximum).

method

Character string specifying the selection method. One of:

Equal probability methods:

"srswor" (default): Simple random sampling without replacement
"srswr": Simple random sampling with replacement
"systematic": Systematic (fixed interval) sampling
"bernoulli": Independent Bernoulli trials (random sample size)

PPS methods (require mos):

"pps_systematic": PPS systematic sampling
"pps_brewer": Generalized Brewer (Tillé) method
"pps_cps": Conditional Poisson sampling (maximum entropy)
"pps_poisson": PPS Poisson sampling (random sample size)
"pps_sps": Sequential Poisson sampling (fixed size, supports prn)
"pps_pareto": Pareto sampling (fixed size, supports prn)
"pps_multinomial": PPS multinomial (with replacement, any hit count)
"pps_chromy": Chromy's sequential PPS (minimum replacement)

Balanced sampling:

"balanced": Balanced sampling via the cube method (Deville & Tille 2004). Uses auxiliary variables (aux) to balance the sample so that Horvitz-Thompson estimates of auxiliary totals match population totals. Supports equal or unequal (mos) inclusion probabilities. When stratified, uses the stratified cube algorithm (Chauvet 2009). At most 2 stages may use "balanced".

mos

Measure of size variable for PPS methods and optional for "balanced", specified as a bare column name (unquoted). Required for all pps_* methods.

prn

Permanent random number variable for sample coordination, specified as a bare column name (unquoted). Must be a numeric column with values in the open interval (0, 1) and no missing values. Supported methods: "bernoulli", "pps_poisson", "pps_sps", "pps_pareto". When supplied, the sample is deterministic for a given set of PRN values, enabling coordination across survey waves.

aux

Auxiliary balancing variables for method = "balanced", specified as bare column names: aux = c(income, pop_density). Columns must be numeric with no missing values. The cube algorithm ensures the Horvitz-Thompson estimator of these auxiliary totals equals (or nearly equals) the population totals, improving precision. When used with cluster_by(), auxiliary values are automatically aggregated (summed) to the cluster level before selection.

round

Rounding method when converting frac to sample sizes. One of:

"up" (default): Round up (ceiling). Matches SAS SURVEYSELECT default.
"down": Round down (floor).
"nearest": Round to nearest integer (standard rounding).

This parameter only affects designs using frac to specify the sampling rate. When n is specified directly, no rounding occurs.

control

<data-masking> Variables for sorting the frame before selection. Control sorting provides implicit stratification, which is particularly effective with systematic and sequential sampling methods. Can be:

A single variable: control = region
Multiple variables: control = c(region, district)
With serp() for serpentine sorting: control = serp(region, district)
With dplyr::desc() for descending: control = c(region, desc(population))
Mixed: control = c(region, serp(district, commune), desc(size))

When stratification is also specified, control sorting is applied within each stratum. See the section "Control Sorting" below for details.

certainty_size

For PPS without-replacement methods, units with MOS >= this value are selected with certainty (probability = 1). Can be:

A scalar: same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
- certainty_size column

Certainty units are removed from the frame before probability sampling, and the remaining sample size is reduced accordingly. Mutually exclusive with certainty_prop. Equivalent to SAS SURVEYSELECT CERTSIZE= option.

certainty_prop

For PPS without-replacement methods, units whose MOS proportion (MOS_i / sum(MOS)) >= this value are selected with certainty. Can be:

A scalar between 0 and 1 (exclusive): same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
- certainty_prop column

Uses iterative selection: after removing certainty units, proportions are recomputed and the check is repeated until no new units qualify. Mutually exclusive with certainty_size. Equivalent to SAS SURVEYSELECT CERTSIZE=P= option.

certainty_overflow

Controls behavior when certainty units exceed the target sample size n. One of:

"error" (default): Stop with an informative error.
"allow": Return all certainty units with weight 1, even if the resulting sample has more than n units.

Equivalent to SAS SURVEYSELECT allowing CERTSIZE= overflow.

on_empty

Behaviour when a random-size method (bernoulli, pps_poisson) selects zero units in a stratum or the whole frame. One of:

"error" (default): Stop with an informative error. Zero selections usually indicate a design problem (sampling fraction too small or stratum too small) that should be fixed rather than silently papered over.
"warn": Issue a warning and fall back to SRS of 1 unit.
"silent": Fall back to SRS of 1 unit without a message.

Weight note: when falling back ("warn" or "silent"), the fallback selects 1 unit via SRS, so the resulting weight is N (the stratum or frame size), not 1/frac. This reflects the actual selection mechanism, not the intended Bernoulli/Poisson design. Downstream variance estimation treats this unit as an SRS draw.

Value

A modified sampling_design object with selection parameters specified.

Details

Selection Methods

Equal Probability Methods

Method	Replacement	Sample Size	Notes
`srswor`	Without	Fixed	Standard SRS
`srswr`	With	Fixed	Allows duplicates
`systematic`	Without	Fixed	Periodic selection
`bernoulli`	Without	Random	Each unit selected independently

PPS Methods

Method	Replacement	Sample Size	Notes
`pps_systematic`	Without	Fixed	Simple, some bias
`pps_brewer`	Without	Fixed	Fast, joint prob > 0
`pps_cps`	Without	Fixed	Highest entropy, joint prob available
`pps_poisson`	Without	Random	PPS analog of Bernoulli
`pps_sps`	Without	Fixed	Sequential Poisson, supports `prn`
`pps_pareto`	Without	Fixed	Pareto sampling, supports `prn`
`pps_multinomial`	With	Fixed	Any hit count, Hansen-Hurwitz
`pps_chromy`	Min. repl.	Fixed	SAS default PPS_SEQ

Balanced Sampling

Method	Replacement	Sample Size	Notes
`balanced`	Without	Fixed	Deville & Tille 2004, uses `aux`

Parameter Requirements

Method	`n`	`frac`	`mos`	`aux`
`srswor`	Yes	or Yes	–	–
`srswr`	Yes	or Yes	–	–
`systematic`	Yes	or Yes	–	–
`bernoulli`	Expected	or Yes	–	–
`pps_systematic`	Yes	or Yes	Yes	–
`pps_brewer`	Yes	or Yes	Yes	–
`pps_cps`	Yes	–	Yes	–
`pps_poisson`	Expected	or Yes	Yes	–
`pps_sps`	Yes	or Yes	Yes	–
`pps_pareto`	Yes	or Yes	Yes	–
`pps_multinomial`	Yes	or Yes	Yes	–
`pps_chromy`	Yes	or Yes	Yes	–
`balanced`	Yes	or Yes	Optional	Optional

Fixed vs Random Sample Size Methods

Methods with fixed sample size (srswor, srswr, systematic, pps_systematic, pps_brewer, pps_cps, pps_sps, pps_pareto, pps_multinomial, pps_chromy) accept either n or frac. When frac is provided, the sample size is computed based on the round parameter (default: ceiling).

Methods with random sample size (bernoulli, pps_poisson) accept either n or frac. When n is provided, it is converted to frac = n / N (where N is the stratum or frame size). The resulting sample size is still random: n specifies the expected sample size, not a fixed count.

For pps_poisson, the raw inclusion probabilities are computed as \(\pi_i = f \cdot x_i / \bar{x}\) where \(f\) is frac and \(x_i\) is the MOS value. Any \(\pi_i > 1\) is clipped to 1, so the expected sample size \(E[n] = \sum \min(\pi_i, 1)\) can be less than \(f \cdot N\) when large units dominate the MOS distribution. Use certainty_size or certainty_prop to handle these dominant units explicitly.

When an allocation method is set in stratify_by() (equal, proportional, neyman, optimal, power), specify total sample size via n. Combining alloc with frac is not supported.

Custom Allocation with Data Frames

For stratum-specific sample sizes or rates, pass a data frame to n or frac. The data frame must contain:

All stratification variable columns (matching those in stratify_by())
An n column (for sizes) or frac column (for rates)

Certainty Selection

In PPS without-replacement sampling, very large units can have theoretical inclusion probabilities exceeding 1. Certainty selection handles this by selecting such units with probability 1 before sampling the remainder. The output includes a .certainty_k column (where k is the stage number) indicating which units were certainty selections.

Certainty selection is only available for WOR PPS methods (pps_systematic, pps_brewer, pps_cps, pps_poisson, pps_sps, pps_pareto). With-replacement methods (pps_multinomial) and PMR methods (pps_chromy) handle large units natively through their hit mechanism.

When certainty_overflow = "allow", if more units qualify for certainty selection than the requested n, all certainty units are returned with probability 1 (weight = 1). No probabilistic sampling is performed in this case. The resulting sample size will be the number of certainty units, which exceeds n.

For stratum-specific thresholds, pass a data frame containing:

All stratification variable columns
A certainty_size or certainty_prop column

Control Sorting

Control sorting orders the sampling frame before selection, providing implicit stratification. This is particularly effective with systematic and sequential methods (systematic, pps_systematic, pps_chromy), where it ensures the sample spreads evenly across the sorted variables.

Serpentine vs Nested Sorting:

Nested (default): Standard ascending sort by each variable in order. Use control = c(var1, var2, var3).
Serpentine: Alternating direction that minimizes "jumps" between adjacent units. Use control = serp(var1, var2, var3).

Serpentine sorting makes nearby observations more similar by reversing direction at each hierarchy level. For geographic hierarchies, this means the last district of region 1 is adjacent to the last district of region 2.

Combining with Explicit Stratification: When both stratify_by() and control are used, sorting is applied within each stratum. This allows explicit stratification for variance control combined with implicit stratification for sample spread.

References

srswor, srswr, systematic, bernoulli, pps_systematic, pps_multinomial: Cochran, W.G. (1977). Sampling Techniques, 3rd ed. Wiley.

pps_brewer: Brewer, K.R.W. (1975). A simple procedure for sampling PPS WOR. Australian Journal of Statistics, 17(3), 166-172.

pps_cps: Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35(4), 1491-1523.

Chen, X.-H., Dempster, A.P. and Liu, J.S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.

pps_poisson: Tillé, Y. (2006). Sampling Algorithms. Springer.

pps_sps: Ohlsson, E. (1998). Sequential Poisson sampling. Journal of Official Statistics, 14(2), 149-162.

pps_pareto: Rosén, B. (1997). Asymptotic theory for order sampling. Journal of Statistical Planning and Inference, 62(2), 135-158.

pps_chromy: Chromy, J.R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.

balanced: Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91(4), 893-912.

Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35(1), 115-119.

Examples

# Simple random sample of 100 EAs
sampling_design() |>
  draw(n = 100) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights:      149 [149, 149]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_10155 Boucle … Mouhoun  Ouarko… Rural             1347        187    33.0 
#>  2 EA_01016 Centre-… Zoundwe… Bere    Rural             1166        193    23.2 
#>  3 EA_04918 Centre-… Kourite… Goungu… Rural              949        141     6.89
#>  4 EA_01890 Hauts-B… Houet    Bobo-D… Urban             1195        191     0.25
#>  5 EA_14703 Plateau… Oubrite… Ziniare Rural             1340        177    26.1 
#>  6 EA_12688 Est      Tapoa    Tambaga Rural              994        139     6.46
#>  7 EA_11778 Plateau… Ganzour… Saolgo  Rural              998        124     1.41
#>  8 EA_05700 Hauts-B… Houet    Karang… Rural             1049        119    23.6 
#>  9 EA_06057 Est      Gnagna   Koala   Rural             1291        173    27.6 
#> 10 EA_04482 Centre-… Boulgou  Garango Rural             1553        262     1.44
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Systematic sample of 10%
sampling_design() |>
  draw(frac = 0.10, method = "systematic") |>
  execute(bfa_eas, seed = 123)
#> # A tbl_sample: 1490 × 17
#> # Weights:      10 [10, 10]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00247 Boucle … Bale     Bagassi Rural             1666        200    33.4 
#>  2 EA_00257 Boucle … Bale     Bagassi Rural              931        112    18.1 
#>  3 EA_00267 Boucle … Bale     Bagassi Rural             1319        159    17.1 
#>  4 EA_00277 Boucle … Bale     Bagassi Rural             1740        209    15.8 
#>  5 EA_00464 Boucle … Bale     Bana    Rural              915        120     7.95
#>  6 EA_02161 Boucle … Bale     Boromo  Rural              805        105     9.43
#>  7 EA_02171 Boucle … Bale     Boromo  Rural             1644        214    15.3 
#>  8 EA_02181 Boucle … Bale     Boromo  Rural              996        130     7.42
#>  9 EA_04192 Boucle … Bale     Fara    Rural             1491        238    22.2 
#> 10 EA_04202 Boucle … Bale     Fara    Rural             1187        190    17.6 
#> # ℹ 1,480 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS sample of EAs using household count
sampling_design() |>
  cluster_by(ea_id) |>
  draw(n = 50, method = "pps_brewer", mos = households) |>
  execute(bfa_eas, seed = 42)
#> # A tbl_sample: 50 × 18
#> # Weights:      289.64 [88.85, 616.64]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_02180 Boucle … Bale     Boromo  Rural             4324        564     3.13
#>  2 EA_10322 Boucle … Bale     Ouri    Rural             1605        192    52.5 
#>  3 EA_06254 Boucle … Kossi    Kombori Rural             1549        194    41.2 
#>  4 EA_13829 Boucle … Sourou   Tougan  Rural             1309        181    55.4 
#>  5 EA_06311 Centre   Kadiogo  Komki-… Rural             1243        188    15.7 
#>  6 EA_08822 Centre   Kadiogo  Ouagad… Urban             1804        342     0.31
#>  7 EA_08853 Centre   Kadiogo  Ouagad… Urban             1452        275     0.26
#>  8 EA_09411 Centre   Kadiogo  Ouagad… Urban             2084        395     0.27
#>  9 EA_09799 Centre   Kadiogo  Ouagad… Urban             1889        358     1.23
#> 10 EA_11351 Centre   Kadiogo  Saaba   Urban             2486        445     6.55
#> # ℹ 40 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Bernoulli sampling with frac (random sample size, expected ~5%)
sampling_design() |>
  draw(frac = 0.05, method = "bernoulli") |>
  execute(ken_enterprises, seed = 12345)
#> # A tbl_sample: 309 × 14
#> # Weights:      20 [20, 20]
#>    enterprise_id county region sector      size_class employees revenue_millions
#>  * <chr>         <fct>  <fct>  <fct>       <fct>          <int>            <dbl>
#>  1 KEN_00010     Kiambu Kiambu Food & Bev… Medium            30             93.4
#>  2 KEN_00020     Kiambu Kiambu Food & Bev… Large            668            305. 
#>  3 KEN_00023     Kiambu Kiambu Other Manu… Small             16            133. 
#>  4 KEN_00051     Kiambu Kiambu Other Serv… Small              8              4.9
#>  5 KEN_00063     Kiambu Kiambu Other Serv… Small             10             10.9
#>  6 KEN_00065     Kiambu Kiambu Other Serv… Small             18             10.6
#>  7 KEN_00077     Kiambu Kiambu Other Serv… Small              8              2.7
#>  8 KEN_00103     Kiambu Kiambu Other Serv… Medium            40             28.5
#>  9 KEN_00117     Kiambu Kiambu Other Serv… Medium            60             47.9
#> 10 KEN_00146     Kiambu Kiambu Retail      Small             11             26  
#> # ℹ 299 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Bernoulli sampling with expected n (converted to frac = 500/N)
sampling_design() |>
  draw(n = 500, method = "bernoulli") |>
  execute(bfa_eas, seed = 42)
#> # A tbl_sample: 504 × 17
#> # Weights:      29.8 [29.8, 29.8]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00261 Boucle … Bale     Bagassi Rural             1128        136    24.0 
#>  2 EA_00267 Boucle … Bale     Bagassi Rural             1319        159    17.1 
#>  3 EA_00465 Boucle … Bale     Bana    Rural             1381        181     3.79
#>  4 EA_02157 Boucle … Bale     Boromo  Rural              129         17     8.12
#>  5 EA_02170 Boucle … Bale     Boromo  Rural              537         70    45.7 
#>  6 EA_12090 Boucle … Bale     Siby    Rural             1555        213    45.7 
#>  7 EA_00377 Boucle … Banwa    Balave  Rural             1310        206    27.1 
#>  8 EA_06871 Boucle … Banwa    Kouka   Rural             1169        135     0.99
#>  9 EA_11688 Boucle … Banwa    Sanaba  Rural             1844        268    14.0 
#> 10 EA_11696 Boucle … Banwa    Sanaba  Rural             2060        299     3.13
#> # ℹ 494 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified with different sizes per stratum (data frame)
region_sizes <- data.frame(
  region = levels(bfa_eas$region),
  n = c(20, 12, 25, 18, 22, 16, 14, 15, 20, 18, 12, 10, 8)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = region_sizes) |>
  execute(bfa_eas, seed = 123)
#> # A tbl_sample: 210 × 17
#> # Weights:      70.95 [43.43, 106]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_12416 Boucle … Banwa    Solenzo Rural             1395        166     8.03
#>  2 EA_12873 Boucle … Banwa    Tansila Rural             1164        148    39.1 
#>  3 EA_11055 Boucle … Bale     Pompoi  Rural             1198        189    33.3 
#>  4 EA_00767 Boucle … Kossi    Barani  Rural             1433        188    17.5 
#>  5 EA_12078 Boucle … Bale     Siby    Rural              912        125     8.55
#>  6 EA_03217 Boucle … Mouhoun  Dedoug… Rural              998        177     3.4 
#>  7 EA_04525 Boucle … Nayala   Gassan  Rural              649         79     0.49
#>  8 EA_05964 Boucle … Sourou   Kiemba… Rural             1075        171    12.7 
#>  9 EA_14277 Boucle … Nayala   Ye      Rural             1323        233    10.9 
#> 10 EA_14292 Boucle … Nayala   Ye      Rural             1030        182    18.1 
#> # ℹ 200 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Stratified with different rates per stratum (named vector)
sampling_design() |>
  stratify_by(size_class) |>
  draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
  execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 758 × 14
#> # Weights:      9 [2, 49.85]
#>    enterprise_id county      region sector size_class employees revenue_millions
#>  * <chr>         <fct>       <fct>  <fct>  <fct>          <int>            <dbl>
#>  1 KEN_05526     Homa Bay    Rest … Retail Small              8             18.8
#>  2 KEN_05090     Nyamira     Rest … Other… Small              6              4.7
#>  3 KEN_02534     Nairobi     Nairo… Retail Small             14             38.4
#>  4 KEN_02455     Nairobi     Nairo… Retail Small             11              7.5
#>  5 KEN_02609     Nairobi     Nairo… Retail Small              5              8.4
#>  6 KEN_06669     Uasin Gishu Uasin… Other… Small             11             16.1
#>  7 KEN_01498     Nairobi     Nairo… Other… Small              8             13.4
#>  8 KEN_04590     Kisii       Rest … Other… Small             10             11.2
#>  9 KEN_02509     Nairobi     Nairo… Retail Small              8             19.3
#> 10 KEN_02684     Nairobi     Nairo… Retail Small             16             23.7
#> # ℹ 748 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> #   .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
  stratify_by(region, alloc = "neyman", variance = bfa_eas_variance) |>
  draw(n = 150, min_n = 2) |>
  execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 150 × 17
#> # Weights:      99.33 [72.27, 155.6]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_08652 Boucle … Kossi    Nouna   Rural             1393        181    17.6 
#>  2 EA_10131 Boucle … Mouhoun  Ouarko… Rural             1123        156    23.4 
#>  3 EA_06881 Boucle … Banwa    Kouka   Rural             1229        142     9.3 
#>  4 EA_04515 Boucle … Nayala   Gassan  Rural             1556        190    42.6 
#>  5 EA_10374 Boucle … Bale     Pa      Rural             1150        162    82.3 
#>  6 EA_13719 Boucle … Nayala   Toma    Rural              955         99     0.98
#>  7 EA_12390 Boucle … Banwa    Solenzo Rural             2953        352    25.6 
#>  8 EA_06874 Boucle … Banwa    Kouka   Rural             1723        198    17.7 
#>  9 EA_02110 Boucle … Mouhoun  Bondok… Rural             1413        216    16.8 
#> 10 EA_11690 Boucle … Banwa    Sanaba  Rural              929        135     1.9 
#> # ℹ 140 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Proportional allocation with min and max bounds
sampling_design() |>
  stratify_by(region, alloc = "proportional") |>
  draw(n = 200, min_n = 10, max_n = 50) |>
  execute(bfa_eas, seed = 1)
#> # A tbl_sample: 200 × 17
#> # Weights:      74.5 [60.8, 78.05]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_10155 Boucle … Mouhoun  Ouarko… Rural             1347        187    33.0 
#>  2 EA_03955 Boucle … Kossi    Doumba… Rural             1558        211    16.6 
#>  3 EA_10325 Boucle … Bale     Ouri    Rural              767         92    17.6 
#>  4 EA_03209 Boucle … Mouhoun  Dedoug… Rural              446         79    18.1 
#>  5 EA_12881 Boucle … Banwa    Tansila Rural             1076        137     9.45
#>  6 EA_11613 Boucle … Banwa    Sami    Rural              912        118    43.2 
#>  7 EA_06857 Boucle … Banwa    Kouka   Rural             1642        189    12.5 
#>  8 EA_13730 Boucle … Nayala   Toma    Rural              973        101     0.88
#>  9 EA_05972 Boucle … Sourou   Kiemba… Rural             1359        216    36.6 
#> 10 EA_03571 Boucle … Kossi    Djibas… Rural              725         86     1.01
#> # ℹ 190 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with serpentine ordering (implicit stratification)
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = serp(region, province)) |>
  execute(bfa_eas, seed = 2)
#> # A tbl_sample: 100 × 17
#> # Weights:      149 [149, 149]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00272 Boucle … Bale     Bagassi Rural              950        114     1.01
#>  2 EA_11053 Boucle … Bale     Pompoi  Rural             1686        266    37.8 
#>  3 EA_11702 Boucle … Banwa    Sanaba  Rural             1714        249    30.7 
#>  4 EA_12885 Boucle … Banwa    Tansila Rural             1050        134     9.63
#>  5 EA_03598 Boucle … Kossi    Djibas… Rural             1039        123     9.6 
#>  6 EA_08692 Boucle … Kossi    Nouna   Rural             1056        137     9.48
#>  7 EA_03201 Boucle … Mouhoun  Dedoug… Rural             1223        217    17.8 
#>  8 EA_12908 Boucle … Mouhoun  Tcheri… Rural             1170        140     9.7 
#>  9 EA_13998 Boucle … Nayala   Yaba    Rural              982        132     1.68
#> 10 EA_07196 Boucle … Sourou   Lanfie… Rural             1310        191    23.3 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Control sorting with nested (standard) ordering
sampling_design() |>
  draw(n = 100, method = "systematic",
       control = c(region, province)) |>
  execute(bfa_eas, seed = 3)
#> # A tbl_sample: 100 × 17
#> # Weights:      149 [149, 149]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00270 Boucle … Bale     Bagassi Rural             1080        130    37.0 
#>  2 EA_11051 Boucle … Bale     Pompoi  Rural             1021        161    35.6 
#>  3 EA_11700 Boucle … Banwa    Sanaba  Rural             1304        189    61.5 
#>  4 EA_12883 Boucle … Banwa    Tansila Rural             1305        166    32.8 
#>  5 EA_03596 Boucle … Kossi    Djibas… Rural             1273        151     1.05
#>  6 EA_08690 Boucle … Kossi    Nouna   Rural             1238        161    36.6 
#>  7 EA_03199 Boucle … Mouhoun  Dedoug… Rural              859        152     1.28
#>  8 EA_12906 Boucle … Mouhoun  Tcheri… Rural              234         28     8.31
#>  9 EA_13996 Boucle … Nayala   Yaba    Rural             1051        141    17.8 
#> 10 EA_07194 Boucle … Sourou   Lanfie… Rural             1942        283     0.58
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# Combined explicit stratification with control sorting within strata
sampling_design() |>
  stratify_by(urban_rural) |>
  draw(n = 50, method = "systematic",
       control = serp(region, province)) |>
  execute(bfa_eas, seed = 25)
#> # A tbl_sample: 100 × 17
#> # Weights:      149 [52.98, 245.02]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_04211 Boucle … Bale     Fara    Rural             1678        268     9.3 
#>  2 EA_12357 Boucle … Banwa    Solenzo Rural              908        108    17.5 
#>  3 EA_03575 Boucle … Kossi    Djibas… Rural              773         92     0.39
#>  4 EA_03126 Boucle … Mouhoun  Dedoug… Rural             1146        203    34.6 
#>  5 EA_12929 Boucle … Mouhoun  Tcheri… Rural             1198        144    10.2 
#>  6 EA_05978 Boucle … Sourou   Kiemba… Rural             1322        210    71.5 
#>  7 EA_08528 Cascades Leraba   Nianko… Rural             1001        137    18.7 
#>  8 EA_07747 Cascades Comoe    Mangod… Rural             1201        144    31.3 
#>  9 EA_12532 Cascades Comoe    Soubak… Rural             1078        152    39.6 
#> 10 EA_00907 Centre-… Kourite… Baskou… Rural             1506        245    26.3 
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>

# PPS with certainty selection (absolute threshold)
# Large EAs selected with certainty, rest sampled with PPS
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = households,
       certainty_size = 800) |>
  execute(bfa_eas, seed = 3)
#> # A tbl_sample: 1300 × 18
#> # Weights:      11.36 [1, 100.73]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00377 Boucle … Banwa    Balave  Rural             1310        206    27.1 
#>  2 EA_14000 Boucle … Nayala   Yaba    Rural             2711        365    45.8 
#>  3 EA_03566 Boucle … Kossi    Djibas… Rural             1891        224     8.26
#>  4 EA_12903 Boucle … Banwa    Tansila Rural             2314        295     9.21
#>  5 EA_03187 Boucle … Mouhoun  Dedoug… Rural             1685        299    73.7 
#>  6 EA_03189 Boucle … Mouhoun  Dedoug… Rural             1847        328    17.6 
#>  7 EA_11054 Boucle … Bale     Pompoi  Rural             1427        225     1.24
#>  8 EA_12435 Boucle … Banwa    Solenzo Rural             1567        187    26.1 
#>  9 EA_03155 Boucle … Mouhoun  Dedoug… Rural             1262        224    18.4 
#> 10 EA_03218 Boucle … Mouhoun  Dedoug… Rural             1160        206     1.34
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# PPS with certainty selection (proportional threshold)
# EAs with >= 10% of stratum total selected with certainty
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_systematic", mos = households,
       certainty_prop = 0.10) |>
  execute(bfa_eas, seed = 321)
#> # A tbl_sample: 1300 × 18
#> # Weights:      11.51 [1.64, 69.41]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_00260 Boucle … Bale     Bagassi Rural             1815        218     3.88
#>  2 EA_00277 Boucle … Bale     Bagassi Rural             1740        209    15.8 
#>  3 EA_00469 Boucle … Bale     Bana    Rural             1608        210     1.51
#>  4 EA_02171 Boucle … Bale     Boromo  Rural             1644        214    15.3 
#>  5 EA_02182 Boucle … Bale     Boromo  Rural             1597        208     1.89
#>  6 EA_04197 Boucle … Bale     Fara    Rural             1040        166     7.17
#>  7 EA_04208 Boucle … Bale     Fara    Rural             2214        354    21.3 
#>  8 EA_04222 Boucle … Bale     Fara    Rural             2148        343    24.8 
#>  9 EA_10324 Boucle … Bale     Ouri    Rural             2006        240    10.8 
#> 10 EA_10340 Boucle … Bale     Ouri    Rural             2128        254    11.1 
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>

# Stratum-specific certainty thresholds (data frame)
cert_thresholds <- data.frame(
  region = levels(bfa_eas$region),
  certainty_size = c(700, 450, 800, 850, 750, 800, 550,
                     450, 700, 950, 750, 600, 480)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = 100, method = "pps_brewer", mos = households,
       certainty_size = cert_thresholds) |>
  execute(bfa_eas, seed = 424)
#> # A tbl_sample: 1300 × 18
#> # Weights:      11.43 [1, 57.66]
#>    ea_id    region   province commune urban_rural population households area_km2
#>  * <chr>    <fct>    <fct>    <fct>   <fct>            <dbl>      <int>    <dbl>
#>  1 EA_14289 Boucle … Nayala   Ye      Rural             4131        729    36.6 
#>  2 EA_13788 Boucle … Sourou   Tougan  Rural             2148        297     2.73
#>  3 EA_03594 Boucle … Kossi    Djibas… Rural              737         87     0.49
#>  4 EA_00369 Boucle … Banwa    Balave  Rural             2049        323    36.5 
#>  5 EA_12081 Boucle … Bale     Siby    Rural             2045        281     9.5 
#>  6 EA_12339 Boucle … Banwa    Solenzo Rural             1077        129    20.5 
#>  7 EA_02136 Boucle … Mouhoun  Bondok… Rural              750        114    33.7 
#>  8 EA_06881 Boucle … Banwa    Kouka   Rural             1229        142     9.3 
#>  9 EA_11533 Boucle … Mouhoun  Safane  Rural              684         97     0.89
#> 10 EA_03146 Boucle … Mouhoun  Dedoug… Rural             1266        225    15.5 
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> #   food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> #   .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>