draw() specifies how units are selected: sample size, sampling fraction,
selection method, and measure of size for PPS sampling. Every stage in a
sampling design must end with draw().
Usage
draw(
.data,
n = NULL,
frac = NULL,
min_n = NULL,
max_n = NULL,
method = "srswor",
mos = NULL,
prn = NULL,
aux = NULL,
round = "up",
control = NULL,
certainty_size = NULL,
certainty_prop = NULL,
certainty_overflow = "error",
on_empty = "error"
)Arguments
- .data
A
sampling_designobject (piped fromsampling_design(),stratify_by(), orcluster_by()).- n
Sample size. For random-size methods (
bernoulli,pps_poisson),nis the expected sample size (converted internally tofrac = n / N). Can be:A scalar: applies per stratum (if no
alloc) or as total (ifallocspecified)A named vector: stratum-specific sizes (for single stratification variable)
A data frame: stratum-specific sizes with stratification columns +
ncolumn
- frac
Sampling fraction. Can be:
A scalar: same fraction for all strata
A named vector: stratum-specific fractions
A data frame: stratum-specific fractions with stratification columns +
fraccolumn Only one ofnorfracshould be specified.
- min_n
Minimum sample size per stratum. When an allocation method (e.g., Neyman, proportional) would assign fewer than
min_nunits to a stratum, that stratum receivesmin_nunits instead. The excess is redistributed proportionally among strata that were abovemin_n. Commonly set to 2 (minimum for variance estimation) or higher for reliable subgroup estimates. Only applies when stratification with an allocation method is used. Default isNULL(no minimum).- max_n
Maximum sample size per stratum. When an allocation method would assign more than
max_nunits to a stratum, that stratum is capped atmax_nunits. The surplus is redistributed proportionally among strata that were belowmax_n. Useful for capping dominant strata or managing operational constraints. Only applies when stratification with an allocation method is used. Default isNULL(no maximum).- method
Character string specifying the selection method. One of:
Equal probability methods:
"srswor"(default): Simple random sampling without replacement"srswr": Simple random sampling with replacement"systematic": Systematic (fixed interval) sampling"bernoulli": Independent Bernoulli trials (random sample size)
PPS methods (require
mos):"pps_systematic": PPS systematic sampling"pps_brewer": Generalized Brewer (Tillé) method"pps_cps": Conditional Poisson sampling (maximum entropy)"pps_poisson": PPS Poisson sampling (random sample size)"pps_sps": Sequential Poisson sampling (fixed size, supportsprn)"pps_pareto": Pareto sampling (fixed size, supportsprn)"pps_multinomial": PPS multinomial (with replacement, any hit count)"pps_chromy": Chromy's sequential PPS (minimum replacement)
Balanced sampling:
"balanced": Balanced sampling via the cube method (Deville & Tille 2004). Uses auxiliary variables (aux) to balance the sample so that Horvitz-Thompson estimates of auxiliary totals match population totals. Supports equal or unequal (mos) inclusion probabilities. When stratified, uses the stratified cube algorithm (Chauvet 2009). At most 2 stages may use"balanced".
Custom PPS methods registered with
sondage::register_method()are also accepted, using the"pps_<name>"convention (e.g.,"pps_mymethod").- mos
Measure of size variable for PPS methods and optional for
"balanced", specified as a bare column name (unquoted). Required for allpps_*methods.- prn
Permanent random number variable for sample coordination, specified as a bare column name (unquoted). Must be a numeric column with values in the open interval (0, 1) and no missing values. Supported methods:
"bernoulli","pps_poisson","pps_sps","pps_pareto". When supplied, the sample is deterministic for a given set of PRN values, enabling coordination across survey waves.- aux
Auxiliary balancing variables for
method = "balanced", specified as bare column names:aux = c(income, pop_density). Columns must be numeric with no missing values. The cube algorithm ensures the Horvitz-Thompson estimator of these auxiliary totals equals (or nearly equals) the population totals, improving precision. When used withcluster_by(), auxiliary values are automatically aggregated (summed) to the cluster level before selection.- round
Rounding method when converting
fracto sample sizes. One of:"up"(default): Round up (ceiling). Matches SAS SURVEYSELECT default."down": Round down (floor)."nearest": Round to nearest integer (standard rounding).
This parameter only affects designs using
fracto specify the sampling rate. Whennis specified directly, no rounding occurs. After rounding, a minimum of 1 is enforced per stratum or group.- control
<
data-masking> Variables for sorting the frame before selection. Control sorting provides implicit stratification, which is particularly effective with systematic and sequential sampling methods. Can be:A single variable:
control = regionMultiple variables:
control = c(region, district)With
serp()for serpentine sorting:control = serp(region, district)With
dplyr::desc()for descending:control = c(region, desc(population))Mixed:
control = c(region, serp(district, commune), desc(size))
When stratification is also specified, control sorting is applied within each stratum. See the section "Control Sorting" below for details.
- certainty_size
For PPS without-replacement methods, units with MOS >= this value are selected with certainty (probability = 1). Can be:
A scalar: same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
certainty_sizecolumn
Certainty units are removed from the frame before probability sampling, and the remaining sample size is reduced accordingly. Mutually exclusive with
certainty_prop. Equivalent to SAS SURVEYSELECTCERTSIZE=option.- certainty_prop
For PPS without-replacement methods, units whose MOS proportion (MOS_i / sum(MOS)) >= this value are selected with certainty. Can be:
A scalar between 0 and 1 (exclusive): same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
certainty_propcolumn
Uses iterative selection: after removing certainty units, proportions are recomputed and the check is repeated until no new units qualify. Mutually exclusive with
certainty_size. Equivalent to SAS SURVEYSELECTCERTSIZE=P=option.- certainty_overflow
Controls behavior when certainty units exceed the target sample size
n. One of:"error"(default): Stop with an informative error."allow": Return all certainty units with stage weight 1, even if the resulting sample has more thannunits.
Equivalent to SAS SURVEYSELECT allowing
CERTSIZE=overflow.- on_empty
Behaviour when a random-size method (
bernoulli,pps_poisson) selects zero units in a stratum or the whole frame. One of:"error"(default): Stop with an informative error. Zero selections usually indicate a design problem (sampling fraction too small or stratum too small) that should be fixed rather than silently papered over."warn": Issue a warning and fall back to SRS of 1 unit."silent": Fall back to SRS of 1 unit without a message.
Weight note: when falling back (
"warn"or"silent"), the fallback selects 1 unit via SRS, so the resulting weight isN(the stratum or frame size), not1/frac. This reflects the actual selection mechanism, not the intended Bernoulli/Poisson design. Downstream variance estimation treats this unit as an SRS draw.
Details
Selection Methods
Equal Probability Methods
| Method | Replacement | Sample Size | Notes |
srswor | Without | Fixed | Standard SRS |
srswr | With | Fixed | Allows duplicates |
systematic | Without | Fixed | Periodic selection |
bernoulli | Without | Random | Each unit selected independently |
PPS Methods
| Method | Replacement | Sample Size | Notes |
pps_systematic | Without | Fixed | Simple, some bias |
pps_brewer | Without | Fixed | Fast, joint prob > 0 |
pps_cps | Without | Fixed | Highest entropy, joint prob available |
pps_poisson | Without | Random | PPS analog of Bernoulli |
pps_sps | Without | Fixed | Sequential Poisson, supports prn |
pps_pareto | Without | Fixed | Pareto sampling, supports prn |
pps_multinomial | With | Fixed | Any hit count, Hansen-Hurwitz |
pps_chromy | Min. repl. | Fixed | SAS default PPS_SEQ |
Parameter Requirements
| Method | n | frac | mos | aux |
srswor | Yes | or Yes | – | – |
srswr | Yes | or Yes | – | – |
systematic | Yes | or Yes | – | – |
bernoulli | Expected | or Yes | – | – |
pps_systematic | Yes | or Yes | Yes | – |
pps_brewer | Yes | or Yes | Yes | – |
pps_cps | Yes | – | Yes | – |
pps_poisson | Expected | or Yes | Yes | – |
pps_sps | Yes | or Yes | Yes | – |
pps_pareto | Yes | or Yes | Yes | – |
pps_multinomial | Yes | or Yes | Yes | – |
pps_chromy | Yes | or Yes | Yes | – |
balanced | Yes | or Yes | Optional | Optional |
Fixed vs Random Sample Size Methods
Methods with fixed sample size (srswor, srswr, systematic, pps_systematic,
pps_brewer, pps_cps, pps_sps, pps_pareto, pps_multinomial, pps_chromy)
accept either n or frac. When frac
is provided, the sample size is computed based on the round parameter (default: ceiling).
Methods with random sample size (bernoulli, pps_poisson) accept either
n or frac. When n is provided, it is converted to frac = n / N (where
N is the stratum or frame size). The resulting sample size is still random:
n specifies the expected sample size, not a fixed count.
For pps_poisson, the raw inclusion probabilities are computed as
\(\pi_i = f \cdot x_i / \bar{x}\) where
\(f\) is frac and \(x_i\) is the MOS value. Any \(\pi_i > 1\)
is clipped to 1, so the expected sample size
\(E[n] = \sum \min(\pi_i, 1)\) can be less
than \(f \cdot N\) when large units dominate the MOS
distribution. Use certainty_size or certainty_prop to handle these
dominant units explicitly.
When an allocation method is set in stratify_by() (equal,
proportional, neyman, optimal, power), specify total sample size via n.
Combining alloc with frac is not supported.
Custom Allocation with Data Frames
For stratum-specific sample sizes or rates, pass a data frame to n or frac.
The data frame must contain:
All stratification variable columns (matching those in
stratify_by())An
ncolumn (for sizes) orfraccolumn (for rates)
Certainty Selection
In PPS without-replacement sampling, very large units can have theoretical
inclusion probabilities exceeding 1. Certainty selection handles this by
selecting such units with probability 1 before sampling the remainder.
The output includes a .certainty_k column (where k is the stage number)
indicating which units were certainty selections.
Certainty selection is only available for WOR PPS methods (pps_systematic,
pps_brewer, pps_cps, pps_poisson, pps_sps, pps_pareto). With-replacement methods
(pps_multinomial) and PMR methods (pps_chromy) handle large units
natively through their hit mechanism.
When certainty_overflow = "allow", if more units qualify for certainty
selection than the requested n, all certainty units are returned with
probability 1 (stage weight = 1). No probabilistic sampling is performed in
this case. The resulting sample size will be the number of certainty
units, which exceeds n. In multi-stage designs, the final .weight can
still exceed 1 because it compounds all stage weights.
For stratum-specific thresholds, pass a data frame containing:
All stratification variable columns
A
certainty_sizeorcertainty_propcolumn
Control Sorting
Control sorting orders the sampling frame before selection, providing implicit
stratification. This is particularly effective with systematic and sequential
methods (systematic, pps_systematic, pps_chromy), where it ensures the
sample spreads evenly across the sorted variables.
Serpentine vs Nested Sorting:
Nested (default): Standard ascending sort by each variable in order. Use
control = c(var1, var2, var3).Serpentine: Alternating direction that minimizes "jumps" between adjacent units. Use
control = serp(var1, var2, var3).
Serpentine sorting makes nearby observations more similar by reversing direction at each hierarchy level. For geographic hierarchies, this means the last district of region 1 is adjacent to the last district of region 2.
Combining with Explicit Stratification:
When both stratify_by() and control are used, sorting is applied within
each stratum. This allows explicit stratification for variance control
combined with implicit stratification for sample spread.
References
srswor, srswr, systematic, bernoulli, pps_systematic,
pps_multinomial:
Cochran, W.G. (1977). Sampling Techniques, 3rd ed. Wiley.
pps_brewer:
Brewer, K.R.W. (1975). A simple procedure for sampling PPS WOR.
Australian Journal of Statistics, 17(3), 166-172.
pps_cps:
Hájek, J. (1964). Asymptotic theory of rejective sampling with varying
probabilities from a finite population.
Annals of Mathematical Statistics, 35(4), 1491-1523.
Chen, X.-H., Dempster, A.P. and Liu, J.S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.
pps_poisson:
Tillé, Y. (2006). Sampling Algorithms. Springer.
pps_sps:
Ohlsson, E. (1998). Sequential Poisson sampling.
Journal of Official Statistics, 14(2), 149-162.
pps_pareto:
Rosén, B. (1997). Asymptotic theory for order sampling.
Journal of Statistical Planning and Inference, 62(2), 135-158.
pps_chromy:
Chromy, J.R. (1979). Sequential sample selection methods.
Proceedings of the Survey Research Methods Section, ASA, 401-406.
balanced:
Deville, J.-C. and Tillé, Y. (2004). Efficient balanced
sampling: the cube method. Biometrika, 91(4), 893-912.
Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35(1), 115-119.
See also
sampling_design() for creating designs,
stratify_by() for stratification,
cluster_by() for clustering,
execute() for running designs,
serp() for serpentine sorting
Examples
# Simple random sample of 100 EAs
sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights: 445.7 [445.7, 445.7]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 43475 Est Gnagna Piela Rural 971 114 8.95
#> 2 6592 Sud-Ouest Noumbiel Kpuere Rural 88 11 7.89
#> 3 11611 Boucle du … Nayala Yaba Rural 111 15 8.97
#> 4 45236 Centre-Est Boulgou Beguedo Urban 939 167 0.32
#> 5 3549 Est Gourma Fada-N… Rural 263 32 3.4
#> 6 39095 Hauts-Bass… Kenedou… Sindo Rural 37 4 6.09
#> 7 21818 Centre-Est Kourite… Goungu… Rural 21 4 1.07
#> 8 15528 Centre Kadiogo Ouagad… Urban 1207 182 0.14
#> 9 14284 Est Gourma Matiak… Rural 178 21 8.86
#> 10 3433 Est Gourma Fada-N… Rural 285 34 8.55
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Systematic sample of 10%
sampling_design() |>
draw(frac = 0.10, method = "systematic") |>
execute(bfa_eas, seed = 123)
#> # A tbl_sample: 4457 × 17
#> # Weights: 10 [10, 10]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 11761 Boucle du … Bale Bagassi Rural 63 8 8.54
#> 2 11771 Boucle du … Bale Bagassi Rural 234 28 5.91
#> 3 11781 Boucle du … Bale Bagassi Rural 86 10 3.68
#> 4 11791 Boucle du … Bale Bagassi Rural 970 117 1.2
#> 5 11801 Boucle du … Bale Bagassi Rural 102 12 9.29
#> 6 11811 Boucle du … Bale Bagassi Rural 135 16 8.25
#> 7 11821 Boucle du … Bale Bagassi Rural 208 25 8.13
#> 8 11831 Boucle du … Bale Bagassi Rural 92 11 8.97
#> 9 11843 Boucle du … Bale Bagassi Rural 132 16 10.7
#> 10 11853 Boucle du … Bale Bagassi Rural 63 8 6.11
#> # ℹ 4,447 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# PPS sample of EAs using household count
sampling_design() |>
cluster_by(ea_id) |>
draw(n = 50, method = "pps_brewer", mos = households) |>
execute(bfa_eas, seed = 42)
#> # A tbl_sample: 50 × 18
#> # Weights: 1143.26 [117.16, 8569.22]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 1204 Boucle du … Bale Boromo Rural 456 59 0.15
#> 2 8357 Boucle du … Bale Ouri Rural 54 7 12.2
#> 3 41835 Boucle du … Kossi Dokui Rural 194 24 11.8
#> 4 11438 Boucle du … Sourou Toeni Rural 157 16 0.2
#> 5 12900 Cascades Leraba Douna Rural 990 129 0.59
#> 6 13925 Centre Kadiogo Koubri Rural 1372 270 2.1
#> 7 15179 Centre Kadiogo Ouagad… Urban 981 148 0.15
#> 8 16818 Centre Kadiogo Ouagad… Urban 1019 153 0.11
#> 9 18091 Centre Kadiogo Ouagad… Urban 1152 173 0.15
#> 10 683 Centre-Est Boulgou Beguedo Urban 1112 198 1.55
#> # ℹ 40 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Bernoulli sampling with frac (random sample size, expected ~5%)
sampling_design() |>
draw(frac = 0.05, method = "bernoulli") |>
execute(ken_enterprises, seed = 12345)
#> # A tbl_sample: 309 × 14
#> # Weights: 20 [20, 20]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_00010 Kiambu Kiambu Food & Bev… Medium 30 93.4
#> 2 KEN_00020 Kiambu Kiambu Food & Bev… Large 668 305.
#> 3 KEN_00023 Kiambu Kiambu Other Manu… Small 16 133.
#> 4 KEN_00051 Kiambu Kiambu Other Serv… Small 8 4.9
#> 5 KEN_00063 Kiambu Kiambu Other Serv… Small 10 10.9
#> 6 KEN_00065 Kiambu Kiambu Other Serv… Small 18 10.6
#> 7 KEN_00077 Kiambu Kiambu Other Serv… Small 8 2.7
#> 8 KEN_00103 Kiambu Kiambu Other Serv… Medium 40 28.5
#> 9 KEN_00117 Kiambu Kiambu Other Serv… Medium 60 47.9
#> 10 KEN_00146 Kiambu Kiambu Retail Small 11 26
#> # ℹ 299 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Bernoulli sampling with expected n (converted to frac = 500/N)
sampling_design() |>
draw(n = 500, method = "bernoulli") |>
execute(bfa_eas, seed = 42)
#> # A tbl_sample: 526 × 17
#> # Weights: 89.14 [89.14, 89.14]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 11781 Boucle du … Bale Bagassi Rural 86 10 3.68
#> 2 36665 Boucle du … Bale Fara Rural 518 76 8.89
#> 3 36750 Boucle du … Bale Fara Rural 107 16 4.14
#> 4 9032 Boucle du … Bale Poura Urban 432 69 4.38
#> 5 9047 Boucle du … Bale Poura Urban 181 29 4.7
#> 6 9053 Boucle du … Bale Poura Urban 436 70 3.19
#> 7 39999 Boucle du … Banwa Balave Rural 1820 287 2.04
#> 8 34012 Boucle du … Banwa Sami Rural 135 20 5.63
#> 9 9680 Boucle du … Banwa Sanaba Rural 59 8 8.98
#> 10 23812 Boucle du … Banwa Solenzo Rural 679 91 8.85
#> # ℹ 516 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified with different sizes per stratum (data frame)
region_sizes <- data.frame(
region = levels(bfa_eas$region),
n = c(20, 12, 25, 18, 22, 16, 14, 15, 20, 18, 12, 10, 8)
)
sampling_design() |>
stratify_by(region) |>
draw(n = region_sizes) |>
execute(bfa_eas, seed = 123)
#> # A tbl_sample: 210 × 17
#> # Weights: 212.24 [115.14, 414.4]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 33181 Boucle du … Kossi Nouna Rural 133 20 0.07
#> 2 33229 Boucle du … Kossi Nouna Rural 48 7 4.53
#> 3 21506 Boucle du … Kossi Doumba… Rural 485 59 4.82
#> 4 45264 Boucle du … Bale Pa Rural 825 119 0.66
#> 5 26201 Boucle du … Sourou Di Rural 95 12 21.1
#> 6 26077 Boucle du … Mouhoun Dedoug… Rural 176 31 8.96
#> 7 25510 Boucle du … Kossi Bombor… Rural 246 32 0.44
#> 8 23774 Boucle du … Banwa Solenzo Rural 158 21 2.9
#> 9 8332 Boucle du … Mouhoun Ouarko… Rural 55 8 5.68
#> 10 43713 Boucle du … Mouhoun Safane Rural 97 14 6.46
#> # ℹ 200 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified with different rates per stratum (named vector)
sampling_design() |>
stratify_by(size_class) |>
draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 758 × 14
#> # Weights: 9 [2, 49.85]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_05526 Homa Bay Rest … Retail Small 8 18.8
#> 2 KEN_05090 Nyamira Rest … Other… Small 6 4.7
#> 3 KEN_02534 Nairobi Nairo… Retail Small 14 38.4
#> 4 KEN_02455 Nairobi Nairo… Retail Small 11 7.5
#> 5 KEN_02609 Nairobi Nairo… Retail Small 5 8.4
#> 6 KEN_06669 Uasin Gishu Uasin… Other… Small 11 16.1
#> 7 KEN_01498 Nairobi Nairo… Other… Small 8 13.4
#> 8 KEN_04590 Kisii Rest … Other… Small 10 11.2
#> 9 KEN_02509 Nairobi Nairo… Retail Small 8 19.3
#> 10 KEN_02684 Nairobi Nairo… Retail Small 16 23.7
#> # ℹ 748 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
stratify_by(region, alloc = "neyman", variance = bfa_eas_variance) |>
draw(n = 150, min_n = 2) |>
execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 150 × 17
#> # Weights: 297.13 [112.69, 2419.5]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 34795 Boucle du … Sourou Tougan Rural 45 6 14.0
#> 2 44496 Boucle du … Mouhoun Tcheri… Rural 180 29 16.6
#> 3 9624 Boucle du … Banwa Sanaba Rural 177 23 6.56
#> 4 7012 Boucle du … Kossi Madouba Rural 387 48 6.18
#> 5 44420 Boucle du … Mouhoun Tcheri… Rural 63 10 22.3
#> 6 515 Boucle du … Kossi Barani Rural 651 85 1.11
#> 7 1144 Boucle du … Bale Boromo Rural 75 10 8.92
#> 8 8378 Boucle du … Bale Ouri Rural 693 89 9.26
#> 9 4909 Boucle du … Sourou Kassoum Rural 662 98 0.65
#> 10 21094 Boucle du … Kossi Bouras… Rural 25 3 5.1
#> # ℹ 140 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Proportional allocation with min and max bounds
sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 200, min_n = 10, max_n = 50) |>
execute(bfa_eas, seed = 1)
#> # A tbl_sample: 200 × 17
#> # Weights: 222.85 [161.2, 238.52]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 9648 Boucle du … Banwa Sanaba Rural 279 35 8.37
#> 2 11547 Boucle du … Sourou Toeni Rural 49 5 21.4
#> 3 41824 Boucle du … Kossi Dokui Rural 75 9 15.8
#> 4 11012 Boucle du … Banwa Tansila Rural 592 71 0.95
#> 5 32308 Boucle du … Sourou Lanfie… Rural 57 8 9.49
#> 6 7017 Boucle du … Kossi Madouba Rural 599 74 0.74
#> 7 36700 Boucle du … Bale Fara Rural 402 59 6.36
#> 8 11611 Boucle du … Nayala Yaba Rural 111 15 8.97
#> 9 8342 Boucle du … Mouhoun Ouarko… Rural 94 13 8.1
#> 10 11626 Boucle du … Nayala Yaba Rural 56 7 8.31
#> # ℹ 190 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Control sorting with serpentine ordering (implicit stratification)
sampling_design() |>
draw(n = 100, method = "systematic",
control = serp(region, province)) |>
execute(bfa_eas, seed = 2)
#> # A tbl_sample: 100 × 17
#> # Weights: 445.7 [445.7, 445.7]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 11843 Boucle du … Bale Bagassi Rural 132 16 10.7
#> 2 8876 Boucle du … Bale Pompoi Rural 55 7 8.62
#> 3 34070 Boucle du … Banwa Sami Rural 24 4 8.12
#> 4 24054 Boucle du … Banwa Solenzo Rural 123 17 8.82
#> 5 21070 Boucle du … Kossi Bouras… Rural 115 13 9.87
#> 6 37405 Boucle du … Kossi Kombori Rural 308 39 9.76
#> 7 21009 Boucle du … Mouhoun Bondok… Rural 556 85 9.16
#> 8 5672 Boucle du … Mouhoun Kona Rural 222 31 8.98
#> 9 44440 Boucle du … Mouhoun Tcheri… Rural 55 9 5.53
#> 10 11656 Boucle du … Nayala Yaba Rural 1292 170 1.35
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Control sorting with nested (standard) ordering
sampling_design() |>
draw(n = 100, method = "systematic",
control = c(region, province)) |>
execute(bfa_eas, seed = 3)
#> # A tbl_sample: 100 × 17
#> # Weights: 445.7 [445.7, 445.7]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 11833 Boucle du … Bale Bagassi Rural 35 4 7.15
#> 2 8620 Boucle du … Bale Pa Rural 52 7 1.46
#> 3 34062 Boucle du … Banwa Sami Rural 286 43 7.68
#> 4 24046 Boucle du … Banwa Solenzo Rural 24 3 6.37
#> 5 25526 Boucle du … Kossi Bombor… Rural 152 20 0.06
#> 6 37398 Boucle du … Kossi Kombori Rural 201 26 5.89
#> 7 21002 Boucle du … Mouhoun Bondok… Rural 791 121 9.13
#> 8 5663 Boucle du … Mouhoun Kona Rural 180 25 7.94
#> 9 44432 Boucle du … Mouhoun Tcheri… Rural 502 80 23.6
#> 10 11649 Boucle du … Nayala Yaba Rural 526 69 7.66
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Combined explicit stratification with control sorting within strata
sampling_design() |>
stratify_by(urban_rural) |>
draw(n = 50, method = "systematic",
control = serp(region, province)) |>
execute(bfa_eas, seed = 25)
#> # A tbl_sample: 100 × 17
#> # Weights: 445.7 [137.66, 753.74]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 36744 Boucle du … Bale Fara Rural 565 83 9.07
#> 2 9728 Boucle du … Banwa Sanaba Rural 69 9 6.85
#> 3 25519 Boucle du … Kossi Bombor… Rural 240 32 4.62
#> 4 10541 Boucle du … Kossi Sono Rural 738 111 0.6
#> 5 8318 Boucle du … Mouhoun Ouarko… Rural 160 22 7.76
#> 6 11674 Boucle du … Nayala Yaba Rural 1353 178 1.3
#> 7 34832 Boucle du … Sourou Tougan Rural 33 5 8.81
#> 8 20641 Cascades Comoe Banfora Rural 996 122 0.24
#> 9 15093 Cascades Comoe Niango… Rural 126 17 8.87
#> 10 10357 Cascades Comoe Sidera… Rural 31 5 8.64
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# PPS with certainty selection (absolute threshold)
# Large EAs selected with certainty, rest sampled with PPS
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_brewer", mos = households,
certainty_size = 800) |>
execute(bfa_eas, seed = 3)
#> # A tbl_sample: 1300 × 18
#> # Weights: 35.84 [1, 874.53]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 39999 Boucle du … Banwa Balave Rural 1820 287 2.04
#> 2 44814 Boucle du … Nayala Toma Rural 399 58 0.29
#> 3 12752 Boucle du … Kossi Djibas… Rural 336 43 8.61
#> 4 11005 Boucle du … Banwa Tansila Rural 664 80 10.2
#> 5 26060 Boucle du … Mouhoun Dedoug… Rural 988 175 2.47
#> 6 26079 Boucle du … Mouhoun Dedoug… Rural 1114 198 1.9
#> 7 9031 Boucle du … Bale Poura Urban 941 151 1.06
#> 8 23980 Boucle du … Banwa Solenzo Rural 870 117 8.12
#> 9 25983 Boucle du … Mouhoun Dedoug… Rural 877 156 1.18
#> 10 45289 Boucle du … Mouhoun Dedoug… Rural 910 161 0.83
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# PPS with certainty selection (proportional threshold)
# EAs with >= 10% of stratum total selected with certainty
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_systematic", mos = households,
certainty_prop = 0.10) |>
execute(bfa_eas, seed = 321)
#> # A tbl_sample: 1300 × 18
#> # Weights: 33.17 [1.99, 427.22]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 11817 Boucle du … Bale Bagassi Rural 433 52 0.42
#> 2 29500 Boucle du … Bale Bana Rural 378 49 8.91
#> 3 1136 Boucle du … Bale Boromo Rural 3613 471 4.23
#> 4 1180 Boucle du … Bale Boromo Rural 2111 275 2.75
#> 5 36664 Boucle du … Bale Fara Rural 152 22 8.44
#> 6 36708 Boucle du … Bale Fara Rural 581 85 8.9
#> 7 36740 Boucle du … Bale Fara Rural 997 146 0.77
#> 8 36778 Boucle du … Bale Fara Rural 489 72 0.69
#> 9 8397 Boucle du … Bale Ouri Rural 1244 159 1.31
#> 10 8450 Boucle du … Bale Ouri Rural 1020 130 1.13
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Stratum-specific certainty thresholds (data frame)
cert_thresholds <- data.frame(
region = levels(bfa_eas$region),
certainty_size = c(700, 450, 800, 850, 750, 800, 550,
450, 700, 950, 750, 600, 480)
)
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_brewer", mos = households,
certainty_size = cert_thresholds) |>
execute(bfa_eas, seed = 424)
#> # A tbl_sample: 1300 × 18
#> # Weights: 35.77 [1, 640.13]
#> ea_id region province commune urban_rural population households area_km2
#> * <int> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 34783 Boucle du … Sourou Tougan Rural 989 136 1.38
#> 2 12833 Boucle du … Kossi Djibas… Rural 219 28 8.46
#> 3 39958 Boucle du … Banwa Balave Rural 684 108 0.88
#> 4 9915 Boucle du … Bale Siby Rural 75 11 8.54
#> 5 9738 Boucle du … Banwa Sanaba Rural 504 64 8.53
#> 6 21043 Boucle du … Mouhoun Bondok… Rural 942 144 1.41
#> 7 6371 Boucle du … Banwa Kouka Rural 2329 321 1.42
#> 8 43729 Boucle du … Mouhoun Safane Rural 1047 150 1.06
#> 9 25968 Boucle du … Mouhoun Dedoug… Rural 766 136 0.3
#> 10 34034 Boucle du … Banwa Sami Rural 175 26 8.81
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>