draw() specifies how units are selected: sample size, sampling fraction,
selection method, and measure of size for PPS sampling. Every stage in a
sampling design must end with draw().
Usage
draw(
.data,
n = NULL,
frac = NULL,
min_n = NULL,
max_n = NULL,
method = "srswor",
mos = NULL,
prn = NULL,
aux = NULL,
round = "up",
control = NULL,
certainty_size = NULL,
certainty_prop = NULL,
certainty_overflow = "error",
on_empty = "error"
)Arguments
- .data
A
sampling_designobject (piped fromsampling_design(),stratify_by(), orcluster_by()).- n
Sample size. For random-size methods (
bernoulli,pps_poisson),nis the expected sample size (converted internally tofrac = n / N). Can be:A scalar: applies per stratum (if no
alloc) or as total (ifallocspecified)A named vector: stratum-specific sizes (for single stratification variable)
A data frame: stratum-specific sizes with stratification columns +
ncolumn
- frac
Sampling fraction. Can be:
A scalar: same fraction for all strata
A named vector: stratum-specific fractions
A data frame: stratum-specific fractions with stratification columns +
fraccolumn Only one ofnorfracshould be specified.
- min_n
Minimum sample size per stratum. When an allocation method (e.g., Neyman, proportional) would assign fewer than
min_nunits to a stratum, that stratum receivesmin_nunits instead. The excess is redistributed proportionally among strata that were abovemin_n. Commonly set to 2 (minimum for variance estimation) or higher for reliable subgroup estimates. Only applies when stratification with an allocation method is used. Default isNULL(no minimum).- max_n
Maximum sample size per stratum. When an allocation method would assign more than
max_nunits to a stratum, that stratum is capped atmax_nunits. The surplus is redistributed proportionally among strata that were belowmax_n. Useful for capping dominant strata or managing operational constraints. Only applies when stratification with an allocation method is used. Default isNULL(no maximum).- method
Character string specifying the selection method. One of:
Equal probability methods:
"srswor"(default): Simple random sampling without replacement"srswr": Simple random sampling with replacement"systematic": Systematic (fixed interval) sampling"bernoulli": Independent Bernoulli trials (random sample size)
PPS methods (require
mos):"pps_systematic": PPS systematic sampling"pps_brewer": Generalized Brewer (Tillé) method"pps_cps": Conditional Poisson sampling (maximum entropy)"pps_poisson": PPS Poisson sampling (random sample size)"pps_sps": Sequential Poisson sampling (fixed size, supportsprn)"pps_pareto": Pareto sampling (fixed size, supportsprn)"pps_multinomial": PPS multinomial (with replacement, any hit count)"pps_chromy": Chromy's sequential PPS (minimum replacement)
Balanced sampling:
"balanced": Balanced sampling via the cube method (Deville & Tille 2004). Uses auxiliary variables (aux) to balance the sample so that Horvitz-Thompson estimates of auxiliary totals match population totals. Supports equal or unequal (mos) inclusion probabilities. When stratified, uses the stratified cube algorithm (Chauvet 2009). At most 2 stages may use"balanced".
- mos
Measure of size variable for PPS methods and optional for
"balanced", specified as a bare column name (unquoted). Required for allpps_*methods.- prn
Permanent random number variable for sample coordination, specified as a bare column name (unquoted). Must be a numeric column with values in the open interval (0, 1) and no missing values. Supported methods:
"bernoulli","pps_poisson","pps_sps","pps_pareto". When supplied, the sample is deterministic for a given set of PRN values, enabling coordination across survey waves.- aux
Auxiliary balancing variables for
method = "balanced", specified as bare column names:aux = c(income, pop_density). Columns must be numeric with no missing values. The cube algorithm ensures the Horvitz-Thompson estimator of these auxiliary totals equals (or nearly equals) the population totals, improving precision. When used withcluster_by(), auxiliary values are automatically aggregated (summed) to the cluster level before selection.- round
Rounding method when converting
fracto sample sizes. One of:"up"(default): Round up (ceiling). Matches SAS SURVEYSELECT default."down": Round down (floor)."nearest": Round to nearest integer (standard rounding).
This parameter only affects designs using
fracto specify the sampling rate. Whennis specified directly, no rounding occurs.- control
<
data-masking> Variables for sorting the frame before selection. Control sorting provides implicit stratification, which is particularly effective with systematic and sequential sampling methods. Can be:A single variable:
control = regionMultiple variables:
control = c(region, district)With
serp()for serpentine sorting:control = serp(region, district)With
dplyr::desc()for descending:control = c(region, desc(population))Mixed:
control = c(region, serp(district, commune), desc(size))
When stratification is also specified, control sorting is applied within each stratum. See the section "Control Sorting" below for details.
- certainty_size
For PPS without-replacement methods, units with MOS >= this value are selected with certainty (probability = 1). Can be:
A scalar: same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
certainty_sizecolumn
Certainty units are removed from the frame before probability sampling, and the remaining sample size is reduced accordingly. Mutually exclusive with
certainty_prop. Equivalent to SAS SURVEYSELECTCERTSIZE=option.- certainty_prop
For PPS without-replacement methods, units whose MOS proportion (MOS_i / sum(MOS)) >= this value are selected with certainty. Can be:
A scalar between 0 and 1 (exclusive): same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
certainty_propcolumn
Uses iterative selection: after removing certainty units, proportions are recomputed and the check is repeated until no new units qualify. Mutually exclusive with
certainty_size. Equivalent to SAS SURVEYSELECTCERTSIZE=P=option.- certainty_overflow
Controls behavior when certainty units exceed the target sample size
n. One of:"error"(default): Stop with an informative error."allow": Return all certainty units with weight 1, even if the resulting sample has more thannunits.
Equivalent to SAS SURVEYSELECT allowing
CERTSIZE=overflow.- on_empty
Behaviour when a random-size method (
bernoulli,pps_poisson) selects zero units in a stratum or the whole frame. One of:"error"(default): Stop with an informative error. Zero selections usually indicate a design problem (sampling fraction too small or stratum too small) that should be fixed rather than silently papered over."warn": Issue a warning and fall back to SRS of 1 unit."silent": Fall back to SRS of 1 unit without a message.
Weight note: when falling back (
"warn"or"silent"), the fallback selects 1 unit via SRS, so the resulting weight isN(the stratum or frame size), not1/frac. This reflects the actual selection mechanism, not the intended Bernoulli/Poisson design. Downstream variance estimation treats this unit as an SRS draw.
Details
Selection Methods
Equal Probability Methods
| Method | Replacement | Sample Size | Notes |
srswor | Without | Fixed | Standard SRS |
srswr | With | Fixed | Allows duplicates |
systematic | Without | Fixed | Periodic selection |
bernoulli | Without | Random | Each unit selected independently |
PPS Methods
| Method | Replacement | Sample Size | Notes |
pps_systematic | Without | Fixed | Simple, some bias |
pps_brewer | Without | Fixed | Fast, joint prob > 0 |
pps_cps | Without | Fixed | Highest entropy, joint prob available |
pps_poisson | Without | Random | PPS analog of Bernoulli |
pps_sps | Without | Fixed | Sequential Poisson, supports prn |
pps_pareto | Without | Fixed | Pareto sampling, supports prn |
pps_multinomial | With | Fixed | Any hit count, Hansen-Hurwitz |
pps_chromy | Min. repl. | Fixed | SAS default PPS_SEQ |
Parameter Requirements
| Method | n | frac | mos | aux |
srswor | Yes | or Yes | – | – |
srswr | Yes | or Yes | – | – |
systematic | Yes | or Yes | – | – |
bernoulli | Expected | or Yes | – | – |
pps_systematic | Yes | or Yes | Yes | – |
pps_brewer | Yes | or Yes | Yes | – |
pps_cps | Yes | – | Yes | – |
pps_poisson | Expected | or Yes | Yes | – |
pps_sps | Yes | or Yes | Yes | – |
pps_pareto | Yes | or Yes | Yes | – |
pps_multinomial | Yes | or Yes | Yes | – |
pps_chromy | Yes | or Yes | Yes | – |
balanced | Yes | or Yes | Optional | Optional |
Fixed vs Random Sample Size Methods
Methods with fixed sample size (srswor, srswr, systematic, pps_systematic,
pps_brewer, pps_cps, pps_sps, pps_pareto, pps_multinomial, pps_chromy)
accept either n or frac. When frac
is provided, the sample size is computed based on the round parameter (default: ceiling).
Methods with random sample size (bernoulli, pps_poisson) accept either
n or frac. When n is provided, it is converted to frac = n / N (where
N is the stratum or frame size). The resulting sample size is still random:
n specifies the expected sample size, not a fixed count.
For pps_poisson, the raw inclusion probabilities are computed as
\(\pi_i = f \cdot x_i / \bar{x}\) where
\(f\) is frac and \(x_i\) is the MOS value. Any \(\pi_i > 1\)
is clipped to 1, so the expected sample size
\(E[n] = \sum \min(\pi_i, 1)\) can be less
than \(f \cdot N\) when large units dominate the MOS
distribution. Use certainty_size or certainty_prop to handle these
dominant units explicitly.
When an allocation method is set in stratify_by() (equal,
proportional, neyman, optimal, power), specify total sample size via n.
Combining alloc with frac is not supported.
Custom Allocation with Data Frames
For stratum-specific sample sizes or rates, pass a data frame to n or frac.
The data frame must contain:
All stratification variable columns (matching those in
stratify_by())An
ncolumn (for sizes) orfraccolumn (for rates)
Certainty Selection
In PPS without-replacement sampling, very large units can have theoretical
inclusion probabilities exceeding 1. Certainty selection handles this by
selecting such units with probability 1 before sampling the remainder.
The output includes a .certainty_k column (where k is the stage number)
indicating which units were certainty selections.
Certainty selection is only available for WOR PPS methods (pps_systematic,
pps_brewer, pps_cps, pps_poisson, pps_sps, pps_pareto). With-replacement methods
(pps_multinomial) and PMR methods (pps_chromy) handle large units
natively through their hit mechanism.
When certainty_overflow = "allow", if more units qualify for certainty
selection than the requested n, all certainty units are returned with
probability 1 (weight = 1). No probabilistic sampling is performed in
this case. The resulting sample size will be the number of certainty
units, which exceeds n.
For stratum-specific thresholds, pass a data frame containing:
All stratification variable columns
A
certainty_sizeorcertainty_propcolumn
Control Sorting
Control sorting orders the sampling frame before selection, providing implicit
stratification. This is particularly effective with systematic and sequential
methods (systematic, pps_systematic, pps_chromy), where it ensures the
sample spreads evenly across the sorted variables.
Serpentine vs Nested Sorting:
Nested (default): Standard ascending sort by each variable in order. Use
control = c(var1, var2, var3).Serpentine: Alternating direction that minimizes "jumps" between adjacent units. Use
control = serp(var1, var2, var3).
Serpentine sorting makes nearby observations more similar by reversing direction at each hierarchy level. For geographic hierarchies, this means the last district of region 1 is adjacent to the last district of region 2.
Combining with Explicit Stratification:
When both stratify_by() and control are used, sorting is applied within
each stratum. This allows explicit stratification for variance control
combined with implicit stratification for sample spread.
References
srswor, srswr, systematic, bernoulli, pps_systematic,
pps_multinomial:
Cochran, W.G. (1977). Sampling Techniques, 3rd ed. Wiley.
pps_brewer:
Brewer, K.R.W. (1975). A simple procedure for sampling PPS WOR.
Australian Journal of Statistics, 17(3), 166-172.
pps_cps:
Hájek, J. (1964). Asymptotic theory of rejective sampling with varying
probabilities from a finite population.
Annals of Mathematical Statistics, 35(4), 1491-1523.
Chen, X.-H., Dempster, A.P. and Liu, J.S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.
pps_poisson:
Tillé, Y. (2006). Sampling Algorithms. Springer.
pps_sps:
Ohlsson, E. (1998). Sequential Poisson sampling.
Journal of Official Statistics, 14(2), 149-162.
pps_pareto:
Rosén, B. (1997). Asymptotic theory for order sampling.
Journal of Statistical Planning and Inference, 62(2), 135-158.
pps_chromy:
Chromy, J.R. (1979). Sequential sample selection methods.
Proceedings of the Survey Research Methods Section, ASA, 401-406.
balanced:
Deville, J.-C. and Tillé, Y. (2004). Efficient balanced
sampling: the cube method. Biometrika, 91(4), 893-912.
Chauvet, G. (2009). Stratified balanced sampling. Survey Methodology, 35(1), 115-119.
See also
sampling_design() for creating designs,
stratify_by() for stratification,
cluster_by() for clustering,
execute() for running designs,
serp() for serpentine sorting
Examples
# Simple random sample of 100 EAs
sampling_design() |>
draw(n = 100) |>
execute(bfa_eas, seed = 1)
#> # A tbl_sample: 100 × 17
#> # Weights: 149 [149, 149]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_10155 Boucle … Mouhoun Ouarko… Rural 1347 187 33.0
#> 2 EA_01016 Centre-… Zoundwe… Bere Rural 1166 193 23.2
#> 3 EA_04918 Centre-… Kourite… Goungu… Rural 949 141 6.89
#> 4 EA_01890 Hauts-B… Houet Bobo-D… Urban 1195 191 0.25
#> 5 EA_14703 Plateau… Oubrite… Ziniare Rural 1340 177 26.1
#> 6 EA_12688 Est Tapoa Tambaga Rural 994 139 6.46
#> 7 EA_11778 Plateau… Ganzour… Saolgo Rural 998 124 1.41
#> 8 EA_05700 Hauts-B… Houet Karang… Rural 1049 119 23.6
#> 9 EA_06057 Est Gnagna Koala Rural 1291 173 27.6
#> 10 EA_04482 Centre-… Boulgou Garango Rural 1553 262 1.44
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Systematic sample of 10%
sampling_design() |>
draw(frac = 0.10, method = "systematic") |>
execute(bfa_eas, seed = 123)
#> # A tbl_sample: 1490 × 17
#> # Weights: 10 [10, 10]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00247 Boucle … Bale Bagassi Rural 1666 200 33.4
#> 2 EA_00257 Boucle … Bale Bagassi Rural 931 112 18.1
#> 3 EA_00267 Boucle … Bale Bagassi Rural 1319 159 17.1
#> 4 EA_00277 Boucle … Bale Bagassi Rural 1740 209 15.8
#> 5 EA_00464 Boucle … Bale Bana Rural 915 120 7.95
#> 6 EA_02161 Boucle … Bale Boromo Rural 805 105 9.43
#> 7 EA_02171 Boucle … Bale Boromo Rural 1644 214 15.3
#> 8 EA_02181 Boucle … Bale Boromo Rural 996 130 7.42
#> 9 EA_04192 Boucle … Bale Fara Rural 1491 238 22.2
#> 10 EA_04202 Boucle … Bale Fara Rural 1187 190 17.6
#> # ℹ 1,480 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# PPS sample of EAs using household count
sampling_design() |>
cluster_by(ea_id) |>
draw(n = 50, method = "pps_brewer", mos = households) |>
execute(bfa_eas, seed = 42)
#> # A tbl_sample: 50 × 18
#> # Weights: 289.64 [88.85, 616.64]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_02180 Boucle … Bale Boromo Rural 4324 564 3.13
#> 2 EA_10322 Boucle … Bale Ouri Rural 1605 192 52.5
#> 3 EA_06254 Boucle … Kossi Kombori Rural 1549 194 41.2
#> 4 EA_13829 Boucle … Sourou Tougan Rural 1309 181 55.4
#> 5 EA_06311 Centre Kadiogo Komki-… Rural 1243 188 15.7
#> 6 EA_08822 Centre Kadiogo Ouagad… Urban 1804 342 0.31
#> 7 EA_08853 Centre Kadiogo Ouagad… Urban 1452 275 0.26
#> 8 EA_09411 Centre Kadiogo Ouagad… Urban 2084 395 0.27
#> 9 EA_09799 Centre Kadiogo Ouagad… Urban 1889 358 1.23
#> 10 EA_11351 Centre Kadiogo Saaba Urban 2486 445 6.55
#> # ℹ 40 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Bernoulli sampling with frac (random sample size, expected ~5%)
sampling_design() |>
draw(frac = 0.05, method = "bernoulli") |>
execute(ken_enterprises, seed = 12345)
#> # A tbl_sample: 309 × 14
#> # Weights: 20 [20, 20]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_00010 Kiambu Kiambu Food & Bev… Medium 30 93.4
#> 2 KEN_00020 Kiambu Kiambu Food & Bev… Large 668 305.
#> 3 KEN_00023 Kiambu Kiambu Other Manu… Small 16 133.
#> 4 KEN_00051 Kiambu Kiambu Other Serv… Small 8 4.9
#> 5 KEN_00063 Kiambu Kiambu Other Serv… Small 10 10.9
#> 6 KEN_00065 Kiambu Kiambu Other Serv… Small 18 10.6
#> 7 KEN_00077 Kiambu Kiambu Other Serv… Small 8 2.7
#> 8 KEN_00103 Kiambu Kiambu Other Serv… Medium 40 28.5
#> 9 KEN_00117 Kiambu Kiambu Other Serv… Medium 60 47.9
#> 10 KEN_00146 Kiambu Kiambu Retail Small 11 26
#> # ℹ 299 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Bernoulli sampling with expected n (converted to frac = 500/N)
sampling_design() |>
draw(n = 500, method = "bernoulli") |>
execute(bfa_eas, seed = 42)
#> # A tbl_sample: 504 × 17
#> # Weights: 29.8 [29.8, 29.8]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00261 Boucle … Bale Bagassi Rural 1128 136 24.0
#> 2 EA_00267 Boucle … Bale Bagassi Rural 1319 159 17.1
#> 3 EA_00465 Boucle … Bale Bana Rural 1381 181 3.79
#> 4 EA_02157 Boucle … Bale Boromo Rural 129 17 8.12
#> 5 EA_02170 Boucle … Bale Boromo Rural 537 70 45.7
#> 6 EA_12090 Boucle … Bale Siby Rural 1555 213 45.7
#> 7 EA_00377 Boucle … Banwa Balave Rural 1310 206 27.1
#> 8 EA_06871 Boucle … Banwa Kouka Rural 1169 135 0.99
#> 9 EA_11688 Boucle … Banwa Sanaba Rural 1844 268 14.0
#> 10 EA_11696 Boucle … Banwa Sanaba Rural 2060 299 3.13
#> # ℹ 494 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified with different sizes per stratum (data frame)
region_sizes <- data.frame(
region = levels(bfa_eas$region),
n = c(20, 12, 25, 18, 22, 16, 14, 15, 20, 18, 12, 10, 8)
)
sampling_design() |>
stratify_by(region) |>
draw(n = region_sizes) |>
execute(bfa_eas, seed = 123)
#> # A tbl_sample: 210 × 17
#> # Weights: 70.95 [43.43, 106]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_12416 Boucle … Banwa Solenzo Rural 1395 166 8.03
#> 2 EA_12873 Boucle … Banwa Tansila Rural 1164 148 39.1
#> 3 EA_11055 Boucle … Bale Pompoi Rural 1198 189 33.3
#> 4 EA_00767 Boucle … Kossi Barani Rural 1433 188 17.5
#> 5 EA_12078 Boucle … Bale Siby Rural 912 125 8.55
#> 6 EA_03217 Boucle … Mouhoun Dedoug… Rural 998 177 3.4
#> 7 EA_04525 Boucle … Nayala Gassan Rural 649 79 0.49
#> 8 EA_05964 Boucle … Sourou Kiemba… Rural 1075 171 12.7
#> 9 EA_14277 Boucle … Nayala Ye Rural 1323 233 10.9
#> 10 EA_14292 Boucle … Nayala Ye Rural 1030 182 18.1
#> # ℹ 200 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified with different rates per stratum (named vector)
sampling_design() |>
stratify_by(size_class) |>
draw(frac = c(Small = 0.02, Medium = 0.10, Large = 0.50)) |>
execute(ken_enterprises, seed = 42)
#> # A tbl_sample: 758 × 14
#> # Weights: 9 [2, 49.85]
#> enterprise_id county region sector size_class employees revenue_millions
#> * <chr> <fct> <fct> <fct> <fct> <int> <dbl>
#> 1 KEN_05526 Homa Bay Rest … Retail Small 8 18.8
#> 2 KEN_05090 Nyamira Rest … Other… Small 6 4.7
#> 3 KEN_02534 Nairobi Nairo… Retail Small 14 38.4
#> 4 KEN_02455 Nairobi Nairo… Retail Small 11 7.5
#> 5 KEN_02609 Nairobi Nairo… Retail Small 5 8.4
#> 6 KEN_06669 Uasin Gishu Uasin… Other… Small 11 16.1
#> 7 KEN_01498 Nairobi Nairo… Other… Small 8 13.4
#> 8 KEN_04590 Kisii Rest … Other… Small 10 11.2
#> 9 KEN_02509 Nairobi Nairo… Retail Small 8 19.3
#> 10 KEN_02684 Nairobi Nairo… Retail Small 16 23.7
#> # ℹ 748 more rows
#> # ℹ 7 more variables: year_established <int>, exporter <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
stratify_by(region, alloc = "neyman", variance = bfa_eas_variance) |>
draw(n = 150, min_n = 2) |>
execute(bfa_eas, seed = 2026)
#> # A tbl_sample: 150 × 17
#> # Weights: 99.33 [72.27, 155.6]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_08652 Boucle … Kossi Nouna Rural 1393 181 17.6
#> 2 EA_10131 Boucle … Mouhoun Ouarko… Rural 1123 156 23.4
#> 3 EA_06881 Boucle … Banwa Kouka Rural 1229 142 9.3
#> 4 EA_04515 Boucle … Nayala Gassan Rural 1556 190 42.6
#> 5 EA_10374 Boucle … Bale Pa Rural 1150 162 82.3
#> 6 EA_13719 Boucle … Nayala Toma Rural 955 99 0.98
#> 7 EA_12390 Boucle … Banwa Solenzo Rural 2953 352 25.6
#> 8 EA_06874 Boucle … Banwa Kouka Rural 1723 198 17.7
#> 9 EA_02110 Boucle … Mouhoun Bondok… Rural 1413 216 16.8
#> 10 EA_11690 Boucle … Banwa Sanaba Rural 929 135 1.9
#> # ℹ 140 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Proportional allocation with min and max bounds
sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 200, min_n = 10, max_n = 50) |>
execute(bfa_eas, seed = 1)
#> # A tbl_sample: 200 × 17
#> # Weights: 74.5 [60.8, 78.05]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_10155 Boucle … Mouhoun Ouarko… Rural 1347 187 33.0
#> 2 EA_03955 Boucle … Kossi Doumba… Rural 1558 211 16.6
#> 3 EA_10325 Boucle … Bale Ouri Rural 767 92 17.6
#> 4 EA_03209 Boucle … Mouhoun Dedoug… Rural 446 79 18.1
#> 5 EA_12881 Boucle … Banwa Tansila Rural 1076 137 9.45
#> 6 EA_11613 Boucle … Banwa Sami Rural 912 118 43.2
#> 7 EA_06857 Boucle … Banwa Kouka Rural 1642 189 12.5
#> 8 EA_13730 Boucle … Nayala Toma Rural 973 101 0.88
#> 9 EA_05972 Boucle … Sourou Kiemba… Rural 1359 216 36.6
#> 10 EA_03571 Boucle … Kossi Djibas… Rural 725 86 1.01
#> # ℹ 190 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Control sorting with serpentine ordering (implicit stratification)
sampling_design() |>
draw(n = 100, method = "systematic",
control = serp(region, province)) |>
execute(bfa_eas, seed = 2)
#> # A tbl_sample: 100 × 17
#> # Weights: 149 [149, 149]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00272 Boucle … Bale Bagassi Rural 950 114 1.01
#> 2 EA_11053 Boucle … Bale Pompoi Rural 1686 266 37.8
#> 3 EA_11702 Boucle … Banwa Sanaba Rural 1714 249 30.7
#> 4 EA_12885 Boucle … Banwa Tansila Rural 1050 134 9.63
#> 5 EA_03598 Boucle … Kossi Djibas… Rural 1039 123 9.6
#> 6 EA_08692 Boucle … Kossi Nouna Rural 1056 137 9.48
#> 7 EA_03201 Boucle … Mouhoun Dedoug… Rural 1223 217 17.8
#> 8 EA_12908 Boucle … Mouhoun Tcheri… Rural 1170 140 9.7
#> 9 EA_13998 Boucle … Nayala Yaba Rural 982 132 1.68
#> 10 EA_07196 Boucle … Sourou Lanfie… Rural 1310 191 23.3
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Control sorting with nested (standard) ordering
sampling_design() |>
draw(n = 100, method = "systematic",
control = c(region, province)) |>
execute(bfa_eas, seed = 3)
#> # A tbl_sample: 100 × 17
#> # Weights: 149 [149, 149]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00270 Boucle … Bale Bagassi Rural 1080 130 37.0
#> 2 EA_11051 Boucle … Bale Pompoi Rural 1021 161 35.6
#> 3 EA_11700 Boucle … Banwa Sanaba Rural 1304 189 61.5
#> 4 EA_12883 Boucle … Banwa Tansila Rural 1305 166 32.8
#> 5 EA_03596 Boucle … Kossi Djibas… Rural 1273 151 1.05
#> 6 EA_08690 Boucle … Kossi Nouna Rural 1238 161 36.6
#> 7 EA_03199 Boucle … Mouhoun Dedoug… Rural 859 152 1.28
#> 8 EA_12906 Boucle … Mouhoun Tcheri… Rural 234 28 8.31
#> 9 EA_13996 Boucle … Nayala Yaba Rural 1051 141 17.8
#> 10 EA_07194 Boucle … Sourou Lanfie… Rural 1942 283 0.58
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Combined explicit stratification with control sorting within strata
sampling_design() |>
stratify_by(urban_rural) |>
draw(n = 50, method = "systematic",
control = serp(region, province)) |>
execute(bfa_eas, seed = 25)
#> # A tbl_sample: 100 × 17
#> # Weights: 149 [52.98, 245.02]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_04211 Boucle … Bale Fara Rural 1678 268 9.3
#> 2 EA_12357 Boucle … Banwa Solenzo Rural 908 108 17.5
#> 3 EA_03575 Boucle … Kossi Djibas… Rural 773 92 0.39
#> 4 EA_03126 Boucle … Mouhoun Dedoug… Rural 1146 203 34.6
#> 5 EA_12929 Boucle … Mouhoun Tcheri… Rural 1198 144 10.2
#> 6 EA_05978 Boucle … Sourou Kiemba… Rural 1322 210 71.5
#> 7 EA_08528 Cascades Leraba Nianko… Rural 1001 137 18.7
#> 8 EA_07747 Cascades Comoe Mangod… Rural 1201 144 31.3
#> 9 EA_12532 Cascades Comoe Soubak… Rural 1078 152 39.6
#> 10 EA_00907 Centre-… Kourite… Baskou… Rural 1506 245 26.3
#> # ℹ 90 more rows
#> # ℹ 9 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# PPS with certainty selection (absolute threshold)
# Large EAs selected with certainty, rest sampled with PPS
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_brewer", mos = households,
certainty_size = 800) |>
execute(bfa_eas, seed = 3)
#> # A tbl_sample: 1300 × 18
#> # Weights: 11.36 [1, 100.73]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00377 Boucle … Banwa Balave Rural 1310 206 27.1
#> 2 EA_14000 Boucle … Nayala Yaba Rural 2711 365 45.8
#> 3 EA_03566 Boucle … Kossi Djibas… Rural 1891 224 8.26
#> 4 EA_12903 Boucle … Banwa Tansila Rural 2314 295 9.21
#> 5 EA_03187 Boucle … Mouhoun Dedoug… Rural 1685 299 73.7
#> 6 EA_03189 Boucle … Mouhoun Dedoug… Rural 1847 328 17.6
#> 7 EA_11054 Boucle … Bale Pompoi Rural 1427 225 1.24
#> 8 EA_12435 Boucle … Banwa Solenzo Rural 1567 187 26.1
#> 9 EA_03155 Boucle … Mouhoun Dedoug… Rural 1262 224 18.4
#> 10 EA_03218 Boucle … Mouhoun Dedoug… Rural 1160 206 1.34
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# PPS with certainty selection (proportional threshold)
# EAs with >= 10% of stratum total selected with certainty
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_systematic", mos = households,
certainty_prop = 0.10) |>
execute(bfa_eas, seed = 321)
#> # A tbl_sample: 1300 × 18
#> # Weights: 11.51 [1.64, 69.41]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_00260 Boucle … Bale Bagassi Rural 1815 218 3.88
#> 2 EA_00277 Boucle … Bale Bagassi Rural 1740 209 15.8
#> 3 EA_00469 Boucle … Bale Bana Rural 1608 210 1.51
#> 4 EA_02171 Boucle … Bale Boromo Rural 1644 214 15.3
#> 5 EA_02182 Boucle … Bale Boromo Rural 1597 208 1.89
#> 6 EA_04197 Boucle … Bale Fara Rural 1040 166 7.17
#> 7 EA_04208 Boucle … Bale Fara Rural 2214 354 21.3
#> 8 EA_04222 Boucle … Bale Fara Rural 2148 343 24.8
#> 9 EA_10324 Boucle … Bale Ouri Rural 2006 240 10.8
#> 10 EA_10340 Boucle … Bale Ouri Rural 2128 254 11.1
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>
# Stratum-specific certainty thresholds (data frame)
cert_thresholds <- data.frame(
region = levels(bfa_eas$region),
certainty_size = c(700, 450, 800, 850, 750, 800, 550,
450, 700, 950, 750, 600, 480)
)
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_brewer", mos = households,
certainty_size = cert_thresholds) |>
execute(bfa_eas, seed = 424)
#> # A tbl_sample: 1300 × 18
#> # Weights: 11.43 [1, 57.66]
#> ea_id region province commune urban_rural population households area_km2
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <int> <dbl>
#> 1 EA_14289 Boucle … Nayala Ye Rural 4131 729 36.6
#> 2 EA_13788 Boucle … Sourou Tougan Rural 2148 297 2.73
#> 3 EA_03594 Boucle … Kossi Djibas… Rural 737 87 0.49
#> 4 EA_00369 Boucle … Banwa Balave Rural 2049 323 36.5
#> 5 EA_12081 Boucle … Bale Siby Rural 2045 281 9.5
#> 6 EA_12339 Boucle … Banwa Solenzo Rural 1077 129 20.5
#> 7 EA_02136 Boucle … Mouhoun Bondok… Rural 750 114 33.7
#> 8 EA_06881 Boucle … Banwa Kouka Rural 1229 142 9.3
#> 9 EA_11533 Boucle … Mouhoun Safane Rural 684 97 0.89
#> 10 EA_03146 Boucle … Mouhoun Dedoug… Rural 1266 225 15.5
#> # ℹ 1,290 more rows
#> # ℹ 10 more variables: accessible <lgl>, dist_road_km <dbl>,
#> # food_insecurity_pct <dbl>, cost <dbl>, .weight <dbl>, .sample_id <int>,
#> # .stage <int>, .weight_1 <dbl>, .fpc_1 <int>, .certainty_1 <lgl>