draw() specifies how units are selected: sample size, sampling fraction,
selection method, and measure of size for PPS sampling. Every stage in a
sampling design must end with draw().
draw(
.data,
n = NULL,
frac = NULL,
min_n = NULL,
max_n = NULL,
method = "srswor",
mos = NULL,
round = "up",
control = NULL,
certainty_size = NULL,
certainty_prop = NULL
)A sampling_design object (piped from sampling_design(),
stratify_by(), or cluster_by()).
Sample size. Can be:
A scalar: applies per stratum (if no alloc) or as total (if alloc specified)
A named vector: stratum-specific sizes (for single stratification variable)
A data frame: stratum-specific sizes with stratification columns + n column
Sampling fraction. Can be:
A scalar: same fraction for all strata
A named vector: stratum-specific fractions
A data frame: stratum-specific fractions with stratification columns + frac column
Only one of n or frac should be specified.
Minimum sample size per stratum. When an allocation method
(e.g., Neyman, proportional) would assign fewer than min_n units to a
stratum, that stratum receives min_n units instead. The excess is
redistributed proportionally among strata that were above min_n.
Commonly set to 2 (minimum for variance estimation) or higher for
reliable subgroup estimates. Only applies when stratification with an
allocation method is used. Default is NULL (no minimum).
Maximum sample size per stratum. When an allocation method
would assign more than max_n units to a stratum, that stratum is
capped at max_n units. The surplus is redistributed proportionally
among strata that were below max_n. Useful for capping dominant strata
or managing operational constraints. Only applies when stratification
with an allocation method is used. Default is NULL (no maximum).
Character string specifying the selection method. One of:
Equal probability methods:
"srswor" (default): Simple random sampling without replacement
"srswr": Simple random sampling with replacement
"systematic": Systematic (fixed interval) sampling
"bernoulli": Independent Bernoulli trials (random sample size)
PPS methods (require mos):
"pps_systematic": PPS systematic sampling
"pps_brewer": Generalized Brewer (Tillé) method
"pps_maxent": Maximum entropy / conditional Poisson
"pps_poisson": PPS Poisson sampling (random sample size)
"pps_multinomial": PPS multinomial (with replacement, any hit count)
"pps_chromy": Chromy's sequential PPS (minimum replacement)
<data-masking> Measure of size
variable for PPS methods. Required for all pps_* methods.
Rounding method when converting frac to sample sizes.
One of:
"up" (default): Round up (ceiling). Matches SAS SURVEYSELECT default.
"down": Round down (floor).
"nearest": Round to nearest integer (standard rounding).
This parameter only affects designs using frac to specify the sampling
rate. When n is specified directly, no rounding occurs.
<data-masking> Variables for
sorting the frame before selection. Control sorting provides implicit
stratification, which is particularly effective with systematic and
sequential sampling methods. Can be:
A single variable: control = region
Multiple variables: control = c(region, district)
With serp() for serpentine sorting: control = serp(region, district)
With dplyr::desc() for descending: control = c(region, desc(population))
Mixed: control = c(region, serp(district, commune), desc(size))
When stratification is also specified, control sorting is applied within each stratum. See the section "Control Sorting" below for details.
For PPS methods, units with MOS >= this value are selected with certainty (probability = 1). Can be:
A scalar: same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
certainty_size column
Certainty units are removed from the frame before probability sampling,
and the remaining sample size is reduced accordingly.
Mutually exclusive with certainty_prop.
Equivalent to SAS SURVEYSELECT CERTSIZE= option.
For PPS methods, units whose MOS proportion (MOS_i / sum(MOS)) >= this value are selected with certainty. Can be:
A scalar between 0 and 1 (exclusive): same threshold for all strata
A data frame: stratum-specific thresholds with stratification columns
certainty_prop column
Uses iterative selection: after removing certainty units, proportions are
recomputed and the check is repeated until no new units qualify.
Mutually exclusive with certainty_size.
Equivalent to SAS SURVEYSELECT CERTSIZE=P= option.
A modified sampling_design object with selection parameters specified.
| Method | Replacement | Sample Size | Notes |
srswor | Without | Fixed | Standard SRS |
srswr | With | Fixed | Allows duplicates |
systematic | Without | Fixed | Periodic selection |
bernoulli | Without | Random | Each unit selected independently |
| Method | Replacement | Sample Size | Notes |
pps_systematic | Without | Fixed | Simple, some bias |
pps_brewer | Without | Fixed | Fast, π_ij > 0 |
pps_maxent | Without | Fixed | Highest entropy, π_ij available |
pps_poisson | Without | Random | PPS analog of Bernoulli |
pps_multinomial | With | Fixed | Any hit count, Hansen-Hurwitz |
pps_chromy | Min. repl. | Fixed | SAS default PPS_SEQ |
| Method | n | frac | mos |
srswor | ✓ | or ✓ | — |
srswr | ✓ | or ✓ | — |
systematic | ✓ | or ✓ | — |
bernoulli | — | ✓ | — |
pps_systematic | ✓ | or ✓ | ✓ |
pps_brewer | ✓ | or ✓ | ✓ |
pps_maxent | ✓ | — | ✓ |
pps_poisson | — | ✓ | ✓ |
pps_multinomial | ✓ | or ✓ | ✓ |
pps_chromy | ✓ | or ✓ | ✓ |
Methods with fixed sample size (srswor, srswr, systematic, pps_systematic,
pps_brewer, pps_maxent, pps_multinomial) accept either n or frac. When frac
is provided, the sample size is computed based on the round parameter (default: ceiling).
Methods with random sample size (bernoulli, pps_poisson) require frac only.
These methods perform independent selection trials for each unit, so the final sample
size is a random variable—not a fixed count. Specifying n would be misleading since
the method cannot guarantee exactly n selections.
For stratum-specific sample sizes or rates, pass a data frame to n or frac.
The data frame must contain:
All stratification variable columns (matching those in stratify_by())
An n column (for sizes) or frac column (for rates)
In PPS sampling, very large units can have theoretical inclusion probabilities
exceeding 1. Certainty selection handles this by selecting such units with
probability 1 before sampling the remainder. The output includes a .certainty_k
column (where k is the stage number) indicating which units were certainty selections.
For stratum-specific thresholds, pass a data frame containing:
All stratification variable columns
A certainty_size or certainty_prop column
Control sorting orders the sampling frame before selection, providing implicit
stratification. This is particularly effective with systematic and sequential
methods (systematic, pps_systematic, pps_chromy), where it ensures the
sample spreads evenly across the sorted variables.
Serpentine vs Nested Sorting:
Nested (default): Standard ascending sort by each variable in order.
Use control = c(var1, var2, var3).
Serpentine: Alternating direction that minimizes "jumps" between
adjacent units. Use control = serp(var1, var2, var3).
Serpentine sorting makes nearby observations more similar by reversing direction at each hierarchy level. For geographic hierarchies, this means the last district of region 1 is adjacent to the last district of region 2.
Combining with Explicit Stratification:
When both stratify_by() and control are used, sorting is applied within
each stratum. This allows explicit stratification for variance control
combined with implicit stratification for sample spread.
sampling_design() for creating designs,
stratify_by() for stratification,
cluster_by() for clustering,
execute() for running designs,
serp() for serpentine sorting
# Simple random sample of 100 facilities
sampling_design() |>
draw(n = 100) |>
execute(kenya_health, seed = 1)
#> == tbl_sample ==
#> Weights: 30.98 - 30.98 (mean: 30.98 )
#>
#> # A tibble: 100 × 14
#> facility_id region county urban_rural facility_type beds staff_count
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 KE_16_0054 Eastern Meru Rural Health Centre 9 13
#> 2 KE_12_0043 Eastern Embu Rural Health Centre 9 11
#> 3 KE_28_0066 Rift Valley Baringo Rural County Hospi… 96 57
#> 4 KE_15_0048 Eastern Makueni Rural Dispensary 2 3
#> 5 KE_19_0002 North Eastern Garissa Rural Sub-County H… 57 15
#> 6 KE_11_0015 Coast Lamu Urban Health Centre 18 13
#> 7 KE_30_0075 Rift Valley Kericho Urban Clinic 2 7
#> 8 KE_04_0056 Central Nyanda… Urban Clinic 3 6
#> 9 KE_18_0081 Nairobi Nairobi Urban Health Centre 14 11
#> 10 KE_10_0004 Coast Tana R… Rural Dispensary 2 4
#> # ℹ 90 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Systematic sample of 10%
sampling_design() |>
draw(frac = 0.10, method = "systematic") |>
execute(kenya_health, seed = 123)
#> == tbl_sample ==
#> Weights: 9.99 - 9.99 (mean: 9.99 )
#>
#> # A tibble: 310 × 14
#> facility_id region county urban_rural facility_type beds staff_count
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 KE_01_0003 Central Kiambu Rural Clinic 4 5
#> 2 KE_01_0013 Central Kiambu Rural Clinic 2 8
#> 3 KE_01_0023 Central Kiambu Rural Dispensary 2 4
#> 4 KE_01_0033 Central Kiambu Rural Clinic 3 4
#> 5 KE_01_0043 Central Kiambu Urban Dispensary 2 3
#> 6 KE_01_0053 Central Kiambu Rural Dispensary 1 6
#> 7 KE_01_0063 Central Kiambu Rural Health Centre 16 13
#> 8 KE_02_0003 Central Kirinyaga Urban Dispensary 3 4
#> 9 KE_02_0013 Central Kirinyaga Urban Dispensary 1 5
#> 10 KE_02_0023 Central Kirinyaga Rural Dispensary 2 4
#> # ℹ 300 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# PPS sample of schools using enrollment
sampling_design() |>
cluster_by(school_id) |>
draw(n = 50, method = "pps_brewer", mos = enrollment) |>
execute(tanzania_schools, seed = 42)
#> == tbl_sample ==
#> Weights: 15.14 - 135.75 (mean: 44.21 )
#>
#> # A tibble: 50 × 15
#> school_id region district school_level ownership enrollment n_teachers
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl>
#> 1 TZ_04_0011 Arusha Arusha … Primary Governme… 722 16
#> 2 TZ_04_0022 Arusha Arusha … Primary Governme… 396 8
#> 3 TZ_04_0091 Arusha Arusha … Primary Governme… 484 11
#> 4 TZ_05_0095 Arusha Arusha … Primary Private 148 4
#> 5 TZ_06_0078 Arusha Meru Primary Private 453 11
#> 6 TZ_07_0034 Arusha Monduli Primary Governme… 446 10
#> 7 TZ_07_0041 Arusha Monduli Primary Governme… 920 19
#> 8 TZ_01_0129 Dar es Sala… Ilala Primary Private 389 8
#> 9 TZ_02_0071 Dar es Sala… Kinondo… Primary Governme… 381 11
#> 10 TZ_03_0027 Dar es Sala… Temeke Primary Governme… 317 7
#> # ℹ 40 more rows
#> # ℹ 8 more variables: has_electricity <lgl>, has_water <lgl>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# Bernoulli sampling (random sample size, expected ~5%)
sampling_design() |>
draw(frac = 0.05, method = "bernoulli") |>
execute(nigeria_business, seed = 1234)
#> == tbl_sample ==
#> Weights: 20 - 20 (mean: 20 )
#>
#> # A tibble: 563 × 12
#> enterprise_id zone state sector size_class employees annual_turnover .weight
#> * <chr> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 NG_01_00039 Nort… Benue Servi… Micro 3 6228000 20
#> 2 NG_01_00113 Nort… Benue Hospi… Micro 3 8684000 20
#> 3 NG_01_00123 Nort… Benue Whole… Micro 3 5940000 20
#> 4 NG_01_00131 Nort… Benue Manuf… Small 13 41599000 20
#> 5 NG_07_00004 Nort… FCT … Whole… Small 6 35743000 20
#> 6 NG_07_00011 Nort… FCT … Servi… Small 7 11020000 20
#> 7 NG_07_00018 Nort… FCT … Const… Small 13 18812000 20
#> 8 NG_07_00020 Nort… FCT … Trans… Micro 4 6466000 20
#> 9 NG_07_00047 Nort… FCT … Servi… Large 1222 1548058000 20
#> 10 NG_07_00054 Nort… FCT … Servi… Medium 56 155155000 20
#> # ℹ 553 more rows
#> # ℹ 4 more variables: .sample_id <int>, .stage <int>, .weight_1 <dbl>,
#> # .fpc_1 <int>
# Stratified with different sizes per stratum (data frame)
facility_sizes <- data.frame(
facility_type = c("Clinic", "Dispensary", "Health Centre",
"Sub-County Hospital", "County Hospital",
"Referral Hospital", "Maternity Home"),
n = c(30, 40, 35, 25, 20, 10, 15)
)
sampling_design() |>
stratify_by(facility_type) |>
draw(n = facility_sizes) |>
execute(kenya_health, seed = 123)
#> == tbl_sample ==
#> Weights: 3.3 - 35.5 (mean: 17.7 )
#>
#> # A tibble: 175 × 14
#> facility_type facility_id region county urban_rural beds staff_count
#> * <fct> <chr> <fct> <fct> <fct> <dbl> <dbl>
#> 1 Referral Hospital KE_37_0042 Western Busia Rural 175 123
#> 2 Referral Hospital KE_24_0041 Nyanza Kisumu Rural 240 101
#> 3 Referral Hospital KE_26_0065 Nyanza Nyami… Rural 305 153
#> 4 Referral Hospital KE_23_0059 Nyanza Kisii Rural 254 258
#> 5 Referral Hospital KE_03_0047 Central Muran… Urban 420 118
#> 6 Referral Hospital KE_17_0066 Eastern Thara… Rural 201 128
#> 7 Referral Hospital KE_26_0055 Nyanza Nyami… Rural 163 336
#> 8 Referral Hospital KE_28_0078 Rift Vall… Barin… Rural 372 146
#> 9 Referral Hospital KE_18_0051 Nairobi Nairo… Urban 419 221
#> 10 Referral Hospital KE_04_0035 Central Nyand… Rural 221 124
#> # ℹ 165 more rows
#> # ℹ 7 more variables: outpatient_visits <dbl>, ownership <fct>, .weight <dbl>,
#> # .sample_id <int>, .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Stratified with different rates per stratum (named vector)
sampling_design() |>
stratify_by(size_class) |>
draw(frac = c(Micro = 0.01, Small = 0.05, Medium = 0.20, Large = 0.50)) |>
execute(nigeria_business, seed = 42)
#> == tbl_sample ==
#> Weights: 2 - 99.15 (mean: 14.72 )
#>
#> # A tibble: 789 × 12
#> size_class enterprise_id zone state sector employees annual_turnover .weight
#> * <fct> <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 Micro NG_27_00194 Sout… Baye… Manuf… 3 7576000 99.2
#> 2 Micro NG_33_00528 Sout… Lagos Hospi… 2 4142000 99.2
#> 3 Micro NG_26_00035 Sout… Akwa… Servi… 4 7047000 99.2
#> 4 Micro NG_34_00596 Sout… Ogun Retai… 4 9600000 99.2
#> 5 Micro NG_15_00004 Nort… Kadu… Retai… 3 7316000 99.2
#> 6 Micro NG_17_00008 Nort… Kats… Trans… 2 5477000 99.2
#> 7 Micro NG_06_00045 Nort… Plat… Hospi… 4 10406000 99.2
#> 8 Micro NG_24_00141 Sout… Enugu Trans… 2 3140000 99.2
#> 9 Micro NG_34_00549 Sout… Ogun Manuf… 3 5295000 99.2
#> 10 Micro NG_35_00120 Sout… Ondo Hospi… 1 2816000 99.2
#> # ℹ 779 more rows
#> # ℹ 4 more variables: .sample_id <int>, .stage <int>, .weight_1 <dbl>,
#> # .fpc_1 <int>
# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
stratify_by(region, alloc = "neyman", variance = niger_eas_variance) |>
draw(n = 150, min_n = 2) |>
execute(niger_eas, seed = 2026)
#> == tbl_sample ==
#> Weights: 7.29 - 13 (mean: 10.24 )
#>
#> # A tibble: 150 × 11
#> region ea_id department strata hh_count pop_estimate .weight .sample_id
#> * <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Agadez Aga_03_0002 Bilma Urban 279 1395 7.29 1
#> 2 Agadez Aga_03_0006 Bilma Rural 98 490 7.29 2
#> 3 Agadez Aga_04_0001 Tchirozér… Rural 87 609 7.29 3
#> 4 Agadez Aga_04_0008 Tchirozér… Rural 76 456 7.29 4
#> 5 Agadez Aga_04_0010 Tchirozér… Rural 37 185 7.29 5
#> 6 Agadez Aga_02_0014 Arlit Urban 142 710 7.29 6
#> 7 Agadez Aga_04_0007 Tchirozér… Urban 121 847 7.29 7
#> 8 Diffa Dif_06_0003 Mainé-Sor… Rural 56 448 13 8
#> 9 Diffa Dif_06_0015 Mainé-Sor… Rural 93 558 13 9
#> 10 Diffa Dif_08_0005 Bosso Rural 66 462 13 10
#> # ℹ 140 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Proportional allocation with min and max bounds
sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 200, min_n = 10, max_n = 50) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 5.1 - 8.03 (mean: 7.68 )
#>
#> # A tibble: 200 × 11
#> region ea_id department strata hh_count pop_estimate .weight .sample_id
#> * <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Agadez Aga_04_0012 Tchirozér… Urban 431 2586 5.1 1
#> 2 Agadez Aga_03_0010 Bilma Urban 259 1554 5.1 2
#> 3 Agadez Aga_01_0001 Agadez Rural 59 413 5.1 3
#> 4 Agadez Aga_02_0012 Arlit Rural 137 959 5.1 4
#> 5 Agadez Aga_01_0010 Agadez Urban 192 1344 5.1 5
#> 6 Agadez Aga_03_0009 Bilma Rural 41 287 5.1 6
#> 7 Agadez Aga_02_0005 Arlit Urban 138 966 5.1 7
#> 8 Agadez Aga_02_0011 Arlit Rural 70 350 5.1 8
#> 9 Agadez Aga_01_0007 Agadez Rural 166 1162 5.1 9
#> 10 Agadez Aga_04_0009 Tchirozér… Rural 107 856 5.1 10
#> # ℹ 190 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Control sorting with serpentine ordering (implicit stratification)
sampling_design() |>
draw(n = 100, method = "systematic",
control = serp(region, department)) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 15.36 - 15.36 (mean: 15.36 )
#>
#> # A tibble: 100 × 11
#> ea_id region department strata hh_count pop_estimate .weight .sample_id
#> * <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Aga_02_0002 Agadez Arlit Rural 63 315 15.4 1
#> 2 Aga_03_0003 Agadez Bilma Urban 154 770 15.4 2
#> 3 Aga_04_0008 Agadez Tchirozér… Rural 76 456 15.4 3
#> 4 Dif_07_0010 Diffa N'Guigmi Rural 65 325 15.4 4
#> 5 Dif_06_0008 Diffa Mainé-Sor… Rural 46 322 15.4 5
#> 6 Dif_05_0008 Diffa Diffa Rural 36 288 15.4 6
#> 7 Dif_08_0006 Diffa Bosso Urban 185 1480 15.4 7
#> 8 Dos_10_0006 Dosso Boboye Rural 39 312 15.4 8
#> 9 Dos_10_0021 Dosso Boboye Urban 162 810 15.4 9
#> 10 Dos_11_0002 Dosso Dogondout… Rural 43 258 15.4 10
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Control sorting with nested (standard) ordering
sampling_design() |>
draw(n = 100, method = "systematic",
control = c(region, department)) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 15.36 - 15.36 (mean: 15.36 )
#>
#> # A tibble: 100 × 11
#> ea_id region department strata hh_count pop_estimate .weight .sample_id
#> * <chr> <fct> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Aga_02_0002 Agadez Arlit Rural 63 315 15.4 1
#> 2 Aga_03_0003 Agadez Bilma Urban 154 770 15.4 2
#> 3 Aga_04_0008 Agadez Tchirozér… Rural 76 456 15.4 3
#> 4 Dif_08_0010 Diffa Bosso Rural 86 602 15.4 4
#> 5 Dif_05_0010 Diffa Diffa Rural 54 378 15.4 5
#> 6 Dif_06_0007 Diffa Mainé-Sor… Rural 53 318 15.4 6
#> 7 Dif_07_0008 Diffa N'Guigmi Rural 83 581 15.4 7
#> 8 Dos_10_0006 Dosso Boboye Rural 39 312 15.4 8
#> 9 Dos_10_0021 Dosso Boboye Urban 162 810 15.4 9
#> 10 Dos_11_0002 Dosso Dogondout… Rural 43 258 15.4 10
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# Combined explicit stratification with control sorting within strata
sampling_design() |>
stratify_by(strata) |>
draw(n = 50, method = "systematic",
control = serp(region, department)) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 6.62 - 24.1 (mean: 15.36 )
#>
#> # A tibble: 100 × 11
#> strata ea_id region department hh_count pop_estimate .weight .sample_id
#> * <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Urban Aga_02_0001 Agadez Arlit 128 640 6.62 1
#> 2 Urban Aga_03_0002 Agadez Bilma 279 1395 6.62 2
#> 3 Urban Aga_04_0005 Agadez Tchirozér… 75 525 6.62 3
#> 4 Urban Dif_06_0010 Diffa Mainé-Sor… 119 714 6.62 4
#> 5 Urban Dos_10_0021 Dosso Boboye 162 810 6.62 5
#> 6 Urban Dos_12_0023 Dosso Gaya 101 707 6.62 6
#> 7 Urban Mar_20_0038 Maradi Tessaoua 204 1224 6.62 7
#> 8 Urban Mar_14_0026 Maradi Maradi 130 780 6.62 8
#> 9 Urban Mar_18_0022 Maradi Madarounfa 100 700 6.62 9
#> 10 Urban Mar_17_0010 Maradi Guidan-Ro… 120 720 6.62 10
#> # ℹ 90 more rows
#> # ℹ 3 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>
# PPS with certainty selection (absolute threshold)
# Large EAs selected with certainty, rest sampled with PPS
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_brewer", mos = hh_count,
certainty_size = 500) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1 - 9.71 (mean: 2.17 )
#>
#> # A tibble: 716 × 12
#> region ea_id department strata hh_count pop_estimate .weight .sample_id
#> * <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Agadez Aga_01_0001 Agadez Rural 59 413 1 1
#> 2 Agadez Aga_01_0002 Agadez Urban 157 942 1 2
#> 3 Agadez Aga_01_0003 Agadez Urban 124 868 1 3
#> 4 Agadez Aga_01_0004 Agadez Rural 146 1022 1 4
#> 5 Agadez Aga_01_0005 Agadez Urban 112 896 1 5
#> 6 Agadez Aga_01_0006 Agadez Rural 182 1092 1 6
#> 7 Agadez Aga_01_0007 Agadez Rural 166 1162 1 7
#> 8 Agadez Aga_01_0008 Agadez Urban 54 432 1 8
#> 9 Agadez Aga_01_0009 Agadez Rural 97 582 1 9
#> 10 Agadez Aga_01_0010 Agadez Urban 192 1344 1 10
#> # ℹ 706 more rows
#> # ℹ 4 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# PPS with certainty selection (proportional threshold)
# EAs with >= 10% of stratum total selected with certainty
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_systematic", mos = hh_count,
certainty_prop = 0.10) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1 - 13.19 (mean: 2.14 )
#>
#> # A tibble: 716 × 12
#> region ea_id department strata hh_count pop_estimate .weight .sample_id
#> * <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Agadez Aga_01_0001 Agadez Rural 59 413 1 1
#> 2 Agadez Aga_01_0002 Agadez Urban 157 942 1 2
#> 3 Agadez Aga_01_0003 Agadez Urban 124 868 1 3
#> 4 Agadez Aga_01_0004 Agadez Rural 146 1022 1 4
#> 5 Agadez Aga_01_0005 Agadez Urban 112 896 1 5
#> 6 Agadez Aga_01_0006 Agadez Rural 182 1092 1 6
#> 7 Agadez Aga_01_0007 Agadez Rural 166 1162 1 7
#> 8 Agadez Aga_01_0008 Agadez Urban 54 432 1 8
#> 9 Agadez Aga_01_0009 Agadez Rural 97 582 1 9
#> 10 Agadez Aga_01_0010 Agadez Urban 192 1344 1 10
#> # ℹ 706 more rows
#> # ℹ 4 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>
# Stratum-specific certainty thresholds (data frame)
cert_thresholds <- data.frame(
region = c("Agadez", "Diffa", "Dosso", "Maradi",
"Niamey", "Tahoua", "Tillaberi", "Zinder"),
certainty_size = c(1000, 500, 600, 700, 300, 800, 650, 750)
)
sampling_design() |>
stratify_by(region) |>
draw(n = 100, method = "pps_brewer", mos = hh_count,
certainty_size = cert_thresholds) |>
execute(niger_eas, seed = 42)
#> == tbl_sample ==
#> Weights: 1 - 9.71 (mean: 2.17 )
#>
#> # A tibble: 716 × 12
#> region ea_id department strata hh_count pop_estimate .weight .sample_id
#> * <fct> <chr> <fct> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Agadez Aga_01_0001 Agadez Rural 59 413 1 1
#> 2 Agadez Aga_01_0002 Agadez Urban 157 942 1 2
#> 3 Agadez Aga_01_0003 Agadez Urban 124 868 1 3
#> 4 Agadez Aga_01_0004 Agadez Rural 146 1022 1 4
#> 5 Agadez Aga_01_0005 Agadez Urban 112 896 1 5
#> 6 Agadez Aga_01_0006 Agadez Rural 182 1092 1 6
#> 7 Agadez Aga_01_0007 Agadez Rural 166 1162 1 7
#> 8 Agadez Aga_01_0008 Agadez Urban 54 432 1 8
#> 9 Agadez Aga_01_0009 Agadez Rural 97 582 1 9
#> 10 Agadez Aga_01_0010 Agadez Urban 192 1344 1 10
#> # ℹ 706 more rows
#> # ℹ 4 more variables: .stage <int>, .weight_1 <dbl>, .fpc_1 <int>,
#> # .certainty_1 <lgl>