draw() specifies how units are selected: sample size, sampling fraction,
selection method, and measure of size for PPS sampling. Every stage in a
sampling design must end with draw().
draw(
.data,
n = NULL,
frac = NULL,
min_n = NULL,
max_n = NULL,
method = "srswor",
mos = NULL
)A sampling_design object (piped from sampling_design(),
stratify_by(), or cluster_by()).
Sample size. Can be:
A scalar: applies per stratum (if no alloc) or as total (if alloc specified)
A named vector: stratum-specific sizes (for single stratification variable)
A data frame: stratum-specific sizes with stratification columns + n column
Sampling fraction. Can be:
A scalar: same fraction for all strata
A named vector: stratum-specific fractions
A data frame: stratum-specific fractions with stratification columns + frac column
Only one of n or frac should be specified.
Minimum sample size per stratum. When an allocation method
(e.g., Neyman, proportional) would assign fewer than min_n units to a
stratum, that stratum receives min_n units instead. The excess is
redistributed proportionally among strata that were above min_n.
Commonly set to 2 (minimum for variance estimation) or higher for
reliable subgroup estimates. Only applies when stratification with an
allocation method is used. Default is NULL (no minimum).
Maximum sample size per stratum. When an allocation method
would assign more than max_n units to a stratum, that stratum is
capped at max_n units. The surplus is redistributed proportionally
among strata that were below max_n. Useful for capping dominant strata
or managing operational constraints. Only applies when stratification
with an allocation method is used. Default is NULL (no maximum).
Character string specifying the selection method. One of:
Equal probability methods:
"srswor" (default): Simple random sampling without replacement
"srswr": Simple random sampling with replacement
"systematic": Systematic (fixed interval) sampling
"bernoulli": Independent Bernoulli trials (random sample size)
PPS methods (require mos):
"pps_systematic": PPS systematic sampling
"pps_brewer": Generalized Brewer (Tillé) method
"pps_maxent": Maximum entropy / conditional Poisson
"pps_poisson": PPS Poisson sampling (random sample size)
"pps_multinomial": PPS multinomial (with replacement)
<data-masking> Measure of size
variable for PPS methods. Required for all pps_* methods.
A modified sampling_design object with selection parameters specified.
| Method | n | frac | mos |
srswor | ✓ | or ✓ | — |
srswr | ✓ | or ✓ | — |
systematic | ✓ | or ✓ | — |
bernoulli | — | ✓ | — |
pps_systematic | ✓ | or ✓ | ✓ |
pps_brewer | ✓ | or ✓ | ✓ |
pps_maxent | ✓ | — | ✓ |
pps_poisson | — | ✓ | ✓ |
pps_multinomial | ✓ | or ✓ | ✓ |
Methods with fixed sample size (srswor, srswr, systematic, pps_systematic,
pps_brewer, pps_maxent, pps_multinomial) accept either n or frac. When frac
is provided, the sample size is computed as ceiling(N * frac).
Methods with random sample size (bernoulli, pps_poisson) require frac only.
These methods perform independent selection trials for each unit, so the final sample
size is a random variable—not a fixed count. Specifying n would be misleading since
the method cannot guarantee exactly n selections.
For stratum-specific sample sizes or rates, pass a data frame to n or frac.
The data frame must contain:
All stratification variable columns (matching those in stratify_by())
An n column (for sizes) or frac column (for rates)
sampling_design() for creating designs,
stratify_by() for stratification,
cluster_by() for clustering,
execute() for running designs
if (FALSE) { # \dontrun{
# Simple random sample
sampling_design() |>
draw(n = 100) |>
execute(frame, seed = 42)
# Systematic sample of 10%
sampling_design() |>
draw(frac = 0.10, method = "systematic") |>
execute(frame, seed = 42)
# PPS sample
sampling_design() |>
cluster_by(school_id) |>
draw(n = 50, method = "pps_brewer", mos = enrollment) |>
execute(school_frame, seed = 42)
# Bernoulli sampling (random sample size)
sampling_design() |>
draw(frac = 0.05, method = "bernoulli") |>
execute(frame, seed = 42)
# Stratified with different sizes per stratum (data frame)
sizes_df <- data.frame(
region = c("North", "South", "East", "West"),
n = c(100, 200, 150, 100)
)
sampling_design() |>
stratify_by(region) |>
draw(n = sizes_df) |>
execute(frame, seed = 42)
# Stratified with different rates per stratum (named vector)
sampling_design() |>
stratify_by(region) |>
draw(frac = c(North = 0.1, South = 0.2, East = 0.15, West = 0.1)) |>
execute(frame, seed = 42)
# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
stratify_by(region, alloc = "neyman", variance = var_df) |>
draw(n = 500, min_n = 2) |>
execute(frame, seed = 42)
# Proportional allocation with min and max bounds
sampling_design() |>
stratify_by(region, alloc = "proportional") |>
draw(n = 1000, min_n = 20, max_n = 300) |>
execute(frame, seed = 42)
} # }