draw() specifies how units are selected: sample size, sampling fraction, selection method, and measure of size for PPS sampling. Every stage in a sampling design must end with draw().

draw(
  .data,
  n = NULL,
  frac = NULL,
  min_n = NULL,
  max_n = NULL,
  method = "srswor",
  mos = NULL
)

Arguments

.data

A sampling_design object (piped from sampling_design(), stratify_by(), or cluster_by()).

n

Sample size. Can be:

  • A scalar: applies per stratum (if no alloc) or as total (if alloc specified)

  • A named vector: stratum-specific sizes (for single stratification variable)

  • A data frame: stratum-specific sizes with stratification columns + n column

frac

Sampling fraction. Can be:

  • A scalar: same fraction for all strata

  • A named vector: stratum-specific fractions

  • A data frame: stratum-specific fractions with stratification columns + frac column Only one of n or frac should be specified.

min_n

Minimum sample size per stratum. When an allocation method (e.g., Neyman, proportional) would assign fewer than min_n units to a stratum, that stratum receives min_n units instead. The excess is redistributed proportionally among strata that were above min_n. Commonly set to 2 (minimum for variance estimation) or higher for reliable subgroup estimates. Only applies when stratification with an allocation method is used. Default is NULL (no minimum).

max_n

Maximum sample size per stratum. When an allocation method would assign more than max_n units to a stratum, that stratum is capped at max_n units. The surplus is redistributed proportionally among strata that were below max_n. Useful for capping dominant strata or managing operational constraints. Only applies when stratification with an allocation method is used. Default is NULL (no maximum).

method

Character string specifying the selection method. One of:

Equal probability methods:

  • "srswor" (default): Simple random sampling without replacement

  • "srswr": Simple random sampling with replacement

  • "systematic": Systematic (fixed interval) sampling

  • "bernoulli": Independent Bernoulli trials (random sample size)

PPS methods (require mos):

  • "pps_systematic": PPS systematic sampling

  • "pps_brewer": Generalized Brewer (Tillé) method

  • "pps_maxent": Maximum entropy / conditional Poisson

  • "pps_poisson": PPS Poisson sampling (random sample size)

  • "pps_multinomial": PPS multinomial (with replacement)

mos

<data-masking> Measure of size variable for PPS methods. Required for all pps_* methods.

Value

A modified sampling_design object with selection parameters specified.

Details

Selection Methods

Equal Probability Methods

MethodReplacementSample SizeNotes
srsworWithoutFixedStandard SRS
srswrWithFixedAllows duplicates
systematicWithoutFixedPeriodic selection
bernoulliWithoutRandomEach unit selected independently

PPS Methods

MethodReplacementSample SizeNotes
pps_systematicWithoutFixedSimple, some bias
pps_brewerWithoutFixedFast, π_ij > 0
pps_maxentWithoutFixedHighest entropy, π_ij available
pps_poissonWithoutRandomPPS analog of Bernoulli
pps_multinomialWithFixedAllows duplicates

Parameter Requirements

Methodnfracmos
srsworor ✓
srswror ✓
systematicor ✓
bernoulli
pps_systematicor ✓
pps_breweror ✓
pps_maxent
pps_poisson
pps_multinomialor ✓

Fixed vs Random Sample Size Methods

Methods with fixed sample size (srswor, srswr, systematic, pps_systematic, pps_brewer, pps_maxent, pps_multinomial) accept either n or frac. When frac is provided, the sample size is computed as ceiling(N * frac).

Methods with random sample size (bernoulli, pps_poisson) require frac only. These methods perform independent selection trials for each unit, so the final sample size is a random variable—not a fixed count. Specifying n would be misleading since the method cannot guarantee exactly n selections.

Custom Allocation with Data Frames

For stratum-specific sample sizes or rates, pass a data frame to n or frac. The data frame must contain:

  • All stratification variable columns (matching those in stratify_by())

  • An n column (for sizes) or frac column (for rates)

See also

sampling_design() for creating designs, stratify_by() for stratification, cluster_by() for clustering, execute() for running designs

Examples

if (FALSE) { # \dontrun{
# Simple random sample
sampling_design() |>
  draw(n = 100) |>
  execute(frame, seed = 42)

# Systematic sample of 10%
sampling_design() |>
  draw(frac = 0.10, method = "systematic") |>
  execute(frame, seed = 42)

# PPS sample
sampling_design() |>
  cluster_by(school_id) |>
  draw(n = 50, method = "pps_brewer", mos = enrollment) |>
  execute(school_frame, seed = 42)

# Bernoulli sampling (random sample size)
sampling_design() |>
  draw(frac = 0.05, method = "bernoulli") |>
  execute(frame, seed = 42)

# Stratified with different sizes per stratum (data frame)
sizes_df <- data.frame(
  region = c("North", "South", "East", "West"),
  n = c(100, 200, 150, 100)
)
sampling_design() |>
  stratify_by(region) |>
  draw(n = sizes_df) |>
  execute(frame, seed = 42)

# Stratified with different rates per stratum (named vector)
sampling_design() |>
  stratify_by(region) |>
  draw(frac = c(North = 0.1, South = 0.2, East = 0.15, West = 0.1)) |>
  execute(frame, seed = 42)

# Neyman allocation with minimum 2 per stratum (for variance estimation)
sampling_design() |>
  stratify_by(region, alloc = "neyman", variance = var_df) |>
  draw(n = 500, min_n = 2) |>
  execute(frame, seed = 42)

# Proportional allocation with min and max bounds
sampling_design() |>
  stratify_by(region, alloc = "proportional") |>

  draw(n = 1000, min_n = 20, max_n = 300) |>
  execute(frame, seed = 42)
} # }