Skip to contents

Distribute a total sample size across strata defined by a single stratification variable, under a fixed total \(n\), target CV, or budget. When the design uses multiple stratification variables (e.g. region and urbanicity), cross them into a single variable beforehand so that each row of frame represents one unique stratum.

Usage

n_alloc(frame, ...)

# Default S3 method
n_alloc(
  frame,
  domains = NULL,
  n = NULL,
  cv = NULL,
  budget = NULL,
  alloc = c("neyman", "optimal", "proportional", "power"),
  unit_cost = NULL,
  alpha = 0.05,
  deff = 1,
  resp_rate = 1,
  min_n = NULL,
  power_q = 0.5,
  plan = NULL,
  ...
)

# S3 method for class 'svyplan_prec'
n_alloc(frame, n = NULL, cv = NULL, budget = NULL, ...)

Arguments

frame

For the default method: a stratum-level data frame describing the population you want to sample. Each row represents one stratum, a subgroup of the population defined by a stratification variable such as region, age group, or urbanicity. The values in this frame typically come from a census, a population register, or a previous survey.

When a design stratifies by several variables at once (e.g. region \(\times\) urbanicity), cross them into a single variable before calling n_alloc (e.g. with interaction()) so that each row maps to exactly one population cell.

Required columns:

N

Number of units (e.g. households, individuals) in each stratum. These are population counts, not sample sizes. Must be positive and finite.

sd or var

A measure of how spread out the variable of interest is within each stratum. Provide exactly one:

  • sd: the stratum standard deviation (\(\sqrt{\text{variance}}\)), or

  • var: the stratum variance.

Both must be non-negative and finite. When all strata have equal variability (or variability is unknown), a constant column (e.g. sd = 1) yields proportional-to-size allocation.

Optional columns:

stratum

A label identifying each stratum (e.g. "Urban", "Rural"). If omitted, row numbers are used. Must be unique, or unique within each domain when domains is set.

mean or p

The stratum population mean or proportion of the variable of interest. Required when solving for cv, because the coefficient of variation is defined relative to the mean. Use mean for continuous variables and p (in \([0, 1]\)) for binary (yes/no) variables.

cost

Per-unit interviewing cost in each stratum (positive, finite). Set higher values for strata that are more expensive to reach. Defaults to 1 everywhere (equal cost).

max_weight

Maximum allowed sampling weight \(N_h / n_h\). Caps how under-represented a stratum can be. Use NA for strata without a cap.

take_all

Logical (or 0/1). If TRUE, every unit in the stratum is included, a census stratum. Useful for small strata whose total population is tiny enough to enumerate.

For svyplan_prec objects: a precision result from prec_alloc().

...

Additional arguments passed to methods.

domains

Character vector of column names in frame to treat as domain identifiers, or NULL (default) for no domains. All names must exist in frame. Domains define sub-populations that each contain one or more strata. When cv is the target, precision is enforced within every domain (see Details).

n

Total sample size. Specify exactly one of n, cv, or budget.

cv

Target coefficient of variation (relative standard error). For example, cv = 0.05 means the standard error of the estimated population mean or total should be at most 5 percent of the estimate. Requires mean or p in frame. When domain columns are present, this target is enforced in each domain. Specify exactly one of n, cv, or budget.

budget

Total field budget. Specify exactly one of n, cv, or budget.

alloc

Allocation rule: "neyman" (default), "optimal", "proportional", or "power".

unit_cost

Optional scalar or length-nrow(frame) vector of per-stratum unit costs, overriding frame$cost.

alpha

Significance level, default 0.05.

deff

Design effect multiplier (> 0).

resp_rate

Expected response rate, in (0, 1]. Default 1.

min_n

Optional minimum sample size per stratum.

power_q

Bankier power parameter from 0 to 1, used when alloc = "power".

plan

Optional svyplan() object providing design defaults.

Value

A svyplan_n object with type = "alloc" and a stratum-level allocation table in $detail.

Details

Building the frame

The frame is a data frame where each row is one stratum of your target population. It summarizes what you know about each subgroup before sampling. A typical workflow:

  1. Identify strata from a census or register (e.g. provinces, urban/rural areas, age groups).

  2. Look up N: the population count per stratum.

  3. Estimate sd: the standard deviation of your key variable within each stratum (from a pilot survey, a previous census, or expert judgement). If unknown, set sd = 1 everywhere for proportional allocation.

  4. Add mean or p if you want to solve for a target CV.

A minimal frame:

frame <- data.frame(
  stratum = c("Urban", "Rural"),
  N       = c(50000, 120000),
  sd      = c(12, 20)
)

When a design stratifies by several variables (e.g. region \(\times\) urbanicity), cross them into one variable first:

frame$stratum <- interaction(frame$region, frame$urban, drop = TRUE)

This ensures that each row maps to exactly one population cell and that the allocation formulas apply to the correct per-stratum N and sd pairs.

Domains vs. strata

Domains are specified via the domains parameter. Domain columns partition strata into sub-populations. Each domain groups one or more strata. When cv is specified, the algorithm finds the minimum total \(n\) such that the worst-case domain CV meets the target, i.e. every domain achieves the required precision.

In n or budget mode, domains affect reporting only: per-domain precision metrics appear in $domains but the allocation itself treats all strata globally.

Allocation methods

Allocation is controlled by the alloc parameter (same methods as strata_bound()):

  • proportional: \(n_h \propto N_h\)

  • neyman: \(n_h \propto N_h S_h\)

  • optimal: \(n_h \propto N_h S_h / \sqrt{c_h}\)

  • power: Bankier (1988), \(n_h \propto S_h N_h^{power\_q}\)

Stratum allocations are rounded to integers using the ORIC method (Cont and Heidari, 2015). Constraints (min_n, max_weight, take_all) are enforced via recursive Neyman allocation (RNA, Wesolowski et al., 2021).

When budget is specified, the algorithm finds the maximum affordable allocation under unit costs.

References

Valliant, R., Dever, J. A., & Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Chapter 5.

Bankier, M. D. (1988). Power allocations: determining sample sizes for subnational areas. The American Statistician, 42(3), 174–177.

Examples

frame <- data.frame(
  stratum = c("A", "B", "C"),
  N    = c(4000, 3000, 3000),
  sd   = c(10, 15, 8),
  mean = c(50, 60, 55),
  cost = c(1, 1.5, 1)
)

n_alloc(frame, n = 600)
#> Stratum allocation (neyman, 3 strata)
#> n = 600, cv = 0.0079, se = 0.4305
n_alloc(frame, cv = 0.03)
#> Stratum allocation (neyman, 3 strata)
#> n = 45, cv = 0.0300, se = 1.6350

frame_constraints <- transform(
  frame,
  max_weight = c(25, 20, NA),
  take_all = c(FALSE, FALSE, TRUE)
)

n_alloc(frame_constraints, budget = 3500, alloc = "optimal", min_n = 40)
#> Stratum allocation (optimal, 3 strata)
#> n = 3404, cv = 0.0076, se = 0.4125
#> (min_n = 40)

frame_domains <- data.frame(
  province = c("North", "North", "South", "South"),
  stratum = c("Urban", "Rural", "Urban", "Rural"),
  N    = c(2000, 3000, 1800, 3200),
  sd   = c(12, 18, 10, 16),
  mean = c(55, 48, 58, 50)
)

n_alloc(frame_domains, domains = "province",
       cv = 0.04, alloc = "power", power_q = 0.3)
#> Stratum allocation (power, 4 strata)
#> n = 111, cv = 0.0272, se = 1.4076
#> Domains: 2
#> ---
#>  province .domain .n       .se      .moe     .cv    .cost
#>  North    North   59.23404 2.032000 3.982647 0.0400 59   
#>  South    South   51.50815 1.948447 3.818886 0.0368 52