Distribute a total sample size across strata defined by a single
stratification variable, under a fixed total \(n\), target CV, or budget.
When the design uses multiple stratification variables (e.g. region and
urbanicity), cross them into a single variable beforehand so that each row
of frame represents one unique stratum.
Usage
n_alloc(frame, ...)
# Default S3 method
n_alloc(
frame,
domains = NULL,
n = NULL,
cv = NULL,
budget = NULL,
alloc = c("neyman", "optimal", "proportional", "power"),
unit_cost = NULL,
alpha = 0.05,
deff = 1,
resp_rate = 1,
min_n = NULL,
power_q = 0.5,
plan = NULL,
...
)
# S3 method for class 'svyplan_prec'
n_alloc(frame, n = NULL, cv = NULL, budget = NULL, ...)Arguments
- frame
For the default method: a stratum-level data frame describing the population you want to sample. Each row represents one stratum, a subgroup of the population defined by a stratification variable such as region, age group, or urbanicity. The values in this frame typically come from a census, a population register, or a previous survey.
When a design stratifies by several variables at once (e.g. region \(\times\) urbanicity), cross them into a single variable before calling
n_alloc(e.g. withinteraction()) so that each row maps to exactly one population cell.Required columns:
NNumber of units (e.g. households, individuals) in each stratum. These are population counts, not sample sizes. Must be positive and finite.
sdorvarA measure of how spread out the variable of interest is within each stratum. Provide exactly one:
sd: the stratum standard deviation (\(\sqrt{\text{variance}}\)), orvar: the stratum variance.
Both must be non-negative and finite. When all strata have equal variability (or variability is unknown), a constant column (e.g.
sd = 1) yields proportional-to-size allocation.
Optional columns:
stratumA label identifying each stratum (e.g.
"Urban","Rural"). If omitted, row numbers are used. Must be unique, or unique within each domain whendomainsis set.meanorpThe stratum population mean or proportion of the variable of interest. Required when solving for
cv, because the coefficient of variation is defined relative to the mean. Usemeanfor continuous variables andp(in \([0, 1]\)) for binary (yes/no) variables.costPer-unit interviewing cost in each stratum (positive, finite). Set higher values for strata that are more expensive to reach. Defaults to 1 everywhere (equal cost).
max_weightMaximum allowed sampling weight \(N_h / n_h\). Caps how under-represented a stratum can be. Use
NAfor strata without a cap.take_allLogical (or 0/1). If
TRUE, every unit in the stratum is included, a census stratum. Useful for small strata whose total population is tiny enough to enumerate.
For
svyplan_precobjects: a precision result fromprec_alloc().- ...
Additional arguments passed to methods.
- domains
Character vector of column names in
frameto treat as domain identifiers, orNULL(default) for no domains. All names must exist inframe. Domains define sub-populations that each contain one or more strata. Whencvis the target, precision is enforced within every domain (see Details).- n
Total sample size. Specify exactly one of
n,cv, orbudget.- cv
Target coefficient of variation (relative standard error). For example,
cv = 0.05means the standard error of the estimated population mean or total should be at most 5 percent of the estimate. Requiresmeanorpinframe. When domain columns are present, this target is enforced in each domain. Specify exactly one ofn,cv, orbudget.- budget
Total field budget. Specify exactly one of
n,cv, orbudget.- alloc
Allocation rule:
"neyman"(default),"optimal","proportional", or"power".- unit_cost
Optional scalar or length-
nrow(frame)vector of per-stratum unit costs, overridingframe$cost.- alpha
Significance level, default 0.05.
- deff
Design effect multiplier (> 0).
- resp_rate
Expected response rate, in (0, 1]. Default 1.
- min_n
Optional minimum sample size per stratum.
- power_q
Bankier power parameter from 0 to 1, used when
alloc = "power".- plan
Optional
svyplan()object providing design defaults.
Details
Building the frame
The frame is a data frame where each row is one stratum of
your target population. It summarizes what you know about each
subgroup before sampling. A typical workflow:
Identify strata from a census or register (e.g. provinces, urban/rural areas, age groups).
Look up
N: the population count per stratum.Estimate
sd: the standard deviation of your key variable within each stratum (from a pilot survey, a previous census, or expert judgement). If unknown, setsd = 1everywhere for proportional allocation.Add
meanorpif you want to solve for a target CV.
A minimal frame:
When a design stratifies by several variables (e.g. region \(\times\) urbanicity), cross them into one variable first:
This ensures that each row maps to exactly one population cell and that
the allocation formulas apply to the correct per-stratum N and sd pairs.
Domains vs. strata
Domains are specified via the domains parameter. Domain columns
partition strata into sub-populations. Each domain groups
one or more strata. When cv is specified, the algorithm finds the
minimum total \(n\) such that the worst-case domain CV meets the
target, i.e. every domain achieves the required precision.
In n or budget mode, domains affect reporting only: per-domain
precision metrics appear in $domains but the allocation itself treats
all strata globally.
Allocation methods
Allocation is controlled by the alloc parameter (same methods as
strata_bound()):
proportional: \(n_h \propto N_h\)
neyman: \(n_h \propto N_h S_h\)
optimal: \(n_h \propto N_h S_h / \sqrt{c_h}\)
power: Bankier (1988), \(n_h \propto S_h N_h^{power\_q}\)
Stratum allocations are rounded to integers using the ORIC method
(Cont and Heidari, 2015). Constraints (min_n, max_weight, take_all)
are enforced via recursive Neyman allocation (RNA, Wesolowski et al., 2021).
When budget is specified, the algorithm finds the maximum affordable
allocation under unit costs.
References
Valliant, R., Dever, J. A., & Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Chapter 5.
Bankier, M. D. (1988). Power allocations: determining sample sizes for subnational areas. The American Statistician, 42(3), 174–177.
Examples
frame <- data.frame(
stratum = c("A", "B", "C"),
N = c(4000, 3000, 3000),
sd = c(10, 15, 8),
mean = c(50, 60, 55),
cost = c(1, 1.5, 1)
)
n_alloc(frame, n = 600)
#> Stratum allocation (neyman, 3 strata)
#> n = 600, cv = 0.0079, se = 0.4305
n_alloc(frame, cv = 0.03)
#> Stratum allocation (neyman, 3 strata)
#> n = 45, cv = 0.0300, se = 1.6350
frame_constraints <- transform(
frame,
max_weight = c(25, 20, NA),
take_all = c(FALSE, FALSE, TRUE)
)
n_alloc(frame_constraints, budget = 3500, alloc = "optimal", min_n = 40)
#> Stratum allocation (optimal, 3 strata)
#> n = 3404, cv = 0.0076, se = 0.4125
#> (min_n = 40)
frame_domains <- data.frame(
province = c("North", "North", "South", "South"),
stratum = c("Urban", "Rural", "Urban", "Rural"),
N = c(2000, 3000, 1800, 3200),
sd = c(12, 18, 10, 16),
mean = c(55, 48, 58, 50)
)
n_alloc(frame_domains, domains = "province",
cv = 0.04, alloc = "power", power_q = 0.3)
#> Stratum allocation (power, 4 strata)
#> n = 111, cv = 0.0272, se = 1.4076
#> Domains: 2
#> ---
#> province .domain .n .se .moe .cv .cost
#> North North 59.23404 2.032000 3.982647 0.0400 59
#> South South 51.50815 1.948447 3.818886 0.0368 52