Distribute a total sample size across strata defined by a single
stratification variable, under a fixed total \(n\), target CV, or budget.
When the design uses multiple stratification variables (e.g. region and
urbanicity), cross them into a single variable beforehand so that each row
of frame represents one unique stratum.
Usage
n_alloc(frame, ...)
# Default S3 method
n_alloc(
frame,
n = NULL,
cv = NULL,
budget = NULL,
alloc = c("neyman", "optimal", "proportional", "power"),
unit_cost = NULL,
alpha = 0.05,
deff = 1,
resp_rate = 1,
min_n = NULL,
power_q = 0.5,
plan = NULL,
...
)
# S3 method for class 'svyplan_prec'
n_alloc(frame, n = NULL, cv = NULL, budget = NULL, ...)Arguments
- frame
For the default method: data frame with one row per stratum. Each row describes a population stratum defined by a single stratification variable (or a cross of several variables collapsed into one). When multiple classification variables exist, build the crossed strata before calling
n_alloc(e.g. withinteraction()).Required columns:
N_hPopulation size per stratum (positive finite integer or numeric).
S_horvarStratum standard deviation or variance (non-negative finite). Provide exactly one.
Optional columns:
stratumStratum label. If omitted, row numbers are used. Must be unique (or unique within each domain when domain columns are present).
mean_horp_hStratum mean or proportion. Required when solving for
cv. Whenp_his used it must lie in \([0, 1]\).cost_hPer-unit cost in each stratum (positive finite). Defaults to 1 everywhere.
max_weightMaximum sampling weight \(N_h / n_h\). Use
NAfor unconstrained strata.take_allLogical (or 0/1). If
TRUE, the entire stratum is included (census stratum).
Domain columns: any column not listed above is treated as a domain identifier. Domains define sub-populations that each contain one or more strata. When
cvis the target, precision is enforced within every domain (see Details). For example, aprovincecolumn would make each province a separate domain whose strata are the rows that share the same province value.For
svyplan_precobjects: a precision result fromprec_alloc().- ...
Additional arguments passed to methods.
- n
Total sample size. Specify exactly one of
n,cv, orbudget.- cv
Target coefficient of variation (requires
mean_horp_hinframe). When domain columns are present, this target is enforced in each domain. Specify exactly one ofn,cv, orbudget.- budget
Total field budget. Specify exactly one of
n,cv, orbudget.- alloc
Allocation rule:
"neyman"(default),"optimal","proportional", or"power".- unit_cost
Optional scalar or length-
nrow(frame)vector of per-stratum unit costs, overridingframe$cost_h.- alpha
Significance level, default 0.05.
- deff
Design effect multiplier (> 0).
- resp_rate
Expected response rate, in (0, 1]. Default 1.
- min_n
Optional minimum sample size per stratum.
- power_q
Bankier power parameter from 0 to 1, used when
alloc = "power".- plan
Optional
svyplan()object providing design defaults.
Details
Frame structure
Each row of frame represents one stratum of a single stratification
variable. When a design stratifies by several variables (e.g. region
\(\times\) urbanicity), cross them into one variable first:
This ensures that each row maps to exactly one population cell and that the allocation formulas apply to the correct \(N_h\), \(S_h\) pairs.
Domains vs. strata
Domain columns partition strata into sub-populations. Each domain groups
one or more strata. When cv is specified, the algorithm finds the
minimum total \(n\) such that the worst-case domain CV meets the
target — i.e. every domain achieves the required precision.
In n or budget mode, domains affect reporting only: per-domain
precision metrics appear in $domains but the allocation itself treats
all strata globally.
Allocation methods
Allocation is controlled by the alloc parameter (same methods as
strata_bound()):
proportional: \(n_h \propto N_h\)
neyman: \(n_h \propto N_h S_h\)
optimal: \(n_h \propto N_h S_h / \sqrt{c_h}\)
power: Bankier (1988), \(n_h \propto S_h N_h^{power\_q}\)
Stratum allocations are rounded to integers using the ORIC method
(Cont and Heidari, 2015). Constraints (min_n, max_weight, take_all)
are enforced via recursive Neyman allocation (RNA, Wesolowski et al., 2021).
When budget is specified, the algorithm finds the maximum affordable
allocation under unit costs.
References
Valliant, R., Dever, J. A., & Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Chapter 5.
Bankier, M. D. (1988). Power allocations: determining sample sizes for subnational areas. The American Statistician, 42(3), 174–177.
Examples
frame <- data.frame(
stratum = c("A", "B", "C"),
N_h = c(4000, 3000, 3000),
S_h = c(10, 15, 8),
mean_h = c(50, 60, 55),
cost_h = c(1, 1.5, 1)
)
n_alloc(frame, n = 600)
#> Stratum allocation (neyman, 3 strata)
#> n = 600, cv = 0.0079, se = 0.4305
n_alloc(frame, cv = 0.03)
#> Stratum allocation (neyman, 3 strata)
#> n = 45, cv = 0.0300, se = 1.6350
frame_constraints <- transform(
frame,
max_weight = c(25, 20, NA),
take_all = c(FALSE, FALSE, TRUE)
)
n_alloc(frame_constraints, budget = 3500, alloc = "optimal", min_n = 40)
#> Stratum allocation (optimal, 3 strata)
#> n = 3404, cv = 0.0076, se = 0.4125
#> (min_n = 40)
frame_domains <- data.frame(
province = c("North", "North", "South", "South"),
stratum = c("Urban", "Rural", "Urban", "Rural"),
N_h = c(2000, 3000, 1800, 3200),
S_h = c(12, 18, 10, 16),
mean_h = c(55, 48, 58, 50)
)
n_alloc(frame_domains, cv = 0.04, alloc = "power", power_q = 0.3)
#> Stratum allocation (power, 4 strata)
#> n = 111, cv = 0.0272, se = 1.4076
#> Domains: 2
#> ---
#> province .domain .n .se .moe .cv .cost
#> North North 59.23404 2.032000 3.982647 0.0400 59
#> South South 51.50815 1.948447 3.818886 0.0368 52