Compute the sample size that satisfies precision requirements for multiple survey indicators simultaneously. Supports simple (single-stage) and multistage cluster designs, with optional domain-level planning.
Usage
n_multi(targets, ...)
# Default S3 method
n_multi(
targets,
stage_cost = NULL,
budget = NULL,
n_psu = NULL,
psu_size = NULL,
ssu_size = NULL,
joint = FALSE,
min_n = NULL,
fixed_cost = 0,
prop_method = "wald",
plan = NULL,
...
)
# S3 method for class 'svyplan_prec'
n_multi(targets, stage_cost = NULL, ...)Arguments
- targets
For the default method: data frame with one row per indicator (see Details). For
svyplan_precobjects: a precision result fromprec_multi().- ...
Additional arguments passed to methods.
- stage_cost
Numeric vector of per-stage costs.
NULL(default) for simple mode; length 2 or 3 for multistage mode.- budget
Total budget (multistage only). Provide either
cvvalues in thetargetsdata frame or abudgethere, not both.- n_psu
Fixed stage-1 sample size (multistage only). For 2-stage, at most one of
n_psuorpsu_sizemay be fixed. For 3-stage, up to two ofn_psu,psu_size,ssu_sizemay be fixed.- psu_size
Fixed cluster size (stage-2 sample size per PSU, multistage only).
NULL(default) means optimize.- ssu_size
Fixed SSU take size (stage-3 sample size per SSU).
NULL(default) means optimize. Only valid for 3-stage designs.- joint
Logical. If
TRUE, optimally split a singlebudgetacross domains to minimize the worst-case CV ratio. Only applies to multistage budget mode with multiple domains; ignored otherwise.- min_n
Numeric scalar or
NULL(default). Minimum total sample size per domain. Only active when domains are present; silently ignored otherwise. In simple mode, per-domain sample sizes are floored tomin_n. In joint multistage mode, domains that would receive fewer thanmin_nobservations are penalized during optimization, with an upfront feasibility check. In non-joint multistage mode, a warning is issued for any domain below the floor.- fixed_cost
Fixed overhead cost (C0). Default 0. Only applies to multistage mode. See
n_cluster()for details.- prop_method
Proportion CI method for simple mode, one of
"wald"(default),"wilson", or"logodds". This is passed ton_prop()for proportion rows and ignored for mean rows and multistage mode. An optionalprop_methodcolumn intargetsoverrides this default on a per-row basis.- plan
Optional
svyplan()object providing design defaults (stage_cost,fixed_cost).
Value
A svyplan_n object (simple mode) or svyplan_cluster object
(multistage mode).
Without domains, the object contains:
nSample size (simple) or named per-stage allocation vector (multistage, e.g.
c(n_psu = 80, psu_size = 12)).detailPer-indicator results (sample sizes or achieved CVs).
bindingName or index of the binding (most demanding) indicator.
targetsThe input targets data frame.
With domains, the object additionally contains:
nMaximum per-stage sample size across domains. In simple mode, a single number; in multistage mode, a named vector (e.g.
c(n_psu = 120, psu_size = 15)) giving the conservative allocation that satisfies all domains.domainsData frame with one row per domain, including domain variable columns, per-stage allocations (
n_psu,psu_size, ...), and summary columns (.total_n,.cv,.cost,.bindingfor multistage;.n,.bindingfor simple mode). Use this for stratum-specific allocations.total_nTotal sample size summed across all domains (multistage only).
costTotal cost summed across all domains (multistage only).
Details
The targets data frame supports the following columns:
nameIndicator label (optional).
pExpected proportion, in (0, 1). One of
porvarper row.varPopulation variance. One of
porvarper row.muPopulation mean magnitude (positive). Required when
varis specified withcv.moeMargin of error (simple mode).
cvTarget coefficient of variation (either mode).
alphaSignificance level (default 0.05).
deffDesign effect multiplier (simple mode only, default 1).
NPopulation size (simple mode only, default Inf).
prop_methodOptional proportion method for simple mode, one of
"wald","wilson", or"logodds". Only used for rows withp; ignored for mean rows and multistage mode.delta_psu,delta_ssuHomogeneity measures (multistage).
rel_varUnit relvariance. If omitted, derived from
porvar/mu.k_psu,k_ssuRatio parameters (multistage, default 1).
resp_rateExpected response rate, in (0, 1]. Default 1 (no adjustment). Inflates the required sample size to account for non-response.
Any column not in the recognized set is treated as a domain variable.
When domain columns are present, optimization runs independently per
domain combination (default), or jointly when joint = TRUE.
Simple mode (stage_cost = NULL): computes sample size per indicator
by delegating proportion rows to n_prop() and mean rows to n_mean(),
then takes the maximum per domain. Use prop_method or a
targets$prop_method column to choose "wald", "wilson", or
"logodds" for proportion rows.
Multistage mode (stage_cost provided): uses analytical reduction.
For each candidate sub-stage allocation, the required stage-1 size is
the maximum across all indicators. The total cost is then minimized
(CV mode) or the worst-case CV ratio is minimized (budget mode) using
numerical optimization.
Boundary and near-boundary homogeneity values are not supported by the
analytical multistage optimum. When delta is near 0, most variability is
within clusters, so the optimum collapses toward many interviews in very
few PSUs. When delta is near 1, most variability is between clusters, so
the optimum collapses toward very few interviews in many PSUs. To stay
aligned with n_cluster(), multistage n_multi() rejects values
numerically too close to 0 or 1.
Joint budget allocation (joint = TRUE): when domains and a budget
are specified, the default (joint = FALSE) gives each domain the full
budget independently. With joint = TRUE, a single budget is split
optimally across domains using L-BFGS-B optimization of budget fractions,
minimizing the worst-case CV ratio across all domains.
These functions assume sampling fractions are negligible at each stage (equivalent to sampling with replacement). No finite population correction is applied. This is standard for multistage planning when cluster populations are large relative to the sample.
References
Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Wiley.
Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer.
See also
n_prop(), n_mean() for single-indicator sizing;
n_cluster() for single-indicator multistage allocation;
prec_multi() for the inverse.
Examples
# Simple mode: three indicators, take the max
targets <- data.frame(
name = c("stunting", "vaccination", "anemia"),
p = c(0.30, 0.70, 0.10),
moe = c(0.05, 0.05, 0.03)
)
n_multi(targets)
#> Multi-indicator sample size
#> n = 385 (binding: anemia)
#> ---
#> name .n .binding
#> stunting 323
#> vaccination 323
#> anemia 385 *
# MICS/DHS-style: specify precision as a relative margin of error (RME).
# RME = moe / p, so convert with moe = RME * p before calling n_multi().
rme <- 0.12
targets_rme <- data.frame(
name = c("stunting", "vaccination", "anemia"),
p = c(0.30, 0.70, 0.10),
deff = c(2.0, 1.5, 2.5)
)
targets_rme$moe <- rme * targets_rme$p
n_multi(targets_rme)
#> Multi-indicator sample size
#> n = 6003 (binding: anemia)
#> ---
#> name .n .binding
#> stunting 1245
#> vaccination 172
#> anemia 6003 *
# Rare proportion: use Wilson globally in simple mode
n_multi(targets[3, , drop = FALSE], prop_method = "wilson")
#> Multi-indicator sample size
#> n = 388 (binding: anemia)
#> ---
#> name .n .binding
#> anemia 388 *
# Per-row proportion methods in a mixed target table
targets_mixed <- data.frame(
name = c("rare_prop", "mean_ind"),
p = c(0.05, NA),
var = c(NA, 100),
moe = c(0.02, 2),
prop_method = c("wilson", NA)
)
n_multi(targets_mixed)
#> Multi-indicator sample size
#> n = 469 (binding: rare_prop)
#> ---
#> name .n .binding
#> rare_prop 469 *
#> mean_ind 97
# Simple mode with domains
targets_dom <- data.frame(
name = rep(c("stunting", "anemia"), each = 2),
p = c(0.30, 0.25, 0.10, 0.15),
moe = c(0.05, 0.05, 0.03, 0.03),
region = rep(c("North", "South"), 2)
)
n_multi(targets_dom)
#> Treating column(s) ‘region’ as domain variable(s)
#> Multi-indicator sample size (2 domains)
#> n = 545 (binding: anemia)
#> ---
#> region .n .binding
#> North 385 anemia
#> South 545 anemia
# Two-stage CV mode
targets_cl <- data.frame(
name = c("stunting", "anemia"),
p = c(0.30, 0.10),
cv = c(0.10, 0.15),
delta_psu = c(0.02, 0.05)
)
n_multi(targets_cl, stage_cost = c(500, 50))
#> Multi-indicator optimal allocation (2-stage)
#> n_psu = 48 | psu_size = 14 -> total n = 672 (unrounded: 655.6811)
#> cv = 0.1500, cost = 56568 (binding: anemia)
#> ---
#> name .cv_target .cv_achieved .binding
#> stunting 0.10 0.0668
#> anemia 0.15 0.1500 *
# Joint budget allocation across domains
targets_jnt <- data.frame(
name = rep(c("stunting", "anemia"), each = 2),
p = c(0.30, 0.25, 0.10, 0.15),
cv = c(0.10, 0.10, 0.15, 0.15),
delta_psu = c(0.02, 0.03, 0.05, 0.04),
region = rep(c("Urban", "Rural"), 2)
)
n_multi(targets_jnt, stage_cost = c(500, 50), budget = 100000, joint = TRUE)
#> Treating column(s) ‘region’ as domain variable(s)
#> Multi-indicator optimal allocation (2-stage, 2 domains, joint)
#> ---
#> Total n = 1232 (unrounded: 1209)
#> region n_psu psu_size .total_n .cv .cost .binding
#> Rural 28 18 504 0.0958 38379 stunting
#> Urban 52 14 728 0.1437 61621 anemia