Compute optimal per-stage sample sizes for a multistage cluster design, minimizing cost for a given precision or minimizing variance for a given budget.
Usage
n_cluster(stage_cost, ...)
# Default S3 method
n_cluster(
stage_cost = NULL,
delta = NULL,
rel_var = 1,
k = 1,
cv = NULL,
budget = NULL,
n_psu = NULL,
psu_size = NULL,
ssu_size = NULL,
resp_rate = 1,
fixed_cost = 0,
plan = NULL,
...
)
# S3 method for class 'svyplan_prec'
n_cluster(stage_cost, cv = NULL, budget = NULL, ...)Arguments
- stage_cost
For the default method: numeric vector of per-stage costs. Length determines the number of stages (2 or 3). Named vectors are accepted with stage names
cost_psu,cost_ssu,cost_tsu(cost_tsualiasescost_ssuin 2-stage). Forsvyplan_precobjects: a precision result fromprec_cluster().- ...
Additional arguments passed to methods.
- delta
Numeric vector of homogeneity measures (length = stages - 1), or a
svyplan_varcompobject.- rel_var
Unit relvariance (default 1).
- k
Ratio parameter(s). Scalar for 2-stage, length-2 vector for 3-stage (default 1).
- cv
Target coefficient of variation. Specify exactly one of
cvorbudget.- budget
Total budget. Specify exactly one of
cvorbudget.- n_psu
Fixed number of PSUs (stage-1 sample size).
NULL(default) means optimize. For 2-stage, at most one ofn_psuorpsu_sizemay be specified. For 3-stage, up to two ofn_psu,psu_size,ssu_sizemay be fixed.- psu_size
Fixed cluster size (stage-2 sample size per PSU).
NULL(default) means optimize. This is the typical MICS/DHS parameterization where the number of households per cluster is fixed.- ssu_size
Fixed SSU take size (stage-3 sample size per SSU).
NULL(default) means optimize. Only valid for 3-stage designs.- resp_rate
Expected response rate, in (0, 1]. Default 1 (no adjustment). The stage-1 sample size is inflated by
1 / resp_rate.- fixed_cost
Fixed overhead cost (C0). Default 0. The total cost model becomes
C = C0 + c1*n_psu + c2*n_psu*psu_size [+ c3*n_psu*psu_size*ssu_size]. In budget mode, onlybudget - fixed_costis available for variable costs; in CV mode,fixed_costis added to the variable cost.- plan
Optional
svyplan()object providing design defaults (includingstage_cost,delta,rel_var,k,resp_rate,fixed_cost).
Value
A svyplan_cluster object with components:
nNamed numeric vector of continuous per-stage sample sizes (e.g.
c(n_psu = 84.1, psu_size = 13.8)). Useceiling()for operational (integer) values.stagesNumber of stages (2 or 3).
total_nContinuous total sample size (
prod(n)). Useas.integer()for the operational total (product of ceiled stages), oras.double()for this continuous value.cvAchieved coefficient of variation (based on continuous optimum).
costTotal cost.
paramsList of input parameters.
Details
Stage count is determined by length(stage_cost). Two dispatch dimensions:
2-stage vs 3-stage (vector length)
budget vs cv mode (which is non-NULL)
One or more stage sizes can be fixed, leaving the remaining stage(s) to be optimized or derived from the constraint. For 2-stage designs, at most one stage may be fixed. For 3-stage designs, up to two stages may be fixed; the remaining free stage is derived from the budget or CV constraint.
If delta is a svyplan_varcomp object, delta, rel_var, and k
are extracted automatically.
Boundary and near-boundary homogeneity values are not supported by the
analytical optimum used here. When delta is near 0, most variability is
within PSUs, so the closed-form optimum collapses toward taking many units
in very few PSUs. When delta is near 1, most variability is between PSUs,
so the optimum collapses toward taking very few units in many PSUs. In both
cases the analytical allocation becomes degenerate, so n_cluster()
rejects values numerically too close to 0 or 1.
These functions assume sampling fractions are negligible at each stage (equivalent to sampling with replacement). No finite population correction is applied. This is standard for multistage planning when cluster populations are large relative to the sample.
References
Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Ch. 9.
See also
prec_cluster() for the inverse, varcomp() for estimating
variance components.
Examples
# 2-stage, budget mode
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000)
#> Optimal 2-stage allocation
#> n_psu = 85 | psu_size = 14 -> total n = 1190 (unrounded: 1159.1)
#> cv = 0.0376, cost = 100000
# 2-stage, CV mode
n_cluster(stage_cost = c(500, 50), delta = 0.05, cv = 0.05)
#> Optimal 2-stage allocation
#> n_psu = 48 | psu_size = 14 -> total n = 672 (unrounded: 655.681)
#> cv = 0.0500, cost = 56568
# 2-stage, fixed n_psu
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000, n_psu = 40)
#> Optimal 2-stage allocation
#> n_psu = 40 | psu_size = 40 -> total n = 1600 (unrounded: 1600)
#> cv = 0.0429, cost = 100000
# 2-stage, fixed psu_size (MICS/DHS style: 20 households per cluster)
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000, psu_size = 20)
#> Optimal 2-stage allocation
#> n_psu = 67 | psu_size = 20 -> total n = 1340 (unrounded: 1333.333)
#> cv = 0.0382, cost = 100000
# 3-stage
n_cluster(stage_cost = c(500, 100, 50), delta = c(0.01, 0.05), cv = 0.05)
#> Optimal 3-stage allocation
#> n_psu = 21 | psu_size = 5 | ssu_size = 7 -> total n = 735 (unrounded: 626.5766)
#> cv = 0.0500, cost = 51658
# 3-stage, fixed n_psu + ssu_size (solve for psu_size)
n_cluster(
stage_cost = c(500, 100, 50), delta = c(0.01, 0.05),
budget = 500000, n_psu = 50, ssu_size = 8
)
#> Optimal 3-stage allocation
#> n_psu = 50 | psu_size = 19 | ssu_size = 8 -> total n = 7600 (unrounded: 7600)
#> cv = 0.0194, cost = 500000
# With fixed overhead cost
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000, fixed_cost = 5000)
#> Optimal 2-stage allocation
#> n_psu = 80 | psu_size = 14 -> total n = 1120 (unrounded: 1101.145)
#> cv = 0.0386, cost = 100000 (fixed: 5000)