Skip to contents

Compute optimal per-stage sample sizes for a multistage cluster design, minimizing cost for a given precision or minimizing variance for a given budget.

Usage

n_cluster(stage_cost, ...)

# Default S3 method
n_cluster(
  stage_cost = NULL,
  delta = NULL,
  rel_var = 1,
  k = 1,
  cv = NULL,
  budget = NULL,
  n_psu = NULL,
  psu_size = NULL,
  ssu_size = NULL,
  resp_rate = 1,
  fixed_cost = 0,
  plan = NULL,
  ...
)

# S3 method for class 'svyplan_prec'
n_cluster(stage_cost, cv = NULL, budget = NULL, ...)

Arguments

stage_cost

For the default method: numeric vector of per-stage costs. Length determines the number of stages (2 or 3). Named vectors are accepted with stage names cost_psu, cost_ssu, cost_tsu (cost_tsu aliases cost_ssu in 2-stage). For svyplan_prec objects: a precision result from prec_cluster().

...

Additional arguments passed to methods.

delta

Numeric vector of homogeneity measures (length = stages - 1), or a svyplan_varcomp object.

rel_var

Unit relvariance (default 1).

k

Ratio parameter(s). Scalar for 2-stage, length-2 vector for 3-stage (default 1).

cv

Target coefficient of variation. Specify exactly one of cv or budget.

budget

Total budget. Specify exactly one of cv or budget.

n_psu

Fixed number of PSUs (stage-1 sample size). NULL (default) means optimize. For 2-stage, at most one of n_psu or psu_size may be specified. For 3-stage, up to two of n_psu, psu_size, ssu_size may be fixed.

psu_size

Fixed cluster size (stage-2 sample size per PSU). NULL (default) means optimize. This is the typical MICS/DHS parameterization where the number of households per cluster is fixed.

ssu_size

Fixed SSU take size (stage-3 sample size per SSU). NULL (default) means optimize. Only valid for 3-stage designs.

resp_rate

Expected response rate, in (0, 1]. Default 1 (no adjustment). The stage-1 sample size is inflated by 1 / resp_rate.

fixed_cost

Fixed overhead cost (C0). Default 0. The total cost model becomes C = C0 + c1*n_psu + c2*n_psu*psu_size [+ c3*n_psu*psu_size*ssu_size]. In budget mode, only budget - fixed_cost is available for variable costs; in CV mode, fixed_cost is added to the variable cost.

plan

Optional svyplan() object providing design defaults (including stage_cost, delta, rel_var, k, resp_rate, fixed_cost).

Value

A svyplan_cluster object with components:

n

Named numeric vector of continuous per-stage sample sizes (e.g. c(n_psu = 84.1, psu_size = 13.8)). Use ceiling() for operational (integer) values.

stages

Number of stages (2 or 3).

total_n

Continuous total sample size (prod(n)). Use as.integer() for the operational total (product of ceiled stages), or as.double() for this continuous value.

cv

Achieved coefficient of variation (based on continuous optimum).

cost

Total cost.

params

List of input parameters.

Details

Stage count is determined by length(stage_cost). Two dispatch dimensions:

  • 2-stage vs 3-stage (vector length)

  • budget vs cv mode (which is non-NULL)

One or more stage sizes can be fixed, leaving the remaining stage(s) to be optimized or derived from the constraint. For 2-stage designs, at most one stage may be fixed. For 3-stage designs, up to two stages may be fixed; the remaining free stage is derived from the budget or CV constraint.

If delta is a svyplan_varcomp object, delta, rel_var, and k are extracted automatically.

Boundary and near-boundary homogeneity values are not supported by the analytical optimum used here. When delta is near 0, most variability is within PSUs, so the closed-form optimum collapses toward taking many units in very few PSUs. When delta is near 1, most variability is between PSUs, so the optimum collapses toward taking very few units in many PSUs. In both cases the analytical allocation becomes degenerate, so n_cluster() rejects values numerically too close to 0 or 1.

These functions assume sampling fractions are negligible at each stage (equivalent to sampling with replacement). No finite population correction is applied. This is standard for multistage planning when cluster populations are large relative to the sample.

References

Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Ch. 9.

See also

prec_cluster() for the inverse, varcomp() for estimating variance components.

Examples

# 2-stage, budget mode
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000)
#> Optimal 2-stage allocation
#> n_psu = 85 | psu_size = 14 -> total n = 1190 (unrounded: 1159.1)
#> cv = 0.0376, cost = 100000

# 2-stage, CV mode
n_cluster(stage_cost = c(500, 50), delta = 0.05, cv = 0.05)
#> Optimal 2-stage allocation
#> n_psu = 48 | psu_size = 14 -> total n = 672 (unrounded: 655.681)
#> cv = 0.0500, cost = 56568

# 2-stage, fixed n_psu
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000, n_psu = 40)
#> Optimal 2-stage allocation
#> n_psu = 40 | psu_size = 40 -> total n = 1600 (unrounded: 1600)
#> cv = 0.0429, cost = 100000

# 2-stage, fixed psu_size (MICS/DHS style: 20 households per cluster)
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000, psu_size = 20)
#> Optimal 2-stage allocation
#> n_psu = 67 | psu_size = 20 -> total n = 1340 (unrounded: 1333.333)
#> cv = 0.0382, cost = 100000

# 3-stage
n_cluster(stage_cost = c(500, 100, 50), delta = c(0.01, 0.05), cv = 0.05)
#> Optimal 3-stage allocation
#> n_psu = 21 | psu_size = 5 | ssu_size = 7 -> total n = 735 (unrounded: 626.5766)
#> cv = 0.0500, cost = 51658

# 3-stage, fixed n_psu + ssu_size (solve for psu_size)
n_cluster(
  stage_cost = c(500, 100, 50), delta = c(0.01, 0.05),
  budget = 500000, n_psu = 50, ssu_size = 8
)
#> Optimal 3-stage allocation
#> n_psu = 50 | psu_size = 19 | ssu_size = 8 -> total n = 7600 (unrounded: 7600)
#> cv = 0.0194, cost = 500000

# With fixed overhead cost
n_cluster(stage_cost = c(500, 50), delta = 0.05, budget = 100000, fixed_cost = 5000)
#> Optimal 2-stage allocation
#> n_psu = 80 | psu_size = 14 -> total n = 1120 (unrounded: 1101.145)
#> cv = 0.0386, cost = 100000 (fixed: 5000)