Skip to contents

Compute optimal per-stage sample sizes for a multistage cluster design, minimizing cost for a given precision or minimizing variance for a given budget.

Usage

n_cluster(cost, ...)

# Default S3 method
n_cluster(
  cost,
  delta,
  rel_var = 1,
  k = 1,
  cv = NULL,
  budget = NULL,
  n_psu = NULL,
  resp_rate = 1,
  fixed_cost = 0,
  ...
)

# S3 method for class 'svyplan_prec'
n_cluster(cost, cv = NULL, budget = NULL, ...)

Arguments

cost

For the default method: numeric vector of per-stage costs. Length determines the number of stages (2 or 3). For svyplan_prec objects: a precision result from prec_cluster().

...

Additional arguments passed to methods.

delta

Numeric vector of homogeneity measures (length = stages - 1), or a svyplan_varcomp object.

rel_var

Unit relvariance (default 1).

k

Ratio parameter(s). Scalar for 2-stage, length-2 vector for 3-stage (default 1).

cv

Target coefficient of variation. Specify exactly one of cv or budget.

budget

Total budget. Specify exactly one of cv or budget.

n_psu

Fixed number of PSUs (stage-1 sample size). NULL (default) means optimize all stages.

resp_rate

Expected response rate, in (0, 1]. Default 1 (no adjustment). The stage-1 sample size is inflated by 1 / resp_rate.

fixed_cost

Fixed overhead cost (C0). Default 0. The total cost model becomes C = C0 + c1*n_psu + c2*n_psu*psu_size [+ c3*n_psu*psu_size*ssu_size]. In budget mode, only budget - fixed_cost is available for variable costs; in CV mode, fixed_cost is added to the variable cost.

Value

A svyplan_cluster object with components:

n

Named numeric vector of continuous per-stage sample sizes (e.g. c(n_psu = 84.1, psu_size = 13.8)). Use ceiling() for operational (integer) values.

stages

Number of stages (2 or 3).

total_n

Continuous total sample size (prod(n)). Use as.integer() for the operational total (product of ceiled stages), or as.double() for this continuous value.

cv

Achieved coefficient of variation (based on continuous optimum).

cost

Total cost.

params

List of input parameters.

Details

Stage count is determined by length(cost). Two dispatch dimensions:

  • 2-stage vs 3-stage (vector length)

  • budget vs cv mode (which is non-NULL)

When n_psu is specified, stage 1 is fixed and only stage 2+ are optimized.

If delta is a svyplan_varcomp object, delta, rel_var, and k are extracted automatically.

These functions assume sampling fractions are negligible at each stage (equivalent to sampling with replacement). No finite population correction is applied. This is standard for multistage planning when cluster populations are large relative to the sample.

References

Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Ch. 9.

See also

prec_cluster() for the inverse, varcomp() for estimating variance components.

Examples

# 2-stage, budget mode
n_cluster(cost = c(500, 50), delta = 0.05, budget = 100000)
#> Optimal 2-stage allocation
#> n_psu = 85 | psu_size = 14 -> total n = 1190 (unrounded: 1159.1)
#> cv = 0.0376, cost = 100000

# 2-stage, CV mode
n_cluster(cost = c(500, 50), delta = 0.05, cv = 0.05)
#> Optimal 2-stage allocation
#> n_psu = 48 | psu_size = 14 -> total n = 672 (unrounded: 655.681)
#> cv = 0.0500, cost = 56568

# 2-stage, fixed n_psu
n_cluster(cost = c(500, 50), delta = 0.05, budget = 100000, n_psu = 40)
#> Optimal 2-stage allocation
#> n_psu = 40 | psu_size = 40 -> total n = 1600 (unrounded: 1600)
#> cv = 0.0429, cost = 100000

# 3-stage
n_cluster(cost = c(500, 100, 50), delta = c(0.01, 0.05), cv = 0.05)
#> Optimal 3-stage allocation
#> n_psu = 21 | psu_size = 5 | ssu_size = 7 -> total n = 735 (unrounded: 626.5766)
#> cv = 0.0500, cost = 51658

# With fixed overhead cost
n_cluster(cost = c(500, 50), delta = 0.05, budget = 100000, fixed_cost = 5000)
#> Optimal 2-stage allocation
#> n_psu = 80 | psu_size = 14 -> total n = 1120 (unrounded: 1101.145)
#> cv = 0.0386, cost = 100000 (fixed: 5000)