Skip to contents

Compute the sample size that satisfies precision requirements for multiple survey indicators simultaneously. Supports simple (single-stage) and multistage cluster designs, with optional domain-level planning.

Usage

n_multi(targets, ...)

# Default S3 method
n_multi(
  targets,
  domains = NULL,
  stage_cost = NULL,
  budget = NULL,
  n_psu = NULL,
  psu_size = NULL,
  ssu_size = NULL,
  joint = FALSE,
  min_n = NULL,
  fixed_cost = 0,
  prop_method = "wald",
  plan = NULL,
  ...
)

# S3 method for class 'svyplan_prec'
n_multi(targets, stage_cost = NULL, ...)

Arguments

targets

For the default method: a data frame where each row is one survey indicator you want to measure. For example, a prevalence (proportion) or a population mean. Surveys typically track several indicators simultaneously and the sample must be large enough for the most demanding one; n_multi finds that size.

See the Details section for the full column reference. At minimum, each row needs:

  • What to measure: p for a proportion (e.g. 0.30 for 30\ population variance. Each row must use exactly one.

  • How precise: moe (margin of error) or cv (coefficient of variation). Each row must specify exactly one. Both are accepted in simple and multistage modes; in multistage mode, moe values are converted to cv internally (see Details).

For svyplan_prec objects: a precision result from prec_multi().

...

Additional arguments passed to methods.

domains

Character vector of column names in targets to treat as domain variables, or NULL (default) for no domains. All names must exist in targets. When specified, optimization runs independently per domain combination (default), or jointly when joint = TRUE.

stage_cost

Numeric vector of per-stage costs. NULL (default) for simple mode; length 2 or 3 for multistage mode.

budget

Total budget (multistage only). Provide either cv values in the targets data frame or a budget here, not both.

n_psu

Fixed stage-1 sample size (multistage only). For 2-stage, at most one of n_psu or psu_size may be fixed. For 3-stage, up to two of n_psu, psu_size, ssu_size may be fixed.

psu_size

Fixed cluster size (stage-2 sample size per PSU, multistage only). NULL (default) means optimize.

ssu_size

Fixed SSU take size (stage-3 sample size per SSU). NULL (default) means optimize. Only valid for 3-stage designs.

joint

Logical. If TRUE, optimally split a single budget across domains to minimize the worst-case CV ratio. Only applies to multistage budget mode with multiple domains; ignored otherwise.

min_n

Numeric scalar or NULL (default). Minimum total sample size per domain. Only active when domains are present; silently ignored otherwise. In simple mode, per-domain sample sizes are floored to min_n. In joint multistage mode, domains that would receive fewer than min_n observations are penalized during optimization, with an upfront feasibility check. In non-joint multistage mode, a warning is issued for any domain below the floor.

fixed_cost

Fixed overhead cost (C0). Default 0. Only applies to multistage mode. See n_cluster() for details.

prop_method

Proportion CI method for simple mode, one of "wald" (default), "wilson", or "logodds". This is passed to n_prop() for proportion rows and ignored for mean rows and multistage mode. An optional prop_method column in targets overrides this default on a per-row basis.

plan

Optional svyplan() object providing design defaults (stage_cost, fixed_cost).

Value

A svyplan_n object (simple mode) or svyplan_cluster object (multistage mode).

Without domains, the object contains:

n

Sample size (simple) or named per-stage allocation vector (multistage, e.g. c(n_psu = 80, psu_size = 12)).

detail

Per-indicator results (sample sizes or achieved CVs).

binding

Name or index of the binding (most demanding) indicator.

targets

The input targets data frame.

With domains, the object additionally contains:

n

Maximum per-stage sample size across domains. In simple mode, a single number; in multistage mode, a named vector (e.g. c(n_psu = 120, psu_size = 15)) giving the conservative allocation that satisfies all domains.

domains

Data frame with one row per domain, including domain variable columns, per-stage allocations (n_psu, psu_size, ...), and summary columns (.total_n, .cv, .cost, .binding for multistage; .n, .binding for simple mode). Use this for stratum-specific allocations.

total_n

Total sample size summed across all domains (multistage only).

cost

Total cost summed across all domains (multistage only).

Details

Building the targets data frame

Each row of targets represents one survey indicator. The two key decisions per row are:

  1. Type of indicator: is it a proportion (binary variable like "stunted yes/no") or a mean (continuous variable like "household expenditure")? This determines whether you fill the p or var column.

  2. Precision target: do you want an absolute margin of error (moe, e.g. +/- 5 percentage points) or a relative coefficient of variation (cv, e.g. 10 percent relative error)?

A minimal example for three health indicators:

targets <- data.frame(
  name = c("stunting", "vaccination", "expenditure"),
  p    = c(0.30, 0.70, NA),
  var  = c(NA, NA, 2500),
  moe  = c(0.05, 0.05, 10)
)

Rows with p are treated as proportions; rows with var (and p = NA) as means. You cannot have both p and var non-NA in the same row.

Column reference

name

Indicator label (optional). If omitted, row numbers are used in output.

p

Expected proportion, in (0, 1). Use this for binary indicators such as prevalences or coverage rates. The value is your best prior guess (e.g. from a previous survey or literature). One of p or var per row.

var

Population variance of a continuous indicator. Use this for means (e.g. income, expenditure, weight). One of p or var per row.

mu

Population mean (positive). Required when var is used together with cv (because CV = SE / mean) or with moe in multistage mode (for the moe-to-cv conversion).

moe

Margin of error, the half-width of the confidence interval you want. For proportions, this is on the probability scale (e.g. 0.05 for +/- 5 percentage points). For means, it is in the same units as the variable (e.g. 10 dollars). In multistage mode, converted to cv internally (see Details).

cv

Target coefficient of variation (relative standard error). For example, 0.10 means the SE should be at most 10\ of the estimate. Works in both simple and multistage mode.

alpha

Significance level for the confidence interval (default 0.05, giving a 95 percent CI).

deff

Design effect multiplier (simple mode only, default 1). Set > 1 to inflate the sample size for complex designs (e.g. 1.5 for a cluster design).

N

Population size (simple mode only, default Inf). A finite value applies a finite population correction, reducing the required sample size.

prop_method

Proportion CI method for simple mode: "wald" (default), "wilson", or "logodds". "wilson" is recommended for rare proportions (below 0.1 or above 0.9). Only used for rows with p; ignored for mean rows and multistage mode.

delta_psu, delta_ssu

Measure of homogeneity (intra-class correlation) within clusters, between 0 and 1. Needed for multistage mode. delta_psu is required for 2-stage designs; both are required for 3-stage designs.

rel_var

Unit relvariance. If omitted, derived automatically from p (as (1 - p) / p) or from var / mu^2.

k_psu, k_ssu

Ratio parameters for cost-variance modelling (multistage, default 1).

resp_rate

Expected response rate, in (0, 1]. Default 1 (no adjustment). A value of 0.90 inflates the sample size by 1 / 0.90 to compensate for 10 percent non-response.

Domain columns are specified via the domains parameter. When domains are present, optimization runs independently per domain combination (default), or jointly when joint = TRUE.

Simple mode (stage_cost = NULL): computes sample size per indicator by delegating proportion rows to n_prop() and mean rows to n_mean(), then takes the maximum per domain. Use prop_method or a targets$prop_method column to choose "wald", "wilson", or "logodds" for proportion rows.

Multistage mode (stage_cost provided): uses analytical reduction. For each candidate sub-stage allocation, the required stage-1 size is the maximum across all indicators. The total cost is then minimized (CV mode) or the worst-case CV ratio is minimized (budget mode) using numerical optimization.

Boundary and near-boundary homogeneity values are not supported by the analytical multistage optimum. When delta is near 0, most variability is within clusters, so the optimum collapses toward many interviews in very few PSUs. When delta is near 1, most variability is between clusters, so the optimum collapses toward very few interviews in many PSUs. To stay aligned with n_cluster(), multistage n_multi() rejects values numerically too close to 0 or 1.

Joint budget allocation (joint = TRUE): when domains and a budget are specified, the default (joint = FALSE) gives each domain the full budget independently. With joint = TRUE, a single budget is split optimally across domains using L-BFGS-B optimization of budget fractions, minimizing the worst-case CV ratio across all domains.

MOE in multistage mode

Multistage optimization uses CV internally. When moe values are provided, they are converted to cv before optimization:

  • Proportions: cv = moe / (z * p)

  • Means: cv = moe / (z * mu) (requires mu)

where z is the normal quantile for the row's alpha. This is an exact transformation, not an approximation.

These functions assume sampling fractions are negligible at each stage (equivalent to sampling with replacement). No finite population correction is applied. This is standard for multistage planning when cluster populations are large relative to the sample.

References

Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Wiley.

Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer.

See also

n_prop(), n_mean() for single-indicator sizing; n_cluster() for single-indicator multistage allocation; prec_multi() for the inverse.

Examples

# Simple mode: three indicators, take the max
targets <- data.frame(
  name = c("stunting", "vaccination", "anemia"),
  p    = c(0.30, 0.70, 0.10),
  moe  = c(0.05, 0.05, 0.03)
)
n_multi(targets)
#> Multi-indicator sample size
#> n = 385 (binding: anemia)
#> ---
#>  name        .n  .binding
#>  stunting    323         
#>  vaccination 323         
#>  anemia      385 *       

# MICS/DHS-style: specify precision as a relative margin of error (RME).
# RME = moe / p, so convert with moe = RME * p before calling n_multi().
rme <- 0.12
targets_rme <- data.frame(
  name = c("stunting", "vaccination", "anemia"),
  p    = c(0.30, 0.70, 0.10),
  deff = c(2.0, 1.5, 2.5)
)
targets_rme$moe <- rme * targets_rme$p
n_multi(targets_rme)
#> Multi-indicator sample size
#> n = 6003 (binding: anemia)
#> ---
#>  name        .n   .binding
#>  stunting    1245         
#>  vaccination  172         
#>  anemia      6003 *       

# Rare proportion: use Wilson globally in simple mode
n_multi(targets[3, , drop = FALSE], prop_method = "wilson")
#> Multi-indicator sample size
#> n = 388 (binding: anemia)
#> ---
#>  name   .n  .binding
#>  anemia 388 *       

# Per-row proportion methods in a mixed target table
targets_mixed <- data.frame(
  name = c("rare_prop", "mean_ind"),
  p = c(0.05, NA),
  var = c(NA, 100),
  moe = c(0.02, 2),
  prop_method = c("wilson", NA)
)
n_multi(targets_mixed)
#> Multi-indicator sample size
#> n = 469 (binding: rare_prop)
#> ---
#>  name      .n  .binding
#>  rare_prop 469 *       
#>  mean_ind   97         

# Simple mode with domains
targets_dom <- data.frame(
  name   = rep(c("stunting", "anemia"), each = 2),
  p      = c(0.30, 0.25, 0.10, 0.15),
  moe    = c(0.05, 0.05, 0.03, 0.03),
  region = rep(c("North", "South"), 2)
)
n_multi(targets_dom, domains = "region")
#> Multi-indicator sample size (2 domains)
#> n = 545 (binding: anemia)
#> ---
#>  region .n  .binding
#>  North  385 anemia  
#>  South  545 anemia  

# Two-stage CV mode
targets_cl <- data.frame(
  name   = c("stunting", "anemia"),
  p      = c(0.30, 0.10),
  cv     = c(0.10, 0.15),
  delta_psu = c(0.02, 0.05)
)
n_multi(targets_cl, stage_cost = c(500, 50))
#> Multi-indicator optimal allocation (2-stage)
#> n_psu = 48 | psu_size = 14 -> total n = 672 (unrounded: 655.6811)
#> cv = 0.1500, cost = 56568 (binding: anemia)
#> ---
#>  name     .cv_target .cv_achieved .binding
#>  stunting 0.10       0.0668               
#>  anemia   0.15       0.1500       *       

# Two-stage with MOE (converted to CV internally)
targets_moe <- data.frame(
  name   = c("stunting", "anemia"),
  p      = c(0.30, 0.10),
  moe    = c(0.05, 0.03),
  delta_psu = c(0.02, 0.05)
)
n_multi(targets_moe, stage_cost = c(500, 50))
#> Multi-indicator optimal allocation (2-stage)
#> n_psu = 46 | psu_size = 14 -> total n = 644 (unrounded: 629.693)
#> cv = 0.1531, cost = 54326 (binding: anemia)
#> ---
#>  name     .cv_target .cv_achieved .binding
#>  stunting 0.08503558 0.0682               
#>  anemia   0.15306404 0.1531       *       

# Joint budget allocation across domains
targets_jnt <- data.frame(
  name   = rep(c("stunting", "anemia"), each = 2),
  p      = c(0.30, 0.25, 0.10, 0.15),
  cv     = c(0.10, 0.10, 0.15, 0.15),
  delta_psu = c(0.02, 0.03, 0.05, 0.04),
  region = rep(c("Urban", "Rural"), 2)
)
n_multi(targets_jnt, domains = "region",
       stage_cost = c(500, 50), budget = 100000, joint = TRUE)
#> Multi-indicator optimal allocation (2-stage, 2 domains, joint)
#> ---
#> Total n = 1232 (unrounded: 1209)
#>  region n_psu psu_size .total_n .cv    .cost .binding
#>  Rural  28    18       504      0.0958 38379 stunting
#>  Urban  52    14       728      0.1437 61621 anemia