Multi-Indicator Sample Size

Compute the sample size that satisfies precision requirements for multiple survey indicators simultaneously under a simple sampling design. Optional domain columns support separate requirements by subpopulation.

Usage

n_multi(targets, ...)

# Default S3 method
n_multi(
  targets,
  ...,
  domains = NULL,
  min_n = NULL,
  prop_method = c("wald", "wilson", "logodds"),
  plan = NULL
)

# S3 method for class 'svyplan_prec'
n_multi(targets, ...)

Arguments

targets

For the default method: a data frame where each row is one survey indicator you want to measure. For example, a prevalence (proportion) or a population mean. Surveys typically track several indicators simultaneously and the sample must be large enough for the most demanding one. n_multi() finds that size.

See the Details section for the full column reference. At minimum, each row needs:

What to measure: p for a proportion (e.g. 0.30 for 30\ population variance. Each row must use exactly one.
How precise: moe (margin of error) or cv (coefficient of variation). Each row must specify exactly one.

For svyplan_prec objects: a precision result from prec_multi().

...

Additional arguments passed to methods. Unused arguments are rejected.

domains

Character vector of column names in targets to treat as domain variables, or NULL (default) for no domains. All names must exist in targets. When specified, sizing runs independently for each domain combination.

min_n

Numeric scalar or NULL (default). Minimum total sample size per domain. It applies only when domains are present. Per-domain sample sizes are floored to min_n.

prop_method

Proportion CI method, one of "wald" (default), "wilson", or "logodds". This is passed to n_prop() for proportion rows and ignored for mean rows. An optional prop_method column in targets overrides this default on a per-row basis.

plan

Optional svyplan() object providing design defaults.

Value

A svyplan_n object. The output class is the same with or without domains.

Without domains, the object contains:

n: The sample size required by the binding indicator.
detail: Per-indicator sample-size results.
binding: Name or index of the binding (most demanding) indicator.
targets: The input targets data frame.

With domains, the object additionally contains:

n: The largest sample size required across domains.
domains: Data frame with one row per domain, including domain variables, .n, and .binding.

Details

Building the targets data frame

Each row of targets represents one survey indicator. The two key decisions per row are:

Type of indicator: is it a proportion (binary variable like "stunted yes/no") or a mean (continuous variable like "household expenditure")? This determines whether you fill the p or var column.
Precision target: do you want an absolute margin of error (moe, e.g. +/- 5 percentage points) or a relative coefficient of variation (cv, e.g. 10 percent relative error)?

A minimal example for three health indicators:

targets <- data.frame(
  name = c("stunting", "vaccination", "expenditure"),
  p    = c(0.30, 0.70, NA),
  var  = c(NA, NA, 2500),
  moe  = c(0.05, 0.05, 10)
)

Rows with p are treated as proportions, whereas rows with var (and p = NA) are treated as means. You cannot have both p and var non-NA in the same row.

Column reference

name: Indicator label (optional). If omitted, row numbers are used in output.
p: Expected proportion, in (0, 1). Use this for binary indicators such as prevalences or coverage rates. The value is your best prior guess (e.g. from a previous survey or literature). One of p or var per row.
var: Population variance of a continuous indicator. Use this for means (e.g. income, expenditure, weight). One of p or var per row.
mu: Population mean (positive). It is required when var is used with cv because CV = SE / mean.
moe: Margin of error, the half-width of the confidence interval you want. For proportions, this is on the probability scale (e.g. 0.05 for +/- 5 percentage points). For means, it is in the same units as the variable (e.g. 10 dollars).
cv: Target coefficient of variation (relative standard error). For example, 0.10 means the SE should be at most 10\ of the estimate.
alpha: Significance level for the confidence interval (default 0.05, giving a 95 percent CI).
deff: Design effect multiplier (default 1). Set > 1 to inflate the sample size for complex designs (e.g. 1.5 for a cluster design).
N: Population size (default Inf). A finite value applies a finite population correction, reducing the required sample size.
prop_method: Proportion CI method: "wald" (default), "wilson", or "logodds". "wilson" is recommended for rare proportions (below 0.1 or above 0.9). It is used only for rows with p.
rel_var: Unit relvariance. If omitted, derived automatically from p (as (1 - p) / p) or from var / mu^2.
resp_rate: Expected response rate, in (0, 1]. Default 1 (no adjustment). A value of 0.90 inflates the sample size by 1 / 0.90 to compensate for 10 percent non-response.

Domain columns are specified via the domains parameter. When domains are present, sizing runs independently for each domain combination.

n_multi() computes sample size per indicator by delegating proportion rows to n_prop() and mean rows to n_mean(), then takes the maximum per domain. Use prop_method or a targets$prop_method column to choose "wald", "wilson", or "logodds" for proportion rows.

References

Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Wiley.

Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer.

Examples

# Simple mode: three indicators, take the max
targets <- data.frame(
  name = c("stunting", "vaccination", "anemia"),
  p    = c(0.30, 0.70, 0.10),
  moe  = c(0.05, 0.05, 0.03)
)
n_multi(targets)
#> Multi-indicator sample size
#> n = 385 (binding: anemia)
#> ---
#>  name        .n  .cv_target .cv_achieved .binding
#>  stunting    323 0.08503558 0.07793639           
#>  vaccination 323 0.03644382 0.03340131           
#>  anemia      385 0.15306404 0.15306404   *       

# MICS/DHS-style: specify precision as a relative margin of error (RME).
# RME = moe / p, so convert with moe = RME * p before calling n_multi().
rme <- 0.12
targets_rme <- data.frame(
  name = c("stunting", "vaccination", "anemia"),
  p    = c(0.30, 0.70, 0.10),
  deff = c(2.0, 1.5, 2.5)
)
targets_rme$moe <- rme * targets_rme$p
n_multi(targets_rme)
#> Multi-indicator sample size
#> n = 6003 (binding: anemia)
#> ---
#>  name        .n   .cv_target .cv_achieved .binding
#>  stunting    1245 0.06122561 0.02788337           
#>  vaccination  172 0.06122561 0.01034902           
#>  anemia      6003 0.06122561 0.06122561   *       

# Rare proportion: use Wilson globally in simple mode
n_multi(targets[3, , drop = FALSE], prop_method = "wilson")
#> Multi-indicator sample size
#> n = 388 (binding: anemia)
#> ---
#>  name   .n  .cv_target .cv_achieved .binding
#>  anemia 388 0.153064   0.153064     *       

# Per-row proportion methods in a mixed target table
targets_mixed <- data.frame(
  name = c("rare_prop", "mean_ind"),
  p = c(0.05, NA),
  var = c(NA, 100),
  moe = c(0.02, 2),
  prop_method = c("wilson", NA)
)
n_multi(targets_mixed)
#> Multi-indicator sample size
#> n = 469 (binding: rare_prop)
#> ---
#>  name      .n  .cv_target .cv_achieved .binding
#>  rare_prop 469 0.2040854  0.2040854    *       
#>  mean_ind   97        NA         NA            

# Simple mode with domains
targets_dom <- data.frame(
  name   = rep(c("stunting", "anemia"), each = 2),
  p      = c(0.30, 0.25, 0.10, 0.15),
  moe    = c(0.05, 0.05, 0.03, 0.03),
  region = rep(c("North", "South"), 2)
)
n_multi(targets_dom, domains = "region")
#> Multi-indicator sample size (2 domains)
#> n = 545 (binding: anemia)
#> ---
#>  region .n  .binding
#>  North  385 anemia  
#>  South  545 anemia  

# Two-stage CV mode
targets_cl <- data.frame(
  name   = c("stunting", "anemia"),
  p      = c(0.30, 0.10),
  cv     = c(0.10, 0.15),
  delta_psu = c(0.02, 0.05)
)
n_multi_cluster(targets_cl, stage_cost = c(500, 50))
#> Multi-indicator optimal allocation (2-stage)
#> field design: n_psu = 52 | psu_size = 12 -> total n = 624
#> worst cv = 0.1495, cost = 57200 (binding: anemia)
#> continuous optimum: n_psu = 47.56811 | psu_size = 13.78404 (cv = 0.1500, cost = 56568)
#> ---
#>  name     .n       .cv_target .cv_achieved .binding
#>  stunting 292.9922 0.10       0.0668               
#>  anemia   655.6809 0.15       0.1500       *       

# Two-stage with MOE (converted to CV internally)
targets_moe <- data.frame(
  name   = c("stunting", "anemia"),
  p      = c(0.30, 0.10),
  moe    = c(0.05, 0.03),
  delta_psu = c(0.02, 0.05)
)
n_multi_cluster(targets_moe, stage_cost = c(500, 50))
#> Multi-indicator optimal allocation (2-stage)
#> field design: n_psu = 50 | psu_size = 12 -> total n = 600
#> worst cv = 0.1525, cost = 55000 (binding: anemia)
#> continuous optimum: n_psu = 45.68273 | psu_size = 13.78404 (cv = 0.1531, cost = 54326)
#> ---
#>  name     .n       .cv_target .cv_achieved .binding
#>  stunting 405.1863 0.08503558 0.0682               
#>  anemia   629.6928 0.15306404 0.1531       *       

# Joint budget allocation across domains
targets_jnt <- data.frame(
  name   = rep(c("stunting", "anemia"), each = 2),
  p      = c(0.30, 0.25, 0.10, 0.15),
  cv     = c(0.10, 0.10, 0.15, 0.15),
  delta_psu = c(0.02, 0.03, 0.05, 0.04),
  region = rep(c("Urban", "Rural"), 2)
)
n_multi_cluster(
  targets_jnt,
  stage_cost = c(500, 50),
  domains = "region",
  budget = 100000,
  joint = TRUE
)
#> Multi-indicator optimal allocation (2-stage, 2 domains, joint)
#> ---
#> Total n = 1232 (unrounded: 1209)
#>  region n_psu psu_size .total_n .cv    .cost .binding
#>  Urban  52    14       728      0.1437 61621 anemia  
#>  Rural  28    18       504      0.0958 38379 stunting