Skip to contents

Estimate between- and within-stage variance components using nested ANOVA decomposition. Supports SRS and PPS first-stage designs.

Usage

varcomp(x, ...)

# S3 method for class 'formula'
varcomp(x, ..., data = NULL, prob = NULL)

# Default S3 method
varcomp(x, ..., stage_id = NULL, prob = NULL)

# S3 method for class 'survey.design'
varcomp(x, ..., prob = NULL)

Arguments

x

A formula, numeric vector, or survey design object (see Details).

...

Additional arguments passed to methods.

data

A data frame (required for formula interface).

prob

First-stage selection probabilities. A one-sided formula (e.g., ~pp) when using the formula interface, or a numeric vector. NULL (default) assumes SRS.

stage_id

A list of stage-ID vectors (required for vector interface). Length determines the number of stage boundaries (stages - 1).

Value

A svyplan_varcomp object with components:

varb

Between-PSU variance (scalar).

varw

Within-PSU variance. Scalar for 2-stage, length-2 vector (varw_psu, varw_ssu) for 3-stage.

delta

Measure of homogeneity. Length 1 for 2-stage, length 2 (delta_psu, delta_ssu) for 3-stage.

k

Ratio parameter(s), same length as delta (k_psu, k_ssu for 3-stage).

rel_var

Unit relvariance (scalar).

stages

Number of stages (2 or 3).

Details

The interface is determined by the class of x:

  • Formula: varcomp(income ~ district, data = frame). The LHS is the analysis variable, RHS terms are stage IDs (outermost first).

  • Numeric vector: varcomp(y, stage_id = list(cluster_ids)).

  • survey.design: varcomp(design, ~y). Cluster structure is extracted from the design object. Requires the survey package.

When prob is NULL, SRS first-stage is assumed. When provided, PPS variance estimation is used.

The returned delta is the measure of homogeneity \(\delta = V_b / (V_b + V_w)\) following Valliant, Dever, and Kreuter (2018, Ch. 9). Unlike the traditional ANOVA intraclass correlation coefficient, delta is constrained to \([0, 1]\) and should not be compared directly to mixed-model ICCs (e.g. from lme4) which can be negative.

Clusters containing a single observation have undefined within-cluster variance. In this case, the within-cluster variance is imputed as the mean variance of the remaining clusters.

Methods (by class)

  • varcomp(formula): Method for formula interface.

  • varcomp(default): Default method for numeric vectors.

  • varcomp(survey.design): Method for survey design objects. Pass a one-sided formula (e.g., ~y) to specify the outcome variable. Cluster structure is extracted from the design.

References

Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Ch. 9.

Hansen, M. H., Hurwitz, W. N., and Madow, W. G. (1953). Sample Survey Methods and Theory (Vol. I). Wiley.

See also

n_cluster() which accepts a svyplan_varcomp as delta.

Examples

# 2-stage SRS using formula
set.seed(42)
frame <- data.frame(
  income = rnorm(200, 50000, 10000),
  district = rep(1:20, each = 10)
)
vc <- varcomp(income ~ district, data = frame)
vc
#> Variance components (2-stage)
#> varb = 0.0040, varw = 0.0383
#> delta = 0.0944
#> k = 1.0998
#> Unit relvariance = 0.0384

# Feed into n_cluster
n_cluster(cost = c(500, 50), delta = vc, budget = 100000)
#> Optimal 2-stage allocation
#> n_psu = 102 | psu_size = 10 -> total n = 1020 (unrounded: 989.5168)
#> cv = 0.0088, cost = 100000