Estimate between- and within-stage variance components using nested ANOVA decomposition. Supports SRS and PPS first-stage designs.
Usage
varcomp(x, ...)
# S3 method for class 'formula'
varcomp(x, ..., data = NULL, prob = NULL)
# Default S3 method
varcomp(x, ..., stage_id = NULL, prob = NULL)
# S3 method for class 'survey.design'
varcomp(x, ..., prob = NULL)Arguments
- x
A formula, numeric vector, or survey design object (see Details).
- ...
Additional arguments passed to methods.
- data
A data frame (required for formula interface).
- prob
First-stage selection probabilities. A one-sided formula (e.g.,
~pp) when using the formula interface, or a numeric vector.NULL(default) assumes SRS.- stage_id
A list of stage-ID vectors (required for vector interface). Length determines the number of stage boundaries (stages - 1).
Value
A svyplan_varcomp object with components:
varbBetween-PSU variance (scalar).
varwWithin-PSU variance. Scalar for 2-stage, length-2 vector (
varw_psu,varw_ssu) for 3-stage.deltaMeasure of homogeneity. Length 1 for 2-stage, length 2 (
delta_psu,delta_ssu) for 3-stage.kRatio parameter(s), same length as
delta(k_psu,k_ssufor 3-stage).rel_varUnit relvariance (scalar).
stagesNumber of stages (2 or 3).
Details
The interface is determined by the class of x:
Formula:
varcomp(income ~ district, data = frame). The LHS is the analysis variable, RHS terms are stage IDs (outermost first).Numeric vector:
varcomp(y, stage_id = list(cluster_ids)).survey.design:
varcomp(design, ~y). Cluster structure is extracted from the design object. Requires the survey package.
When prob is NULL, SRS first-stage is assumed. When provided, PPS
variance estimation is used.
The returned delta is the measure of homogeneity
\(\delta = V_b / (V_b + V_w)\) following
Valliant, Dever, and Kreuter (2018, Ch. 9). Unlike the traditional ANOVA
intraclass correlation coefficient, delta is constrained to \([0, 1]\)
and should not be compared directly to mixed-model ICCs (e.g. from lme4)
which can be negative.
Clusters containing a single observation have undefined within-cluster variance. In this case, the within-cluster variance is imputed as the mean variance of the remaining clusters.
Methods (by class)
varcomp(formula): Method for formula interface.varcomp(default): Default method for numeric vectors.varcomp(survey.design): Method for survey design objects. Pass a one-sided formula (e.g.,~y) to specify the outcome variable. Cluster structure is extracted from the design.
References
Valliant, R., Dever, J. A., and Kreuter, F. (2018). Practical Tools for Designing and Weighting Survey Samples (2nd ed.). Springer. Ch. 9.
Hansen, M. H., Hurwitz, W. N., and Madow, W. G. (1953). Sample Survey Methods and Theory (Vol. I). Wiley.
See also
n_cluster() which accepts a svyplan_varcomp as delta.
Examples
# 2-stage SRS using formula
set.seed(42)
frame <- data.frame(
income = rnorm(200, 50000, 10000),
district = rep(1:20, each = 10)
)
vc <- varcomp(income ~ district, data = frame)
vc
#> Variance components (2-stage)
#> varb = 0.0040, varw = 0.0383
#> delta = 0.0944
#> k = 1.0998
#> Unit relvariance = 0.0384
# Feed into n_cluster
n_cluster(cost = c(500, 50), delta = vc, budget = 100000)
#> Optimal 2-stage allocation
#> n_psu = 102 | psu_size = 10 -> total n = 1020 (unrounded: 989.5168)
#> cv = 0.0088, cost = 100000