Re-exported from svyplan. Compute the design effect (DEFF) or effective sample size from sampling weights. Five methods are available for two use cases:
Usage
design_effect(x = NULL, ...)
effective_n(x = NULL, ...)
# S3 method for class 'tbl_sample'
design_effect(x, ..., y = NULL, x_cal = NULL, method = "kish")
# S3 method for class 'tbl_sample'
effective_n(x, ..., y = NULL, x_cal = NULL, method = "kish")Arguments
- x
A numeric weight vector, a
tbl_sample, orNULL(for the"cluster"planning method).- ...
Passed to the svyplan method. For
method = "cluster", passdelta(measure of homogeneity, scalar orsvyplan_varcomp) andpsu_size(mean cluster size). Seesvyplan::design_effect().- y
<
data-masking> Outcome variable (column name). Required for Henry, Spencer, and CR methods.- x_cal
<
data-masking> Calibration covariate (column name). Required for the Henry method.- method
Design effect method. For diagnostic use (with weights): one of
"kish"(default),"henry","spencer", or"cr". For planning (no weights):"cluster".
Value
For "kish", "henry", "spencer", and "cluster": a
numeric scalar. For "cr": a list with $strata (data frame of
per-stratum DEFF values) and $overall (numeric scalar).
Details
After data collection (diagnostic): assess how much precision was lost due to the complex design.
"kish"(default): weights only. Quick, outcome-independent summary."henry": weights + outcome + calibration covariate. Accounts for calibration weighting."spencer": weights + outcome + selection probabilities. Accounts for correlation between weights and the outcome."cr": weights + outcome + strata/cluster IDs. Full Chen-Rust decomposition for multistage stratified designs.
Before data collection (planning): estimate an expected DEFF to inflate a simple-random-sample size calculation.
"cluster": uses homogeneity (delta) and mean cluster size (psu_size) to compute DEFF = 1 + (psu_size - 1) * delta. Pass the result tosvyplan::n_prop(),svyplan::n_mean(), or other sizing functions.
The tbl_sample methods extract what they can from the sample
metadata. The user only needs to supply column names for variables
that are not part of the sampling metadata.
Weights from
.weightSelection probabilities from
.weight_1(for Spencer)Stratification and clustering variables from the stored design (for CR)
Examples
# Kish design effect (default)
set.seed(1)
frame <- data.frame(
id = 1:200,
stratum = rep(c("A", "B"), each = 100),
income = c(rnorm(100, 50, 10), rnorm(100, 80, 15)),
x_cal = runif(200, 0.5, 2)
)
samp <- sampling_design() |>
stratify_by(stratum) |>
draw(n = c(A = 10, B = 40)) |>
execute(frame, seed = 1)
design_effect(samp)
#> [1] 1.5625
effective_n(samp)
#> [1] 32
# Henry (calibration covariate)
design_effect(samp, y = income, x_cal = x_cal, method = "henry")
#> [1] 1.463474
# Spencer (selection probabilities extracted automatically)
design_effect(samp, y = income, method = "spencer")
#> [1] 3.57891
# Chen-Rust (strata and clusters extracted from design)
design_effect(samp, y = income, method = "cr")
#> $strata
#> stratum n_h cv2_w deff_w deff_s
#> 1 A 10 0 1 0.4166121
#> 2 B 40 0 1 0.1602749
#>
#> $overall
#> [1] 0.576887
#>
# Cluster planning (no sample needed)
design_effect(delta = 0.05, psu_size = 25, method = "cluster")
#> [1] 2.2