Design Effect and Effective Sample Size

These are the svyplan generics re-exported by samplyr. Samplyr adds tbl_sample methods rather than defining competing generics. Compute the design effect (DEFF) or effective sample size from sampling weights. Five methods are available for two use cases:

Usage

design_effect(x = NULL, ...)

effective_n(x = NULL, ...)

# S3 method for class 'tbl_sample'
design_effect(x, ..., y = NULL, x_cal = NULL, method = "kish")

# S3 method for class 'tbl_sample'
effective_n(x, ..., y = NULL, x_cal = NULL, method = "kish")

Arguments

x: A numeric weight vector, a tbl_sample, or NULL (for the "cluster" planning method).
...: Passed to the svyplan method. For method = "cluster", pass delta (measure of homogeneity, scalar or svyplan_varcomp) and psu_size (mean cluster size). See svyplan::design_effect().
y: <data-masking> Outcome variable (column name). Required for Henry, Spencer, and CR methods.
x_cal: <data-masking> Calibration covariate (column name). Required for the Henry method.
method: Design effect method. For diagnostic use (with weights): one of "kish" (default), "henry", "spencer", or "cr". For planning (no weights): "cluster".

Value

design_effect() returns a numeric svyplan_design_effect object. Use as.double() for the overall value and as.data.frame() for the Chen-Rust decomposition. effective_n() returns a numeric scalar.

Details

After data collection (diagnostic): assess how much precision was lost due to the complex design.

"kish" (default): weights only. Quick, outcome-independent summary.
"henry": weights + outcome + calibration covariate. Accounts for calibration weighting.
"spencer": weights + outcome + selection probabilities. Accounts for correlation between weights and the outcome.
"cr": weights + outcome + strata/cluster IDs. Full Chen-Rust decomposition for multistage stratified designs.

Before data collection (planning): estimate an expected DEFF to inflate a simple-random-sample size calculation.

"cluster": uses homogeneity (delta) and mean cluster size (psu_size) to compute DEFF = 1 + (psu_size - 1) * delta. Pass the result to svyplan::n_prop(), svyplan::n_mean(), or other sizing functions.

The tbl_sample methods extract what they can from the sample metadata. The user only needs to supply column names for variables that are not part of the sampling metadata.

Weights from .weight
Selection probabilities from 1 / .weight (for Spencer). Spencer (2000) derives his DEFF at the unit level, so the required probability is the overall inclusion probability \(\pi_i = 1/w_i\), i.e. the product of all per-stage inclusion probabilities for multi-stage designs. This matches the convention used by PracTools::deffS().
Stratification and clustering variables from the stored design (for CR)

Examples

# Kish design effect (default)
set.seed(1207)
frame <- data.frame(
  id = 1:200,
  stratum = rep(c("A", "B"), each = 100),
  income = c(rnorm(100, 50, 10), rnorm(100, 80, 15)),
  x_cal = runif(200, 0.5, 2)
)
samp <- sampling_design() |>
  stratify_by(stratum) |>
  draw(n = c(A = 10, B = 40)) |>
  execute(frame, seed = 1213)

design_effect(samp)
#> Design effect (Kish)
#> overall = 1.5625
effective_n(samp)
#> [1] 32

# Henry (calibration covariate)
design_effect(samp, y = income, x_cal = x_cal, method = "henry")
#> Design effect (Henry)
#> overall = 1.3808

# Spencer (selection probabilities extracted automatically)
design_effect(samp, y = income, method = "spencer")
#> Design effect (Spencer)
#> overall = 2.3287

# Chen-Rust (strata and clusters extracted from design)
design_effect(samp, y = income, method = "cr")
#> Design effect (Chen-Rust)
#> overall = 0.3181

# Cluster planning (no sample needed)
design_effect(delta = 0.05, psu_size = 25, method = "cluster")
#> Design effect (Cluster)
#> overall = 2.2000

Usage

Arguments

Value

Details

See also

Examples