Skip to contents

Re-exported from svyplan. Compute the design effect (DEFF) or effective sample size from sampling weights. Five methods are available for two use cases:

Usage

design_effect(x = NULL, ...)

effective_n(x = NULL, ...)

# S3 method for class 'tbl_sample'
design_effect(x, ..., y = NULL, x_cal = NULL, method = "kish")

# S3 method for class 'tbl_sample'
effective_n(x, ..., y = NULL, x_cal = NULL, method = "kish")

Arguments

x

A numeric weight vector, a tbl_sample, or NULL (for the "cluster" planning method).

...

Passed to the svyplan method. For method = "cluster", pass delta (measure of homogeneity, scalar or svyplan_varcomp) and psu_size (mean cluster size). See svyplan::design_effect().

y

<data-masking> Outcome variable (column name). Required for Henry, Spencer, and CR methods.

x_cal

<data-masking> Calibration covariate (column name). Required for the Henry method.

method

Design effect method. For diagnostic use (with weights): one of "kish" (default), "henry", "spencer", or "cr". For planning (no weights): "cluster".

Value

For "kish", "henry", "spencer", and "cluster": a numeric scalar. For "cr": a list with $strata (data frame of per-stratum DEFF values) and $overall (numeric scalar).

Details

After data collection (diagnostic): assess how much precision was lost due to the complex design.

  • "kish" (default): weights only. Quick, outcome-independent summary.

  • "henry": weights + outcome + calibration covariate. Accounts for calibration weighting.

  • "spencer": weights + outcome + selection probabilities. Accounts for correlation between weights and the outcome.

  • "cr": weights + outcome + strata/cluster IDs. Full Chen-Rust decomposition for multistage stratified designs.

Before data collection (planning): estimate an expected DEFF to inflate a simple-random-sample size calculation.

  • "cluster": uses homogeneity (delta) and mean cluster size (psu_size) to compute DEFF = 1 + (psu_size - 1) * delta. Pass the result to svyplan::n_prop(), svyplan::n_mean(), or other sizing functions.

The tbl_sample methods extract what they can from the sample metadata. The user only needs to supply column names for variables that are not part of the sampling metadata.

  • Weights from .weight

  • Selection probabilities from .weight_1 (for Spencer)

  • Stratification and clustering variables from the stored design (for CR)

Examples

# Kish design effect (default)
set.seed(1)
frame <- data.frame(
  id = 1:200,
  stratum = rep(c("A", "B"), each = 100),
  income = c(rnorm(100, 50, 10), rnorm(100, 80, 15)),
  x_cal = runif(200, 0.5, 2)
)
samp <- sampling_design() |>
  stratify_by(stratum) |>
  draw(n = c(A = 10, B = 40)) |>
  execute(frame, seed = 1)

design_effect(samp)
#> [1] 1.5625
effective_n(samp)
#> [1] 32

# Henry (calibration covariate)
design_effect(samp, y = income, x_cal = x_cal, method = "henry")
#> [1] 1.463474

# Spencer (selection probabilities extracted automatically)
design_effect(samp, y = income, method = "spencer")
#> [1] 3.57891

# Chen-Rust (strata and clusters extracted from design)
design_effect(samp, y = income, method = "cr")
#> $strata
#>   stratum n_h cv2_w deff_w    deff_s
#> 1       A  10     0      1 0.4166121
#> 2       B  40     0      1 0.1602749
#> 
#> $overall
#> [1] 0.576887
#> 

# Cluster planning (no sample needed)
design_effect(delta = 0.05, psu_size = 25, method = "cluster")
#> [1] 2.2