Sample Size for a Proportion

Compute the required sample size for estimating a population proportion with a specified margin of error or coefficient of variation.

Usage

n_prop(p, ...)

# Default S3 method
n_prop(
  p,
  ...,
  moe = NULL,
  cv = NULL,
  alpha = 0.05,
  N = Inf,
  deff = 1,
  resp_rate = 1,
  method = c("wald", "wilson", "logodds"),
  plan = NULL
)

# S3 method for class 'svyplan_prec'
n_prop(p, ..., moe = NULL, cv = NULL)

Arguments

p: For the default method: expected proportion, in (0, 1). For svyplan_prec objects: a precision result from prec_prop().
...: Additional arguments passed to methods. Unused arguments are rejected.
moe: Desired margin of error, the half-width of the confidence interval on the proportion scale. For example, moe = 0.05 means the 95 percent CI should be no wider than +/- 5 percentage points. Specify exactly one of moe or cv.
cv: Target coefficient of variation (relative standard error). For example, cv = 0.10 means the standard error should be at most 10 percent of the estimate. Use cv when you want precision to scale with the estimate (common in economic surveys). Use moe when you want a fixed absolute precision (common in health/DHS surveys). Specify exactly one of moe or cv.
alpha: Significance level, default 0.05.
N: Population size. Inf (default) means no finite population correction.
deff: Design effect multiplier (> 0). Accounts for the loss of precision from a complex design (clustering, unequal weights) compared to simple random sampling. A DEFF of 1.5 means 50 percent more interviews are needed for the same precision. Estimate from a previous survey, use design_effect() to compute it, or apply a rule of thumb (1.5–2.0 for typical cluster designs). Values < 1 are valid for efficient designs (e.g., stratified sampling with Neyman allocation).
resp_rate: Expected response rate, in (0, 1]. Default 1 (no adjustment). The required sample size is inflated by 1 / resp_rate. Estimate from response rates observed in similar surveys in the same population.
method: One of "wald" (default), "wilson", or "logodds".
plan: Optional svyplan() object providing design defaults.

Value

A svyplan_n object.

Details

Three confidence interval methods are available:

Wald ("wald"): Standard normal approximation (Cochran, 1977, Ch. 3). Supports both moe and cv modes, with optional finite population correction.
Wilson ("wilson"): Wilson (1927) score interval. Only moe mode, no FPC.
Log-odds ("logodds"): Log-odds (logit) transform interval. Only moe mode, with optional FPC.

For proportions near 0 or 1 (below 0.1 or above 0.9), the Wald interval has poor coverage. The recommended choice in those cases is method = "wilson".

For the Wilson and log-odds methods, the design effect is applied as a multiplicative factor to the final SRS sample size, which is an approximation.

Finite population correction

Setting N to a finite value reduces the required sample size when the sampling fraction (n/N) is non-negligible. As a rule of thumb, FPC has little effect when n/N < 5 percent. The Wald FPC uses the Cochran (1977, Ch. 3) form with an N/(N-1) factor to account for the Bernoulli finite-population variance. This differs from n_mean(), where no N/(N-1) adjustment is needed because the variance is already defined on N-1 degrees of freedom.

All methods use the normal (z) quantile. This is standard for survey sampling where the sample size is large enough for the CLT to apply.

When called on a svyplan_prec object, parameters are extracted from the stored result. Any argument of the default method (e.g. method, deff, N) can be overridden through .... Unknown argument names are an error. Passing a different method evaluates the stored precision target under that formula. The round-trip will not be exact because the precision was computed under the original method.

References

Cochran, W. G. (1977). Sampling Techniques (3rd ed.). Wiley.

Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209–212.

Examples

# Wald, absolute margin of error
n_prop(p = 0.3, moe = 0.05)
#> Sample size for proportion (wald)
#> n = 323 (p = 0.30, moe = 0.050)

# Wald, target CV with finite population
n_prop(p = 0.5, cv = 0.10, N = 10000)
#> Sample size for proportion (wald)
#> n = 100 (p = 0.50, cv = 0.100)

# Wilson score interval
n_prop(p = 0.1, moe = 0.03, method = "wilson")
#> Sample size for proportion (wilson)
#> n = 388 (p = 0.10, moe = 0.030)

# With design effect and response rate
n_prop(p = 0.3, moe = 0.05, deff = 1.5, resp_rate = 0.8)
#> Sample size for proportion (wald)
#> n = 606 (net: 485) (p = 0.30, moe = 0.050, deff = 1.50, resp_rate = 0.80)

# MICS/DHS-style relative margin of error (RME)
# RME = moe / p, so moe = RME * p
p <- 0.2
n_prop(p = p, moe = 0.12 * p, deff = 1.5, resp_rate = 0.9)
#> Sample size for proportion (wald)
#> n = 1779 (net: 1601) (p = 0.20, moe = 0.024, deff = 1.50, resp_rate = 0.90)