Selects a sample using Chromy's (1979) sequential method with probability proportional to size. This is the default METHOD=PPS_SEQ in SAS SURVEYSELECT.

up_chromy(x, n)

Arguments

x

A numeric vector of positive size measures (e.g., population, revenue, area). Must be non-negative with positive sum.

n

The sample size (number of selections).

Value

An integer vector of selected indices (1 to length(x)). May contain repeated values when expected hits exceed 1 (minimum replacement).

Details

Chromy's method is a strictly sequential algorithm that processes units in order and makes an immediate selection decision for each. It achieves spatial balancing similar to systematic sampling.

Properties:

  • Fixed sample size n

  • Exact expected hits: \(E[hits_k] = n \times x_k / \sum x\)

  • Spatial balance (sample spread throughout frame)

  • O(N) time complexity (single pass)

  • All joint inclusion probabilities > 0

The method uses "minimum replacement": if expected hits for unit k is 2.3, the unit appears exactly 2 or 3 times (never 0, 1, or 4+). This differs from multinomial sampling where any count is possible.

When all expected hits are < 1 (i.e., n * max(x) / sum(x) <= 1), the method behaves as without replacement sampling.

References

Chromy, J.R. (1979). Sequential sample selection methods. Proceedings of the Survey Research Methods Section, ASA, 401-406.

Chauvet, G. (2019). Properties of Chromy's sampling procedure. arXiv:1912.10896.

See also

up_multinomial() for PPS with replacement (any hit count), up_systematic() for systematic PPS, up_brewer() for Brewer's method (WOR only), inclusion_prob() for computing inclusion probabilities

Examples

# Size measures
x <- c(40, 80, 50, 60, 70)

# WOR case: n small relative to x
set.seed(42)
up_chromy(x, n = 3)  # No repeats
#> [1] 1 2 4

# Minimum replacement: larger n causes repeats
up_chromy(x, n = 10)  # Some units appear multiple times
#>  [1] 1 2 2 2 3 3 4 4 5 5

# Verify expected hits
n <- 10
expected <- n * x / sum(x)
print(expected)  # 1.33, 2.67, 1.67, 2.00, 2.33
#> [1] 1.333333 2.666667 1.666667 2.000000 2.333333

# Simulate to check
hits <- table(factor(up_chromy(x, n), levels = 1:5))
print(hits)  # Should be close to floor or ceiling of expected
#> 
#> 1 2 3 4 5 
#> 1 3 1 2 3