Skip to contents

Draws a sample with unequal inclusion probabilities, without replacement.

Usage

unequal_prob_wor(
  pik,
  method = c("cps", "brewer", "systematic", "poisson", "sps", "pareto"),
  nrep = 1L,
  prn = NULL,
  ...
)

Arguments

pik

A numeric vector of inclusion probabilities. For fixed-size methods, sum(pik) must be close to an integer.

method

The sampling method:

"cps"

Conditional Poisson Sampling (maximum entropy; Chen et al., 1994). Fixed size, exact joint probabilities with all \(\pi_{ij} > 0\). O(N^2).

"brewer"

Brewer's (1975) draw-by-draw method. Fixed size, approximate joint probabilities (high-entropy approximation; see joint_inclusion_prob()). O(Nn).

"systematic"

Systematic PPS. Fixed size, exact joint probabilities but some may be zero (pairs that never co-occur), making the SYG estimator inapplicable; see sampling_cov(). O(N).

"poisson"

Poisson sampling. Random sample size (expected \(n = \sum \pi_k\)). Units selected independently, so \(\pi_{ij} = \pi_i \pi_j\). Supports PRN. O(N).

"sps"

Sequential Poisson Sampling (Ohlsson, 1998). Order sampling with key \(\xi_k = u_k / \pi_k\); the \(n\) smallest are selected. Fixed size, high-entropy. Supports PRN. Approximate joint probabilities. The true first-order inclusion probabilities are approximately equal to the supplied pik; see inclusion_prob(). O(N log N).

"pareto"

Pareto sampling (Rosen, 1997). Order sampling with odds-ratio key \(\xi_k = [u_k/(1-u_k)] / [\pi_k/(1-\pi_k)]\). Same properties as "sps". O(N log N).

nrep

Number of replicate samples (default 1). When nrep > 1, $sample holds a matrix (fixed-size) or list (random-size) of all replicates. The design object and all generics remain usable.

prn

Optional vector of permanent random numbers (length N, values in the open interval (0, 1)) for sample coordination. Supported by methods "sps", "pareto", and "poisson". When NULL, random numbers are generated internally. Cannot be used with nrep > 1 (identical PRN would produce identical replicates). Use a loop with different PRN vectors for coordinated repeated sampling.

...

Additional arguments passed to methods (e.g., eps for boundary tolerance).

Value

An object of class c("unequal_prob", "wor", "sondage_sample"). When nrep = 1, $sample is an integer vector of selected unit indices. When nrep > 1, $sample is a matrix (n x nrep) for fixed-size methods, or a list of integer vectors of varying lengths for random-size methods ("poisson").

References

Chen, X. H., Dempster, A. P., & Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika, 81(3), 457-469.

Brewer, K.R.W. (1975). A simple procedure for sampling pi-ps wor. Australian Journal of Statistics, 17(3), 166-172.

Ohlsson, E. (1998). Sequential Poisson sampling. Journal of Official Statistics, 14(2), 149-162.

Rosen, B. (1997). On sampling with probability proportional to size. Journal of Statistical Planning and Inference, 62(2), 159-191.

Tille, Y. (2006). Sampling Algorithms. Springer.

See also

unequal_prob_wr() for with-replacement designs, equal_prob_wor() for equal probability designs, inclusion_prob() to compute inclusion probabilities from size measures.

Examples

pik <- c(0.2, 0.4, 0.6, 0.8)

# Conditional Poisson Sampling
set.seed(123)
s <- unequal_prob_wor(pik, method = "cps")
s$sample
#> [1] 3 4

# Brewer's method
s <- unequal_prob_wor(pik, method = "brewer")
s$sample
#> [1] 3 4

# Sequential Poisson Sampling with PRN coordination
prn <- runif(4)
s <- unequal_prob_wor(pik, method = "sps", prn = prn)
s$sample
#> [1] 2 3

# Pareto sampling
s <- unequal_prob_wor(pik, method = "pareto", prn = prn)
s$sample
#> [1] 2 3

# \donttest{
# Batch mode for simulations
sim <- unequal_prob_wor(pik, method = "cps", nrep = 1000)
dim(sim$sample)  # 2 x 1000
#> [1]    2 1000
# }