Hajek Approximation for Joint Inclusion Probabilities

Computes the joint inclusion probability matrix using the Hajek (1964) approximation based on conditional Poisson (rejective) sampling theory: $$\pi_{ij} \approx \pi_i \pi_j \left[1 - \frac{(1-\pi_i)(1-\pi_j)}{D}\right]$$ where $D = \sum_k \pi_k (1 - \pi_k)$.

Usage

hajek_jip(pik, sample_idx = NULL, eps = 1e-06, ...)

Arguments

pik: Numeric vector of inclusion probabilities ($0 \le \pi_k \le 1$).
sample_idx: Unique integer vector of 1-based indices for the sampled units, or NULL (default) for the full population matrix. When non-NULL, returns the submatrix for those units only, without allocating the full N x N matrix.
eps: Boundary tolerance (default 1e-6). Units with $\pi_k \ge 1 - \varepsilon$ are treated as certainty selections; units with $\pi_k \le \varepsilon$ are treated as zero.
...: Additional arguments (ignored). Present so that the function matches the joint_fn signature required by register_method().

Value

A symmetric matrix of joint inclusion probabilities: N x N when sample_idx is NULL, or length(sample_idx) x length(sample_idx) otherwise. Diagonal entries are $\pi_i$.

Details

The Hajek approximation is simpler and computationally lighter than the high-entropy approximation (he_jip()), but generally slightly less accurate. It is derived from the asymptotic theory of rejective (conditional Poisson) sampling, where the design is obtained by conditioning independent Poisson trials on the total sample size.

The formula is valid for any high-entropy design, but is most accurate when the design is close to rejective sampling. For maximum-entropy designs (CPS, Sampford), he_jip() tends to give tighter results. In practice, both approximations agree closely for moderate to large populations with well-spread inclusion probabilities.

Like he_jip(), this function matches the joint_fn signature required by register_method():

register_method("my_method", sample_fn = my_fn, joint_fn = hajek_jip)

Properties

Symmetric: $\pi_{ij} = \pi_{ji}$
Diagonal: $\pi_{ii} = \pi_i$ (set directly)
Bounded: $0 \le \pi_{ij} \le \min(\pi_i, \pi_j)$ (clamped)
Marginal defect: $|\sum_{j \ne i} \pi_{ij} - (n-1)\pi_i| = O(1/N)$ for well-spread $\pi_k$

References

Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35(4), 1491–1523.

Examples

pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)

# Full N x N matrix
pikl <- hajek_jip(pik)
round(pikl, 4)
#>        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
#> [1,] 0.1818 0.0317 0.0453 0.0603 0.0769 0.0949 0.1144 0.1354
#> [2,] 0.0317 0.2727 0.0714 0.0942 0.1190 0.1458 0.1745 0.2053
#> [3,] 0.0453 0.0714 0.3636 0.1306 0.1636 0.1990 0.2367 0.2767
#> [4,] 0.0603 0.0942 0.1306 0.4545 0.2107 0.2545 0.3008 0.3496
#> [5,] 0.0769 0.1190 0.1636 0.2107 0.5455 0.3124 0.3669 0.4240
#> [6,] 0.0949 0.1458 0.1990 0.2545 0.3124 0.6364 0.4350 0.4998
#> [7,] 0.1144 0.1745 0.2367 0.3008 0.3669 0.4350 0.7273 0.5772
#> [8,] 0.1354 0.2053 0.2767 0.3496 0.4240 0.4998 0.5772 0.8182

# Compare with high-entropy approximation
he <- he_jip(pik)
max(abs(pikl - he))
#> [1] 0.02273801