Computes the joint inclusion probability matrix using the Hajek (1964) approximation based on conditional Poisson (rejective) sampling theory: $$\pi_{ij} \approx \pi_i \pi_j \left[1 - \frac{(1-\pi_i)(1-\pi_j)}{D}\right]$$ where \(D = \sum_k \pi_k (1 - \pi_k)\).
Arguments
- pik
Numeric vector of inclusion probabilities (\(0 \le \pi_k \le 1\)).
- sample_idx
Integer vector of 1-based indices for the sampled units, or
NULL(default) for the full population matrix. When non-NULL, returns the submatrix for those units only, without allocating the full N x N matrix.- eps
Boundary tolerance (default 1e-6). Units with \(\pi_k \ge 1 - \varepsilon\) are treated as certainty selections; units with \(\pi_k \le \varepsilon\) are treated as zero.
- ...
Additional arguments (ignored). Present so that the function matches the
joint_fnsignature required byregister_method().
Value
A symmetric matrix of joint inclusion probabilities:
N x N when sample_idx is NULL, or
length(sample_idx) x length(sample_idx) otherwise.
Diagonal entries are \(\pi_i\).
Details
The Hajek approximation is simpler and computationally lighter
than the high-entropy approximation (he_jip()), but generally
slightly less accurate. It is derived from the asymptotic
theory of rejective (conditional Poisson) sampling, where the
design is obtained by conditioning independent Poisson trials
on the total sample size.
The formula is valid for any high-entropy design, but is most
accurate when the design is close to rejective sampling. For
maximum-entropy designs (CPS, Sampford), he_jip() tends to
give tighter results. In practice, both approximations agree
closely for moderate to large populations with well-spread
inclusion probabilities.
Like he_jip(), this function matches the joint_fn
signature required by register_method():
Properties
Symmetric: \(\pi_{ij} = \pi_{ji}\)
Diagonal: \(\pi_{ii} = \pi_i\) (set directly)
Bounded: \(0 \le \pi_{ij} \le \min(\pi_i, \pi_j)\) (clamped)
Marginal defect: \(|\sum_{j \ne i} \pi_{ij} - (n-1)\pi_i| = O(1/N)\) for well-spread \(\pi_k\)
References
Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35(4), 1491–1523.
See also
he_jip() for the high-entropy approximation,
joint_inclusion_prob() for design-based dispatch,
register_method() for custom method registration.
Examples
pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)
# Full N x N matrix
pikl <- hajek_jip(pik)
round(pikl, 4)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.1818 0.0317 0.0453 0.0603 0.0769 0.0949 0.1144 0.1354
#> [2,] 0.0317 0.2727 0.0714 0.0942 0.1190 0.1458 0.1745 0.2053
#> [3,] 0.0453 0.0714 0.3636 0.1306 0.1636 0.1990 0.2367 0.2767
#> [4,] 0.0603 0.0942 0.1306 0.4545 0.2107 0.2545 0.3008 0.3496
#> [5,] 0.0769 0.1190 0.1636 0.2107 0.5455 0.3124 0.3669 0.4240
#> [6,] 0.0949 0.1458 0.1990 0.2545 0.3124 0.6364 0.4350 0.4998
#> [7,] 0.1144 0.1745 0.2367 0.3008 0.3669 0.4350 0.7273 0.5772
#> [8,] 0.1354 0.2053 0.2767 0.3496 0.4240 0.4998 0.5772 0.8182
# Compare with high-entropy approximation
he <- he_jip(pik)
max(abs(pikl - he))
#> [1] 0.02273801