Skip to contents

Computes the joint inclusion probability matrix using the Hajek (1964) approximation based on conditional Poisson (rejective) sampling theory: $$\pi_{ij} \approx \pi_i \pi_j \left[1 - \frac{(1-\pi_i)(1-\pi_j)}{D}\right]$$ where \(D = \sum_k \pi_k (1 - \pi_k)\).

Usage

hajek_jip(pik, sample_idx = NULL, eps = 1e-06, ...)

Arguments

pik

Numeric vector of inclusion probabilities (\(0 \le \pi_k \le 1\)).

sample_idx

Integer vector of 1-based indices for the sampled units, or NULL (default) for the full population matrix. When non-NULL, returns the submatrix for those units only, without allocating the full N x N matrix.

eps

Boundary tolerance (default 1e-6). Units with \(\pi_k \ge 1 - \varepsilon\) are treated as certainty selections; units with \(\pi_k \le \varepsilon\) are treated as zero.

...

Additional arguments (ignored). Present so that the function matches the joint_fn signature required by register_method().

Value

A symmetric matrix of joint inclusion probabilities: N x N when sample_idx is NULL, or length(sample_idx) x length(sample_idx) otherwise. Diagonal entries are \(\pi_i\).

Details

The Hajek approximation is simpler and computationally lighter than the high-entropy approximation (he_jip()), but generally slightly less accurate. It is derived from the asymptotic theory of rejective (conditional Poisson) sampling, where the design is obtained by conditioning independent Poisson trials on the total sample size.

The formula is valid for any high-entropy design, but is most accurate when the design is close to rejective sampling. For maximum-entropy designs (CPS, Sampford), he_jip() tends to give tighter results. In practice, both approximations agree closely for moderate to large populations with well-spread inclusion probabilities.

Like he_jip(), this function matches the joint_fn signature required by register_method():

register_method("my_method", sample_fn = my_fn, joint_fn = hajek_jip)

Properties

  • Symmetric: \(\pi_{ij} = \pi_{ji}\)

  • Diagonal: \(\pi_{ii} = \pi_i\) (set directly)

  • Bounded: \(0 \le \pi_{ij} \le \min(\pi_i, \pi_j)\) (clamped)

  • Marginal defect: \(|\sum_{j \ne i} \pi_{ij} - (n-1)\pi_i| = O(1/N)\) for well-spread \(\pi_k\)

References

Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Annals of Mathematical Statistics, 35(4), 1491–1523.

See also

he_jip() for the high-entropy approximation, joint_inclusion_prob() for design-based dispatch, register_method() for custom method registration.

Examples

pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)

# Full N x N matrix
pikl <- hajek_jip(pik)
round(pikl, 4)
#>        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
#> [1,] 0.1818 0.0317 0.0453 0.0603 0.0769 0.0949 0.1144 0.1354
#> [2,] 0.0317 0.2727 0.0714 0.0942 0.1190 0.1458 0.1745 0.2053
#> [3,] 0.0453 0.0714 0.3636 0.1306 0.1636 0.1990 0.2367 0.2767
#> [4,] 0.0603 0.0942 0.1306 0.4545 0.2107 0.2545 0.3008 0.3496
#> [5,] 0.0769 0.1190 0.1636 0.2107 0.5455 0.3124 0.3669 0.4240
#> [6,] 0.0949 0.1458 0.1990 0.2545 0.3124 0.6364 0.4350 0.4998
#> [7,] 0.1144 0.1745 0.2367 0.3008 0.3669 0.4350 0.7273 0.5772
#> [8,] 0.1354 0.2053 0.2767 0.3496 0.4240 0.4998 0.5772 0.8182

# Compare with high-entropy approximation
he <- he_jip(pik)
max(abs(pikl - he))
#> [1] 0.02273801