High-Entropy Approximation for Joint Inclusion Probabilities

Computes the joint inclusion probability matrix using the high-entropy approximation of Brewer & Donadio (2003, eq. 18): $$\pi_{ij} \approx \pi_i \pi_j \frac{c_i + c_j}{2}$$ where $c_k = \frac{n-1}{n - \frac{2n-1}{n-1}\pi_k + \frac{\sum_\ell \pi_\ell^2}{n-1}}$ and $n = \sum_k \pi_k$.

Usage

he_jip(pik, sample_idx = NULL, eps = 1e-06, ...)

Arguments

pik: Numeric vector of inclusion probabilities ($0 \le \pi_k \le 1$).
sample_idx: Unique integer vector of 1-based indices for the sampled units, or NULL (default) for the full population matrix. When non-NULL, returns the submatrix for those units only, without allocating the full N x N matrix.
eps: Boundary tolerance (default 1e-6). Units with $\pi_k \ge 1 - \varepsilon$ are treated as certainty selections; units with $\pi_k \le \varepsilon$ are treated as zero.
...: Additional arguments (ignored). Present so that the function matches the joint_fn signature required by register_method().

Value

A symmetric matrix of joint inclusion probabilities: N x N when sample_idx is NULL, or length(sample_idx) x length(sample_idx) otherwise. Diagonal entries are $\pi_i$.

Details

The high-entropy (HE) approximation is the recommended default for designs that are close to maximum entropy, which includes most common unequal-probability without-replacement designs: Brewer, Sampford, Tillé, SPS, Pareto, cube, and CPS itself.

The approximation guarantees symmetry, $0 \le \pi_{ij} \le \min(\pi_i, \pi_j)$, and correct diagonal ($\pi_{ii} = \pi_i$), but does not exactly satisfy the marginal identity $\sum_{j \neq i} \pi_{ij} = (n-1)\pi_i$. The defect is typically small but grows with skewed $\pi_k$ and small $n$.

Internally, this calls the same C implementation used by joint_inclusion_prob() for Brewer, SPS, Pareto, and cube methods. Exported so that custom methods registered via register_method() can use it directly as their joint_fn:

register_method("my_method", sample_fn = my_fn, joint_fn = he_jip)

For a lighter alternative based on conditional Poisson theory, see hajek_jip().

References

Brewer, K.R.W. and Donadio, M.E. (2003). The high entropy variance of the Horvitz-Thompson estimator. Survey Methodology, 29(2), 189–196.

Examples

pik <- inclusion_prob(c(2, 3, 4, 5, 6, 7, 8, 9), n = 4)

# Full N x N matrix
pikl <- he_jip(pik)
round(pikl, 4)
#>        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
#> [1,] 0.1818 0.0350 0.0480 0.0618 0.0765 0.0925 0.1098 0.1289
#> [2,] 0.0350 0.2727 0.0737 0.0948 0.1174 0.1417 0.1682 0.1972
#> [3,] 0.0480 0.0737 0.3636 0.1296 0.1604 0.1935 0.2294 0.2688
#> [4,] 0.0618 0.0948 0.1296 0.4545 0.2059 0.2482 0.2939 0.3440
#> [5,] 0.0765 0.1174 0.1604 0.2059 0.5455 0.3062 0.3624 0.4237
#> [6,] 0.0925 0.1417 0.1935 0.2482 0.3062 0.6364 0.4355 0.5087
#> [7,] 0.1098 0.1682 0.2294 0.2939 0.3624 0.4355 0.7273 0.5999
#> [8,] 0.1289 0.1972 0.2688 0.3440 0.4237 0.5087 0.5999 0.8182

# Submatrix for specific units
he_jip(pik, sample_idx = c(1, 3, 5))
#>            [,1]       [,2]       [,3]
#> [1,] 0.18181818 0.04796609 0.07652019
#> [2,] 0.04796609 0.36363636 0.16040262
#> [3,] 0.07652019 0.16040262 0.54545455

# Use as joint_fn in register_method()
register_method("my_method", sample_fn = function(pik, n, prn, ...) {
  sample.int(length(pik), n, prob = pik)
}, joint_fn = he_jip)
unregister_method("my_method")