R/joint.R
joint_inclusion_prob.RdReconstructs the joint inclusion probabilities (π_kl) for PPS
without replacement stages by replaying the design specification
against the original sampling frame. This allows exact variance
estimation via survey::ppsmat() instead of the default Brewer
approximation used by as_survey_design().
joint_inclusion_prob(x, frame, stage = NULL)A tbl_sample object produced by execute().
The data frame originally passed to execute(). Must
contain the same columns used during sampling (strata variables,
cluster variables, measure of size).
An integer vector of stage numbers to compute, or
NULL (default) to compute all PPS without replacement stages.
Non-PPS stages produce NULL entries in the returned list.
A named list of length equal to the number of executed stages. Each element is either:
A square matrix of joint inclusion probabilities for PPS WOR stages, with dimensions matching the number of sampled units at that stage (or sampled clusters for clustered stages). Rows and columns are ordered to match the sample.
NULL for non-PPS stages (SRS, systematic, WR methods, PMR
methods) or stages not requested via the stage argument.
For each PPS without replacement stage, the function:
Reconstructs the full-population inclusion probabilities (π_i) from the frame using the stage's method and measure of size
Dispatches to the appropriate sondage joint probability function
Extracts the submatrix corresponding to sampled units
For stratified stages, the target sample size per stratum (n_h) is
reconstructed by replaying the same allocation logic used during
execute() (proportional, Neyman, optimal, etc.) against the
frame. This ensures π_i values match what was computed at sampling
time, regardless of allocation method.
For stratified or conditional (within-cluster) stages, joint probabilities are computed independently within each group and assembled into a block-diagonal matrix.
| samplyr method | sondage function |
pps_brewer | up_brewer_jip(pik) |
pps_systematic | up_systematic_jip(pik) |
pps_maxent | up_maxent_jip(pik) |
pps_poisson | up_poisson_jip(pik) |
pps_chromy is excluded from joint probability computation.
Chromy's method is a Probability Minimum Replacement (PMR)
method, not a standard WOR method. Units can receive multiple
hits, so the relevant pairwise quantities are expected sample
size products E(n_i · n_j), not inclusion probabilities π_ij.
While sondage provides up_chromy_pairexp() to estimate E(n_i · n_j)
via Monte Carlo, the result cannot be passed to survey::ppsmat()
because the survey package assumes the diagonal contains π_i, but
for PMR it contains E(n_i²) ≠ E(n_i) when units receive multiple
hits. Chromy (2009) recommends the Hansen-Hurwitz approximation
for variance estimation, which is the default used by
as_survey_design().
Requires the original sampling frame. The frame must be unchanged
from what was passed to execute().
Units in the frame must be uniquely identifiable within each stratum/cluster group by their column values.
For designs with certainty selections (π_i = 1), the joint matrix is decomposed: certainty units are separated from the stochastic part, the joint probabilities for non-certainty units are computed from the reduced π vector, and the full matrix is reassembled with π_ij = 1 for certainty pairs and π_ij = π_j for certainty × non-certainty pairs.
as_survey_design() for the default export using Brewer's
approximation, survey::ppsmat() for wrapping joint matrices
if (FALSE) { # \dontrun{
sample <- sampling_design() |>
stage() |>
stratify_by(region) |>
cluster_by(ea_id) |>
draw(n = 5, method = "pps_brewer", mos = hh_count) |>
stage() |>
draw(n = 12) |>
execute(niger_eas, seed = 2025)
# Compute joint probabilities for stage 1
jip <- joint_inclusion_prob(sample, niger_eas, stage = 1)
# Use with survey package for exact variance
svy <- as_survey_design(sample, pps = survey::ppsmat(jip[[1]]))
# Compute all PPS stages at once
jip_all <- joint_inclusion_prob(sample, niger_eas)
} # }