Fits three base count regression models (Poisson, negative binomial, and
Tweedie), runs a DHARMa zero-inflation test on each, fits the corresponding
zero-inflated counterpart for any model where zero-inflation is detected,
then selects the best overall model by the metric given in decide.
Usage
countGLM(
formula,
data,
ziformula = NULL,
decide = "BIC",
maxit = NULL,
families = c("poisson", "negbin", "tweedie"),
...
)Arguments
- formula
A model formula for the count component (e.g.
y ~ x1 + x2). The response must be non-negative.- data
A data frame containing the variables in
formula(andziformulaif provided).- ziformula
A one-sided formula for the zero-inflation component passed to
zeroinflPoissonGLM(),zeroinflNegbinGLM(), andzeroinflTweedieGLM()when they are needed. WhenNULL(default), the same right-hand side asformulais used.- decide
Character string specifying the model-selection criterion. One of
"BIC"(default),"AIC","LogLik"(log-likelihood, higher is better), or"McFadden"(McFadden pseudo-R², higher is better). Matching is case-insensitive.- maxit
Optional integer; maximum optimizer/IWLS iterations. When non-
NULL, forwarded as themaxitargument to each underlying fitter (poissonGLM(),negbinGLM(),tweedieGLM(), and the ZI counterparts), which translate it into the appropriate backendcontrolobject. A single value is applied across every model family.- families
Character vector naming which base families to fit. Must be a subset of
c("poisson", "negbin", "tweedie"). Defaults to all three. Each family's zero-inflated counterpart is fitted conditionally on its base model passing zero-inflation detection (as before). Use this to skip slow Tweedie / glmmTMB fits or to restrict comparison to a specific subset. The quasi-Poisson fit (produced when the Poisson fit shows a constant-overdispersion signature) requires"poisson"to be included.- ...
Additional arguments passed to each individual model fitter.
Value
An object of class "countGLM", a list with:
callThe matched call.
fitsA named list of successfully fitted model objects. Base models (
poisson,negbin,tweedie) are always attempted.zeroinfl_poisson,zeroinfl_negbin, andzeroinfl_tweedieare fitted only when the DHARMa zero-inflation test flags their base model (p < 0.05). Any model that failed to converge is omitted. Base model fits includediagnostics$zi_testpopulated from the DHARMa test.aic_tableA named numeric vector of AICs, sorted ascending.
bic_tableA named numeric vector of BICs, sorted ascending.
metric_tableA named numeric vector of the selection metric values (the criterion named by
decide), sorted best-first.decideThe normalised (lower-case) name of the selection criterion actually used.
best_modelCharacter name of the model selected by
decide.recommendationA plain-language character string explaining the selection, including the criterion value, dispersion context, and zero-inflation test results.
vifA data frame of generalized variance inflation factors (GVIF; Fox & Monette 1992) for the main-effect predictors in
formula(interaction and polynomial terms are excluded). Columns areGVIF,Df, andGVIF^(1/(2*Df))(the degrees-of-freedom adjusted scalar comparable to a conventional VIF). For single-df terms (continuous predictors and two-level factors),GVIFequals the usual VIF.NULLwhen fewer than two main-effect terms are present. A warning is issued when any term'sGVIF^(1/(2*Df))exceedssqrt(5)(the GVIF analogue of theVIF > 5rule of thumb).
Details
Workflow: countGLM() fits Poisson, negative binomial, and Tweedie
base models. It then runs a DHARMa simulation test for zero-inflation on
each successful base model. For every family, the zero-inflated counterpart
is fitted only when zero-inflation is detected (p < 0.05) on its base
model. A quasiPoissonGLM() fit is additionally produced when the Poisson
fit shows a dispersion ratio > 1.2 combined with a roughly flat squared
Pearson residual cloud that sits above 1 (the quasi-Poisson signature).
Because quasi-Poisson has no proper likelihood, it is excluded from the
decide comparison and reported alongside it with NA AIC/BIC.
All surviving likelihood-based models are compared by decide.
Model selection: The model with the best value of decide is chosen.
For "AIC" and "BIC" the model with the lowest value wins; for
"LogLik" and "McFadden" the model with the highest value wins.
AIC and BIC are always computed and displayed regardless of decide.
When decide = "McFadden", intercept-only null models are fitted for each
family to compute the pseudo-R².
Examples
df <- data.frame(
y = c(0L, 1L, 2L, 3L, 5L, 0L, 2L, 4L, 1L, 3L),
x1 = c(1.2, -0.4, 0.8, -1.1, 2.0, 0.3, -0.9, 1.5, -0.2, 0.7)
)
result <- suppressWarnings(countGLM(y ~ x1, data = df)) # default: BIC
result <- suppressWarnings(countGLM(y ~ x1, data = df, decide = "AIC"))
result <- suppressWarnings(countGLM(y ~ x1, data = df, decide = "McFadden"))
print(result)
#>
#> Call:
#> countGLM(formula = y ~ x1, data = df, decide = "McFadden")
#>
#> Model comparison (sorted by McFadden R2 (descending)):
#> model AIC BIC McFadden R2
#> poisson 39.02 39.63 0.0459
#> negbin 41.02 41.93 0.0407
#>
#> Selected model: poisson
#>
#> Recommendation:
#> Poisson was selected by McFadden R² (McFadden R² = 0.0459). The
#> Poisson dispersion ratio is 1.17, consistent with equidispersion. No
#> significant zero-inflation detected.
#>
#> Selected-model warnings:
#> Count component: 8 events (y > 0) for 1 predictor(s) (8.0 per
#> predictor). At least 10 events per predictor is recommended.
#>
summary(result)
#> Summary of selected model (poisson):
#>
#>
#> Call:
#> poissonGLM(formula = formula, data = data, assessZeroInflation = FALSE,
#> maxit = maxit)
#>
#> Model family: poissonGLM
#>
#> Coefficients (on response scale):
#> term exp.coef lower.95 upper.95 p.value stars
#> (Intercept) 1.7989 1.0683 3.0290 0.0272 *
#> x1 1.3396 0.8572 2.0936 0.1993
#>
#> Dispersion ratio: 1.1658
#> AIC: 39.02