dispersion.Rd
The function dispersion()
extracts the dispersion parameter
from a multinomial logit model or computes a dispersion parameter
estimate based on a given method. This dispersion parameter can
be attached to a model using update()
. It can also given as an
argument to summary()
.
dispersion(object,method, ...)
# S3 method for class 'mclogit'
dispersion(object,method=NULL,groups=NULL, ...)
an object that inherits class "mclogit"
.
When passed to dispersion()
, it
should be the result of a call of mclogit()
of
mblogit()
, without random effects.
a character string, either "Afroz"
,
"Fletcher"
, "Pearson"
, or "Deviance"
, that
specifies the estimator of the dispersion; or
NULL
, in which case the default estimator, "Afroz"
is used. The estimators are discussed in Afroz et al. (2019).
an optional formula that specifies groups of observations relevant for the estimation of overdispersion. Prediced probabilities should be constant within groups, otherwise a warning is generated since the overdispersion estimate may be imprecise.
other arguments, ignored or passed to other methods.
Afroz, Farzana, Matt Parry, and David Fletcher. (2020). "Estimating Overdispersion in Sparse Multinomial Data." Biometrics 76(3): 834-842. doi:10.1111/biom.13194 .
library(MASS) # For 'housing' data
# Note that with a factor response and frequency weighted data,
# Overdispersion will be overestimated:
house.mblogit <- mblogit(Sat ~ Infl + Type + Cont, weights = Freq,
data = housing)
#>
#> Iteration 1 - deviance = 3493.764 - criterion = 0.9614469
#> Iteration 2 - deviance = 3470.111 - criterion = 0.00681597
#> Iteration 3 - deviance = 3470.084 - criterion = 7.82437e-06
#> Iteration 4 - deviance = 3470.084 - criterion = 7.469596e-11
#> converged
dispersion(house.mblogit,method="Afroz")
#> [1] 2.062653
dispersion(house.mblogit,method="Deviance")
#> [1] 2.175601
summary(house.mblogit)
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.4192 0.1729 -2.424 0.015342 *
#> InflMedium 0.4464 0.1416 3.153 0.001613 **
#> InflHigh 0.6649 0.1863 3.568 0.000359 ***
#> TypeApartment -0.4357 0.1725 -2.525 0.011562 *
#> TypeAtrium 0.1314 0.2231 0.589 0.555980
#> TypeTerrace -0.6666 0.2063 -3.232 0.001230 **
#> ContHigh 0.3609 0.1324 2.726 0.006420 **
#>
#> Equation for High vs Low:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.1387 0.1592 -0.871 0.383570
#> InflMedium 0.7349 0.1369 5.366 8.03e-08 ***
#> InflHigh 1.6126 0.1671 9.649 < 2e-16 ***
#> TypeApartment -0.7356 0.1553 -4.738 2.16e-06 ***
#> TypeAtrium -0.4080 0.2115 -1.929 0.053730 .
#> TypeTerrace -1.4123 0.2001 -7.056 1.71e-12 ***
#> ContHigh 0.4818 0.1241 3.881 0.000104 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Approximate residual Deviance: 3470
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>
phi.Afroz <- dispersion(house.mblogit,method="Afroz")
summary(house.mblogit, dispersion=phi.Afroz)
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.4192 0.1729 -2.424 0.016717 *
#> InflMedium 0.4464 0.1416 3.153 0.002004 **
#> InflHigh 0.6649 0.1863 3.568 0.000504 ***
#> TypeApartment -0.4357 0.1725 -2.525 0.012763 *
#> TypeAtrium 0.1314 0.2231 0.589 0.557002
#> TypeTerrace -0.6666 0.2063 -3.232 0.001559 **
#> ContHigh 0.3609 0.1324 2.726 0.007305 **
#>
#> Equation for High vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.1387 0.1592 -0.871 0.385176
#> InflMedium 0.7349 0.1369 5.366 3.57e-07 ***
#> InflHigh 1.6126 0.1671 9.649 < 2e-16 ***
#> TypeApartment -0.7356 0.1553 -4.738 5.58e-06 ***
#> TypeAtrium -0.4080 0.2115 -1.929 0.055910 .
#> TypeTerrace -1.4123 0.2001 -7.056 9.14e-11 ***
#> ContHigh 0.4818 0.1241 3.881 0.000164 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Dispersion: 2.062653 on 130 degrees of freedom
#> Approximate residual Deviance: 3470
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>
summary(update(house.mblogit, dispersion="Afroz"))
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.4192 0.2484 -1.688 0.0938 .
#> InflMedium 0.4464 0.2033 2.196 0.0299 *
#> InflHigh 0.6649 0.2676 2.485 0.0142 *
#> TypeApartment -0.4357 0.2478 -1.758 0.0811 .
#> TypeAtrium 0.1314 0.3204 0.410 0.6825
#> TypeTerrace -0.6666 0.2962 -2.250 0.0261 *
#> ContHigh 0.3609 0.1901 1.898 0.0599 .
#>
#> Equation for High vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.1387 0.2287 -0.607 0.545109
#> InflMedium 0.7349 0.1967 3.737 0.000278 ***
#> InflHigh 1.6126 0.2400 6.718 5.20e-10 ***
#> TypeApartment -0.7356 0.2230 -3.299 0.001253 **
#> TypeAtrium -0.4080 0.3038 -1.343 0.181568
#> TypeTerrace -1.4123 0.2875 -4.913 2.64e-06 ***
#> ContHigh 0.4818 0.1783 2.703 0.007800 **
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Dispersion: 2.062653 on 130 degrees of freedom
#> Approximate residual Deviance: 3470
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>
# In order to be able to estimate overdispersion accurately,
# data like the above (which usually comes from applying
# 'as.data.frame' to a contingency table) the model has to be
# fitted with the optional argument 'from.table=TRUE':
house.mblogit.corrected <- mblogit(Sat ~ Infl + Type + Cont, weights = Freq,
data = housing, from.table=TRUE,
dispersion="Afroz")
#>
#> Iteration 1 - deviance = 38.84842 - criterion = 0.992521
#> Iteration 2 - deviance = 38.66222 - criterion = 0.004803721
#> Iteration 3 - deviance = 38.6622 - criterion = 3.782555e-07
#> Iteration 4 - deviance = 38.6622 - criterion = 3.666163e-15
#> converged
# Now the estimated dispersion parameter is no longer larger than 20,
# but just bit over 1.0.
summary(house.mblogit.corrected)
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq,
#> dispersion = "Afroz", from.table = TRUE)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.41923 0.02661 -15.757 < 2e-16 ***
#> InflMedium 0.44640 0.02178 20.497 < 2e-16 ***
#> InflHigh 0.66494 0.02867 23.195 < 2e-16 ***
#> TypeApartment -0.43569 0.02654 -16.414 < 2e-16 ***
#> TypeAtrium 0.13137 0.03432 3.827 0.00053 ***
#> TypeTerrace -0.66657 0.03173 -21.007 < 2e-16 ***
#> ContHigh 0.36085 0.02037 17.716 < 2e-16 ***
#>
#> Equation for High vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.13874 0.02450 -5.664 2.36e-06 ***
#> InflMedium 0.73486 0.02107 34.882 < 2e-16 ***
#> InflHigh 1.61263 0.02571 62.718 < 2e-16 ***
#> TypeApartment -0.73563 0.02389 -30.795 < 2e-16 ***
#> TypeAtrium -0.40798 0.03254 -12.539 2.64e-14 ***
#> TypeTerrace -1.41233 0.03079 -45.866 < 2e-16 ***
#> ContHigh 0.48183 0.01910 25.229 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Dispersion: 0.0236687 on 34 degrees of freedom
#> Approximate residual Deviance: 38.66
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>