Overdispersion in Multinomial Logit Models
dispersion.Rd
The function dispersion()
extracts the dispersion parameter
from a multinomial logit model or computes a dispersion parameter
estimate based on a given method. This dispersion parameter can
be attached to a model using update()
. It can also given as an
argument to summary()
.
Usage
dispersion(object,method, ...)
# S3 method for class 'mclogit'
dispersion(object,method=NULL,groups=NULL, ...)
Arguments
- object
an object that inherits class
"mclogit"
. When passed todispersion()
, it should be the result of a call ofmclogit()
ofmblogit()
, without random effects.- method
a character string, either
"Afroz"
,"Fletcher"
,"Pearson"
, or"Deviance"
, that specifies the estimator of the dispersion; orNULL
, in which case the default estimator,"Afroz"
is used. The estimators are discussed in Afroz et al. (2019).- groups
an optional formula that specifies groups of observations relevant for the estimation of overdispersion. Prediced probabilities should be constant within groups, otherwise a warning is generated since the overdispersion estimate may be imprecise.
- ...
other arguments, ignored or passed to other methods.
References
Afroz, Farzana, Matt Parry, and David Fletcher. (2020). "Estimating Overdispersion in Sparse Multinomial Data." Biometrics 76(3): 834-842. doi:10.1111/biom.13194 .
Examples
library(MASS) # For 'housing' data
# Note that with a factor response and frequency weighted data,
# Overdispersion will be overestimated:
house.mblogit <- mblogit(Sat ~ Infl + Type + Cont, weights = Freq,
data = housing)
#>
#> Iteration 1 - deviance = 3493.764 - criterion = 0.9614469
#> Iteration 2 - deviance = 3470.111 - criterion = 0.00681597
#> Iteration 3 - deviance = 3470.084 - criterion = 7.82437e-06
#> Iteration 4 - deviance = 3470.084 - criterion = 7.469596e-11
#> converged
dispersion(house.mblogit,method="Afroz")
#> [1] 2.062653
dispersion(house.mblogit,method="Deviance")
#> [1] 2.175601
summary(house.mblogit)
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.4192 0.1729 -2.424 0.015342 *
#> InflMedium 0.4464 0.1416 3.153 0.001613 **
#> InflHigh 0.6649 0.1863 3.568 0.000359 ***
#> TypeApartment -0.4357 0.1725 -2.525 0.011562 *
#> TypeAtrium 0.1314 0.2231 0.589 0.555980
#> TypeTerrace -0.6666 0.2063 -3.232 0.001230 **
#> ContHigh 0.3609 0.1324 2.726 0.006420 **
#>
#> Equation for High vs Low:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.1387 0.1592 -0.871 0.383570
#> InflMedium 0.7349 0.1369 5.366 8.03e-08 ***
#> InflHigh 1.6126 0.1671 9.649 < 2e-16 ***
#> TypeApartment -0.7356 0.1553 -4.738 2.16e-06 ***
#> TypeAtrium -0.4080 0.2115 -1.929 0.053730 .
#> TypeTerrace -1.4123 0.2001 -7.056 1.71e-12 ***
#> ContHigh 0.4818 0.1241 3.881 0.000104 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Approximate residual Deviance: 3470
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>
phi.Afroz <- dispersion(house.mblogit,method="Afroz")
summary(house.mblogit, dispersion=phi.Afroz)
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.4192 0.1729 -2.424 0.016717 *
#> InflMedium 0.4464 0.1416 3.153 0.002004 **
#> InflHigh 0.6649 0.1863 3.568 0.000504 ***
#> TypeApartment -0.4357 0.1725 -2.525 0.012763 *
#> TypeAtrium 0.1314 0.2231 0.589 0.557002
#> TypeTerrace -0.6666 0.2063 -3.232 0.001559 **
#> ContHigh 0.3609 0.1324 2.726 0.007305 **
#>
#> Equation for High vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.1387 0.1592 -0.871 0.385176
#> InflMedium 0.7349 0.1369 5.366 3.57e-07 ***
#> InflHigh 1.6126 0.1671 9.649 < 2e-16 ***
#> TypeApartment -0.7356 0.1553 -4.738 5.58e-06 ***
#> TypeAtrium -0.4080 0.2115 -1.929 0.055910 .
#> TypeTerrace -1.4123 0.2001 -7.056 9.14e-11 ***
#> ContHigh 0.4818 0.1241 3.881 0.000164 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Dispersion: 2.062653 on 130 degrees of freedom
#> Approximate residual Deviance: 3470
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>
summary(update(house.mblogit, dispersion="Afroz"))
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.4192 0.2484 -1.688 0.0938 .
#> InflMedium 0.4464 0.2033 2.196 0.0299 *
#> InflHigh 0.6649 0.2676 2.485 0.0142 *
#> TypeApartment -0.4357 0.2478 -1.758 0.0811 .
#> TypeAtrium 0.1314 0.3204 0.410 0.6825
#> TypeTerrace -0.6666 0.2962 -2.250 0.0261 *
#> ContHigh 0.3609 0.1901 1.898 0.0599 .
#>
#> Equation for High vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.1387 0.2287 -0.607 0.545109
#> InflMedium 0.7349 0.1967 3.737 0.000278 ***
#> InflHigh 1.6126 0.2400 6.718 5.20e-10 ***
#> TypeApartment -0.7356 0.2230 -3.299 0.001253 **
#> TypeAtrium -0.4080 0.3038 -1.343 0.181568
#> TypeTerrace -1.4123 0.2875 -4.913 2.64e-06 ***
#> ContHigh 0.4818 0.1783 2.703 0.007800 **
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Dispersion: 2.062653 on 130 degrees of freedom
#> Approximate residual Deviance: 3470
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>
# In order to be able to estimate overdispersion accurately,
# data like the above (which usually comes from applying
# 'as.data.frame' to a contingency table) the model has to be
# fitted with the optional argument 'from.table=TRUE':
house.mblogit.corrected <- mblogit(Sat ~ Infl + Type + Cont, weights = Freq,
data = housing, from.table=TRUE,
dispersion="Afroz")
#>
#> Iteration 1 - deviance = 38.84842 - criterion = 0.992521
#> Iteration 2 - deviance = 38.66222 - criterion = 0.004803721
#> Iteration 3 - deviance = 38.6622 - criterion = 3.782555e-07
#> Iteration 4 - deviance = 38.6622 - criterion = 3.666163e-15
#> converged
# Now the estimated dispersion parameter is no longer larger than 20,
# but just bit over 1.0.
summary(house.mblogit.corrected)
#>
#> Call:
#> mblogit(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq,
#> dispersion = "Afroz", from.table = TRUE)
#>
#> Equation for Medium vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.41923 0.02661 -15.757 < 2e-16 ***
#> InflMedium 0.44640 0.02178 20.497 < 2e-16 ***
#> InflHigh 0.66494 0.02867 23.195 < 2e-16 ***
#> TypeApartment -0.43569 0.02654 -16.414 < 2e-16 ***
#> TypeAtrium 0.13137 0.03432 3.827 0.00053 ***
#> TypeTerrace -0.66657 0.03173 -21.007 < 2e-16 ***
#> ContHigh 0.36085 0.02037 17.716 < 2e-16 ***
#>
#> Equation for High vs Low:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.13874 0.02450 -5.664 2.36e-06 ***
#> InflMedium 0.73486 0.02107 34.882 < 2e-16 ***
#> InflHigh 1.61263 0.02571 62.718 < 2e-16 ***
#> TypeApartment -0.73563 0.02389 -30.795 < 2e-16 ***
#> TypeAtrium -0.40798 0.03254 -12.539 2.64e-14 ***
#> TypeTerrace -1.41233 0.03079 -45.866 < 2e-16 ***
#> ContHigh 0.48183 0.01910 25.229 < 2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Dispersion: 0.0236687 on 34 degrees of freedom
#> Approximate residual Deviance: 38.66
#> Number of Fisher scoring iterations: 4
#> Number of observations: 1681
#>