Skip to contents

Multinomial baseline-category logit models are a generalisation of logistic regression, that allow to model not only binary or dichotomous responses, but also polychotomous responses. In addition, they allow to model responses in the form of counts that have a pre-determined sum. These models are described in Agresti (2002). Estimating these models is also supported by the function multinom() in the R package “nnet” (Venables and Ripley 2002). In the package “mclogit”, the function to estimate these models is called mblogit(), which uses the infrastructure for estimating conditional logit models, exploiting the fact that baseline-category logit models can be re-expressed as condigional logit models.

Baseline-category logit models are constructed as follows. Suppose a categorical dependent variable or response with categories j=1,,qj=1,\ldots,q is observed for individuals i=1,,ni=1,\ldots,n. Let πij\pi_{ij} denote the probability that the value of the dependent variable for individual ii is equal to jj, then the baseline-category logit model takes the form:

πij={exp(αj0+αj1x1i++αjrxri)1+k>1exp(αk0+αk1x1i++αkrxri)for j>111+k>1exp(αk0+αk1x1i++αkrxri)for j=1 \begin{aligned} \pi_{ij} = \begin{cases} \dfrac{\exp(\alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri})} {1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})} & \text{for } j>1\\[20pt] \dfrac{1} {1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})} & \text{for } j=1 \end{cases} \end{aligned}

where the first category (j=1j=1) is the baseline category.

Equivalently, the model can be expressed in terms of log-odds, relative to the baseline-category:

lnπijπi1=αj0+αj1x1i++αjrxri. \ln\frac{\pi_{ij}}{\pi_{i1}} = \alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri}.

Here the relevant parameters of the model are the coefficients αjk\alpha_{jk} which describe how the values of independent variables (numbered k=1,,rk=1,\ldots,r) affect the relative chances of the response taking a value jj versus taking the value 11. Note that there is one coefficient for each independent variable and each response other than the baseline category.

References

Agresti, Alan. 2002. Categorical Data Analysis. Second. New York: Wiley.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/.