The package “mclogit” fits conditional logit models using a maximum likelihood estimator. It does this by maximizing the log-likelihood function using an iterative weighted least-squares (IWLS) algorithm, which follows the algorithm used by the glm.fit() function from the “stats” package of R (Nelder and Wedderburn 1972; McCullagh and Nelder 1989; R Core Team 2023).

If πij\pi_{ij} is the probability that individual ii chooses alternative jj from his/her choice set 𝒮i\mathcal{S}_i, where

πij=exp(ηij)k𝒮iexp(ηik) \pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})}

and if yijy_{ij} is the dummy variable with equals 1 if individual ii chooses alternative jj and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given πij\pi_{ij}) can be written as

=i,jyijlnπij=i,jyijηijiln(jexp(ηij)) \ell=\sum_{i,j}y_{ij}\ln\pi_{ij} =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right)

If the data are aggregated in the terms of counts such that nijn_{ij} is the number of individuals with the same choice set and the same choice probabilities πij\pi_{ij} that have chosen alternative jj, the log-likelihood is (given that the choices are identically independent distributed given πij\pi_{ij})

=i,jnijlnπij=i,jnijηijini+ln(jexp(ηij)) \ell=\sum_{i,j}n_{ij}\ln\pi_{ij} =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right)

where ni+=j𝒮inijn_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}.

If

ηij=α1x1ij++αrxrij=𝐱ij𝛂 \eta_{ij} = \alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}

then the gradient of the log-likelihood with respect to the coefficient vector 𝛂\boldsymbol{\alpha} is

𝛂=i,jηij𝛂ηij=i,j𝐱ij(nijni+πij)=i,j𝐱ijni+(yijπij)=𝐗𝐍(𝐲𝛑) \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\ell}{\partial\eta_{ij}} = \sum_{i,j} \boldsymbol{x}_{ij} (n_{ij}-n_{i+}\pi_{ij}) = \sum_{i,j} \boldsymbol{x}_{ij} n_{i+} (y_{ij}-\pi_{ij}) = \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})

and the Hessian is

2𝛂𝛂=i,jηij𝛂ηij𝛂2ηij2=i,j,k𝐱ijni+(δjkπijπik)𝐱ij=𝐗𝐖𝐗 \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'} \frac{\partial\ell^2}{\partial\eta_{ij}^2} = - \sum_{i,j,k} \boldsymbol{x}_{ij} n_{i+} (\delta_{jk}-\pi_{ij}\pi_{ik}) \boldsymbol{x}_{ij}' = - \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}

Here yij=nij/ni+y_{ij}=n_{ij}/n_{i+}, while 𝐍\boldsymbol{N} is a diagonal matrix with diagonal elements ni+n_{i+}.

Newton-Raphson iterations then take the form

𝛂(s+1)=𝛂(s)(2𝛂𝛂)1𝛂=𝛂(s)+(𝐗𝐖𝐗)1𝐗𝐍(𝐲𝛑) \boldsymbol{\alpha}^{(s+1)} = \boldsymbol{\alpha}^{(s)} - \left( \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} \right)^{-1} \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \boldsymbol{\alpha}^{(s)} + \left( \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \right)^{-1} \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})

where 𝛑\boldsymbol{\pi} and 𝐖\boldsymbol{W} are evaluated at 𝛂=𝛂(s)\boldsymbol{\alpha}=\boldsymbol{\alpha}^{(s)}.

Multiplying by 𝐗𝐖𝐗\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} gives

𝐗𝐖𝐗𝛂(s+1)=𝐗𝐖𝐗𝛂(s)+𝐗𝐍(𝐲𝛑)=𝐗𝐖(𝐗𝛂(s)+𝐖𝐍(𝐲𝛑))=𝐗𝐖𝐲* \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha}^{(s+1)} = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha}^{(s)} + \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) = \boldsymbol{X}'\boldsymbol{W} \left(\boldsymbol{X}\boldsymbol{\alpha}^{(s)}+\boldsymbol{W}^-\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})\right) = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*

where 𝐖\boldsymbol{W}^- is a generalized inverse of 𝐖\boldsymbol{W} and 𝐲*\boldsymbol{y}^* is a “working response vector” with elements

yij*=𝐱ij𝛂(s)+yijπijπij y_{ij}^*=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}}

The IWLS algorithm thus involves the following steps:

  1. Create some suitable starting values for 𝛑\boldsymbol{\pi}, 𝐖\boldsymbol{W}, and 𝐲*\boldsymbol{y}^*

  2. Construct the “working dependent variable” 𝐲*\boldsymbol{y}^*

  3. Solve the equation

    𝐗𝐖𝐗𝛂=𝐗𝐖𝐲* \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha} = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*

    for 𝛂\boldsymbol{\alpha}.

  4. Compute updated 𝛈\boldsymbol{\eta}, 𝛑\boldsymbol{\pi}, 𝐖\boldsymbol{W}, and 𝐲*\boldsymbol{y}^*.

  5. Compute the updated value for the log-likelihood or the deviance

    d=2i,jnijlnyijπij d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}}

  6. If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically Δd107\Delta d \leq 10^{-7}) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of 𝛂\boldsymbol{\alpha}.

The starting values for the algorithm used by the mclogit package are constructe as follows:

  1. Set

    ηij(0)=ln(nij+12)1qik𝒮iln(nij+12) \eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12)

    (where qiq_i is the size of the choice set 𝒮i\mathcal{S}_i)

  2. Compute the starting values of the choice probabilities πij(0)\pi_{ij}^{(0)} according to the equation at the beginning of the page

  3. Compute intial values of the working dependent variable according to

    yij*(0)=ηij(0)+yijπij(0)πij(0) y_{ij}^{*(0)} = \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}}

References

McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Monographs on Statistics & Applied Probability. Boca Raton et al.: Chapman & Hall/CRC.
Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society. Series A (General) 135 (3): 370–84. https://doi.org/10.2307/2344614.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.