The IWLS algorithm used to fit conditional logit models
Source:vignettes/fitting-mclogit.Rmd
fitting-mclogit.RmdThe package “mclogit” fits conditional logit models using a maximum
likelihood estimator. It does this by maximizing the log-likelihood
function using an iterative weighted least-squares (IWLS)
algorithm, which follows the algorithm used by the
glm.fit() function from the “stats” package of R
(Nelder and Wedderburn 1972; McCullagh and Nelder
1989; R Core Team 2023).
If \pi_{ij} is the probability that individual i chooses alternative j from his/her choice set \mathcal{S}_i, where
\pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_{k\in\mathcal{S}_i}\exp(\eta_{ik})}
and if y_{ij} is the dummy variable with equals 1 if individual i chooses alternative j and equals 0 otherwise, the log-likelihood function (given that the choices are identically independent distributed given \pi_{ij}) can be written as
\ell=\sum_{i,j}y_{ij}\ln\pi_{ij} =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right)
If the data are aggregated in the terms of counts such that n_{ij} is the number of individuals with the same choice set and the same choice probabilities \pi_{ij} that have chosen alternative j, the log-likelihood is (given that the choices are identically independent distributed given \pi_{ij})
\ell=\sum_{i,j}n_{ij}\ln\pi_{ij} =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right)
where n_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}.
If
\eta_{ij} = \alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}
then the gradient of the log-likelihood with respect to the coefficient vector \boldsymbol{\alpha} is
\frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\ell}{\partial\eta_{ij}} = \sum_{i,j} \boldsymbol{x}_{ij} (n_{ij}-n_{i+}\pi_{ij}) = \sum_{i,j} \boldsymbol{x}_{ij} n_{i+} (y_{ij}-\pi_{ij}) = \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
and the Hessian is
\frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} = \sum_{i,j} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} \frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'} \frac{\partial\ell^2}{\partial\eta_{ij}^2} = - \sum_{i,j,k} \boldsymbol{x}_{ij} n_{i+} (\delta_{jk}-\pi_{ij}\pi_{ik}) \boldsymbol{x}_{ij}' = - \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
Here y_{ij}=n_{ij}/n_{i+}, while \boldsymbol{N} is a diagonal matrix with diagonal elements n_{i+}.
Newton-Raphson iterations then take the form
\boldsymbol{\alpha}^{(s+1)} = \boldsymbol{\alpha}^{(s)} - \left( \frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} \right)^{-1} \frac{\partial\ell}{\partial\boldsymbol{\alpha}} = \boldsymbol{\alpha}^{(s)} + \left( \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \right)^{-1} \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
where \boldsymbol{\pi} and \boldsymbol{W} are evaluated at \boldsymbol{\alpha}=\boldsymbol{\alpha}^{(s)}.
Multiplying by \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} gives
\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha}^{(s+1)} = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha}^{(s)} + \boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) = \boldsymbol{X}'\boldsymbol{W} \left(\boldsymbol{X}\boldsymbol{\alpha}^{(s)}+\boldsymbol{W}^-\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})\right) = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*
where \boldsymbol{W}^- is a generalized inverse of \boldsymbol{W} and \boldsymbol{y}^* is a “working response vector” with elements
y_{ij}^*=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}}
The IWLS algorithm thus involves the following steps:
Create some suitable starting values for \boldsymbol{\pi}, \boldsymbol{W}, and \boldsymbol{y}^*
Construct the “working dependent variable” \boldsymbol{y}^*
-
Solve the equation
\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} \boldsymbol{\alpha} = \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*
for \boldsymbol{\alpha}.
Compute updated \boldsymbol{\eta}, \boldsymbol{\pi}, \boldsymbol{W}, and \boldsymbol{y}^*.
-
Compute the updated value for the log-likelihood or the deviance
d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}}
If the decrease of the deviance (or the increase of the log-likelihood) is smaller than a given tolerance criterian (typically \Delta d \leq 10^{-7}) stop the algorighm and declare it as converged. Otherwise go back to step 2 with the updated value of \boldsymbol{\alpha}.
The starting values for the algorithm used by the mclogit package are constructe as follows:
-
Set
\eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12)
(where q_i is the size of the choice set \mathcal{S}_i)
Compute the starting values of the choice probabilities \pi_{ij}^{(0)} according to the equation at the beginning of the page
-
Compute intial values of the working dependent variable according to
y_{ij}^{*(0)} = \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}}