Dissecting lme4's lmer function. Part 1.
This blog posts marks the start of my Google Summer of Code project with the Ruby Science Foundation, where I will develop mixed linear models software for Ruby. As a preparation for my GSoC project, I will dedicate a couple of blog posts to a meticulous analysis of lme4
code (so that I can steal all the ideas from it!).
The R
package lme4
is capable of fitting linear, generalized and nonlinear mixed effects models. Here, I am interested in linear mixed models exclusively. A linear mixed model fit is performed in lme4
with an lmer
function call. For example:
library(lme4)
data(sleepstudy)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
Thus, the first function to dissect is lmer
.
The function definition of lmer
is located in the file lmer.R
.
The code is confusing. So, let’s go through it step-by-step.
General overview
The general steps taken in lmer
are:
-
Set some parameters in the local environment. In particular, deal with the function argument
control
, which usually is a list inheriting from classmerControl
(butcontrol
can be defined in other ways too, and the code deals with all possibilities in a multitude of lines). It includes general parameters such as the optimizer to be used, model- and data-checking specifications, and all parameters to be passed to the optimizer. -
Parse data and formula. At this step the fixed effects matrix \(X\), the random effects matrix \(Z\), the parameter vector \(\theta\), the covariance factor \(\Lambda\subscript{\theta}\), etc. are created from the input data. Moreover, the
REML
flag is set, formula is rewritten in the preferred form, and warnings from checks for the random effects (e.g. about the number of levels, dimension of \(Z\), rank of \(Z\)) are generated. This all is performed by the functionlFormula
, which is located inmodular.R
. -
Create the deviance function to be minimized. Call the function
mkLmerDevfun
, which returns a functiondevfun
to calculate the profiled deviance. ImplicitlymkLmerDevfun
also returns an environment required to evaluatedevfun
. The function definition ofmkLmerDevfun
can be found inmodular.R
. -
Optimize the deviance function w.r.t. \(\theta\). Optimize the deviance
devfun
with respect to the covariance parameters \(\theta\). This is performed by the functionoptimizeLmer
(located inmodular.R
), which returns the result of the optimization. -
Check convergence criteria. Convergence check according to the convergence check options specified in
control
, performed by the functioncheckConv
, which is implemented in the filecheckConv.R
. -
Set up a useful output object. Package the results into a
merMod
object to return. This is performed by the functionmkMerMod
, located in the fileutilities.R
.
Step 2-6 perform function calls to other sophisticated functions. We will dissect each of those separately in what follows.
Processing the formula - “lFormula”
Defined as:
lFormula <- function(formula, data=NULL, REML = TRUE,
subset, weights, na.action, offset, contrasts = NULL,
control=lmerControl(), ...)
It proceeds with the following steps:
-
Check the input arguments. Call
checkArgs
on the given input arguments, in order to produces some warnings (about “family” and “method” being deprecated, and extra unused arguments) if necessary. Check that the input forcheck.formula.LHS
is a valid value via the functioncheckCtrlLevels
. The functioncheckArgs
is defined inutilities.R
, andcheckCtrlLevels
is defined inmodular.R
. -
Check and adjust the given formula and data frame. Check the provided formula and data with
checkFormulaData
fromutilities.R
(e.g. check missing data, check if formula has a left hand side, etc.). From the right hand side of a formula for a mixed-effects model, expand terms with||
into separate, independent random effect terms written with|
(expandDoubleVerts
is defined inutilities.R
,RHSForm
is defined inmodular.R
). Generate a data frame with only the variables needed to use the given formula. -
Create the random effects model matrix \(Z\), its covariance factor \(\Lambda\subscript{\theta}\), as well as the parametrization \(\theta\), etc. This is performed by the function
mkReTrms
, which is defined inutilities.R
. Additionally, check that all grouping structures have at least 2 levels, that for each random effect \(Z\) has more columns than rows, that the rank of \(Z\) is not greater than \(n\), etc. with the functionscheckNlevels
,checkZdims
,checkZrank
(all defined inmodular.R
). -
Create the fixed effects design matrix \(X\). Only basic
R
functions likemodel.matrix
used here. -
Return a data frame with variables required to use the formula. That is, fixed effects matrix \(X\);
reTrms
containing the random effects matrix \(Z\), covariance factor \(\Lambda\subscript{\theta}\), etc.;REML
flag; the formula; warnings from checks for the random effects (number of levels, dimension of \(Z\), rank of \(Z\), etc.).