Predict Directive
Underlying Principles
Our approach to prediction is a generalisation of that of Lane and Nelder (1982) who
consider fixed effects models. They form fitted values for all
combinations of the explanatory variables in the model, then take
marginal means across the explanatory variables not relevent to the
current prediction. Our case is more general in that random effects
can be fitted in our (mixed) models. A full description can be found in
Gilmour et al. (2004)
and Welham et al. (2004).
Random factor terms may contribute to predictions in several
ways. They may be evaluated at values specified by the user,
they may be averaged over, or they may be omitted from the fitted
values used to form the prediction. Averaging over the set of random
effects gives a prediction specific to the random effects observed. We
call this a `conditional' prediction. Omitting the term from
the model produces a prediction at the population average (zero), that
is, substituting the assumed population mean for an unknown random
effect. We call this a `marginal' prediction. Note that in any
prediction, some terms may be evaluated as conditional and others at
marginal values, depending on the aim of prediction.
For fixed factors there is no pre-defined population average, so there
is no natural interpretation for a prediction derived by omitting a
fixed term from the fitted values. Averages must therefore be taken
over all the levels present to give a sample specific average, or
prediction must be at specified levels.
For covariate terms (fixed or random) the associated effect represents
the coefficient of a linear trend in the data with respect to the
covariate values. These terms should be evaluated at a given value of
the covariate, or averaged over several given values. Omission of a
covariate from the predictive model is equivalent to predicting at a
zero covariate value, which is often inappropriate.
Interaction terms constructed from factors generate an effect for each
combination of the factor levels, and behave like single factor terms
in prediction. Interactions constructed from covariates fit a linear
trend for the product of the covariate values and behave like a single
covariate term. An interaction of a factor and a covariate fits a
linear trend for the covariate for each level of the factor. For both
fixed and random terms, a value for the covariate must be given, but
the factor may be evaluated at a given level, averaged over or
(for random terms) omitted.
Before considering some examples in detail, it is useful to consider
the conceptual steps involved in the prediction process. Given the
explanatory variables used to define the linear (mixed) model, the four main
steps are
a:
Choose the explanatory variable(s) and their respective
value(s) for which predictions are required; the variables
involved will be referred to as the classify set and together
define the multiway table to be predicted.
b:
Determine which variables should be averaged over to form
predictions. The values to be averaged over must also be defined for
each variable; the variables involved will be referred to as the
averaging set. The combination of the classify set with these
averaging variables defines a multiway hyper-table. Note that
variables evaluated at only one value, for example, a covariate at its
mean value, can be formally introduced as part of the classifying or averaging set.
c:
Determine which terms from the linear mixed model are to be
used in forming predictions for each cell in the multiway hyper-table
in order to give appropriate conditional or marginal prediction.
d:
Choose the weights to be used when averaging cells in the
hyper-table to produce the multiway table to be reported.
Note that after steps (a) and (b) there may be some
explanatory variables in the fitted model that do not classify the
hyper-table. These variables occur in terms that are
ignored when forming the predicted values. It was concluded above that
fixed terms could not sensibly be ignored in forming predictions, so
that variables should only be omitted from the hyper-table when they only
appear in random terms. Whether terms derived from these variables should be used when forming predictions depends on the application and
aim of prediction.
The main difference in this prediction process compared to that
described by Lane and Nelder (1982) is the choice of whether to
include or exclude model terms when forming predictions. In linear
models, since all terms are fixed, terms not in the classify set must
be in the averaging set.
Return to start