Balanced longitudinal data - Oranges

Introduction

We now illustrate the use of random coefficients and cubic smoothing splines for the analysis of balanced longitudinal data. The implementation of cubic smoothing splines in ASReml was originally based on the mixed model formulation presented by Verbyla et al. (1999). More recently the technology has been enhanced so that the user can specify knot points; in the original approach the knot points were taken to be the ordered set of unique values of the explanatory variable. The specification of knot points is particularly useful if the number of unique values in the explanatory variable is large, or if units are measured at different times.

The data we use was originally reported by Draper and Smith (1998, ex24N, p559) and has recently been reanalysed by Pinheiro and Bates (2000, p338). The data are displayed in Figure 1 and are the trunk circumferences (in millimetres) of each of 5 trees taken at 7 times. All trees were measured at the same time so that the data are balanced. The aim of the study is unclear, though, both previous analyses involved modelling the overall `growth' curve, accounting for the obvious variation in both level and shape between trees. Pinheiro and Bates (2000) used a nonlinear mixed effects modelling approach, in which they modelled the growth curves by a three parameter logistic function of age, given by

y = φ1/(1 + exp( [-(x-φ2)/φ3 ]) Eqn (1)

where y is the trunk circumference, x is the tree age in days since December 31 1968, φ1 is the asymptotic height, φ2 is the inflection point or the time at which the tree reaches 0.5φ1, φ3 is the time elapsed between trees reaching half and about 3/4 of φ1.
Figure 1. Trellis plot of trunk circumference for each tree

The datafile consists of 5 columns viz, Tree, a factor with 5 levels, age, tree age in days since 31st December 1968, circ the trunk circumference and season. The last column season was added after noting that tree age spans several years and if converted to day of year, measurements were taken in either Spring (April/May) or Autumn (September/October).

First we demonstrate the fitting of a cubic spline in ASReml by restricting the dataset to tree 1 only. The model includes the intercept and linear regression of trunk circumference on age and an additional random term spl(age,7) which instructs ASReml to include a random term with a special design matrix with 7-2=5 columns which relate to the vector, δ whose elements δi, i=2, ... ,6 are the second differentials of the cubic spline at the knot points. The second differentials of a natural cubic spline are zero at the first and last knot points (Green and Silverman, 1994). The ASReml job is
 this is the orange data, for tree 1
  seq     # record number is not used
  Tree 5
  age       # 118  484  664 1004 1231 1372 1582
  circ
  season !L Spring Autumn
 orange.asd !skip 1 !filter 2 !select 1
 !SPLINE spl(age,7) 118  484  664 1004 1231 1372 1582
 !PVAL age 150 200:1500
 circ ~ mu age !r spl(age,7)
 predict age
Note that the data for tree 1 has been selected by use of the !filter and !select qualifiers. Also note the use of !PVAL so that the spline curve is properly predicted at the additional nominated points. These additional data points are required for ASReml to form the design matrix to properly interpolate the cubic smoothing spline between knot points in the prediction process. Since the spline knot points are specifically nominated in the !SPLINE line, these extra points have no effect on the analysis run time. The !SPLINE line does not modify the analysis in this example since it simply nominates the 7 ages in the data file. The same analysis would result if the !SPLINE line was omitted and spl(age,7) in the model was replaced with spl(age). An extract of the output file is
 QUALIFIERS: predict x !PLOT
 Forming        7 equations:   2 dense.
 Initial updates will be shrunk by factor    0.316

  Notice: Expanded ... 0 200:1550  with STEP of   50.00
   1 LogL=-20.9043     S2=  48.470          5 df   0.1000
   2 LogL=-20.9013     S2=  49.152          5 df   0.9102E-01
   3 LogL=-20.8998     S2=  49.892          5 df   0.8221E-01
   4 LogL=-20.8996     S2=  50.273          5 df   0.7802E-01
 Final parameter values                        0.7892E-01

          - - - Results from analysis of circ - - -
 Akaike Information Criterion       45.80 (assuming 2 parameters).
 Bayesian Information Criterion     45.02

          Approximate stratum variance decomposition
 Stratum     Degrees-Freedom   Variance      Component Coefficients
 spl(x,7)               1.49    98.4896        12.2     1.0
 Residual Variance      3.51    50.2726         0.0     1.0

 Model\_Term                             Gamma         Sigma   Sigma/SE   % C
 spl(x,7)                IDV\_V    5  0.789210E-01   3.96756       0.40   0 P
 Residual                SCA\_V    7   1.00000       50.2726       1.32   0 P

                                   Wald F statistics
     Source of Variation           NumDF     DenDF    F-inc            P-inc
   7 mu                                1       3.5  1380.50            <.001
   3 x                                 1       3.5   217.24            <.001
 Notice: The DenDF values are calculated ignoring fixed/boundary/singular
             variance parameters using algebraic derivatives.

                     Solution       Standard Error    T-value     T-prev
   3 x
                    1   0.814772E-01   0.552797E-02     14.74
   7 mu
                    1    24.4378        5.75909          4.24
   6 spl(x,7)                              5 effects fitted
 Finished: 29 Jan 2014 12:45:06.574   LogL Converged
The REML estimate of the smoothing constant indicates that there is some nonlinearity. The fitted cubic smoothing spline is presented in Figure 2. The fitted values were obtained from the .pvs file. The four points below the line were the spring measurements.
Figure 2. Fitted cubic smoothing spline for tree 1

We now consider the analysis of the full dataset. Following Verbyla et al. (1999) we consider the analysis of variance decomposition (see Table 1) which models the overall and individual curves. Table 1. Orange data: AOV decomposition
stratum decomposition type df or ne constant 1 F 1
age
age F 1
spl(age,7) R 5
fac(age) R 7
tree
tree RC 5
age.tree
x.tree RC 5
spl(age,7).tree R 25
error R

An overall spline is fitted as well as tree deviation splines. We note however, that the intercept and slope for the tree deviation splines are assumed to be random effects. This is consistent with Verbyla et al. (1999). In this sense the tree deviation splines play a role in modelling the conditional curves for each tree and variance modelling. The intercept and slope for each tree are included as random coefficients (denoted by RC in Table 1). Thus, if U5 cross 2 is the matrix of intercepts (column 1) and slopes (column 2) for each tree, then we assume that

Var(vec(U)) = ΣI5

where Σ is a 2 by 2 symmetric positive definite matrix. Non smooth variation can be modelled at the overall mean (across trees) level and this is achieved in ASReml by inclusion of fac(age) as a random term.

An extract of the ASReml input file is
 circ ~ mu age !r !{ Tree Tree.age us(2 5 0.00001 0.00001).Tree !},
 spl(age,7) .1  spl(age,7).Tree 2.3 fac(age) 13.9
 predict age Tree !IGNORE fac(age)
We stress the importance of model building in these settings, where we generally commence with relatively simple variance models and update to more complex variance models if appropriate. Table 2 presents the sequence of fitted models we have used. Note that the REML log-likelihoods for models 1 and 2 are comparable and likewise for models 3 to 6. The REML log-likelihoods are not comparable between these groups due to the inclusion of the fixed season term in the second set of models. We begin by modelling the variance matrix for the intercept and slope for each tree, Σ, as a diagonal matrix as there is no point including a covariance component between the intercept and slope if the variance component(s) for one (or both) is zero. Model 1 also does not include a non-smooth component at the overall level (that is, fac(age)). Abbreviated output is shown below.

Table 2. Sequence of models fitted to the Orange data
Model 1 2 3 4 5 6
Term
tree y y y y y y
age.tree y y y y y y
(covariance) n n n n n y
spl(age,7) y y y y n y
tree.spl(age,7) y y y n y y
fac(age) n y y n n n
season n n y y y y
REML LogL -97.78 -94.07 -87.95 -91.22 -90.18 -87.43

   12 LogL=-97.7788     S2=  6.3550         33 df

  Source                Model  terms     Gamma     Component    Comp/SE   % C
  Tree                      5      5   4.79025       30.4420       1.24   0 P
  Tree.age                  5      5  0.939436E-04  0.597011E-03   1.41   0 P
  spl(age,7)                5      5   100.513       638.759       1.55   0 P
  spl(age,7).Tree          25     25   1.11728       7.10033       1.44   0 P
  Variance                 35     33   1.00000       6.35500       1.74   0 P

                                    Wald F statistics
      Source of Variation           NumDF     DenDF    F-inc             Prob
    7 mu                                1       4.0    47.04            0.002
    3 age                               1       4.0    95.00            <.001

Figure 3. Plot of fitted cubic smoothing spline for model 1

A quick look suggests this is fine until we look at the predicted curves in Figure 3. The fit is unacceptable because the spline has picked up too much curvature, and suggests that there may be systematic non-smooth variation at the overall level. This can be formally examined by including the fac(age) term as a random effect. This increased the log-likelihood 3.71 (P < 0.05) with the spl(age,7) smoothing constants heading to the boundary. There is a possible explanation in the season factor. When this is added (Model 3) it has an F ratio of 107.5 (P < 0.01) while the fac(age) term goes to the boundry. Notice that the inclusion of the fixed term season in models 3 to 6 means that comparisons with models 1 and 2 on the basis of the log-likelihood are not valid. The spring measurements are lower than the autumn measurements so growth is slower in winter. Models 4 and 5 successively examined each term, indicating that both smoothing constants are significant (P < 0.05). Lastly we add the covariance parameter between the intercept and slope for each tree in model 6. This ensures that the covariance model will be translation invariant. A portion of the output file for model 6 is
  8 LogL=-87.4291     S2=  5.6412         32 df

          - - - Results from analysis of circ - - -
 Akaike Information Criterion      186.86 (assuming 6 parameters).
 Bayesian Information Criterion    195.65

 Model\_Term                             Gamma         Sigma   Sigma/SE   % C
 spl(x,7)                IDV\_V    5   2.17101       12.2471       1.09   0 P
 spl(x,7).Tree           IDV\_V   25   1.38314       7.80259       1.48   0 P
 Residual                SCA\_V   35   1.00000       5.64122       1.72   0 P
 us(2).Tree                      10 effects
 2                       US\_V  1  1   5.61716       31.6877       1.26   0 P
 2                       US\_C  2  1 -0.124098E-01 -0.700063E-01  -0.85   0 P
 2                       US\_V  2  2  0.108290E-03  0.610886E-03   1.41   0 P
 Covariance/Variance/Correlation Matrix US Tree
   31.69     -0.5032
 -0.7001E-01  0.6109E-03

                                   Wald F statistics
     Source of Variation           NumDF     DenDF    F-inc            P-inc
   7 mu                                1       2.4   169.87            0.006
   3 x                                 1       2.4    92.78            0.011
   5 Season                            1       8.8   108.49            <.001

Figure 4. Trellis plot of trunk circumference for each tree at sample dates (adjusted for season effects), with fitted profiles across time and confidence intervals Figure 4 presents the predicted growth over time for individual trees and a marginal prediction for trees with approximate confidence intervals (with twice standard error of prediction). Within this figure, the data is adjusted to remove the estimated seasonal effect. The conclusions from this analysis are quite different from those obtained by the nonlinear mixed effects analysis. The individual curves for each tree are not convincingly modelled by a logistic function. Figure 5 presents a plot of the residuals from the nonlinear model fitted on p340 of Pinheiro and Bates (2000). The distinct pattern in the residuals, which is the same for all trees is taken up in our analysis by the season term.
Figure 5. Plot of the residuals from the nonlinear model of Pinheiro and Bates
  • Back

    Return to index