Multivariate Analysis

Introduction

Multivariate analysis is used here in the narrow sense of a multivariate mixed model. There are many other multivariate analysis techniques which are not covered by ASReml. Multivariate analysis is used when we are interested in estimating the correlations between distinct traits (for example, fleece weight and fibre diameter in sheep) and for repeated measures of a single trait.

Repeated measures (rats)

There are two basic forms of analysis of repeated measures data: Random regression type models and multivariate models. The latter described here apply when there are a limited number of repeat measures and they are taken on each subject at the same times so that the data has a multivariate structure.

Wolfinger (1996) summarises a range of variance structures that can be fitted to repeated measures data and demonstrates the models using five weights taken weekly on 27 rats subjected to 3 treatments.

Multiple traits: Wether trial data

Three key traits for the Australian wool industry are the weight of wool grown per year, the cleanness and the diameter of that wool. Much of the wool is produced from wethers and most major producers have traditionally used a particular strain or 'bloodline'. The file wether.as specifies a bivariate analysis.

Model specification

The syntax for specifying a multivariate linear model in ASReml is
Y-variates ~ fixed [ !r random ] [ !f sparse_fixed ]
where

Y-variates is a list of traits,

fixed, random and sparse_fixed are as in the univariate case but involve the special term Trait and interactions with Trait

The design matrix for Trait has a level (column) for each trait.

Trait by itself fits the mean for each variate,

In an interaction
Trait.Fac fits the factor Fac for each variate and
Trait.Cov fits the covariate Cov for each variate.

ASReml internally rearranges the data so that n data records containing t traits each becomes n sets of t analysis records indexed by the internal factor Trait i.e. nt analysis records ordered Trait within data record. If the data is already in this long form, use the !ASMV t qualifier to indicate that a multivariate analysis is required.

Variance structures

A more sophisticated error structure is required for multivariate analysis. Consider a multivariate analysis with t traits and n units in which the data are ordered traits within units. An typical variance structure is to assume units are independent and traits are correlated. This is described as the direct product of an IDENTITY matrix and an unstructured ( US ) variance matrix.

We discuss the syntax with reference to the following bivariate example

 Orange Wether Trial 1984-8
  SheepID !I
  TRIAL
  BloodLine !I
  TEAM *
  YEAR *
  GFW YLD FDIAM
 wether.dat !skip 1

 GFW FDIAM ~ Trait Trait.YEAR,        # Fixed model
          !r us(Trait).TEAM us(Trait).SheepID # Random model
 residual units.us(Trait)
 predict YEAR Trait

R-structure

For a standard multivariate analysis

the error (R) structure for the residual must be specified as two-dimensional with
independent records and
an unstructured variance matrix across traits;

records may have observations missing in different patterns and these are handled internally during analysis,

the R structure must be ordered traits within units, that is, the R structure definition line for units must be specified before the line for Trait ,

variance parameters are variances not variance ratios,

!ASUV and !ASMV

These special qualifiers relating to multivariate analysis allow for the situation when

!ASUV: the data is in a multivariate layout but some residual variance structure other than IDENTITY cross US is required.

!ASMV t : the data (file) is already in an expanded form (n sets of t records and the multivariate residual variance structure IDENTITY cross US IS required.

To use an error structure other than US for the residual stratum you must (also) specify !ASUV on the datafile line and include mv in the model if there are missing values,

To perform a multivariate analysis (including the automatic handling of missing values) when the data have already been expanded use !ASMV t on the datafile line; t is the number of traits that ASReml should expect, the data file must have t records for each multivariate record although some may be coded missing.

G-structure

For a standard multivariate analysis, a US structure is also used for the between trait variance matrix of the random terms (as in the example). However, other structured models may be used and may be necessary when there are more traits as it is not unusual for there not to be a positive definite solution for US matrices. in lieu of estimates of initial values; ASReml again substitutes a proportion of the observed variance covariance matrix of the data.

Example

Below is the output returned in the .asr file for this analysis, except that the !GO qualifiers were omitted.

 ASReml 4.0 [01 Jan 2013] Orange Wether Trial  1984-88
   Build kj [17 Jan 2014]   64 bit  Windows x64
 24 Jan 2014 15:06:49.049     32 Mbyte  wether
 Licensed to: Cargo Vale    31-dec-2014
 *****************************************************************
 * Contact support@asreml.co.uk for licensing and support        *
 *                                                               *
 *********************************************************** ARG *
 Folder: C:\Users\Public\ASReml\Docs\Manex4
 TAG  !I
 BloodLine !I
 QUALIFIERS: !SKIP 1
 Reading wether.dat  FREE FORMAT skipping     1 lines

 Bivariate analysis of GFW and FDIAM
 Summary of 1485 records retained of 1485 read

  Model term          Size #miss #zero   MinNon0    Mean      MaxNon0  StndDevn
   1 TAG               521     0     0      1   261.0956        521
   2 TRIAL                     0     0  3.000      3.000      3.000      0.000
   3 BloodLine          27     0     0      1    13.4323         27
   4 TEAM               35     0     0      1    18.0067         35
   5 YEAR                3     0     0      1     2.0391          3
   6 GFW            Variate    0     0  4.100      7.478      11.20      1.050
   7 YLD                       0     0  60.30      75.11      88.60      4.379
   8 FDIAM          Variate    0     0  15.90      22.29      30.60      2.190
   9 Trait                       2
  10 Trait.YEAR                  6  9 Trait     :   2   5 YEAR           :    3
  11 us(Trait)                   2
  12 us(Trait).TEAM             70 11 us(Trait) :   2   4 TEAM           :   35
  13 us(Trait).TAG            1042 11 us(Trait) :   2   1 TAG            :  521
 us(Trait) in units.us(Trait) has size 2, parameters:   9  11
  units.us(Trait)                  [  8: 11] initialized.
 us(Trait) in us(Trait).TEAM has size 2, parameters:  12  14
  us(Trait).TEAM                   [12:14] initialized.
 us(Trait) in us(Trait).TAG has size 2, parameters:  15  17
  us(Trait).TAG                    [15:17] initialized.
 Forming     1120 equations:   8 dense.
 Initial updates will be shrunk by factor    0.300
 Notice:      2 singularities detected in design matrix.
   1 LogL=-2118.63     S2= 1.00000       2964 df
   2 LogL=-1667.02     S2= 1.00000       2964 df
   3 LogL=-1185.28     S2= 1.00000       2964 df
   4 LogL=-834.927     S2= 1.00000       2964 df
   5 LogL=-738.734     S2= 1.00000       2964 df
   6 LogL=-724.170     S2= 1.00000       2964 df
   7 LogL=-723.466     S2= 1.00000       2964 df
   8 LogL=-723.462     S2= 1.00000       2964 df
   9 LogL=-723.462     S2= 1.00000       2964 df

          - - - Results from analysis of GFW FDIAM - - -
 Akaike Information Criterion     1464.92 (assuming 9 parameters).
 Bayesian Information Criterion   1518.87

 Model\_Term                             Sigma         Sigma   Sigma/SE   % C
 units.us(Trait)               2970 effects
 Trait                   US_V  1  1  0.198351      0.198351      21.94   0 P
 Trait                   US_C  2  1  0.128890      0.128890      12.40   0 P
 Trait                   US_V  2  2  0.440601      0.440601      21.93   0 P
 us(Trait).TEAM                  70 effects
 Trait                   US_V  1  1  0.374493      0.374493       3.89   0 P
 Trait                   US_C  2  1  0.388740      0.388740       2.60   0 P
 Trait                   US_V  2  2   1.36533       1.36533       3.74   0 P
 us(Trait).TAG                 1042 effects
 Trait                   US_V  1  1  0.257159      0.257159      12.09   0 P
 Trait                   US_C  2  1  0.219557      0.219557       5.55   0 P
 Trait                   US_V  2  2   1.92082       1.92082      14.35   0 P
 Covariance/Variance/Correlation Matrix US Residual
  0.1984      0.4360
  0.1289      0.4406
 Covariance/Variance/Correlation Matrix US us(Trait).TEAM
  0.3745      0.5436
  0.3887       1.365
 Covariance/Variance/Correlation Matrix US us(Trait).TAG
  0.2572      0.3124
  0.2196       1.921

                                   Wald F statistics
     Source of Variation           NumDF              F-inc
   9 Trait                             2            5936.27
  10 Trait.YEAR                        4            1096.41

                     Solution       Standard Error    T-value     T-prev
  10 Trait.YEAR
                    2  -0.102262       0.290190E-01     -3.52
                    3    1.06636       0.290831E-01     36.67     42.07
                    5    1.17407       0.433905E-01     27.06
                    6    2.53439       0.434880E-01     58.28     32.85
   9 Trait
                    1    7.13717       0.107933         66.13
                    2    21.0569       0.209095        100.71     78.16
  12 us(Trait).TEAM                       70 effects fitted
  13 us(Trait).TAG                      1042 effects fitted
 SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section   1
   1.00   1.54
          10  possible outliers: see .res file
 Finished: 24 Jan 2014 15:06:49.490   LogL Converged

Return to index