Multi-Environment Trials - Lupins

Introduction

In this section, we explore some models that can be fitted to multi-environment trials, in particular showing the application of factor analytic models.

A multi-environment trial is a series of experiments with treatments in common. Our examples are drawn from field crops where the aim is to identify genotypes which consistently yield well over a region in several seasons.

The traditional approach involved analysing the experiments separately, saving the means, and then performing some analysis of the means. This approach tends to ignore differences in experiment accuracy and to require each genotype be tested in each experiment (limiting the experiments and/or genotypes that can be considered together). The methods described in this section build on spatial analysis, incorporate trial accuracy and allow for unbalance in genotype representation. For further reading, see Smith et al (2001, 2005).

Multi-environment trials have three basic forms. Early generation trials have few sites and few replicates but many lines to compare. Current latestage trials have more 3-4 replicates, 10-15 sites and 30-50 entries of elite lines grown at most sites. National analyses draw together results from multiple latestage trials using results from several years and regions, many sites and many lines. We will consider these three cases.

An early generation trial

Early stage multi-environment trials typically have many genotypes but limited seed. Consequently, within site replication of test lines is low (1 or 2) so that lines can be tested at more locations. Traditionally, grid plot designs have been used where standard/reference/check lines are highly replicated using a systematic grid but partial replicate designs are strongly advocated (Cullis et al. 2006).

This example involves 6 check lines and 330 test lines grown at three locations. The first 2 locations were laid out in a 12 by 34 arrangement, 12 replicates of each check line and 1 of each test line. The extra 6 plots were sown to a fillin variety which was also harvested. The third location was laid out in a 15 by 28 arrangement and had 15 replicates of each check line. The data file MET.DAT is sorted row within column within site. The distributed data file has extra fields we will ignore. The code for an initial combined analysis fitting a common random test effect follows. This code incorporates the results from several preliminary runs involving separate spatial analyses of each site ignoring test. These runs suggested a random row term was required for site 3, col terms for all sites, AR1 was unnecessary for the col dimension of the R structure for site 3, and provided the inital values inserted in the code.

The !SECTION site qualifier allows ASReml to check that the factor site does indeed correspond to the 3 sections in the R structure. Notice in the 6 R structure lines that the field dimensions are explicitly given in the first field, and the row/col in the second field enables ASReml to sort/check the plots in field order.
 Early Generation MultiEnvironment Trial
  seq
  col 15     # Actually 12 12 and 15 for the sites respectively
  row 34     # Actually 34 34 and 28 for the sites respectively
  chks 7     # 0 is fillin, 1-6 are check lines, 7 is test line
  test 336   # 0 is fillin or check, 7-336 is test line
  geno 337   # 1-6 are checks, 7-336 are test lines, 337 is fillin
  yld 1  !*.01
  site 3
 met.dat   !SECTION site
 yld ~ site chk.site,
      !r at(site,3).row .02 at(site).col .90 .40 .036 test
 predict chk site
 predict test  site  chks 7
at(site).ar1(row).ar1(column)
The LogL from this run is -314.262 and the parameter estimates follow.
          - - - Results from analysis of yld - - -

 Model_Term                             Sigma         Sigma   Sigma/SE   % C
 site_1.col              IDV_V   15  0.622603      0.622603       1.45   0 P
 site_2.col              IDV_V   15  0.159017      0.159017       1.40   0 P
 site_3.col              IDV_V   15  0.483282E-01  0.483282E-01   1.85   0 P
 site_3.row              IDV_V   34  0.235024E-01  0.235024E-01   2.77   0 P
 test                    IDV_V  336  0.103071      0.103071       7.02   0 P
 sat(site,1).ar1v(col).ar1(row)           408 effects
 col                      AR_R    1  0.195921      0.195921       3.65   0 P
 col                      AR_V    1   2.77133       2.77133       8.82   0 P
 row                      AR_R    1  0.650396      0.650396      16.79   0 P
 sat(site,2).ar1v(col).ar1(row)           408 effects
 col                      AR_R    1  0.286842      0.286842       5.47   0 P
 col                      AR_V    1  0.992600      0.992600       9.20   0 P
 row                      AR_R    1  0.574437      0.574437      13.65   0 P
 sat(site,3).idv(col).ar1(row)  420 effects
 col                      ID_V    1  0.120458      0.120458       6.43   0 P
 row                      AR_R    1  0.639344      0.639344      10.11   0 P

                                   Wald F statistics
     Source of Variation           NumDF             F-incr
   8 site                              3            1230.64
   9 chk.site                         20              11.36
Practitioners have taken two views on whether check or reference lines should be fitted as fixed effects, or just treated as random effects in the set of genotypes. It may effect the variance of the test/genotype effects. Comparison of test lines with check lines is easier if all are in the same random factor but this analysis takes the former approach. Notice in passing that chk.site has 20 rather than 18 degrees of freedom because of the fillin variety at the first 2 sites.

Our aim in this analysis is to get good predictions of test line effects. To do this we can compare several models for the genetic variances and covariances of test lines across sites. We are interested in the common effect of genotype across sites, but also to know to what extent individual sites diverge from the common genotype rankings. Our naive first model simply fits a common genetic effect, primarily to show why this model is inadequate. It implies the genetic variance is the same at all sites and the genetic correlation between sites is 1. Both assumptions are unlikely given that the residual variances range from 0.12 to 2.77.

Adding site.test to the random model allows for a common covariance less than the common variance. It increases the LogL from -314.262 to -310.879, highly significant and gives components

 test                    IDV_V  336  0.865261E-01  0.865261E-01   5.45   0 P
 site.test               IDV_V 1008  0.438068E-01  0.438068E-01   2.60   0 P
This represents a genetic correlation of 0.66=0.08677 /( 0.08677+0.04418) and an average genetic variance of 0.1308= 0.08677+0.04418, up from 0.1031. However, the residual site variances are quite different (20 fold difference) so it is likely that the genetic variances also differ between sites.

Our next model therefore fits a common correlation but heterogeneous variances. We drop test from the list of model terms and put the CORUH structure on the site component of site.test. The ASReml code is
 yld ~ site chk.site ,
     !r at(site,3).row .02 at(site).col .90 .40 .036 coruh(site).test
at(site).ar1(row).ar1(column)
The LogL increases 22.4 (P<0.01) to -288.484 (P<0.01) and the components are
 coruh(site).test              1008 effects
 site                    COR_R    1  0.418390      0.418390       6.35   0 U
 site                    COR_V    1  0.990953      0.990953       7.95   0 U
 site                    COR_V    2  0.152282      0.152282       2.73   0 U
 site                    COR_V    3  0.122242      0.122242       7.15   0 U
So the common correlation under this model is 0.42 (down from 0.66) and the site variances are 0.99, 0.15 and 0.12 respectively; site 1 being particularly high. This CORUH model would typically be the first model fitted. In this case there is genetic variance at each site but sometimes that will not be the case.

Is the assumption of common correlation justified? Finally we fit an unstructured model ( us(site).test). This is equivalent to an XFA1 model ( xfa1(site).test) for 3 sites but with more sites, the XFA1 model might be more parsimonious.

To get convergence, it is helpful to start from the results from the previous model. If the last model run was the CORUH model, ASReml will start from there if you specify !CONTINUE.

The resulting genetic variance matrix is:
.991 .158 .132
.158 .074 .078
.132 .078 .121
What next? Predicted values for the lines at each site are given in the .pvs file by the predict statement
predict site test check 7 # Check 7 is the mean effect for test
and averaging sites, weighting by the genetic variances (from the CORUH model) is given by
predict test chk 7 !ave site {1.00 2.56 2.85}/ 6.42
where the weights were calculated as 1/sqrt(c(0.991289,0.152347,0.122776)).

Five site MET example

This second example is also an early generation trial but with more sites (5). It demonstrates a typical set of five models fitted also to current latestage trials. There were 330 genotypes replicated twice at each site except 7 plots were sown to a fill in variety.
 !RE !WORK 100 !NO !CONTINUE !ARG 1 2 3 4  // !DOPART $1
 Title: met.a307.
  Plot  *  Block  * Entry  * Column  * Row  *
  Genotype  !A Site  !A 5 Year  !I Environment  !A
  yield   !*0.001
 met307.csv  !SKIP 1

 !PART 1
 tab Block Col Row Gen ~ Site !stats       # tabulates structural factors
 yield  ~ mu Site at(Site).lin(Row) at(Site).lin(Col) mv ,    #  fixed model
      !r  at(Site).Row at(Site).Col at(Site).Block coruh(Site).Geno #  random model

 !PART 2 3 4 5      # Fits  Revisedd base model
 yield  ~ mu Site at(Site,1,2,4,5).lin(Row)  at(Site,5).lin(Col) mv ,
      !r  at(Site,01,02,03,05).Row .003 .003 .0001  .0008 ,
      at(Site).Col .06 .007 .002 .02 .001 ,
      at(Site,2).Block .05 ,
 !PART 2 //   coruh(Site).Geno        # Starting values inserted from PART 1
 !PART 3 //      us(Site).Geno
 !PART 4 //    xfa1(Site).Geno
 !PART 5 //    xfa2(Site).Geno
 !PART 0 //residual at(Site).ar1(Row).ar1(Col)

Part 1 ended after 12 iterations with a LogL of 2563.66. Parts 2, 3 and 4 fit a slightly simplified model. Part 2 (starting from Part 1 estimates) converged in 7 iterations to LogL 2586.7. These LogLs are not comparable because some small fixed effects were dropped from part 2. Both these fitted the genetic variance matrix assuming equal correlation among sites but different variances CORUH.

Factor Analytic models provide a parsimonious approach generalising the covariance structure. Replacing CORUH with XFA1 in part 3 increased the LogL to 2611.71, a substantial gain. The XFA2 model in part 4 increased the LogL to 2621.57 after 10 iterations (not quite converged). Finally, fitting US in part 5 resulted in a LogL of 2622.06. This has only one more parameter than the XFA2 model which has 4 more free parameters than the XFA1 model. Consequently the XFA2 model is the best fit.

In this case the unstructured variance matrix can be fitted but in general, especially with more than five sites, with high genetic correlations and with fewer genotypes, a positive definite unstructured variance matrix can not be fitted. The final genetic variance matrix was
 Covariance/Variance/Correlation Matrix UnStructured Site.Geno
  0.05947  0.6092   0.7241   0.5229   0.7736
  0.04755  0.1025   0.6013   0.5910   0.6717
  0.02119  0.02310  0.01440  0.4829   0.7818
  0.04943  0.07334  0.02247  0.1503   0.4438
  0.04657  0.05308  0.02316  0.04247  0.0609
Recall, the average correlation from the CORUH model was 0.626. It is convenient at this point to explore the XFA model. It is akin to Principal Components analysis. The XFA1 model assumes a single latent genotype variable explains the covariance while XFA2 assumes two latent genotype variables. The XFA2 model forms the across site variance matrix as ΓΓ'+Ψ with estimates from this example of
0.0172   0 000
0 0.043  00 0
Ψ= 0 00.004  0 0
0 000.000   0
0 000 0.008  
and
0.186 0.088
0.237   0.055
Γ= 0.089   0.049
0.347   -.171
0.188   0.133

The elements in the diagonal matrix Ψ are known as specific variances and represent the variation in genotype effects that is site specific (not associated with the factors). The elements of Γ are the loadings for the two factors at the five sites. They represent the regression of the genotype effects for each site on the latent factors and plotting them maps the sites according to genetic similarity highlighting that site 4 is most divergent. The XFA2 variance matrix as presented in the .asr file (slightly reformated) is
 Covariance/Variance/Correlation Matrix XFA xfa(Site,2).Geno
  0.0594  0.6252  0.7120  0.5247  0.7762  0.7615  0.3627
  0.0488  0.1024  0.6175  0.5883  0.6564  0.7391  0.1720
  0.0208  0.0237  0.0144  0.4853  0.7850  0.7403  0.4089
  0.0494  0.0727  0.0225  0.1490  0.4485  0.8983 -0.4394
  0.0467  0.0518  0.0233  0.0427  0.0609  0.7628  0.5387
  0.1857  0.2365  0.0889  0.3468  0.1883   1.000   0.000
  0.0884  0.0550  0.0491 -0.1696  0.1330   0.000   1.000
The first 5 by 5 block is the variance matrix, directly comparable to the US matrix displayed above, with derived correlations in the upper right triangle. The first 5 columns of the last 2 rows are the loadings, Γ', being the covariances of the genotypes effects at the 5 sites with the latent factors. Similarly, the first 5 rows of the last 2 columns are the correlations of the genotypes effects at the 5 sites with the latent factors. The final 2 by 2 identity matrix relates to the two factors. The .sln file also contains genotype effects for the two factors, that is the genotype scores, as well as genotype effects for the 5 sites. For example
  xfa(Site,2).Geno     BLA3071.VV5866            0.1006      0.9392E-01
  xfa(Site,2).Geno     MTA3071.VV5866            0.1237      0.1680
  xfa(Site,2).Geno     PNA3071.VV5866            0.7299E-01  0.5347E-01
  xfa(Site,2).Geno     RSA3071.VV5866           -0.3641      0.1638
  xfa(Site,2).Geno     WTA3071.VV5866            0.2089      0.8414E-01
  xfa(Site,2).Geno     Factor-1.VV5866          -0.2370      0.3487
  xfa(Site,2).Geno     Factor-2.VV5866            1.661      0.5551
The site BLUPS are calculated as Γs+d where s is the score
| -.237 |
| 1.661 |
and d is a lack of fit residual (with variance given by Ψ).

Thus
| 0.1006 | | 0.186 0.088| | -.237 | | -.0015|
| 0.1237 | | 0.237 0.055| | 1.661 | | .0885 |
| 0.0730 | = | 0.089 0.049| + | .0127 |
| -0.3641 | | 0.347 -.171| | .0022 |
| 0.2089 | | 0.188 0.133| | .0325 |
The factors will be close to orthogonal if the user has not applied explicit constraints to the loadings, or they can be rotated as shown in the next example to be orthogonal. Plotting the genotype scores on the factor axes gives a two dimensional representation of them. Plotting the loadings gives information on the similarity of sites with respect to genotype ranking. Plotting both together provides a biplot. The average genotype ranking is given by predicting genotype effects (BLUPs) at the average loadings (0.213, 0.031). Stability of genotype performance is assessed by evaluating them at the extreme sites. Sites 1, 3 and 5 are similar (average 0.154, 0.090) while site 2 is close to the average and 4 is quite different. The basic predict statements are therefore
 predict Genotype !AVE site 5*0 0.154 0.090 !ONLY xfa(Site,2).Geno # Average 1 3 5
 predict Genotype !AVE site 5*0 0.213 0.031 !ONLY xfa(Site,2).Geno # Average all
 predict Genotype !AVE site 5*0 0.237 0.055 !ONLY xfa(Site,2).Geno # Site 2
 predict Genotype !AVE site 5*0 0.347 -.171 !ONLY xfa(Site,2).Geno # Site 4
Note that these predictions are based solely on the factors and therefore ignore the specific variance effects. Note the !ONLY qualifier. Without it, the predictions are not estimable because they include the mu term but no other fixed effects. The specific variance effects are not included in the ASReml output but can be calculated with a predict statement like (for site 4)
predict Geno !AVE Site 0 0 0 1 0 -.347 0.171 !ONLY xfa(Site,2).Geno
The predict statement predict Geno !AVE {1 0 1 0 1}/3}
gives predictions that include the mean and the specific variance effects.

Meta analysis of trial means

The combined analysis of experiments at the plot level discussed in the previous subsection is suitable for up to 10 experiments although an XFA structure would typically be fitted in place of US with more than 3 experiments. However, when it comes to later stages of selection, it is desirable to include as many experiments as possible representing different locations and seasons when comparing genotypes. This will commonly be over 30 experiments after 3 years of evaluation, and could easily be as many as 100 trials. Furthermore, few genotypes will be represented at all experiments.

We describe a two step approach. First, each experiment is analysed separately under a spatial model and predicted genotype means are produced along with a set of weights ( !TWOSTAGEWEIGHTS). Ideally these should be stored in a database so that they can be conveniently retrieved later for subsequent analysis. We also store the site mean yield and the residual variance along with details of the spatial model fitted.

For this example, we have extracted 2019 predicted lupin yields from the data base, with their weights. They represent 203 genotypes and 87 experiments conducted in 3 regions over 5 years. The yields range from 0.09300 to 5.613 with an average of 1.732. The experiment variances range 0.00037 to 0.04671 with an average of 0.02491. Note that the weights have been scaled by this average value (via transformation and back again in the model) so that variances have their natural scale. Following is code that fits 5 models to this data.
 !WORK 1 !NO !CONTINUE !RENAME !ARG  11 1 2     // !DOPART $1
 Title: ALBUS Second stage.
 #trial,year,region,variety,yield,rep,weight,ems
 #KFA02BURU,2002,NSW,KIEV-MUTANT,0.873,3,2136.562,0.0010000
  trial  !A year  !I region  !A variety  !A
  yield  rep  * weight   !*0.025 ems

 ALBUS.csv  !SKIP 1   !MAXIT 40

 !PART 11  # Shows trials 51 and 85 have minimal variance: fix .0001
  yield !wt=weight !DISP 0.25 ~ mu trial !r  coruh(trial).variety
 !PART 1 2 3 4
  yield !wt=weight !DISP 0.25 !SIGMAP ~ mu trial !r xfa(trial,$1).var
 # LogL 2911.52 +117.83 =3029.35 +118   =3152|3147
The equal correlation model is fitted first in part 11. After 40 iterations, it had converged with a LogL of 2783.3 with an average genetic correlation of 0.55. Continuing with XFA1 in part 1, the LogL increases to 2914.3 in another 40 iterations. Again, using !CONTINUE to start with the XFA1 values, part 2 fits an XFA2 model. After 40 iterations, the LogL had reached 3070.86. The parameters were still changing slightly but this provides two factors and permits a biplot of genotypes and experiments to be formed.

Proceeding to an XFA3 model the LogL increases nicely to 3177.52. An increase in LogL of 107 for 85 effects is about twice the 5% critical value. There is a table in the .res file which shows

 DISPLAY of variance partitioning for XFA structure in xfa3(trial).var
 Lvl  |----+----+----+----+----+----+----+----+----+----|  TotalVar %expl    PsiVar  Loadings
   1  |        1          2              3              |    0.0088  70.8    0.0026    0.0394   -0.0435   -0.0527
   2  1 2                   3                           |    0.0106  44.7    0.0058   -0.0093    0.0179   -0.0657
   3  |     1   2    3                                  |    0.0033  30.4    0.0023    0.0192    0.0168   -0.0190
   4  |    1                             2   3          |    0.0085  78.6    0.0018    0.0303    0.0712   -0.0269
   5  |           1                                     2    0.0061 100.0    0.0000    0.0380   -0.0682    0.0069
   6  |              1    2  3                          |    0.0074  45.9    0.0040    0.0475   -0.0263    0.0211
   7  |                               123               |    0.0045  67.9    0.0014    0.0537   -0.0089    0.0099
   8  |                  1 2    3                       |    0.0026  51.6    0.0013    0.0315   -0.0103    0.0156
   9  |            1                 3                  |    0.0075  62.0    0.0028    0.0446   -0.0014    0.0513
  10  |       1                                2    3   |    0.0090  92.5    0.0007    0.0384   -0.0769    0.0310
  11  |     1      2                              3     |    0.0072  88.3    0.0008    0.0285   -0.0330    0.0667
  12  |                          1 2  3                 |    0.0067  64.2    0.0024    0.0603    0.0164    0.0202
  13  |  12           3                                 |    0.0018  31.5    0.0012    0.0106   -0.0043    0.0205
  14  |                 1  2  3                         |    0.0044  47.6    0.0023    0.0397    0.0160    0.0156
  15  |                                              1  |    0.0791  94.2    0.0046    0.2721    0.0203   -0.0088
  16  |         1      2              3                 |    0.0371  63.8    0.0134    0.0843   -0.0760   -0.1037
  17  |                    1       3                    |    0.0171  57.8    0.0072    0.0846   -0.0122    0.0510
  18  |                          1           23         |    0.0320  80.6    0.0062    0.1306    0.0904    0.0240
  19  |      12             3                           |    0.0565  43.2    0.0321    0.0886    0.0393    0.1227
  20  |             1                      23           |    0.0168  76.0    0.0040    0.0675   -0.0893    0.0143
  21  |                                    1  2  3      |    0.0111  85.5    0.0016    0.0913   -0.0254    0.0229
  22  |                                     1 3         |    0.0148  79.4    0.0031    0.1061    0.0024    0.0219
  23  |      1        2  3                              |    0.0888  39.0    0.0542    0.1118    0.1283   -0.0750
  24  |                                      1         23    0.0031 100.0    0.0000    0.0497   -0.0247    0.0067
  25  |                   1 2                           |    0.0842  45.0    0.0463    0.1834    0.0564    0.0320
  26  |                                  12      3      |    0.0053  86.4    0.0007    0.0611   -0.0050    0.0281
  27  |                     1 2         3               |    0.0166  67.7    0.0054    0.0854   -0.0233   -0.0582
  28  1                                     2 3         |    0.0473  79.2    0.0099    0.0209    0.1877   -0.0422
  29  1                    23                           |    0.0172  43.5    0.0097   -0.0064    0.0845    0.0175
  30  |                             1      3            |    0.0092  74.4    0.0024    0.0744    0.0010    0.0359
  31  |                                1  2        3    |    0.0220  89.7    0.0023    0.1200    0.0359    0.0639
  32  |    1        2                                   3    0.0241 100.0    0.0000   -0.0491    0.0654   -0.1318
  33  |                        1           2 3          |    0.0161  77.3    0.0037    0.0899   -0.0613   -0.0249
  34  |                      1         2                3    0.0101  99.9    0.0000    0.0676   -0.0467    0.0577
  35  |                                          1  2 3 |    0.0704  96.0    0.0028    0.2449   -0.0711   -0.0506
  36  |                                     1           |    0.2242  76.0    0.0539    0.4111    0.0331    0.0140
  37  |                                       12        |    0.0996  81.6    0.0183    0.2841   -0.0231    0.0105
  38  | 13                                              |    0.0018   5.4    0.0017    0.0079   -0.0010   -0.0060
  39  |                  12                             3    0.0115 100.0    0.0000    0.0654    0.0182    0.0833
  40  1         2            3                          |    0.0500  45.9    0.0271   -0.0004    0.1017    0.1122
  41  |                    1        2        3          |    0.0246  78.5    0.0053    0.1021    0.0674    0.0663
  42  |        1               2                  3     |    0.0206  88.0    0.0025    0.0595    0.0813    0.0891
  43  |        12                               3       |    0.0151  84.6    0.0023    0.0534    0.0075    0.0992
  44  |                         12                      3    0.0094 100.0    0.0000    0.0694    0.0145    0.0661
  45  |            1                          2         3    0.0174 100.0    0.0000    0.0663    0.0973    0.0597
  46  |                             1  2                3    0.0018 100.0    0.0000    0.0331   -0.0095    0.0250
  47  |                      1  2                    3  |    0.0457  94.4    0.0025    0.1458    0.0520    0.1387
  48  |          1           2                          3    0.0088 100.0    0.0000    0.0446   -0.0463    0.0685
  49  |      1                                 2        3    0.0644 100.0    0.0000    0.0923    0.2107   -0.1069
  50  |1  2                                             |    0.0083   8.2    0.0076    0.0132    0.0212   -0.0075
  51  |   1                                             3    0.0011 100.0    0.0000    0.0090   -0.0035   -0.0315
  52  |                                        13       |    0.0126  83.1    0.0021    0.1019   -0.0011    0.0073
  53  |                          1          2           3    0.1146 100.0    0.0000    0.2494   -0.1605   -0.1633
  54  |                             1                  23    0.0623 100.0    0.0000    0.1917    0.1572    0.0279
  55  |                1              2                 3    0.0426 100.0    0.0000    0.1213    0.1126   -0.1232
  56  |               1             2                   |    0.0547  59.2    0.0223    0.1317    0.1222   -0.0074
  57  |                                       1 23      |    0.1356  85.6    0.0196    0.3307   -0.0639    0.0507
  58  |                                  13             |    0.1168  71.7    0.0331    0.2865   -0.0203   -0.0348
  59  |             1               2              3    |    0.0441  90.3    0.0043    0.1105    0.1209   -0.1139
  60  |                                         12      |    0.0956  85.0    0.0143    0.2842    0.0221    0.0008
  61  |                                         1 2     3    0.0322 100.0    0.0000    0.1650   -0.0285    0.0642
  62  |                                       1  2      |    0.1502  85.5    0.0218    0.3470    0.0879    0.0170
  63  |                                  12             3    0.1101 100.0    0.0000    0.2789    0.0316   -0.1769
  64  |                                       12     3  |    0.0849  94.9    0.0044    0.2622   -0.0243    0.1059
  65  |                             1         2 3       |    0.1147  84.9    0.0173    0.2611    0.1570    0.0674
  66  |           1          2                          3    0.0146 100.0    0.0000    0.0597    0.0569    0.0882
  67  |               1    2  3                         |    0.0643  48.9    0.0328    0.1420    0.0836   -0.0654
  68  |                                 1     2         3    0.0489 100.0    0.0000    0.1826    0.0736    0.1006
  69  |                                             1 23|    0.1296  97.7    0.0030    0.3449    0.0732    0.0484
  70  |                                       1  23     |    0.2187  87.2    0.0280    0.4181   -0.1126    0.0560
  71  |                     1  2                        |    0.0035  51.0    0.0017    0.0395   -0.0147    0.0002
  72  |            12                                3  |    0.0305  94.3    0.0017    0.0906   -0.0053    0.1431
  73  |      1                  2 3                     |    0.0618  55.7    0.0274    0.0905    0.1542   -0.0490
  74  |             1          2    3                   |    0.0133  60.6    0.0052    0.0605   -0.0551    0.0367
  75  | 1                   2                    3      |    0.1520  86.2    0.0210    0.0734   -0.2477    0.2535
  76  |                      1  2        3              |    0.0185  70.3    0.0055    0.0930    0.0298    0.0591
  77  |                           1    23               |    0.0029  67.5    0.0009    0.0403   -0.0170    0.0041
  78  |                  1          3                   |    0.0007  60.8    0.0003    0.0164    0.0017   -0.0127
  79  |                     123                         |    0.0649  48.8    0.0332    0.1676    0.0434    0.0409
  80  |    1         2                 3                |    0.0040  66.9    0.0013    0.0193   -0.0283    0.0386
  81  |                                      1 23       |    0.0041  83.3    0.0007    0.0560   -0.0153    0.0039
  82  |                      1 2        3               |    0.0042  68.4    0.0013    0.0439   -0.0142    0.0271
  83  |       1   2     3                               |    0.0221  36.7    0.0140    0.0590   -0.0431    0.0527
  84  |               1   2     3                       |    0.0248  52.4    0.0118    0.0899    0.0430   -0.0554
  85  | 12                                              3    0.0077 100.0    0.0000    0.0164   -0.0132    0.0853
  86  |                                 1        2      3    0.4146 100.0    0.0000   -0.5301    0.2802    0.2346
  87  |           1  3                                  |    0.0017  29.9    0.0012    0.0201   -0.0038    0.0101
   0  |----+----+----+----+----+----+----+----+-- Average    0.0442  75.0    0.0079    0.0991    0.0178    0.0184

The last line shows that about 75 percent of the variance is explained by the common effects.

The process can be repeated for XFA4, XFA5 etc but it becomes increasingly difficult to get convergence. In this round, the LogL for XFA4 increased to 3261 and then started jumping around. This behaviour is probably related to the fact that we are fitting about 430 variance parameters ro 2019 data points; modeling a GxE table which has only 11 percent of cells occupied. Nevertheless, we are doing better on this example than we were in 2008.

ASReml 3/4 produces orthogonal vectors when converged if the user does not impose constraints. Following is some R code for looking at the results after extracting the XFA parameters into a file say albus4.xfa.
 # copy XFA gammas from .asr file to albus4.xfa
 XFAm <- matrix(scan('../albus4.gam'),87,5) # Read PSI and loadings
 dimnames(XFAm) <- list(paste('S',1:87,sep='')
               ,c("Psi","Load_1","Load_2","Load_3","Load_4"))
 pairs(XFAm) # Create figure 16.8
 ss <- svd(XFAm[,-1])
 Lam <- XFAm[,-1] %*% ss$v
 colnames(Lam) <- c("Load_1","Load_2","Load_3","Load_4")
 Gvar <- Lam %*% t(Lam) + diag(XFAm[,1])
 cLam <- diag(1/sqrt(diag(Gvar))) %*% Lam
 XFAsln <- read.table('../albus4.sln') # Scores
 XFA4<-matrix(XFAsln$V3[XFAsln$V1=='xfa(trial,4).var'],203,91)
 scores <- XFA4[,88:91] %*% ss$v
 dimnames(scores)<-list(paste('V',1:203,sep='')
                 ,paste('Load_',1:4,sep=''))
 biplot(scores[,1:2],Lam[,1:2])  # Figure 16.9

Figure 1: Pairs plot for the lupin loadings

Figure 1 shows a pairs plot of the specific variances (Ψ) and loadings (Γ). Recall that the XFA variance structure was defined as Σ = ΓΓ' + Ψ. These values have not been finally rotated and loading 4 is not orthogonal to the others. The next step in the R script generates orthogonal loadings in Lam for use in the biplot.

While the XFA model was defined in terms of modelling the variance matrix, it can also be viewed as modelling the variance structure with underlying (latent) factors, hence the name factor analysis. In this case we have fitted four factors and the loadings are the regressions linking the site.variety BLUPs with the factor scores. The specific variances represent the lack of fit. The proportion of variance explained by the 4 factors is 89.9% calculated as mean(diag(Lam %*% t(Lam))/diag(Gvar)).
The separate amounts are 42.9%, 24.8%, 12.9% and 9.3% respectively. cLam is correlation of environments with factors. Examine the Ψ values to identify sites which buck the trends.

The variety effects with respect to the factors are reported in the .sln file. The next few commands extract them and rotate them for plotting in the biplot shown in figure 2. Well, thats a bit cluttered. Figure 3 omits the central cluster.
Figure 2: Biplot for the lupin factors 1 and 2
 mLam <- rep(1/87,87) %*% Lam #Get loading means
 sLam <- Lam - rep(mLam,rep(87,4)) # Centre Loadings
 dLam <- sqrt((sLam*sLam) %*% rep(1,4) )  # Distance from Centre
 dSco <- sqrt((scores*scores) %*% rep(1,4))

 biplot(Lam[dLam>0.1,1:2],scores[dSco>2,1:2])

Figure 3: Biplot plot for the lupin loadings

The following tables show the scores for the more extreme varieties, and the loadings for the more extreme sites.
 cbind(scores[dSco>2.8,],dSco[dSco>2.8])
      Loadg_1  Loadg_2  Loadg_3  Loadg_4    Dist
 V34  -2.4508   1.3773  -0.0762   0.8908   2.950
 V107 -1.5465  -0.0070   2.6319   0.3389   3.071
 V182  2.6397   0.9211   0.1007  -0.2059   2.805

 cbind(Lam[dLam>0.325,],dLam[dLam>.325])
       Load_1  Load_2  Load_3  Load_4  Dist
 S36 -0.4141 -0.0062 -0.0054  0.0801 0.3252
 S75 -0.0535  0.3655 -0.0185 -0.0114 0.3623
 S86  0.2199 -0.0116  0.2003  0.0276 0.3761
  • More examples

    Return to index