Multi-Environment Trials - Lupins
Introduction
In this section, we explore some models that can be fitted to
multi-environment trials, in particular showing the application of
factor analytic models.
A multi-environment trial is a series of experiments with treatments in common.
Our examples are drawn from field crops where the aim is to identify
genotypes which consistently yield well over a region in several seasons.
The traditional approach involved analysing the experiments separately,
saving the means, and then performing some analysis of the means.
This approach tends to ignore differences in experiment accuracy
and to require each genotype be tested in each experiment
(limiting the experiments and/or genotypes that can be considered together).
The methods described in this section build on spatial analysis,
incorporate trial accuracy and allow for unbalance in genotype
representation. For further reading, see Smith et al (2001, 2005).
Multi-environment trials have three basic forms. Early generation trials
have few sites and few replicates but many lines to compare. Current latestage trials
have more 3-4 replicates, 10-15 sites and 30-50 entries of elite lines grown at most sites.
National analyses draw together results from multiple latestage trials using
results from several years and regions, many sites and many lines.
We will consider these three cases.
An early generation trial
Early stage multi-environment trials typically have many genotypes but
limited seed. Consequently, within site replication of test lines is low (1 or 2)
so that lines can be tested at more locations. Traditionally,
grid plot designs have been used where standard/reference/check lines are
highly replicated using a systematic grid but partial replicate designs
are strongly advocated (Cullis et al. 2006).
This example involves 6 check lines and 330 test lines grown at three locations.
The first 2 locations were laid out in a 12 by 34 arrangement,
12 replicates of each check line and 1 of each test line.
The extra 6 plots were sown to a fillin variety which was also harvested.
The third location was laid out in a 15 by 28 arrangement and had
15 replicates of each check line. The data file MET.DAT
is sorted row within column within site.
The distributed data file has extra fields we will ignore.
The code for an initial combined analysis fitting a common random
test effect follows.
This code incorporates the results from
several preliminary runs involving separate spatial analyses of each site
ignoring test. These runs suggested a random row
term was required for site 3, col terms for all sites, AR1 was
unnecessary for the col dimension of the R structure for site 3,
and provided the inital values inserted in the code.
The !SECTION site qualifier allows ASReml to check
that the factor site does indeed correspond to the 3 sections
in the R structure. Notice in the 6 R structure lines that the
field dimensions are explicitly given in the first field, and the
row/col in the second field enables ASReml to sort/check the plots
in field order.
Early Generation MultiEnvironment Trial
seq
col 15 # Actually 12 12 and 15 for the sites respectively
row 34 # Actually 34 34 and 28 for the sites respectively
chks 7 # 0 is fillin, 1-6 are check lines, 7 is test line
test 336 # 0 is fillin or check, 7-336 is test line
geno 337 # 1-6 are checks, 7-336 are test lines, 337 is fillin
yld 1 !*.01
site 3
met.dat !SECTION site
yld ~ site chk.site,
!r at(site,3).row .02 at(site).col .90 .40 .036 test
predict chk site
predict test site chks 7
at(site).ar1(row).ar1(column)
The LogL from this run is -314.262 and the parameter estimates follow.
- - - Results from analysis of yld - - -
Model_Term Sigma Sigma Sigma/SE % C
site_1.col IDV_V 15 0.622603 0.622603 1.45 0 P
site_2.col IDV_V 15 0.159017 0.159017 1.40 0 P
site_3.col IDV_V 15 0.483282E-01 0.483282E-01 1.85 0 P
site_3.row IDV_V 34 0.235024E-01 0.235024E-01 2.77 0 P
test IDV_V 336 0.103071 0.103071 7.02 0 P
sat(site,1).ar1v(col).ar1(row) 408 effects
col AR_R 1 0.195921 0.195921 3.65 0 P
col AR_V 1 2.77133 2.77133 8.82 0 P
row AR_R 1 0.650396 0.650396 16.79 0 P
sat(site,2).ar1v(col).ar1(row) 408 effects
col AR_R 1 0.286842 0.286842 5.47 0 P
col AR_V 1 0.992600 0.992600 9.20 0 P
row AR_R 1 0.574437 0.574437 13.65 0 P
sat(site,3).idv(col).ar1(row) 420 effects
col ID_V 1 0.120458 0.120458 6.43 0 P
row AR_R 1 0.639344 0.639344 10.11 0 P
Wald F statistics
Source of Variation NumDF F-incr
8 site 3 1230.64
9 chk.site 20 11.36
Practitioners have taken two views on whether check or reference
lines should be fitted as fixed effects, or just treated as random effects in
the set of genotypes. It may effect the variance of the test/genotype effects.
Comparison of test lines with check lines is easier if all are in the
same random factor but this analysis takes the former approach.
Notice in passing that chk.site has 20 rather than 18 degrees of freedom
because of the fillin variety at the first 2 sites.
Our aim in this analysis is to get good predictions of test line effects.
To do this we can compare several models for the genetic variances and covariances
of test lines across sites.
We are interested in the common effect of genotype across sites,
but also to know to what extent individual sites diverge from
the common genotype rankings.
Our naive first model simply fits a common genetic effect, primarily to show
why this model is inadequate. It implies the genetic variance is the
same at all sites and the genetic correlation between sites is 1.
Both assumptions are unlikely given that the residual variances range
from 0.12 to 2.77.
Adding site.test to the random model
allows for a common covariance less than the common variance. It increases the
LogL from -314.262 to -310.879, highly significant and gives components
test IDV_V 336 0.865261E-01 0.865261E-01 5.45 0 P
site.test IDV_V 1008 0.438068E-01 0.438068E-01 2.60 0 P
This represents a genetic correlation of 0.66=0.08677 /( 0.08677+0.04418)
and an average genetic variance of 0.1308= 0.08677+0.04418, up from 0.1031.
However, the residual site variances are quite different (20 fold difference) so
it is likely that the genetic variances also differ between sites.
Our next model therefore fits a common correlation but heterogeneous variances.
We drop test from the list of model terms and put the CORUH
structure on the site component of site.test.
The ASReml code is
yld ~ site chk.site ,
!r at(site,3).row .02 at(site).col .90 .40 .036 coruh(site).test
at(site).ar1(row).ar1(column)
The LogL increases 22.4 (P<0.01) to -288.484 (P<0.01) and the components are
coruh(site).test 1008 effects
site COR_R 1 0.418390 0.418390 6.35 0 U
site COR_V 1 0.990953 0.990953 7.95 0 U
site COR_V 2 0.152282 0.152282 2.73 0 U
site COR_V 3 0.122242 0.122242 7.15 0 U
So the common correlation under this model is 0.42 (down from 0.66) and
the site variances are 0.99, 0.15 and 0.12 respectively; site 1 being particularly high.
This CORUH model would typically be the first model fitted.
In this case there is genetic variance at each site but sometimes
that will not be the case.
Is the assumption of common correlation justified? Finally we fit
an unstructured model ( us(site).test). This is equivalent to an XFA1 model ( xfa1(site).test) for 3
sites but with more sites, the XFA1 model might be
more parsimonious.
To get convergence, it is helpful to start from the results from the previous model.
If the last model run was the CORUH model, ASReml will start from there
if you specify !CONTINUE.
The resulting genetic variance matrix is:
.991 | .158 | .132
|
.158 | .074 | .078
|
.132 | .078 | .121
|
What next? Predicted values for the lines at each site are given in the
.pvs file by the predict statement
predict site test check 7 # Check 7 is the mean effect for test
and averaging sites, weighting by the genetic variances (from the
CORUH model) is given by
predict test chk 7 !ave site {1.00 2.56 2.85}/ 6.42
where the weights were calculated as 1/sqrt(c(0.991289,0.152347,0.122776)).
Five site MET example
This second example is also an early generation trial but
with more sites (5). It demonstrates a typical set of five models
fitted also to current latestage trials.
There were 330 genotypes
replicated twice at each site except 7 plots were sown to a fill in variety.
!RE !WORK 100 !NO !CONTINUE !ARG 1 2 3 4 // !DOPART $1
Title: met.a307.
Plot * Block * Entry * Column * Row *
Genotype !A Site !A 5 Year !I Environment !A
yield !*0.001
met307.csv !SKIP 1
!PART 1
tab Block Col Row Gen ~ Site !stats # tabulates structural factors
yield ~ mu Site at(Site).lin(Row) at(Site).lin(Col) mv , # fixed model
!r at(Site).Row at(Site).Col at(Site).Block coruh(Site).Geno # random model
!PART 2 3 4 5 # Fits Revisedd base model
yield ~ mu Site at(Site,1,2,4,5).lin(Row) at(Site,5).lin(Col) mv ,
!r at(Site,01,02,03,05).Row .003 .003 .0001 .0008 ,
at(Site).Col .06 .007 .002 .02 .001 ,
at(Site,2).Block .05 ,
!PART 2 // coruh(Site).Geno # Starting values inserted from PART 1
!PART 3 // us(Site).Geno
!PART 4 // xfa1(Site).Geno
!PART 5 // xfa2(Site).Geno
!PART 0 //residual at(Site).ar1(Row).ar1(Col)
Part 1 ended after 12 iterations with a LogL of 2563.66.
Parts 2, 3 and 4 fit a slightly simplified model. Part 2 (starting from Part 1 estimates)
converged in 7 iterations to LogL 2586.7. These LogLs are not
comparable because some small fixed effects were dropped from
part 2. Both these fitted the genetic variance matrix assuming equal
correlation among sites but different variances CORUH.
Factor Analytic models provide a parsimonious approach generalising the
covariance structure. Replacing CORUH
with XFA1
in part 3 increased the LogL to 2611.71, a substantial gain.
The XFA2 model in part 4 increased the LogL to 2621.57 after 10
iterations (not quite converged). Finally, fitting US
in part 5 resulted in a LogL of 2622.06. This has only one more
parameter than the XFA2 model which has 4 more free parameters
than the XFA1 model. Consequently the XFA2 model is the best
fit.
In this case the unstructured variance matrix can be fitted but
in general, especially with more than five sites, with high
genetic correlations and with fewer genotypes, a positive definite
unstructured variance matrix can not be fitted.
The final genetic
variance matrix was
Covariance/Variance/Correlation Matrix UnStructured Site.Geno
0.05947 0.6092 0.7241 0.5229 0.7736
0.04755 0.1025 0.6013 0.5910 0.6717
0.02119 0.02310 0.01440 0.4829 0.7818
0.04943 0.07334 0.02247 0.1503 0.4438
0.04657 0.05308 0.02316 0.04247 0.0609
Recall, the average correlation from the CORUH model was 0.626.
It is convenient at this point to explore the XFA model.
It is akin to Principal Components analysis. The XFA1 model
assumes a single latent genotype variable explains the covariance
while XFA2 assumes two latent genotype variables.
The XFA2 model forms the across site variance matrix as
ΓΓ'+Ψ with estimates from this example
of
| 0.0172 | 0 | 0 | 0 | 0
|
| 0 | 0.043 | 0 | 0 | 0
|
Ψ= | 0 | 0 | 0.004 | 0 | 0
|
| 0 | 0 | 0 | 0.000 | 0
|
| 0 | 0 | 0 | 0 | 0.008
|
and
|
| 0.186 | 0.088
|
| 0.237 | 0.055
|
Γ= | 0.089 | 0.049
|
| 0.347 | -.171
|
| 0.188 | 0.133
|
The elements in the diagonal matrix Ψ are known as
specific variances and represent the variation in genotype
effects that is site specific (not associated with the factors).
The elements of Γ are the loadings for the two factors
at the five sites. They represent the regression of the genotype
effects for each site on the latent factors and plotting them
maps the sites according to genetic similarity
highlighting that site 4 is most divergent.
The XFA2 variance matrix as presented in the .asr file (slightly reformated) is
Covariance/Variance/Correlation Matrix XFA xfa(Site,2).Geno
0.0594 0.6252 0.7120 0.5247 0.7762 0.7615 0.3627
0.0488 0.1024 0.6175 0.5883 0.6564 0.7391 0.1720
0.0208 0.0237 0.0144 0.4853 0.7850 0.7403 0.4089
0.0494 0.0727 0.0225 0.1490 0.4485 0.8983 -0.4394
0.0467 0.0518 0.0233 0.0427 0.0609 0.7628 0.5387
0.1857 0.2365 0.0889 0.3468 0.1883 1.000 0.000
0.0884 0.0550 0.0491 -0.1696 0.1330 0.000 1.000
The first 5 by 5 block is the variance matrix, directly
comparable to the US matrix displayed above, with
derived correlations in the upper right triangle.
The first 5 columns of the last 2 rows are the loadings, Γ',
being the covariances of the genotypes effects at the 5 sites
with the latent factors.
Similarly, the first 5 rows of the last 2 columns are
the correlations of the genotypes effects at the 5 sites
with the latent factors. The final 2 by 2 identity matrix
relates to the two factors.
The .sln file also contains genotype effects for the two
factors, that is the genotype scores,
as well as genotype effects for the 5 sites. For example
xfa(Site,2).Geno BLA3071.VV5866 0.1006 0.9392E-01
xfa(Site,2).Geno MTA3071.VV5866 0.1237 0.1680
xfa(Site,2).Geno PNA3071.VV5866 0.7299E-01 0.5347E-01
xfa(Site,2).Geno RSA3071.VV5866 -0.3641 0.1638
xfa(Site,2).Geno WTA3071.VV5866 0.2089 0.8414E-01
xfa(Site,2).Geno Factor-1.VV5866 -0.2370 0.3487
xfa(Site,2).Geno Factor-2.VV5866 1.661 0.5551
The site BLUPS are calculated as Γs+d where
s is the score
and d is
a lack of fit residual (with variance given by Ψ).
Thus
| | 0.1006 | | | | | | | 0.186 | 0.088 | | | | -.237 | | | | | | -.0015|
|
| | 0.1237 | | | | | | | 0.237 | 0.055 | | | | 1.661 | | | | | | .0885 |
|
| | 0.0730 | | | | = | | | 0.089 | 0.049 | | | | | + | | | .0127 |
|
| | -0.3641 | | | | | | | 0.347 | -.171 | | | | | | | | .0022 |
|
| | 0.2089 | | | | | | | 0.188 | 0.133 | | | | | | | | .0325 |
|
The factors will be close to orthogonal if the user has not applied
explicit constraints to the loadings, or they can be rotated as shown in the
next example to be orthogonal. Plotting the genotype scores on the factor axes
gives a two dimensional representation of them. Plotting the loadings gives
information on the similarity of sites with respect to genotype ranking.
Plotting both together provides a biplot.
The average genotype ranking is given by predicting genotype effects (BLUPs)
at the average
loadings (0.213, 0.031). Stability of genotype performance is
assessed by evaluating them at the extreme sites. Sites 1, 3 and 5 are similar
(average 0.154, 0.090) while site 2 is close to the average and 4 is quite
different.
The basic predict statements are therefore
predict Genotype !AVE site 5*0 0.154 0.090 !ONLY xfa(Site,2).Geno # Average 1 3 5
predict Genotype !AVE site 5*0 0.213 0.031 !ONLY xfa(Site,2).Geno # Average all
predict Genotype !AVE site 5*0 0.237 0.055 !ONLY xfa(Site,2).Geno # Site 2
predict Genotype !AVE site 5*0 0.347 -.171 !ONLY xfa(Site,2).Geno # Site 4
Note that these predictions are based solely on the factors and therefore
ignore the specific variance effects. Note the !ONLY
qualifier. Without it, the predictions are not estimable because
they include the mu term but no other fixed effects. The specific variance effects
are not included in the ASReml output but can be calculated with a
predict statement like (for site 4)
predict Geno !AVE Site 0 0 0 1 0 -.347 0.171 !ONLY xfa(Site,2).Geno
The predict statement
predict Geno !AVE {1 0 1 0 1}/3}
gives predictions that include the mean and the specific variance effects.
Meta analysis of trial means
The combined analysis of experiments at the plot level discussed in the previous subsection is suitable for up to
10 experiments although an XFA structure would typically be fitted
in place of US with more than 3 experiments. However,
when it comes to later stages of selection, it is desirable to
include as many experiments as possible representing different locations and
seasons when comparing genotypes.
This will commonly be over 30 experiments after 3 years of evaluation,
and could easily be as many as 100 trials. Furthermore, few genotypes will be represented at
all experiments.
We describe a two step approach. First, each experiment
is analysed separately under a spatial model and predicted
genotype means are produced along with a set of weights
( !TWOSTAGEWEIGHTS).
Ideally these should be stored in a database so that they can
be conveniently retrieved later for subsequent analysis.
We also store the site mean yield and the residual variance
along with details of the spatial model fitted.
For this example, we have extracted 2019 predicted lupin yields
from the data base, with their weights. They represent 203 genotypes
and 87 experiments conducted in 3 regions over 5 years.
The yields range from 0.09300 to 5.613 with an average of 1.732.
The experiment variances range 0.00037 to 0.04671 with an average of 0.02491.
Note that the weights have been scaled by this average value
(via transformation and back again in the model) so that variances
have their natural scale.
Following is code that fits 5 models to this data.
!WORK 1 !NO !CONTINUE !RENAME !ARG 11 1 2 // !DOPART $1
Title: ALBUS Second stage.
#trial,year,region,variety,yield,rep,weight,ems
#KFA02BURU,2002,NSW,KIEV-MUTANT,0.873,3,2136.562,0.0010000
trial !A year !I region !A variety !A
yield rep * weight !*0.025 ems
ALBUS.csv !SKIP 1 !MAXIT 40
!PART 11 # Shows trials 51 and 85 have minimal variance: fix .0001
yield !wt=weight !DISP 0.25 ~ mu trial !r coruh(trial).variety
!PART 1 2 3 4
yield !wt=weight !DISP 0.25 !SIGMAP ~ mu trial !r xfa(trial,$1).var
# LogL 2911.52 +117.83 =3029.35 +118 =3152|3147
The equal correlation model is fitted first in part 11. After 40 iterations,
it had converged with a LogL of 2783.3 with an average genetic
correlation of 0.55.
Continuing with XFA1 in part 1, the LogL increases to 2914.3
in another 40 iterations. Again, using !CONTINUE to start
with the XFA1 values, part 2 fits an XFA2 model.
After 40 iterations, the LogL had reached 3070.86. The parameters were still
changing slightly but this provides two factors and permits a biplot
of genotypes and experiments to be formed.
Proceeding to an XFA3 model the LogL increases nicely to 3177.52.
An increase in LogL of 107 for 85 effects
is about twice the 5% critical value. There is a table in the .res file
which shows
DISPLAY of variance partitioning for XFA structure in xfa3(trial).var
Lvl |----+----+----+----+----+----+----+----+----+----| TotalVar %expl PsiVar Loadings
1 | 1 2 3 | 0.0088 70.8 0.0026 0.0394 -0.0435 -0.0527
2 1 2 3 | 0.0106 44.7 0.0058 -0.0093 0.0179 -0.0657
3 | 1 2 3 | 0.0033 30.4 0.0023 0.0192 0.0168 -0.0190
4 | 1 2 3 | 0.0085 78.6 0.0018 0.0303 0.0712 -0.0269
5 | 1 2 0.0061 100.0 0.0000 0.0380 -0.0682 0.0069
6 | 1 2 3 | 0.0074 45.9 0.0040 0.0475 -0.0263 0.0211
7 | 123 | 0.0045 67.9 0.0014 0.0537 -0.0089 0.0099
8 | 1 2 3 | 0.0026 51.6 0.0013 0.0315 -0.0103 0.0156
9 | 1 3 | 0.0075 62.0 0.0028 0.0446 -0.0014 0.0513
10 | 1 2 3 | 0.0090 92.5 0.0007 0.0384 -0.0769 0.0310
11 | 1 2 3 | 0.0072 88.3 0.0008 0.0285 -0.0330 0.0667
12 | 1 2 3 | 0.0067 64.2 0.0024 0.0603 0.0164 0.0202
13 | 12 3 | 0.0018 31.5 0.0012 0.0106 -0.0043 0.0205
14 | 1 2 3 | 0.0044 47.6 0.0023 0.0397 0.0160 0.0156
15 | 1 | 0.0791 94.2 0.0046 0.2721 0.0203 -0.0088
16 | 1 2 3 | 0.0371 63.8 0.0134 0.0843 -0.0760 -0.1037
17 | 1 3 | 0.0171 57.8 0.0072 0.0846 -0.0122 0.0510
18 | 1 23 | 0.0320 80.6 0.0062 0.1306 0.0904 0.0240
19 | 12 3 | 0.0565 43.2 0.0321 0.0886 0.0393 0.1227
20 | 1 23 | 0.0168 76.0 0.0040 0.0675 -0.0893 0.0143
21 | 1 2 3 | 0.0111 85.5 0.0016 0.0913 -0.0254 0.0229
22 | 1 3 | 0.0148 79.4 0.0031 0.1061 0.0024 0.0219
23 | 1 2 3 | 0.0888 39.0 0.0542 0.1118 0.1283 -0.0750
24 | 1 23 0.0031 100.0 0.0000 0.0497 -0.0247 0.0067
25 | 1 2 | 0.0842 45.0 0.0463 0.1834 0.0564 0.0320
26 | 12 3 | 0.0053 86.4 0.0007 0.0611 -0.0050 0.0281
27 | 1 2 3 | 0.0166 67.7 0.0054 0.0854 -0.0233 -0.0582
28 1 2 3 | 0.0473 79.2 0.0099 0.0209 0.1877 -0.0422
29 1 23 | 0.0172 43.5 0.0097 -0.0064 0.0845 0.0175
30 | 1 3 | 0.0092 74.4 0.0024 0.0744 0.0010 0.0359
31 | 1 2 3 | 0.0220 89.7 0.0023 0.1200 0.0359 0.0639
32 | 1 2 3 0.0241 100.0 0.0000 -0.0491 0.0654 -0.1318
33 | 1 2 3 | 0.0161 77.3 0.0037 0.0899 -0.0613 -0.0249
34 | 1 2 3 0.0101 99.9 0.0000 0.0676 -0.0467 0.0577
35 | 1 2 3 | 0.0704 96.0 0.0028 0.2449 -0.0711 -0.0506
36 | 1 | 0.2242 76.0 0.0539 0.4111 0.0331 0.0140
37 | 12 | 0.0996 81.6 0.0183 0.2841 -0.0231 0.0105
38 | 13 | 0.0018 5.4 0.0017 0.0079 -0.0010 -0.0060
39 | 12 3 0.0115 100.0 0.0000 0.0654 0.0182 0.0833
40 1 2 3 | 0.0500 45.9 0.0271 -0.0004 0.1017 0.1122
41 | 1 2 3 | 0.0246 78.5 0.0053 0.1021 0.0674 0.0663
42 | 1 2 3 | 0.0206 88.0 0.0025 0.0595 0.0813 0.0891
43 | 12 3 | 0.0151 84.6 0.0023 0.0534 0.0075 0.0992
44 | 12 3 0.0094 100.0 0.0000 0.0694 0.0145 0.0661
45 | 1 2 3 0.0174 100.0 0.0000 0.0663 0.0973 0.0597
46 | 1 2 3 0.0018 100.0 0.0000 0.0331 -0.0095 0.0250
47 | 1 2 3 | 0.0457 94.4 0.0025 0.1458 0.0520 0.1387
48 | 1 2 3 0.0088 100.0 0.0000 0.0446 -0.0463 0.0685
49 | 1 2 3 0.0644 100.0 0.0000 0.0923 0.2107 -0.1069
50 |1 2 | 0.0083 8.2 0.0076 0.0132 0.0212 -0.0075
51 | 1 3 0.0011 100.0 0.0000 0.0090 -0.0035 -0.0315
52 | 13 | 0.0126 83.1 0.0021 0.1019 -0.0011 0.0073
53 | 1 2 3 0.1146 100.0 0.0000 0.2494 -0.1605 -0.1633
54 | 1 23 0.0623 100.0 0.0000 0.1917 0.1572 0.0279
55 | 1 2 3 0.0426 100.0 0.0000 0.1213 0.1126 -0.1232
56 | 1 2 | 0.0547 59.2 0.0223 0.1317 0.1222 -0.0074
57 | 1 23 | 0.1356 85.6 0.0196 0.3307 -0.0639 0.0507
58 | 13 | 0.1168 71.7 0.0331 0.2865 -0.0203 -0.0348
59 | 1 2 3 | 0.0441 90.3 0.0043 0.1105 0.1209 -0.1139
60 | 12 | 0.0956 85.0 0.0143 0.2842 0.0221 0.0008
61 | 1 2 3 0.0322 100.0 0.0000 0.1650 -0.0285 0.0642
62 | 1 2 | 0.1502 85.5 0.0218 0.3470 0.0879 0.0170
63 | 12 3 0.1101 100.0 0.0000 0.2789 0.0316 -0.1769
64 | 12 3 | 0.0849 94.9 0.0044 0.2622 -0.0243 0.1059
65 | 1 2 3 | 0.1147 84.9 0.0173 0.2611 0.1570 0.0674
66 | 1 2 3 0.0146 100.0 0.0000 0.0597 0.0569 0.0882
67 | 1 2 3 | 0.0643 48.9 0.0328 0.1420 0.0836 -0.0654
68 | 1 2 3 0.0489 100.0 0.0000 0.1826 0.0736 0.1006
69 | 1 23| 0.1296 97.7 0.0030 0.3449 0.0732 0.0484
70 | 1 23 | 0.2187 87.2 0.0280 0.4181 -0.1126 0.0560
71 | 1 2 | 0.0035 51.0 0.0017 0.0395 -0.0147 0.0002
72 | 12 3 | 0.0305 94.3 0.0017 0.0906 -0.0053 0.1431
73 | 1 2 3 | 0.0618 55.7 0.0274 0.0905 0.1542 -0.0490
74 | 1 2 3 | 0.0133 60.6 0.0052 0.0605 -0.0551 0.0367
75 | 1 2 3 | 0.1520 86.2 0.0210 0.0734 -0.2477 0.2535
76 | 1 2 3 | 0.0185 70.3 0.0055 0.0930 0.0298 0.0591
77 | 1 23 | 0.0029 67.5 0.0009 0.0403 -0.0170 0.0041
78 | 1 3 | 0.0007 60.8 0.0003 0.0164 0.0017 -0.0127
79 | 123 | 0.0649 48.8 0.0332 0.1676 0.0434 0.0409
80 | 1 2 3 | 0.0040 66.9 0.0013 0.0193 -0.0283 0.0386
81 | 1 23 | 0.0041 83.3 0.0007 0.0560 -0.0153 0.0039
82 | 1 2 3 | 0.0042 68.4 0.0013 0.0439 -0.0142 0.0271
83 | 1 2 3 | 0.0221 36.7 0.0140 0.0590 -0.0431 0.0527
84 | 1 2 3 | 0.0248 52.4 0.0118 0.0899 0.0430 -0.0554
85 | 12 3 0.0077 100.0 0.0000 0.0164 -0.0132 0.0853
86 | 1 2 3 0.4146 100.0 0.0000 -0.5301 0.2802 0.2346
87 | 1 3 | 0.0017 29.9 0.0012 0.0201 -0.0038 0.0101
0 |----+----+----+----+----+----+----+----+-- Average 0.0442 75.0 0.0079 0.0991 0.0178 0.0184
The last line shows that about 75 percent of the variance is explained by the common effects.
The process can be repeated for XFA4, XFA5 etc but it becomes increasingly difficult to
get convergence. In this round, the LogL for XFA4 increased to 3261 and then started jumping around.
This behaviour is probably related to the fact that we are fitting about 430 variance parameters ro 2019
data points; modeling a GxE table which has only 11 percent of cells occupied.
Nevertheless, we are doing better on this example than we were in 2008.
ASReml 3/4 produces orthogonal vectors when converged
if the user does not impose constraints. Following is
some R code for looking at the results
after extracting the XFA parameters into a file say albus4.xfa.
# copy XFA gammas from .asr file to albus4.xfa
XFAm <- matrix(scan('../albus4.gam'),87,5) # Read PSI and loadings
dimnames(XFAm) <- list(paste('S',1:87,sep='')
,c("Psi","Load_1","Load_2","Load_3","Load_4"))
pairs(XFAm) # Create figure 16.8
ss <- svd(XFAm[,-1])
Lam <- XFAm[,-1] %*% ss$v
colnames(Lam) <- c("Load_1","Load_2","Load_3","Load_4")
Gvar <- Lam %*% t(Lam) + diag(XFAm[,1])
cLam <- diag(1/sqrt(diag(Gvar))) %*% Lam
XFAsln <- read.table('../albus4.sln') # Scores
XFA4<-matrix(XFAsln$V3[XFAsln$V1=='xfa(trial,4).var'],203,91)
scores <- XFA4[,88:91] %*% ss$v
dimnames(scores)<-list(paste('V',1:203,sep='')
,paste('Load_',1:4,sep=''))
biplot(scores[,1:2],Lam[,1:2]) # Figure 16.9
Figure 1: Pairs plot for the lupin loadings
Figure 1 shows a pairs plot of the specific variances (Ψ)
and loadings (Γ). Recall that the XFA variance structure was
defined as Σ = ΓΓ' + Ψ.
These values have not been finally rotated and loading 4 is not
orthogonal to the others. The next step in the R script
generates orthogonal loadings in Lam for use in the biplot.
While the XFA model was defined in terms of modelling the variance
matrix, it can also be viewed as modelling the variance structure
with underlying (latent) factors, hence the name factor analysis.
In this case we
have fitted four factors and the loadings are the regressions
linking the site.variety BLUPs with the factor scores.
The specific variances represent the lack of fit.
The proportion of variance explained by the 4 factors is 89.9%
calculated as
mean(diag(Lam %*% t(Lam))/diag(Gvar)).
The separate amounts are 42.9%, 24.8%, 12.9% and 9.3% respectively.
cLam is correlation of environments
with factors. Examine the Ψ values to identify
sites which buck the trends.
The variety effects with respect to the factors are
reported in the .sln file. The next few commands
extract them and rotate them for plotting in the biplot
shown in figure 2. Well, thats a bit cluttered.
Figure 3 omits the central cluster.
Figure 2: Biplot for the lupin factors 1 and 2
mLam <- rep(1/87,87) %*% Lam #Get loading means
sLam <- Lam - rep(mLam,rep(87,4)) # Centre Loadings
dLam <- sqrt((sLam*sLam) %*% rep(1,4) ) # Distance from Centre
dSco <- sqrt((scores*scores) %*% rep(1,4))
biplot(Lam[dLam>0.1,1:2],scores[dSco>2,1:2])
Figure 3: Biplot plot for the lupin loadings
The following tables show the scores for the more extreme varieties,
and the loadings for the more extreme sites.
cbind(scores[dSco>2.8,],dSco[dSco>2.8])
Loadg_1 Loadg_2 Loadg_3 Loadg_4 Dist
V34 -2.4508 1.3773 -0.0762 0.8908 2.950
V107 -1.5465 -0.0070 2.6319 0.3389 3.071
V182 2.6397 0.9211 0.1007 -0.2059 2.805
cbind(Lam[dLam>0.325,],dLam[dLam>.325])
Load_1 Load_2 Load_3 Load_4 Dist
S36 -0.4141 -0.0062 -0.0054 0.0801 0.3252
S75 -0.0535 0.3655 -0.0185 -0.0114 0.3623
S86 0.2199 -0.0116 0.2003 0.0276 0.3761
More examples
Return to index