Genotype,0-10024-01-114,0-10037-01-257,0-10040-02-394,...
140099,2,2,1,2,2,2,2,2,2,1,2,1,2,1,1,2,1,2,2,2,2,2,1,2...
141099,2,2,0,0,2,2,1,2,2,1,2,1,2,2,0,2,2,2,2,1,2,2,1,1...
...
547853,2,2,1,2,2,2,1,2,2,0,2,1,2,2,2,2,2,2,2,1,2,...
547966,2,2,1,1,1,2,0,2,2,1,2,2,2,2,2,2,2,2,2,1,2,...
548082,2,2,1,2,2,2,1,2,1,2,2,1,2,2,1,2,2,2,2,1,2,...}
Using GRM matrices
One use of the GRM matrix is to allow more computationally efficient fitting
of random regression models associating u, a vector of f factor effects
with v a vector of m regression effects
through the model u=Mv
where the matrix M contains m regressor variables
for each of the f levels of the factor.
Direct fitting of the regression effects is facilitated by using the my basis function
( mbf function) associating the regressor variables
to the levels of the factor, essentially fitting ZMv where Z
is the design matrix linking observations to the levels of the factor.
But if m is much bigger than f,
it is more computational efficient to fit an equivalent model Zu
with a variance structure for u based on MM'.
ASReml can read the matrix M
associated with a factor and group of regressor variables from a .grr file,
construct a GRM matrix (G=MM'/s),
fit the equivalent model and report both factor and regressor predictions.
One common case of this model is when u represents genotype effects,
the regressors represent SNP marker counts (typically 0/1/2) and v are marker effects.
The .grr file
is specified after any pedigree file and before the data file (with any other GRM files).
There may only be one .grr file.
It is assumed to contain a row for each level of the factor, each row containing
m regressor values. Optionally the factor level name associated with the i-th row
can be included before the relevant regressor values.
Also a heading row might include a name for each field/regressor variable.
Superfluous fields before the factor or regressor fields can be skipped
and superfluous rows before the regressor information can be skipped.
The syntax for specifying and reading the .grr file is
M.grr [!CSKIP c1] Factor [f] [!NOID] [!CSKIP c2] Regressors [m] [!NONAMES] [!SKIP s]
where
M .grr is the name of the file to be read,
!CSKIP c1 indicates c1 fields are to be skipped before the factor identifiers are read,
Factor is the name of the variable in the data that is associated with the regressors,
f sets the maximum number of levels (default 1000) of Factor with regressor data; \ASReml\ will count the actual number,
!NOID indicates that the factor identifiers are not present in the .grr file,
!CSKIP c2 indicates c2 fields are to be skipped before the regressor variables are read,
Regressors is the name for the set of regressor variables,
m sets the number of regressor variables (default is the number of names found); must be set if there are extraneous fields to be ignored,
!SKIP s specifies how many lines are to be skipped before reading the regressor data,
!NONAMES indicates there is no line containing the individual names of the regressor variables;
otherwise names are taken from the first (non-skipped) line in the file.
If the factor identifiers are not present ( !NOID), ASReml assumes that the order
of the factor classes in the data file matches the order in the .grr file.
If the factor identifiers are present, ASReml uses the identifiers obtained from the .grr file
to define the order of the factor classes when the data is read;
any extra identifiers in the data not in the .grr file are appended
at the end of the factor level name list.
If !NOID is set, identifiers in the .grr file are not needed and if present should be skipped using !CSKIP.
Values are typically
TAB, COMMA or SPACE separated but may be packed (no separator) when
all values are integers 0/1/2. Missing values in the regression variables may be
represented by *, NA. Invalid data is also treated as missing.
Missing values are replaced by the mean of the respective regressor.
Alternative missing data methods that involve imputation from neighbouring markers have not been implemented.
Some general qualifiers are:
!SAVEGIV instructs ASReml to write the G matrix in .dgiv format,
!PSD s declares that the derived variance matrix may have up to s singularities,
!PEV requests calculation of Prediction Error Variance of marker effects which
are reported in the .mef
file. Calculation of Prediction error variances is computationally very expensive,
!CENTRE\index{qualifier! "!CENTRE } [c] requests ASReml to centre the regressors at c if c is specified
else at the individual regressor means;
otherwise the G matrix is formed from uncentered regressors.
Other qualifiers relate specifically to whether the regressors are markers.
Markers are typically coded 0/1/2 being counts of the minor allele. However,
if they are imputed, they will take real values between 0 and 2. Since marker files may be huge,
!SMODE b sets the storage mode for the regressor data, indicating whether it is marker data:
b = 2 sets 2bit storage for strictly 0/1/2 marker data,
b=8 (the default) sets 8bit storage useful for marker data with imputed values having 2 digits
after the decimal,
b = 16 sets 16bit storage useful for marker data with imputation with more than 2 digits
and b = 32 sets 32bit real storage and should be used for non-marker data,
!RANGE l h indicates the marker scores range l:h
and are to be transformed to have a range 0:2,
!GSCALE s, controls the scaling of the GRM matrix.
If unspecified s=Σ 2p(1-p) is used for marker data, s=1 for non marker data ( !SMODE 32).
Scaling is often used with centred marker data to scale the MM' matrix so that it is a genomic matrix.
Example
In this forestry example, there are multiple trees per clone and so
two clone model terms;
grm1(Clone)} fits the additive genetic variance based on marker covariance,
Clone fits the non-additive variance, while
the residual represents the within clone samplng variance.
!WORK 1
Nassau Clone Data
Nfam 71 !A
Nfemale 26 !A
Nmale 37 !A
Clone !A 860
rep 8
iblk 80
culture !A
DBH6
snpData.grr Clone
nassau.csv !MAXIT 30 !SKIP 1 !DFF -1
DBH6 ~ mu culture/rep !r grm1(Clon) 0.27 Clone 0.15 rep.iblk 0.31
where snpData.grr is first used to declare Clone identifiers (taken from the first field)
in the correct order,
and then contains the marker scores; it looks like
Genotype,0-10024-01-114,0-10037-01-257,0-10040-02-394,...
140099,2,2,1,2,2,2,2,2,2,1,2,1,2,1,1,2,1,2,2,2,2,2,1,2...
141099,2,2,0,0,2,2,1,2,2,1,2,1,2,2,0,2,2,2,2,1,2,2,1,1...
...
547853,2,2,1,2,2,2,1,2,2,0,2,1,2,2,2,2,2,2,2,1,2,...
547966,2,2,1,1,1,2,0,2,2,1,2,2,2,2,2,2,2,2,2,1,2,...
548082,2,2,1,2,2,2,1,2,1,2,2,1,2,2,1,2,2,2,2,1,2,...
The primary output follows.
ASReml 4.1 [01 Apr 2014] Testing Pedigree Matrices against Marker Matrices for Variance Partition with Na
Build lg [15 Sep 2014] 64 bit Windows x64
16 Sep 2014 14:11:26.277 1024 Mbyte clonesHT6_2/clones
..
Nfam 71 !A
Nfemale 26 !A
Nmale 37 !A
Clone !A 860
MatOrder 914 !A
rep 8 !A
iblk 80 !A
prop 1 !A
culture 2 !A
treat 2 !A
measure 1 !A
CWAC6 !M-9
Class names for factor "Clone" are initialized from the .grr file.
Marker Header: Genotype,0-10024-01-114,0-10037-01-257,0
4854 Marker labels found
Marker labels 0-10024-01-114 ... UMN-CL98Contig1-
Notice: SNP data begins: 140099,2,2,1,2,2,2,2,2,2,1,2,1,2,1,1,
Notice: Markers coded -9 treated as missing.
Marker data [0/1/2] for 923 genotypes and 4854 markers read from snpData.grr
160414 missing marker values ( 3.6%) replaced by column average!
Marker values ranged 0.00 to 2.00
Marker Means ranged 1.00 to 2.00
Sigma2p(1-p) is 1057.12515
GIV1 snpData.grr 923 9 -947.91
QUALIFIERS: !MAXIT 30 !SKIP 1 !DFF -1
QUALIFIER: !DOPART 2 is active
Reading nassau.csv FREE FORMAT skipping 1 lines
Univariate analysis of HT6
Summary of 6399 records retained of 6795 read
Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn
1 Nfam 71 0 0 1 36.3379 71
2 Nfemale 26 0 0 1 12.8823 26
3 Nmale 37 0 0 1 15.2285 37
Warning: More levels found in Clone than specified
4 Clone 926 0 0 1 464.6765 926
Warning: Fewer levels found in MatOrder than specified
5 MatOrder 914 0 0 1 432.5760 860
6 rep 8 0 0 1 4.4837 8
7 iblk 80 0 0 1 40.1164 80
8 tree 0 0 1.0000 7.473 14.00 4.018
9 row 0 0 1.0000 28.52 56.00 16.09
10 col 0 0 1.0000 10.50 20.00 5.760
Warning: Fewer levels found in prop than specified
11 prop 2 0 0 1 1.0000 1
12 culture 2 0 0 1 1.4945 2
13 treat 2 0 0 1 1.4945 2
Warning: Fewer levels found in measure than specified
14 measure 2 0 0 1 1.0000 1
15 SURV 0 6 1.0000 0.9991 1.0000 0.3061E-01
16 DBH6 4 0 0.3000E-01 11.29 18.80 2.400
17 HT6 Variate 0 0 76.20 838.6 1286. 163.6
18 HT8 83 0 91.44 1148. 1576. 170.6
19 CWAC6 3167 0 97.54 301.3 542.5 52.26
20 mu 1
21 culture.rep 16 12 culture : 2 6 rep : 8
Warning: GRM matrix is too SMALL
22 grm1(Clone) 923
23 rep.iblk 640 6 rep : 8 7 iblk : 80
Forming 2508 equations: 19 dense.
Initial updates will be shrunk by factor 0.316
Notice: LogL values are reported relative to a base of -30000.000
Notice: 11 singularities detected in design matrix.
1 LogL=-2845.13 S2= 8956.4 6390 df
2 LogL=-2798.45 S2= 8568.1 6390 df
3 LogL=-2758.19 S2= 8131.3 6390 df
4 LogL=-2741.14 S2= 7766.2 6390 df
5 LogL=-2740.55 S2= 7702.9 6390 df
6 LogL=-2740.54 S2= 7700.1 6390 df
- - - Results from analysis of HT6 - - -
Akaike Information Criterion 65489.09 (assuming 4 parameters).
Bayesian Information Criterion 65516.14
Model_Term Gamma Sigma Sigma/SE % C
rep.iblk IDV_V 640 0.307847 2370.47 13.00 0 P
grm1(Clone) GRM_V 923 0.275811 2123.79 5.82 0 P
Clone IDV_V 926 0.152452 1173.90 6.08 0 P
Residual SCA_V 6399 1.000000 7700.14 49.64 0 P
Wald F statistics
Source of Variation NumDF F-inc
20 mu 1 0.11E+06
12 culture 1 2616.00
21 culture.rep 6 30.44
23 rep.iblk 640 effects fitted
22 grm1(Clone) 923 effects fitted
4 Clone 926 effects fitted ( 66 are zero)
78 possible outliers: see .res file
Finished: 16 Sep 2014 14:12:50.574 LogL Converged
Notes:
of 926 clones identified, 860 have data and 923 have genomic data.
The .res file contains additional details about the analysis
including a listing of the larger marker effects. All marker effects are
reported in the .mef file.
Particular columns of the .grr data can be included in the model using the
grr( Factor,i) model term where and i
specifies which (number) regressor variable to include.
Listing of the larger marker effects
368 0-12761-01-121 1.40736 0.00000
617 0-14383-01-111 1.26081 0.00000
777 0-15417-01-138 -1.25597 0.00000
1246 0-18644-02-210 1.22522 0.00000
1903 0-6963-01-202 -1.24800 0.00000
2102 0-8683-02-432 1.15496 0.00000
2445 2-1563-02-244 -1.35181 0.00000
2497 2-2167-01-413 -1.21339 0.00000
3180 2-8668-03-42 -1.21629 0.00000
3521 CL1577Contig1-03 -1.15833 0.00000
3802 CL2573Contig1-03 1.17005 0.00000
4195 CL595Contig1-01- -1.19330 0.00000
4351 UMN-1397-01-416 -1.34916 0.00000
Return to index