Genetic Marker operators

Setting marker positions

!MMAP s assigns Haldane map positions (s) to marker variables and imputes values to the markers where they have missing values. This transformation will normally be used on a !G n factor where the n variables are the marker states for n markers in a linkage group in map order and coded [-1,1] (backcross) or [-1,0,1] (F2 design). s (length n+1) should be the n marker positions relative to a left telomere position of zero, and an extra value being the length of the linkage group (the position of the right telomere). The length (right telomere) may be omitted in which case the last marker is taken as the end of the linkage group. The positions may be given in Morgans or centiMorgans (if the length is greater than 10, it will be divided by 100 to convert to Morgans). Further details are provided in the User Guide.
 ChrAadd !G 10 !MM 1 ...

Dominance variables

!DOM A is used to form dominance covariables from a set of additive marker covariables previously declared with the !MM marker map qualifier. It assumes the argument A is an existing group of marker variables relating to a linkage group defined using !MM which represents additive marker variation coded [-1, 0, 1] (representing marker states aa, aA and AA) respectively. It is a group transformation which takes the [-1,1] interval values, and calculates (|X|-0.5)*2 i.e. -1 and 1 become one, 0 becomes -1. The marker map is also copied and applied to this model term so it can be the argument in a qtl().
  ChrAdom !DOM ChrAadd

Large Random Regression (GBLUP from large marker files)

One use of the GRM matrix is to fit a Genomic model where G=MM'/s, M is a centred matrix of marker scores (0, 1, 2) and s=2Σ pi1-pi where pi is the proportion for the minor allele. The marker file is recognised by the filename extension .mkr and is specified after any pedigree file and before the data file (with any other GRM files). There may only be one .mkr file. The standard format is to have SNP markers coded in the columns, Genotypes in the rows, the first column identifying (labelling) the Genotypes and the first row identifying the markers. ASReml does not utilize the information in the first row or first column. The syntax for the line is
SNP .mkr !IDS n !MARKERS m [!CENTRE !CSKIP c !SKIP s !HEADER h] where
SNP .mkr is the name of the marker file
!IDS n informs ASReml to expect markers for n genotypes
!MARKERS m informs \ASReml to expect data for m markers
!CENTRE requests ASReml to centre the marker scors at the mean; this also reverses the coding if the mean is greater than one.
!CSKIP c tells ASReml to skip the first c columns in the file. Use it if the first field is not integer, or there is more than 1 field to skip.
!SKIP s tells ASReml to skip the first s rows in the file.
!HEADER 0 indicates that a numeric header line is not present.If the header is alphanumeric, use !SKIP 1 !HEADER 0.

Missing values in the marker file may be represented by *, NA or any number outside the range (-2,2). They are replaced by the mean of the marked levels.

Example


 !WORK 1
 Nassau Clone Data
 Nfam 71 !A
 Nfemale 26 !A
 Nmale 37 !A
 Nclone 857 !A   !L  Clones.txt !LSKIP 1
 rep 8 !A
 iblk 80 !A
 culture 2 !A
 DBH6	

 snpData.mkr  !SKIP 1 !HEAD 0  !CENTRE   !MARKERS 4854 !IDS 923

 nassau.csv !MAXIT 30 !SKIP 1 !DFF -1
 DBH6 ~  mu culture culture.rep !r grm1(Nclo) 0.27 Nclone 0.15 rep.iblk 0.31
Clones.txt contains the genotype identifiers (as used in the data file) as a list the first field and in the order of the marker file, snpData.mkr which looks like
 Genotype,0-10024-01-114,0-10037-01-257,0-10040-02-394,...
 140099,2,2,1,2,2,2,2,2,2,1,2,1,2,1,1,2,1,2,2,2,2,2,1,2...
 141099,2,2,0,0,2,2,1,2,2,1,2,1,2,2,0,2,2,2,2,1,2,2,1,1...
 ...
 547853,2,2,1,2,2,2,1,2,2,0,2,1,2,2,2,2,2,2,2,1,2,...
 547966,2,2,1,1,1,2,0,2,2,1,2,2,2,2,2,2,2,2,2,1,2,...
 548082,2,2,1,2,2,2,1,2,1,2,2,1,2,2,1,2,2,2,2,1,2,...}
  • Back to transformation summary
  • Using GRM matrices One use of the GRM matrix is to allow more computationally efficient fitting of random regression models associating u, a vector of f factor effects with v a vector of m regression effects through the model u=Mv where the matrix M contains m regressor variables for each of the f levels of the factor. Direct fitting of the regression effects is facilitated by using the my basis function ( mbf function) associating the regressor variables to the levels of the factor, essentially fitting ZMv where Z is the design matrix linking observations to the levels of the factor. But if m is much bigger than f, it is more computational efficient to fit an equivalent model Zu with a variance structure for u based on MM'. ASReml can read the matrix M associated with a factor and group of regressor variables from a .grr file, construct a GRM matrix (G=MM'/s), fit the equivalent model and report both factor and regressor predictions. One common case of this model is when u represents genotype effects, the regressors represent SNP marker counts (typically 0/1/2) and v are marker effects.

    The .grr file is specified after any pedigree file and before the data file (with any other GRM files). There may only be one .grr file. It is assumed to contain a row for each level of the factor, each row containing m regressor values. Optionally the factor level name associated with the i-th row can be included before the relevant regressor values. Also a heading row might include a name for each field/regressor variable. Superfluous fields before the factor or regressor fields can be skipped and superfluous rows before the regressor information can be skipped.

    The syntax for specifying and reading the .grr file is
    M.grr [!CSKIP c1] Factor [f] [!NOID] [!CSKIP c2] Regressors [m] [!NONAMES] [!SKIP s]
    where
    M .grr is the name of the file to be read,
    !CSKIP c1 indicates c1 fields are to be skipped before the factor identifiers are read,
    Factor is the name of the variable in the data that is associated with the regressors,
    f sets the maximum number of levels (default 1000) of Factor with regressor data; \ASReml\ will count the actual number,
    !NOID indicates that the factor identifiers are not present in the .grr file,
    !CSKIP c2 indicates c2 fields are to be skipped before the regressor variables are read,
    Regressors is the name for the set of regressor variables,
    m sets the number of regressor variables (default is the number of names found); must be set if there are extraneous fields to be ignored,
    !SKIP s specifies how many lines are to be skipped before reading the regressor data,
    !NONAMES indicates there is no line containing the individual names of the regressor variables; otherwise names are taken from the first (non-skipped) line in the file.

    If the factor identifiers are not present ( !NOID), ASReml assumes that the order of the factor classes in the data file matches the order in the .grr file. If the factor identifiers are present, ASReml uses the identifiers obtained from the .grr file to define the order of the factor classes when the data is read; any extra identifiers in the data not in the .grr file are appended at the end of the factor level name list. If !NOID is set, identifiers in the .grr file are not needed and if present should be skipped using !CSKIP.

    Values are typically TAB, COMMA or SPACE separated but may be packed (no separator) when all values are integers 0/1/2. Missing values in the regression variables may be represented by *, NA. Invalid data is also treated as missing. Missing values are replaced by the mean of the respective regressor. Alternative missing data methods that involve imputation from neighbouring markers have not been implemented.

    Some general qualifiers are:
    !SAVEGIV instructs ASReml to write the G matrix in .dgiv format,
    !PSD s declares that the derived variance matrix may have up to s singularities,
    !PEV requests calculation of Prediction Error Variance of marker effects which are reported in the .mef file. Calculation of Prediction error variances is computationally very expensive,
    !CENTRE\index{qualifier! "!CENTRE } [c] requests ASReml to centre the regressors at c if c is specified else at the individual regressor means; otherwise the G matrix is formed from uncentered regressors.

    Other qualifiers relate specifically to whether the regressors are markers. Markers are typically coded 0/1/2 being counts of the minor allele. However, if they are imputed, they will take real values between 0 and 2. Since marker files may be huge,
    !SMODE b sets the storage mode for the regressor data, indicating whether it is marker data: b = 2 sets 2bit storage for strictly 0/1/2 marker data, b=8 (the default) sets 8bit storage useful for marker data with imputed values having 2 digits after the decimal, b = 16 sets 16bit storage useful for marker data with imputation with more than 2 digits and b = 32 sets 32bit real storage and should be used for non-marker data,
    !RANGE l h indicates the marker scores range l:h and are to be transformed to have a range 0:2,
    !GSCALE s, controls the scaling of the GRM matrix. If unspecified s=Σ 2p(1-p) is used for marker data, s=1 for non marker data ( !SMODE 32). Scaling is often used with centred marker data to scale the MM' matrix so that it is a genomic matrix.

    Example

    In this forestry example, there are multiple trees per clone and so two clone model terms;
    grm1(Clone)} fits the additive genetic variance based on marker covariance,
    Clone fits the non-additive variance, while
    the residual represents the within clone samplng variance.
     !WORK 1
     Nassau Clone Data
      Nfam 71 !A
      Nfemale 26 !A
      Nmale 37 !A
      Clone  !A 860
      rep 8
      iblk 80
      culture  !A
      DBH6
    
     snpData.grr Clone
    
     nassau.csv !MAXIT 30 !SKIP 1 !DFF -1
     DBH6 ~  mu culture/rep  !r grm1(Clon) 0.27 Clone 0.15 rep.iblk 0.31
    
    where snpData.grr is first used to declare Clone identifiers (taken from the first field) in the correct order, and then contains the marker scores; it looks like
     Genotype,0-10024-01-114,0-10037-01-257,0-10040-02-394,...
     140099,2,2,1,2,2,2,2,2,2,1,2,1,2,1,1,2,1,2,2,2,2,2,1,2...
     141099,2,2,0,0,2,2,1,2,2,1,2,1,2,2,0,2,2,2,2,1,2,2,1,1...
     ...
     547853,2,2,1,2,2,2,1,2,2,0,2,1,2,2,2,2,2,2,2,1,2,...
     547966,2,2,1,1,1,2,0,2,2,1,2,2,2,2,2,2,2,2,2,1,2,...
     548082,2,2,1,2,2,2,1,2,1,2,2,1,2,2,1,2,2,2,2,1,2,...
    
    The primary output follows.
     ASReml 4.1 [01 Apr 2014] Testing Pedigree Matrices against Marker Matrices for Variance Partition with Na
       Build lg [15 Sep 2014]   64 bit  Windows x64
     16 Sep 2014 14:11:26.277   1024 Mbyte  clonesHT6_2/clones
    ..
    Nfam 71 !A Nfemale 26 !A Nmale 37 !A Clone !A 860 MatOrder 914 !A rep 8 !A iblk 80 !A prop 1 !A culture 2 !A treat 2 !A measure 1 !A CWAC6 !M-9 Class names for factor "Clone" are initialized from the .grr file. Marker Header: Genotype,0-10024-01-114,0-10037-01-257,0 4854 Marker labels found Marker labels 0-10024-01-114 ... UMN-CL98Contig1- Notice: SNP data begins: 140099,2,2,1,2,2,2,2,2,2,1,2,1,2,1,1, Notice: Markers coded -9 treated as missing. Marker data [0/1/2] for 923 genotypes and 4854 markers read from snpData.grr 160414 missing marker values ( 3.6%) replaced by column average! Marker values ranged 0.00 to 2.00 Marker Means ranged 1.00 to 2.00 Sigma2p(1-p) is 1057.12515 GIV1 snpData.grr 923 9 -947.91 QUALIFIERS: !MAXIT 30 !SKIP 1 !DFF -1 QUALIFIER: !DOPART 2 is active Reading nassau.csv FREE FORMAT skipping 1 lines Univariate analysis of HT6 Summary of 6399 records retained of 6795 read Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn 1 Nfam 71 0 0 1 36.3379 71 2 Nfemale 26 0 0 1 12.8823 26 3 Nmale 37 0 0 1 15.2285 37 Warning: More levels found in Clone than specified 4 Clone 926 0 0 1 464.6765 926 Warning: Fewer levels found in MatOrder than specified 5 MatOrder 914 0 0 1 432.5760 860 6 rep 8 0 0 1 4.4837 8 7 iblk 80 0 0 1 40.1164 80 8 tree 0 0 1.0000 7.473 14.00 4.018 9 row 0 0 1.0000 28.52 56.00 16.09 10 col 0 0 1.0000 10.50 20.00 5.760 Warning: Fewer levels found in prop than specified 11 prop 2 0 0 1 1.0000 1 12 culture 2 0 0 1 1.4945 2 13 treat 2 0 0 1 1.4945 2 Warning: Fewer levels found in measure than specified 14 measure 2 0 0 1 1.0000 1 15 SURV 0 6 1.0000 0.9991 1.0000 0.3061E-01 16 DBH6 4 0 0.3000E-01 11.29 18.80 2.400 17 HT6 Variate 0 0 76.20 838.6 1286. 163.6 18 HT8 83 0 91.44 1148. 1576. 170.6 19 CWAC6 3167 0 97.54 301.3 542.5 52.26 20 mu 1 21 culture.rep 16 12 culture : 2 6 rep : 8 Warning: GRM matrix is too SMALL 22 grm1(Clone) 923 23 rep.iblk 640 6 rep : 8 7 iblk : 80 Forming 2508 equations: 19 dense. Initial updates will be shrunk by factor 0.316 Notice: LogL values are reported relative to a base of -30000.000 Notice: 11 singularities detected in design matrix. 1 LogL=-2845.13 S2= 8956.4 6390 df 2 LogL=-2798.45 S2= 8568.1 6390 df 3 LogL=-2758.19 S2= 8131.3 6390 df 4 LogL=-2741.14 S2= 7766.2 6390 df 5 LogL=-2740.55 S2= 7702.9 6390 df 6 LogL=-2740.54 S2= 7700.1 6390 df - - - Results from analysis of HT6 - - - Akaike Information Criterion 65489.09 (assuming 4 parameters). Bayesian Information Criterion 65516.14 Model_Term Gamma Sigma Sigma/SE % C rep.iblk IDV_V 640 0.307847 2370.47 13.00 0 P grm1(Clone) GRM_V 923 0.275811 2123.79 5.82 0 P Clone IDV_V 926 0.152452 1173.90 6.08 0 P Residual SCA_V 6399 1.000000 7700.14 49.64 0 P Wald F statistics Source of Variation NumDF F-inc 20 mu 1 0.11E+06 12 culture 1 2616.00 21 culture.rep 6 30.44 23 rep.iblk 640 effects fitted 22 grm1(Clone) 923 effects fitted 4 Clone 926 effects fitted ( 66 are zero) 78 possible outliers: see .res file Finished: 16 Sep 2014 14:12:50.574 LogL Converged
    Notes:
    of 926 clones identified, 860 have data and 923 have genomic data.
    The .res file contains additional details about the analysis including a listing of the larger marker effects. All marker effects are reported in the .mef file.
    Particular columns of the .grr data can be included in the model using the grr( Factor,i) model term where and i specifies which (number) regressor variable to include.
     Listing of the larger marker effects
           368  0-12761-01-121     1.40736       0.00000
           617  0-14383-01-111     1.26081       0.00000
           777  0-15417-01-138    -1.25597       0.00000
          1246  0-18644-02-210     1.22522       0.00000
          1903  0-6963-01-202     -1.24800       0.00000
          2102  0-8683-02-432      1.15496       0.00000
          2445  2-1563-02-244     -1.35181       0.00000
          2497  2-2167-01-413     -1.21339       0.00000
          3180  2-8668-03-42      -1.21629       0.00000
          3521  CL1577Contig1-03  -1.15833       0.00000
          3802  CL2573Contig1-03   1.17005       0.00000
          4195  CL595Contig1-01-  -1.19330       0.00000
          4351  UMN-1397-01-416   -1.34916       0.00000
    

    Return to index