My Basis Function

!MBF

!MBF mbf(v,n) f [ !SKIP k ] [ !FACTOR ] [ !FIELD s ] [ !KEY k ] [ !NOKEY ] [ !RFIELD r ] [ !RENAME t ] [ !SPARSE ]
specified on a separate line after the datafile line predefines the model term mbf(v,n) as a set of n covariates indexed by the data values in variable v. MBF stands for My Basis Function and uses the same mechanism as the leg(), pol() and spl() model functions but with covariates supplied by the user. It is used for reading in specialized design matrices indexed by a factor in the data including genetic marker covariables.

In its basic form, the file f should contain 1+n fields where the first (KEY) field contains the values which are in the data variable or at which prediction is required, and the remaining n fields define the corresponding covariate values.

This file may be a 32bit real binary format indicated by file extension .bin or 64bit double precision binary values indicated by file extension .dbl. Files with these formats can be easily created in a preliminary run using the !SAVE qualfier. The advantage of using a binary file is that reading the file is much quicker. This is important if the file has many fields and is being accessed repeatedly, for example

 !CYCLE 1:1000
 !MBF mbf(Geno) markers.dbl !key 1 !RFIELD $I !rename M$I
 ... !r M$I

If n is omitted, all fields after the key field, are taken unless !FACTOR is specified for which n is 1 and the covariate values are treated as coding for a multilevel factor.

!SKIP k is an optional qualifier which requests the first k lines of the file be ignored.

!RENAME t changes the name of the the term from mbf(...) to the new name t. For example

  !MBF  mbf(entry)  mlib/m35.csv !rename Marker35

This is necessary when several mbf(...) terms are being defined which would otherwise have the same name/label.

If the key values are the ordered sequence 1:N, the key field may be omitted if !NOKEY is specified. If the key is not in the first field, its location can be specified with !KEY k.

If extracting a single covariate from a large set of covariates in the file, the specific field to extract can be given by !FIELD s in absolute terms, or relative to the key field by !RFIELD r. For example

 !MBF mbf(variety,1) markers.csv !key 1 !RFIELD 35 !rename Marker35

!SPARSE can be used when the covariates are predominately zero. Each key value is followed by as many column,value pairs as required to specifiy the non zero elements of the design for that value of key. The pairs should be arranged in increasing order of column within rows. The rows may be continued on subsequent lines of the file provided incomplete lines end with a COMMA.

Restrictions:

The key field MUST be numeric. In particular, if the data field it relates to is either an !A or !I encoded factor, the original (uncoded) level labels may not specified in the MBF file. Rather the coded levels must be specified. The MBF file is processed before the data file is read in and so the mapping to coded levels has not been defined in ASReml when the MBF file is processed, although the user can/must anticipate what it will be.

Comment:

If this MBF process is to be used repeatedly, it will generally be much faster processing in ASReml if the markers were written to separate files. ASReml will read 10 files containing a single field much faster than reading a single file containing 400 fields, ten times to extract 10 different markers.

When missing values occur in the design ASReml will report this fact and abort the job unless !MVINCLUDE is specified; then missing values are treated as zeros. Use the !DV transformation to drop the records with the missing values.

Back to general qualifiers

Return to index