My Basis Function
!MBF
!MBF mbf(v,n) f [ !SKIP k ]
[ !FACTOR ]
[ !FIELD s ]
[ !KEY k ]
[ !NOKEY ]
[ !RFIELD r ]
[ !RENAME t ]
[ !SPARSE ]
specified on a separate line after the datafile line predefines the model term
mbf(v,n)
as a set of n covariates
indexed by the data values in variable v.
MBF stands for My Basis Function and uses the same mechanism as
the
leg(), pol()
and
spl()
model functions but with covariates supplied by the user.
It is used for
reading in specialized design matrices indexed by a factor
in the data including genetic marker
covariables.
In its basic form, the file f should contain 1+n fields where the first (KEY) field
contains the values which are in the data variable or at which prediction
is required, and the remaining n fields define the
corresponding covariate values.
This file may be a 32bit real binary format indicated by file extension .bin
or 64bit double precision binary values indicated by file extension .dbl. Files
with these formats can be easily created in a preliminary run using the !SAVE qualfier.
The advantage of using a binary file is that reading the file is much quicker. This is
important if the file has many fields and is being accessed repeatedly, for example
!CYCLE 1:1000
!MBF mbf(Geno) markers.dbl !key 1 !RFIELD $I !rename M$I
... !r M$I
If n is omitted, all fields after the
key
field, are taken unless
!FACTOR is specified for which n is 1 and the covariate values
are treated as coding for a multilevel factor.
!SKIP k
is an optional qualifier
which requests the first k lines of the file be ignored.
!RENAME t changes the name of the the term from
mbf(...)
to the new name t. For example
!MBF mbf(entry) mlib/m35.csv !rename Marker35
This is necessary when several mbf(...) terms are
being defined which would otherwise have the same name/label.
If the
key
values
are the ordered sequence 1:N, the
key
field may be omitted
if !NOKEY is specified. If the
key
is not in
the first field, its location can be specified with !KEY k.
If extracting a single covariate from a large set of covariates
in the file, the specific field to extract can be given by !FIELD s
in absolute terms, or relative to the
key
field by !RFIELD r.
For example
!MBF mbf(variety,1) markers.csv !key 1 !RFIELD 35 !rename Marker35
!SPARSE
can be used when the covariates are predominately zero.
Each
key
value is followed by
as many
column,value
pairs as required to specifiy the
non zero elements of the design for that value of
key.
The pairs should be arranged in increasing order of
column
within rows. The rows may be continued on subsequent lines of the file
provided incomplete lines end with a COMMA.
Restrictions:
The
key
field MUST be numeric. In particular, if the data field it
relates to is either an !A
or !I encoded factor,
the original (uncoded) level labels may not specified in the MBF file.
Rather the coded levels must be specified. The
MBF file is processed before the data file is read in and so the
mapping to coded levels has not been defined in ASReml when the MBF file is
processed, although the user can/must anticipate what it will be.
Comment:
If this MBF process is to be used repeatedly, it will generally
be much faster processing in ASReml if the markers were written
to separate files. ASReml will read 10 files containing a single field
much faster than reading a single file containing 400 fields, ten times
to extract 10 different markers.
When
missing values occur in the design ASReml will report this fact and abort the
job unless
!MVINCLUDE
is specified;
then missing values are treated as zeros.
Use the !DV transformation to drop the records with the missing values.
Back to general qualifiers
Return to index