Pedigree

Introduction

In an `animal model' or `sire model' genetic analysis we have data on a set of animals that are genetically linked via a pedigree. The genetic effects are therefore correlated and, assuming normal modes of inheritance, the correlation expected from additive genetic effects can be derived from the pedigree provided all the genetic links are in the pedigree. The additive genetic relationship matrix (sometimes called the numerator relationship matrix) can be calculated from the pedigree. It is actually the inverse relationship matrix that is formed by ASReml for analysis. Users new to this subject might find notes by Julius van der Werf helpful: Mixed Models for Genetic analysis.pdf. For the more general situation where the pedigree based inverse relationship matrix is not the appropriate/required matrix, the user can provide a particular general relationship matrix (GRM) or its incverse (GIV) explicitly in a .grm or .giv file ( General Relationship Matrix ). In this chapter we consider data presented in Harvey (1977) using the command file harvey.as
 Pedigree file example
  animal     !P
  sire       !A
  dam
  lines       2
  damage
  adailygain
 harvey.ped !ALPHA
 harvey.dat
 adailygain ~ mu lines,  !r animal 0.25

Pedigree factor type

In ASReml the !P data field qualifier indicates that the corresponding data field has an associated pedigree. The file containing the pedigree ( harvey.ped in the example) for animal is specified after all field definitions and before the datafile definition. See below for the first 20 lines of harvey.ped together with the corresponding lines of the data file harvey.dat. All individuals appearing in the data file must appear in the pedigree file. When all the pedigree information ( Individual, MaleParent, FemaleParent ) appears as the first three fields of the data file, the data file can double as the pedigree file. In this example the line harvey.ped !ALPHA could be replaced with harvey.dat !ALPHA. Typically additional individuals providing additional genetic links are present in the pedigree file.

The pedigree file

The pedigree file is used to define the genetic relationships for fitting a genetic animal model and is required if the !P qualifier is associated with a data field. The pedigree file
  • has three fields; the identities of an individual, its sire and its dam (or maternal grand sire if the !MGS qualifier, is specified), in that order,
  • use identity 0 or * for unknown parents.
  • an optional fourth field may supply inbreeding/selfing information used if the !FGEN qualifier is specified,
  • a fourth field specifying the SEX of the individual is required if the !XLINK qualifier is specified,
  • is sorted so that the line giving the pedigree of an individual appears before any line where that individual appears as a parent,
  • is read free format; it may be the same file as the data file if the data file is free format and has the necessary identities in the first three fields, see below,
  • is specified on the line immediately preceding the data file line in the command file,
     harvey.ped              harvey.dat
     101 Sire1 0             101 Sire1 0 1 3 192 390 2241
     102 Sire1 0		 102 Sire1 0 1 3 154 403 2651
     103 Sire1 0		 103 Sire1 0 1 4 185 432 2411
     104 Sire1 0		 104 Sire1 0 1 4 183 457 2251
     105 Sire1 0		 105 Sire1 0 1 5 186 483 2581
     106 Sire1 0		 106 Sire1 0 1 5 177 469 2671
     107 Sire1 0		 107 Sire1 0 1 5 177 428 2711
     108 Sire1 0		 108 Sire1 0 1 5 163 439 2471
     109 Sire2 0		 109 Sire2 0 1 4 188 439 2292
     110 Sire2 0		 110 Sire2 0 1 4 178 407 2262
     111 Sire2 0		 111 Sire2 0 1 5 198 498 1972
     112 Sire2 0		 112 Sire2 0 1 5 193 459 2142
     113 Sire2 0		 113 Sire2 0 1 5 186 459 2442
     114 Sire2 0		 114 Sire2 0 1 5 175 375 2522
     115 Sire2 0		 115 Sire2 0 1 5 171 382 1722
     116 Sire2 0		 116 Sire2 0 1 5 168 417 2752
     117 Sire3 0		 117 Sire3 0 1 3 154 389 2383
     118 Sire3 0		 118 Sire3 0 1 4 184 414 2463
     119 Sire3 0		 119 Sire3 0 1 5 174 483 2293
     120 Sire3 0		 120 Sire3 0 1 5 170 430 2303
    

    Reading in the pedigree file

    The syntax for specifying a pedigree file in the ASReml command file is
    pedigree-file [qualifiers]
  • the qualifiers are listed below,
  • the identities ( individual, MaleParent, FemaleParent ) are merged into a single list and the inverse relationship is formed before the data file is read,
  • when the data file is read, data fields with the !P qualifier are recoded according to the combined identity list,
  • the inverse relationship matrix is automatically associated with factors coded from the pedigree file unless some other covariance structure is specified. The inverse relationship matrix is specified with the variance model name NRM (previously AINV ),
  • the inverse relationship matrix is written to file ainverse.bin;
  • if ainverse.bin already exists ASReml assumes it was formed in a previous run and has the correct inverse; ainverse.bin is read, rather than the inverse being reformed (unless !MAKE is specified); this saves time when performing repeated analyses based on a particular pedigree; delete ainverse.bin or specify !MAKE if the pedigree is changed between runs,
  • identities are printed in the .sln file,
  • identities should be whole numbers less than 200,000,000 unless !ALPHA is specified,
  • pedigree lines for parents must precede their progeny,
  • unknown parents should be given the identity number 0,
  • if an individual appearing as a parent does not appear in the first column, it is assumed to have unknown parents, that is, parents with unknown parentage do not need their own line in the file,
  • identities may appear as both male and female parents, for example, in forestry.

    Pedigree file qualifiers

  • !ALPHA indicates that the identities are alphanumeric with up to 120 characters; otherwise by default they are numeric whole numbers <200,000,000.
  • !DIAG causes the pedigree identifiers, the diagonal elements of the Inverse of the Relationship Matrix and the inbreeding coefficients for the individuals (calculated as the diagonal of A-I) to be written to basename.aif.
  • !FGEN [f] indicates the individuals in the pedigree are inbred to some degree. The pedigree file contains a fourth field indicating the level of selfing or the level of inbreeding in a base individual. In the fourth field, 0 indicates a simple cross, 1 indicates selfed once, 2 indicates selfed twice, etc.. A value between 0 and 1 for a base individual is taken as its inbreeding value. If the pedigree has implicit individuals (they appear as parents but not in the first field of the pedigree file), they will be assumed base non-inbred individuals unless their inbreeding level is set with !FGEN f where 0ltflt1 is the inbreeding level of such individuals.
  • !GIV or !SAVE instructs ASReml to write out the A-inverse in the format of .giv files.
  • !Goffset o An alternative to group constraints (see !GROUP below) is to shrink the group effects by adding the constant o (gt0.0) to the diagonal elements of A inverse pertaining to groups. When a constant is added, no adjustment of the degrees of freedom is made for genetic groups. Typically, o is small as it represents genetic variance relative to group variance.

    Use !Goffset -1 to add no offset but to suppress insertion of constraints where empty groups appear. The empty groups are then not counted in the DF adjustment.
  • !GROUPS g includes genetic groups in the pedigree. The first g lines of the pedigree identify genetic groups (with zero in both the sire and dam fields). All other lines must specify one of the genetic groups as sire or dam if the actual parent is unknown.

    You may insert Groups with no members to define constraints on groups, that is to associate groups into supergroups where the supergroup fixed effect is formally fitted separately in the model. A constraint is added to the inverse which causes the preceding set of groups which have members to have effects which sum to zero. The issue is to get the degrees of freedom correct and to get the correct calculation of the Likelihood, especially in bivariate cases where DF associated with groups may differ between traits. The !LAST qualifier is designed to help as without it, reordering may associate singularities in the A matrix with random effects which at the very least is confusing. When the A matrix incorporates fixed effects, the number of DF involved may not be obvious, especially if there is also a sparsely fitted fixed HYS factor. The number of Fixed effects (degrees of freedom) associated with GROUPS is taken as the declared number less twice the number of constraints applied. This assumes all groups are represented in the data, and that degrees of freedom associated with group constraints will be fitted elsewhere in the model.
  • !INBRED generates pedigree for inbred lines. Each cross is assumed to be selfed several times to stabilize as an inbred line as is usual for cereals, before being evaluated or crossed with another line. Since inbreeding is usually associated with strong selection, it is not obvious that a pedigree assumption of covariance of 0.5 between parent and offspring actually holds. Do not use the !INBRED qualifier with the !MGS or !SELF qualifiers.
  • !LONGINTEGER indicates the identifiers are numeric integer with less than 16 digits. The default is integer values with less than 9 digits. The alternative is alphanmeric identifiers with up to 20 character indicated by !ALPHA.
  • !MAKE tells ASReml to make the A-inverse (rather than trying to retrieve it from the ainverse.bin file). !MGS indicates that the third identity is the sire of the dam rather than the dam.
  • !MEUWISSEN The default method for forming A inverse is based on the algorithm of Meuwissen and Luo (1992).
  • !QUASS The original routine for calculating A inverse in ASReml was based on Quass ()
  • !REPEAT tells ASReml to ignore repeat occurrences of lines in the pedigree file. Use of this option will avoid the check that animals occur in chronological order, but chronological order is still required.
  • !SARGOLZAEI invokes an alternative procedure for computing A inverse developed by Sargolzaei etal (2005).
  • !SELF s allows partial selfing when third field is unknown. It indicates that progeny from a cross where the second parent (male_parent) is unknown, is assumed to be from selfing with probability s and from outcrossing with probability (1-s). This is appropriate in some forestry tree breeding studies where seed collected from a tree may have been pollinated by the mother tree or pollinated by some other tree. Do not use the !SELF qualifier with the !INBRED or !MGS qualifiers.
  • !SKIP n you to skip n header lines at the top of the file.
  • !SORT causes ASReml to sort the pedigree into an acceptable order, that is parents before offspring, before forming the A-Inverse. The sorted pedigree is written to a file whose name has .srt appended to its name.
  • A pdf file pedigree.pdf contains details of these options.
  • !XLINK requests the formation of the (inverse) relationship matrix for the X chromosome as described by Fernando and Grossman (1990) for species where the male is XY and the female is XX. This NRM inverse matrix is formed in addition to the usual A inverse and can be accessed as GIV1 or as specified in the output. The pedigree must include a fourth field which codes the SEX of the individual. The actual code used is up to the user and deduced from the first line which is assumed to be a male. Thus, whatever string is found in the fourth field on the first line of the pedigree is taken to mean MALE and any other code found on other records is taken to mean FEMALE.

    Genetic groups

    If all individuals belong to one genetic group, then use 0 as the identity of the parents of base individuals. However, if base individuals belong to various genetic groups this is indicated by the !GROUP qualifier and the pedigree file must begin by identifying these groups. All base individuals should have group identifiers as parents. In this case the identity 0 will only appear on the group identity lines, as in the following example where three sire lines are fitted as genetic groups.
     Genetic group example
      animal  !P
      sire  9 !A
      dam
      lines  2
      damage
      adailygain
     harveyg.ped  !ALPHA !MAKE !GROUP 3
     harvey.dat
     adailygain ~ mu !r animal 02.5 !GU
    

     G1 0 0
     G2 0 0
     G3 0 0
     Sire1 G1 G1
     Sire2 G1 G1
     Sire3 G1 G1
     Sire4 G2 G2
     Sire5 G2 G2
     Sire6 G3 G3
     Sire7 G3 G3
     Sire8 G3 G3
     Sire9 G3 G3
     101 Sire1 G1
     102 Sire1 G1
     103 Sire1 G1
      ...
     163 Sire9 G3
     164 Sire9 G3
     165 Sire9 G3
    
    It is usually appropriate to allocate a genetic group identifier where the parent is unknown.

    Two pedigree terms!

    ASReml only provides directly for a single pedigree factor. For some crops, breeding is based on distinct beeding families, crossed in the last stage to produce hybrid production lines. To handle this, we need two runs of ASReml. The first run saves one NRM file as a GRM file for use in the second run as in the following code.
      !PART 1
      Mline !P
      Fline !A
     ...
     Mline.ped !GIV !DIAG  !ALPHA
       #!GIV generates the file Hybrid1A.giv
       #!DIAG generates Hybrid1.aif which contains the identifier names
    
     !PART 2   #reads in inverse additive relationship matrix generated in !PART 1
      Mline !A !L Hybrid1.aif !LSKIP 1  #associates identifier names with levels of Mline
      Fline !P                          #used in giv file
     ...
     Fline.ped !GIV !DIAG  !ALPHA
     Hybrid1_A.giv                      # Formed in part 1 from Mline.ped
     Hybrid.asd !SKIP 1
     ...
     ...     grm1(Mline) nrm(Fline)}    #using new synonyms and functions
    

    Return to index