Paired Case-Control study - Rice worms

Introduction

This data is concerned with an experiment conducted to investigate the tolerance of rice varieties to attack by the larvae of bloodworms. The data have been kindly provided by Dr. Mark Stevens, Yanco Agricultural Institute. A full description of the experiment is given by Stevens et al. (1999). Bloodworms are a significant pest of rice in the Murray and Murrumbidgee irrigation areas where they can cause poor establishment and substantial yield loss.

The experiment commenced with the transplanting of rice seedlings into trays. Each tray contained 32 seedlings and the trays were paired so that a control tray (no bloodworms) and a treated tray (bloodworms added) were grown in a controlled environment room for the duration of the experiment. At the end of this time rice plants were carefully extracted, the root system washed and root area determined for the tray using an image analysis system described by Stevens et al. (1999). Two pairs of trays, each pair corresponding to a different variety, were included in each run. A new batch of bloodworm larvae was used for each run. A total of 44 varieties was investigated with three replicates of each. Unfortunately the variety concurrence within runs was less than optimal. Eight varieties occurred with only one other variety, 22 with two other varieties and the remaining 14 with three different varieties.

In the next three sections we present an exhaustive analysis of these data using equivalent univariate and multivariate techniques. It is convenient to use two data files one for each approach. The univariate data file consists of factors pair, run, variety, tmt, unit and variate rootwt. The factor unit labels the individual trays, pair labels pairs of trays (to which varieties are allocated) and tmt is the two level bloodworm treatment factor (control/treated). The multivariate data file consists of factors variety and run and variates for root weight of both the control and exposed treatments (labelled yc and ye respectively).

Preliminary analyses indicated variance heterogeneity so that subsequent analyses were conducted on the square root scale. Figure 1 presents a plot of the treated and the control root area (on the square root scale) for each variety. There is a strong dependence between the treated and control root area, which is not surprising. The aim of the experiment was to determine the tolerance of varieties to bloodworms and thence identify the most tolerant varieties. The definition of tolerance should allow for the fact that varieties differ in their inherent seedling vigour (Figure 1). The original approach of the scientist was to regress the treated root area against the control root area and define the index of vigour as the residual from this regression. This approach is clearly inefficient since there is error in both variables. We seek to determine an index of tolerance from the joint analysis of treated and control root area.

Figure 1. Rice bloodworm data: Plot of square root of root weight for treated versus control

Standard analysis

The allocation of bloodworm treatments within varieties and varieties within runs defines a nested block structure of the form

 run/variety/tmt = run + run.variety + run.variety.tmt
               ( = run + pair + pair.tmt )
               ( = run + run.variety + units )

There is an additional blocking term, however, due to the fact that the bloodworms within a run are derived from the same batch of larvae whereas between runs the bloodworms come from different sources. This defines a block structure of the form

 run/tmt/variety = run + run.tmt + run.tmt.variety
               ( = run + run.tmt + pair.tmt )

Combining the two provides the full block structure for the design, namely

   run + run.variety + run.tmt + run.tmt.variety
 = run + run.variety + run.tmt + units
 = run + pair + run.tmt + pair.tmt

In line with the aims of the experiment the treatment structure comprises variety and treatment main effects and treatment by variety interactions. In the traditional approach the terms in the block structure are regarded as random and the treatment terms as fixed. The choice of treatment terms as fixed or random depends largely on the aims of the experiment. The aim of this example is to select the "best" varieties. The definition of best is somewhat more complex since it does not involve the single trait sqrt(rootwt) but rather two traits, namely sqrt(rootwt) in the presence/absence of bloodworms. Thus to minimise selection bias the variety main effects and thence the tmt.variety interactions are taken as random. The main effect of treatment is fitted as fixed to allow for the likely scenario that rather than a single population of treatment by variety effects there are in fact two populations (control and treated) with a different mean for each. There is evidence of this prior to analysis with the large difference in mean sqrt(rootwt) for the two groups (14.93 and 8.23 for control and treated respectively). The inclusion of tmt as a fixed effect ensures that BLUPs of tmt.variety effects are shrunk to the correct mean (treatment means rather than an overall mean).

The model for the data is given by
y = X τ + Z₁u₁+ Z₂u₂+ Z₃u₃ + Z₄u₄+ Z₅u₅+ e Eqn (1)
where y is a vector of length n=264 containing the sqrt(rootwt) values, τ corresponds to a constant term and the fixed treatment contrast and u₁... u₅ correspond to random variety, by variety, run, treatment by run and variety by run effects. The random effects and error are assumed to be independent Gaussian variables with zero means and variance structures Var(u_i)=σ²_iI_bi (where b_i is the length of u_i, i=1 ... 5) and Var(e) = σ²I_n.

The ASReml code for this analysis is

 Bloodworm data Dr M Stevens
  pair 132
  rootwt
  run 66
  tmt 2 !A
  id
  variety 44 !A
 rice.asd !skip 1 !DOPATH 1
 !PATH 1
 sqrt(rootwt) ~ mu tmt !r variety variety.tmt run pair run.tmt

 !PATH 2
 sqrt(rootwt) ~ mu tmt !r variety diag(tmt).variety +
         run pair diag(tmt !GU).run  uni(tmt,2)

 !PATH 3      # Equivalent model to PART 2
 sqrt(rootwt) ~ mu tmt !r  us(tmt).variety +
          pair us(tmt).run  uni(tmt,2)

The two paths in the input file define the two univariate analyses we will conduct. We consider the results from the analysis defined in PATH 1 first. A portion of the output file is

    5 LogL=-345.306     S2=  1.3216        262 df
    6 LogL=-345.267     S2=  1.3155        262 df
    7 LogL=-345.264     S2=  1.3149        262 df
    8 LogL=-345.263     S2=  1.3149        262 df


          - - - Results from analysis of sqrt(rootwt) - - -
 Akaike Information Criterion      702.53 (assuming 6 parameters).
 Bayesian Information Criterion    723.94

          Approximate stratum variance decomposition
 Stratum     Degrees-Freedom   Variance      Component Coefficients
 variety               44.40    26.0156         7.3     3.0     3.6     2.0     1.5     1.0
 run                   45.17    7.41702         0.0     3.5    -0.0     2.0     1.7     1.0
 variety.tmt           39.53    2.99833         0.0     0.0     2.7     0.0     0.2     1.0
 pair                  41.43    3.26838         0.0     0.0     0.0     2.0    -0.0     1.0
 run.tmt               52.38    5.12369         0.0     0.0     0.0     0.0     2.2     1.0
 Residual Variance     39.09    1.31486         0.0     0.0     0.0     0.0     0.0     1.0

 Model_Term                             Gamma         Sigma   Sigma/SE   % C
 variety                 IDV_V   44   1.80947       2.37920       3.01   0 P
 run                     IDV_V   66  0.244243      0.321145       0.59   0 P
 variety.tmt             IDV_V   88  0.374220      0.492047       1.78   0 P
 pair                    IDV_V  132  0.742328      0.976056       2.51   0 P
 run.tmt                 IDV_V  132   1.32973       1.74841       3.65   0 P
 Residual                SCA_V  264  1.000000       1.31486       4.42   0 P

                                    Wald F statistics
      Source of Variation           NumDF     DenDF    F-inc             Prob
    7 mu                                1      53.5  1484.27            <.001
    4 tmt                               1      60.4   469.36            <.001

The estimated variance components from this analysis are given in column (a) of table 1. The variance component for the variety main effects is large. There is evidence of tmt.variety interactions so we may expect some discrimination between varieties in terms of tolerance to bloodworms.

Table 1 Estimated variance components from univariate analyses of bloodworm data. (a) Model with homogeneous variance for all terms and (b) Model with heterogeneous variance for interactions involving tmt

	(a)	---	(b)	---
source		control		treated

variety	2.378		2.334
tmt.variety	0.492	1.505		-0.372
run	0.321		0.319
tmt.run	1.748	1.388		2.223
variety.run (pair)	0.976		0.987
tmt.pair	1.315	1.156		1.359

REML LogL	-345.256		-343.22

Given the large difference (p<<.001) between tmt means we may wish to allow for heterogeneity of variance associated with tmt. Thus we fit a separate variety variance for each level of tmt so that instead of assuming Var(u₂) = σ²₂I₈₈ we assume

Var(u₂) =	\| σ²_2c	0	\|	cross I₄₄
	\| 0	σ²_2t	\|

where σ²_2c and σ²_2t are the tmt.variety interaction variances for control and treated respectively. This model can be achieved using a diagonal variance structure for the treatment part of the interaction. To be consistent across all strata, we also fit a separate run variance for each level of tmt and heterogeneity at the residual level, by including the uni(tmt,2) term. We have chosen level 2 of tmt as we expect more variation for the exposed treatment and thus the extra variance component for this term should be positive. Had we instead specified level 1 then ASReml would have estimated a negative component, the qualifier !GU permits a negative variance component. But we have also specified !GU for the diag(tmt) variance structure to allow it to be negative; otherwise potentially negative estimates would be bound close to zero. The portion of the ASReml output for this analysis is

   9 LogL=-343.289     S2=  1.1540        262 df
  10 LogL=-343.230     S2=  1.1527        262 df

          - - - Results from analysis of sqrt(rootwt) - - -
 Akaike Information Criterion      704.46 (assuming 9 parameters).
 Bayesian Information Criterion    736.57

 Model_Term                             Gamma         Sigma   Sigma/SE   % C
 variety                 IDV_V   44   2.01823       2.32631       3.01   0 P
 run                     IDV_V   66  0.277130      0.319434       0.59  -1 P
 pair                    IDV_V  132  0.853713      0.984032       2.58   0 P
 uni(tmt,2)              IDV_V  264  0.176302      0.203215       0.32  -2 P
 Residual                SCA_V  264  1.000000       1.15265       2.76   0 P
 diag(tmt).variety               88 effects
 tmt                    DIAG_V    1   1.29824       1.49642       2.27   1 U
 tmt                    DIAG_V    2 -0.319767     -0.368580      -0.81   0 U
 diag(tmt).run                  132 effects
 tmt                    DIAG_V    1   1.20168       1.38512       2.16  -2 P
 tmt                    DIAG_V    2   1.92272       2.21622       3.07   1 P

                                   Wald F statistics
     Source of Variation           NumDF     DenDF    F-inc            P-inc
   7 mu                                1      56.5  1279.27            <.001
   4 tmt                               1      60.7   450.77            <.001

The estimated variance components from this analysis are given in column (b) of table 1. There is no significant variance heterogeneity at the residual or tmt.run level. This indicates that the square root transformation of the data has successfully stabilised the error variance. There is, however, significant variance heterogeneity for tmt.variety interactions with the variance being much greater for the control group. This reflects the fact that in the absence of bloodworms the potential maximum root area is greater. Note that the tmt.variety interaction variance for the treated group would be negative if it were not bounded at zero. The negative component is meaningful (and in fact necessary and obtained by use of the !GU option) in this context since it should be considered as part of the variance structure for the combined variety main effects and treatment by variety interactions. That is,

Var(1₂ cross u₁+ u₂) =	\|	σ²₁+ σ²_2c	σ²₁	\|	cross I₄₄ Eqn (2)
	\|	σ²₁	σ²₁+ σ²_2t	\|

The problem is avoided by using the us() structure as in part 3. The output then is

   4 LogL=-343.228     S2=  1.1563        262 df

          - - - Results from analysis of sqrt(rootwt) - - -
 Akaike Information Criterion      704.46 (assuming 9 parameters).
 Bayesian Information Criterion    736.57

 Model_Term                             Gamma         Sigma   Sigma/SE   % C
 pair                    IDV_V  132  0.854032      0.987517       2.59   0 P
 tmt,2                   IDV_V  264  0.176220      0.203763       0.32   0 P
 Residual                SCA_V  264  1.000000       1.15630       2.77   0 P
 us(tmt).variety                 88 effects
 tmt                     US_V  1  1   3.32087       3.83992       3.47   0 P
 tmt                     US_C  2  1   2.01936       2.33498       3.01   0 P
 tmt                     US_V  2  2   1.69738       1.96268       2.69   0 P
 us(tmt).run                    132 effects
 tmt                     US_V  1  1   1.47689       1.70773       2.62   0 P
 tmt                     US_C  2  1  0.275815      0.318924       0.59   0 P
 tmt                     US_V  2  2   2.20041       2.54433       3.20   0 P
 Covariance/Variance/Correlation Matrix US us(tmt).variety
   3.840      0.8505
   2.335       1.963
 Covariance/Variance/Correlation Matrix US us(tmt).run
   1.708      0.1530
  0.3189       2.544

                                   Wald F statistics
     Source of Variation           NumDF     DenDF    F-inc            P-inc
   7 mu                                1      56.4  1276.50            <.001
   4 tmt                               1      60.6   448.82            <.001

Using the estimates from table 1 this structure is estimated as

\|	3.84 2.33	\|	cross I₄₄
\|	2.33 1.96	\|

Thus the variance of the variety effects in the control group (also known as the genetic variance for this group) is 3.84. The genetic variance for the treated group is much lower (1.96). The genetic correlation is 2.33/√(3.84*1.96) = 0.85 which is strong, supporting earlier indications of the dependence between the treated and control root area (Figure 1).

A multivariate approach

In this simple case in which the variance heterogeneity is associated with the two level factor tmt, the analysis is equivalent to a bivariate analysis in which the two traits correspond to the two levels of tmt, namely sqrt(rootwt) for control and treated. The model for each trait is given by

y_j= Xτ_j+ Z_vu_vj + Z_ru_rj + e_j; (j = c,t) Eqn (3)

where y_j is a vector of length n=132 containing the sqrtroot values for variate j (j=c for control and j=t for treated), τ_j corresponds to a constant term and u_vj and u_rj correspond to random variety and run effects. The design matrices are the same for both traits. The random effects and error are assumed to be independent Gaussian variables with zero means and variance structures Var(u_vj)=σ_vj²I₄₄, Var(u_rj)=σ_rj²I₆₆ and Var(e_j) = σ²_jI₁₃₂. The bivariate model can be written as a direct extension of (3), namely

y = (I₂ cross X )τ + (I₂ cross Z_v)u_v+ (I₂ cross Z_r)u_r+ e^* Eqn (4)

where y = (y_c',y_t')', u_v= (u_vc',u_vt' )', u_r= (u_rc',u_rt' )' and e^* = (e_c,e_t )'.

There is an equivalence between the effects in this bivariate model and the univariate model of (1). The variety effects for each trait (u_v in the bivariate model) are partitioned in () into variety main effects and tmt.variety interactions so that u_v= 1₂ ⊗ u₁+ u₂. There is a similar partitioning for the run effects and the errors (see table 2).

Table 2. Equivalence of random effects in bivariate and univariate analyses

	bivariate	univariate
effects	(model 4)	(model 1)
trait.variety	`u_v`	`1₂ ⊗ u₁ + u₂`
trait.run	`u_r`	`1₂ ⊗ u₃ + u₄`
trait.pair	`ue^*`	`1₂ ⊗ u₅ + e`

In addition to the assumptions in the models for individual traits (3) the bivariate analysis involves the assumptions Cov(u_vc,u_vt') = σ_{v_ct}I₄₄, Cov(u_rc,u_rt') = σ_{r_ct}I₆₆ and Cov(e_ce_t) = σ_ctI₁₃₂.

Thus random effects and errors are correlated between traits. So, for example, the variance matrix for the variety effects for each trait is given by

Var(u_v =	\|	σ_vc²	σ_vct	\|	⊗ I₄₄
	\|	σ_vct	σ_vt²	\|

This unstructured form for trait.variety in the bivariate analysis is equivalent to the variety main effect plus heterogeneous tmt.variety interaction variance structure (2) in the univariate analysis. Similarly the unstructured form for trait.run is equivalent to the run main effect plus heterogeneous tmt.run interaction variance structure. The unstructured form for the errors ( trait.pair) in the bivariate analysis is equivalent to the pair plus heterogeneous error ( tmt.pair) variance in the univariate analysis. This bivariate analysis is achieved in ASReml as follows, noting that the tmt factor here is equivalent to traits.

 this is for the paired data
  id
  pair 132
  run 66
  variety 44 !A
  yc ye
 ricem.asd !skip 1 !X syc !Y sye
 sqrt(yc) sqrt(ye) ~ Trait !r us(Trait).variety us(Trait).run
 residual units.us(Trait)
 predict variety

A portion of the output from this analysis is

    7 LogL=-343.220     S2=  1.0000        262 df
    8 LogL=-343.220     S2=  1.0000        262 df


          - - - Results from analysis of sqrt(yc) sqrt(ye) - - -
 Akaike Information Criterion      704.44 (assuming 9 parameters).
 Bayesian Information Criterion    736.56

 Model_Term                             Sigma         Sigma   Sigma/SE   % C
 Residual                       264 effects
 Residual                US_V  1  1   2.14375       2.14375       4.44   0 P
 Residual                US_C  2  1  0.987450      0.987450       2.59   0 P
 Residual                US_V  2  2   2.34759       2.34759       4.62   0 P
 us(Trait).variety               88 effects
 Trait                   US_V  1  1   3.83990       3.83990       3.47   0 P
 Trait                   US_C  2  1   2.33429       2.33429       3.01   0 P
 Trait                   US_V  2  2   1.96206       1.96206       2.69   0 P
 us(Trait).run                  132 effects
 Trait                   US_V  1  1   1.70773       1.70773       2.62   0 P
 Trait                   US_C  2  1  0.318907      0.318907       0.59   0 P
 Trait                   US_V  2  2   2.54292       2.54292       3.20   0 P
 Covariance/Variance/Correlation Matrix US Residual
   2.144      0.4402
  0.9875       2.348
 Covariance/Variance/Correlation Matrix US us(Trait).variety
   3.840      0.8504
   2.334       1.962
 Covariance/Variance/Correlation Matrix US us(Trait).run
   1.708      0.1530
  0.3189       2.543

Table 3. Estimated variance parameters from bivariate analysis of bloodworm data

	control	treated
source	variance	variance	covariance
us(trait).variety	3.84	1.96	2.33
us(trait).run	1.71	2.54	0.32
us(trait).pair	2.14	2.35	0.99

The resultant REML LogL is identical to that of the heterogeneous univariate analysis (column (b) of table 2). The estimated variance parameters are given in Table 3. The predicted variety means in the .pvs file are used in the following section on interpretation of results. A portion of the file is presented below. There is a wide range in SED reflecting the imbalance of the variety concurrence within runs.

           Assuming Power transformation was (Y+   0.000)^ 0.500
  run            is ignored in the prediction (except where specifically included

  Trait      variety       Power_value  Stand_Error Ecode Retransformed approx_SE
  sqrt(yc)   AliCombo        14.9532         0.9181 E       223.5982     27.4571
  sqrt(ye)   AliCombo         7.9941         0.7993 E        63.9054     12.7790
  sqrt(yc)   Bluebelle       13.1033         0.9310 E       171.6969     24.3980
  sqrt(ye)   Bluebelle        6.6299         0.8062 E        43.9559     10.6901
  sqrt(yc)   C22             16.6679         0.9181 E       277.8192     30.6057
  sqrt(ye)   C22              8.9543         0.7993 E        80.1798     14.3140
   .              .               .             .   .            .           .
  sqrt(yc)   YRK1            15.1859         0.9549 E       230.6103     29.0012
  sqrt(ye)   YRK1             8.3356         0.8190 E        69.4817     13.6534
  sqrt(yc)   YRK3            13.3057         0.9549 E       177.0428     25.4106
  sqrt(ye)   YRK3             8.1133         0.8190 E        65.8264     13.2894
  SED: Overall Standard Error of Difference   1.215

Figure 2. BLUPs for treated for each variety plotted against BLUPs for control

Interpretation of results

Recall that the researcher is interested in varietal tolerance to bloodworms. This could be defined in various ways. One option is to consider the regression implicit in the variance structure for the trait by variety effects. The variance structure can arise from a regression of treated variety effects on control effects, namely u_vt = β u_vc + ε where the slope β = σ_vct / σ_vc². Tolerance can be defined in terms of the deviations from regression, ε. Varieties with large positive deviations have greatest tolerance to bloodworms. Note that this is similar to the researcher's original intentions except that the regression has been conducted at the genotypic rather than the phenotypic level. In Figure 2 the BLUPs for treated have been plotted against the BLUPs for control for each variety and the fitted regression line (slope = 0.61) has been drawn. Varieties with large positive deviations from the regression line include YRK3, Calrose, HR19 and WC1403.

Figure 3. Estimated deviations from regression of treated on control for each variety plotted against estimate for control

An alternative definition of tolerance is the simple difference between treated and control BLUPs for each variety, namely δ = u_vc - u_vt. Unless β=1 the two measures ε and δ have very different interpretations. The key difference is that ε is a measure which is independent of inherent vigour whereas δ is not. To see this consider Cov(ε){u_vc'} = Cov(u_vt-βu_vc){u_vc'} = ( σ_vct - [σ_vct/σ_vc²] σ_vc² ) I₄₄ = 0 whereas Cov(δ){u_vc'} = Cov(u_vc-u_vt){u_vc'} = (σ_vc² - σ_{v_ct} ) I₄₄

Figure 4. Estimated difference between control and treated for each variety plotted against estimate for control

The independence of ε and u_vc and dependence between δ and u_vc is clearly illustrated in Figures 3 and 4 In this example the two measures have provided very different rankings of the varieties. The choice of tolerance measure depends on the aim of the experiment. In this experiment the aim was to identify tolerance which is independent of inherent vigour so the deviations from regression measure is preferred.

More examples

Return to index