Variability in unbalanced data - Volts

In this example we illustrate an analysis of unbalanced data in which the main aim is to determine the sources of variation rather than assess the significance of imposed treatments. The data are taken from Cox and Snell (1981) and involve an experiment to examine the variability in the production of car voltage regulators. Standard production of regulators involves two steps. Regulators are taken from the production line to a setting station and adjusted to operate within a specified voltage range. From the setting station the regulator is then passed to a testing station where it is tested and returned if outside the required range.

The voltage of 64 regulators was set at 10 setting stations ( setstat); between 4 and 8 regulators were set at each station. The regulators were each tested at four testing stations ( teststat). The ASReml input file is presented below.
 Voltage data
  TestStat 4   # 4 testing stations tested each regulator
  SetStat  !A  # 10 setting stations each set 4-8 regulators
  Regulator 8   # regulators numbered within setting stations
  voltage
 voltage.asd !skip 1
 voltage ~ mu !r setstat setstat.regulatr teststat setstat.teststat
The factor Regulator numbers the regulators within each setting station. Thus the term SetStat.Regulator is fitted, not Regulator to model regulator effects, while the other terms examine the effects of the setting and testing stations and possible interaction. The abbreviated output is given below
  LogL= 188.604     S2= 0.67074E-01    255 df
  LogL= 199.530     S2= 0.59303E-01    255 df
  LogL= 203.007     S2= 0.52814E-01    255 df
  LogL= 203.240     S2= 0.51278E-01    255 df
  LogL= 203.242     S2= 0.51141E-01    255 df
  LogL= 203.242     S2= 0.51140E-01    255 df

          - - - Results from analysis of voltage - - -
 Akaike Information Criterion     -396.48 (assuming 5 parameters).
 Bayesian Information Criterion   -378.78

          Approximate stratum variance decomposition
 Stratum     Degrees-Freedom   Variance      Component Coefficients
 TestStat               3.00   0.261510        64.0    -0.0     0.0     1.0
 Set                    8.57   0.512653         0.0    28.3     4.0     1.0
 Reg.Set               54.43   0.174248         0.0     0.0     4.0     1.0
 Residual Variance    189.00   0.511400E-01     0.0     0.0     0.0     1.0

 Model_Term                             Gamma         Sigma   Sigma/SE   % C
 TestStat                IDV_V    4  0.642752E-01  0.328704E-02   0.98   0 P
 Set                     IDV_V   10  0.233416      0.119369E-01   1.35   0 P
 Test.Set                IDV_V   40  0.101193E-06  0.517501E-08   0.00   0 B
 Reg.Set                 IDV_V   80  0.601817      0.307770E-01   3.64   0 P
 Residual                SCA_V  256  1.000000      0.511400E-01   9.72   0 P
   Warning: Code B - fixed at a boundary (!GP)       F - fixed by user
                ? - liable to change from P to B    P - positive definite
                C - Constrained by user (!VCC)      U - unbounded
                S - Singular Information matrix
The convergence criteria has been satisfied after six iterations. A warning message in printed below the summary of the variance components because the variance component for the SetStat.TestStat term has been fixed near the boundary.
The default constraint for variance components ( !GP) is to ensure that the REML estimate remains positive. Under this constraint, if an update for any variance component results in a negative value then ASReml sets that variance component to a small positive value. If this occurs in subsequent iterations the parameter is fixed to a small positive value and the code B replaces P in the C column of the summary table. The default constraint can be overridden using the !GU qualifier, but it is not generally recommended for standard analyses.

The Figure presents the residual plot which indicates two unusual data values. Looking at the .res file we see
 STND RES     37   15.400         -4.93
 STND RES    190   15.400         -3.98
 STND RES    210   15.300         -6.68
 STND RES    211   17.800          8.93
 STND RES    235   16.700          3.90
 The preceding lines report the data record number,
     a data value to help identify the record, and,
     the scaled (by an approximate standard deviation) residual.
These values are successive observations, namely observation 210 and 211, being testing stations 2 and 3 for setting station 9( J), regulator 2. These observations will not be dropped from the following analyses for consistency with other analyses conducted by Cox and Snell (1981) and in the GenStat manual.


Figure 1. Residual plot for the voltage data

The REML Loglikelihood from the model without the setstat.teststat term was 203.242, the same as the REML Loglikelihood for the previous model. Table 1. presents a summary of the REML Loglikelihood for the remaining terms in the model. The summary of the ASReml output for the current model is given below. The column labelled Comp/SE is printed by ASReml to give a guide as to the significance of the variance component for each term in the model. The statistic is simply the REML estimate of the variance component divided by the square root of the diagonal element (for each component) of the inverse of the average information matrix. The diagonal elements of the expected (not the average) information matrix are the asymptotic variances of the REML estimates of the variance parameters. These Comp/SE statistics cannot be used to test the null hypothesis that the variance component is zero, but can be used as a guide. We wonder whether TestStat might not be significant. Formal testing with the Likelihood Ratio Test, running jobs to drop each component in turn after first dropping Test.Set shows it is significant (see Table 1). Table 1. REML LogL for the variance components in the voltage data
REML -twice
terms log-likelihood difference P-value
- SetStat 200.31 5.864 .0077
- SetStat.Regulator 184.15 38.19 .0000
- TestStat 199.71 7.064 .0039
  • Back

    Return to index