Variability in unbalanced data - Volts
In this example we illustrate an analysis of
unbalanced data in which the main aim is to determine the sources of
variation rather than assess the significance of imposed
treatments. The data are taken from Cox and Snell (1981) and involve
an experiment to examine the variability in the production of car
voltage regulators. Standard production of regulators involves two
steps. Regulators are taken from the production line to a
setting station and adjusted to operate within a
specified voltage range. From the setting station the regulator is
then passed to a testing station where it is tested and returned if
outside the required range.
The voltage of 64 regulators was set at
10 setting stations ( setstat); between 4 and 8 regulators
were set at each station.
The regulators were each tested at four testing stations ( teststat). The ASReml input file
is presented below.
Voltage data
TestStat 4 # 4 testing stations tested each regulator
SetStat !A # 10 setting stations each set 4-8 regulators
Regulator 8 # regulators numbered within setting stations
voltage
voltage.asd !skip 1
voltage ~ mu !r setstat setstat.regulatr teststat setstat.teststat
The factor Regulator numbers the regulators within each setting
station. Thus the term SetStat.Regulator is fitted, not
Regulator to model regulator effects, while the other terms examine the effects
of the setting and testing stations and possible interaction. The
abbreviated output is given below
LogL= 188.604 S2= 0.67074E-01 255 df
LogL= 199.530 S2= 0.59303E-01 255 df
LogL= 203.007 S2= 0.52814E-01 255 df
LogL= 203.240 S2= 0.51278E-01 255 df
LogL= 203.242 S2= 0.51141E-01 255 df
LogL= 203.242 S2= 0.51140E-01 255 df
- - - Results from analysis of voltage - - -
Akaike Information Criterion -396.48 (assuming 5 parameters).
Bayesian Information Criterion -378.78
Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
TestStat 3.00 0.261510 64.0 -0.0 0.0 1.0
Set 8.57 0.512653 0.0 28.3 4.0 1.0
Reg.Set 54.43 0.174248 0.0 0.0 4.0 1.0
Residual Variance 189.00 0.511400E-01 0.0 0.0 0.0 1.0
Model_Term Gamma Sigma Sigma/SE % C
TestStat IDV_V 4 0.642752E-01 0.328704E-02 0.98 0 P
Set IDV_V 10 0.233416 0.119369E-01 1.35 0 P
Test.Set IDV_V 40 0.101193E-06 0.517501E-08 0.00 0 B
Reg.Set IDV_V 80 0.601817 0.307770E-01 3.64 0 P
Residual SCA_V 256 1.000000 0.511400E-01 9.72 0 P
Warning: Code B - fixed at a boundary (!GP) F - fixed by user
? - liable to change from P to B P - positive definite
C - Constrained by user (!VCC) U - unbounded
S - Singular Information matrix
The convergence criteria has been satisfied after six
iterations. A warning message in printed below the summary of the
variance components because the variance component for
the SetStat.TestStat term has been fixed near the boundary.
The default
constraint for variance components ( !GP) is to ensure that the REML estimate
remains positive. Under this constraint, if an update for any variance component results in a
negative value then ASReml sets that variance component to a small
positive value. If this occurs in subsequent iterations the parameter
is fixed to a small positive value and the code B
replaces P in the C column of the summary table.
The default constraint can be overridden using the !GU
qualifier, but it is not generally recommended for standard
analyses.
The Figure presents the residual plot which indicates two
unusual data values. Looking at the .res file we see
STND RES 37 15.400 -4.93
STND RES 190 15.400 -3.98
STND RES 210 15.300 -6.68
STND RES 211 17.800 8.93
STND RES 235 16.700 3.90
The preceding lines report the data record number,
a data value to help identify the record, and,
the scaled (by an approximate standard deviation) residual.
These values are successive observations, namely
observation 210 and 211, being testing stations 2 and 3 for setting
station 9( J), regulator 2. These observations will not be dropped
from the following analyses for consistency with other analyses
conducted by Cox and Snell (1981) and in the GenStat manual.
Figure 1. Residual plot for the voltage data
The REML Loglikelihood from the model without the setstat.teststat term was
203.242, the same as the REML Loglikelihood for the previous model.
Table 1. presents a summary of the REML Loglikelihood for the
remaining terms in the model. The summary of the ASReml output for
the current model is given below. The column labelled Comp/SE is
printed by ASReml to give a guide as to the significance of the
variance component for each term in the model. The statistic is simply
the REML estimate of the variance component divided by the square root
of the diagonal element (for each component) of the inverse of the
average information matrix. The diagonal elements of the expected (not
the average)
information matrix are the asymptotic variances of the REML estimates
of the variance parameters. These Comp/SE statistics cannot be used
to test the null hypothesis that the variance component is zero, but can be used as a guide.
We wonder whether TestStat
might not be significant. Formal testing with the Likelihood Ratio Test,
running jobs to drop each component in turn after first dropping
Test.Set
shows it is significant (see Table 1).
Table 1. REML LogL for the variance components in the voltage data
|
| REML | -twice |
|
terms | log-likelihood | difference | P-value
|
|
- SetStat | 200.31 | 5.864 | .0077
|
- SetStat.Regulator | 184.15 | 38.19 | .0000
|
- TestStat | 199.71 | 7.064 | .0039
|
Back
Return to index