You are on page 1of 14

______________________________________________________________

GE Research & Development Center

Random and Fixed Factor ANOVA Models: Gauge R&R Studies

T.A. Early and R. Neagu

99CRD094, July 1999

Class 1

Technical Information Series

Copyright 1999 General Electric Company. All rights reserved.

Corporate Research and Development Technical Report Abstract Page

Title Author(s)

Random and Fixed Factor ANOVA Models: Gauge R&R Studies T.A. Early R. Neagu* Phone (518)387-6590 8*833-6590

Component Report Number Number of Pages Key Words

Characterization and Environmental Technology Laboratory

99CRD094

Date

July 1999

10

Class

ANOVA, Gauge R&R, factor, fixed, random, effects

The effects of random and fixed factors on gauge repeatability and reproducibility analysis are discussed. The context of the gauge study along with the desired calculated outcome guide the gauge practitioner to the proper ANOVA model to be applied to the result of a gauge study. This paper discusses the practical aspects of gauge study factors and their impact on the interpretation of the experimental results. Sample calculations are given.

Manuscript received June 29, 1999

*Information Technology Laboratory

Random and Fixed Factor ANOVA Models: Gauge R&R Studies


T.A. Early and R. Neagu The performance of the measurement system is an essential ingredient in any program designed to improve the quality of a product or process. At a minimum, the measurement system must be able to detect random fluctuations in the process or product to be improved 1. If this performance level is not available from the measurement system, the improvement program simply cannot proceed. Instead, program activity must focus on improving the measurement system. A gage repeatability and reproducibility (R&R) study is a design of experiments (DOE) with the special purpose of understanding the source(s) of variability in a measurement system2 which can be attributed to human, instrument or random effects. In a typical study involving a non-destructive test, several parts are measured by several operators several times. The variation observed when a single operator measures the same part consistently is called repeatability or pure error. The variation observed when one operator duplicates the measurement of another operator on the same part is called reproducibility. The parts themselves also contribution to the overall variation of the data obtained in a gage R&R study. Certainly, this can be a large source of variation if the parts used in the gage R&R study are very different. This variation due to different parts is not of primary interest however. Rather, the intended purpose of the study is to determine how similar parts can be differentiated by the measurement system. Also of importance in a gage R&R study, is the possible interaction of the parts and operators. In other words, do different operators measure different parts differently? For example, does one operator consistently get results below a second operator when measuring small part while consistently getting results above the second operator when measuring large parts? A two-factor ANOVA model for a gage R&R study DOE is given by the equation :
Yijk = + i + j + ( )ij + ijk .
3

[1]

This model states that any measurement, Yijk, is made up of several terms. First is the grand mean of all possible observations, . The second two terms are the main effect from the ith operator, i, and the effect from the jth part, j. The third term is the effect from the part-operator interaction, ( )ij. The final term is a contribution of operator repeatability on his kth measurement, or pure error, ijk. For this paper, we will assume a

Early, T. A., Effective Management of Six Sigma Projects: A Balanced Approach, November 1997, GE CR&D Technical Information Series, 97CRD153. 2 Montgomery, D. C. and Runger, G. C. 1993-4. Gauge Capability and Designed Experiments. Part I: Basic Methods, Quality Engineering 6(1), pp.115135. 3 Neter, J., Kutner, M.H., Nachtsheim, C. J. and Wasserman, W., Applied Linear Statistical Models, 4th edition, IRWIN, 1996.

balanced design, i.e., each one of the a operators measures each one of the b parts the same n number of times. For this ANOVA model, formulas for computing sum of squares can be established (see Appendix A). A mean sum of squares can then be calculated by dividing the sum of squares for each contribution by the degrees of freedom for that contribution. These mean square values can now be used to test the different properties of the terms used in the model. The null hypothesis, H0, states that none of the factors, parts, operators, or their interaction, has any effect on the individual measurements. That is to say, variation in the data cannot be attributed to any factor or interaction of factors. It is important to note that the correct definition of a specific null hypothesis depends on the context of the gage R&R experiment. Of particular importance is the nature and source of the levels of the factors selected for the DOE gage study as well as the kind of inference to be made from the results of the experiment4. In one scenario, parts and operators chosen for a gage R&R study might represent a small sample taken at random from a larger pool of parts and operators. In this scenario, the study is generally performed to infer the behavior of the larger pool of parts and operators based on the random sample taken from those collections. This is called a random factor levels model and usually tests of significance to decide whether the variance of the model terms is significantly different from zero are performed. For example, the null hypothesis for testing whether an operator-part interaction exists is:
2 H 0 : = 0.

In this scenario, the test statistic for the interaction term is calculated by dividing the mean square value for the interaction by the pure error. If this value is larger than tabular F-statistic, then the null hypothesis can be rejected. The alternative hypothesis, , is assumed. If the test statistic is less that the tabular F-statistic, then the null hypothesis cannot be rejected. That is, compared to the pure noise in the measurement system, there is no supportable evidence that there an operator-part interaction. In this case, the model can be recast without the interaction term. This simpler model can also be treated by the methods described in this paper and that exercise is left to the reader. For the model described in Equation 1, the test statistic for the main effects terms (part and operator) are calculated by dividing the mean square value for the main effect by the mean square of the interaction5. The test statistic is generated this way because of the underlying assumption regarding the random choice of factor levels and implies certain characteristics about the expected mean squares of these models (vida infra). The random factor level model can be contrasted with another scenario in which parts and operators are chosen for the gage R&R for a specific reason. Perhaps all the operators of a measurement are selected for the DOE, or specific operators are chosen because experimenter has particular interest in comparing the performance of the chosen operators.
4

2 H a : 0

Feder, P. I., Some Differences between Fixed, Mixed and Random Effects Analysis of Variance Models, Journal of Quality Technology, April 1974, pp 98-106. 5 For an alternate way of computing F statistics, see Wolach, A. H., McHale, M.A. (1987), F ratios and quasi F ratios for fixed, mixed and random model ANOVAs, Behavior Research Methods, Instruments, & Computers 19, no. 4, pp 409-412 and Burdick R. K., Using confidence intervals to test variance components, Journal of Quality Technology, January 1994, pp 30-38.

Specifically in this fixed factor level model, inference about a larger population is not the goal of the study and in fact may have no meaning. Rather, specific comparisons of different factor levels are the point of the study. While in this case, Equation [1] still defines the experimental model, however, the meaning of the terms in the model is different: they do not represent random variables anymore, they are just constants subject to some restrictions. To test whether an operator-part interaction exists, the null hypothesis is:
H 0 : all ( )ij = 0 .

This hypothesis is fundamentally different than that formulated previously. The test statistic would be calculated in the same way as described in the first scenario. Based on the different inferential model in this scenario however, test statistics for the presence of factor main effects are calculated differently by dividing the mean squares for each of the main effects terms by the pure error. As in the first scenario, the underlying assumption regarding this fixed factor level model implies certain characteristics about the expected mean squares of the model. Fixed factor level models are called ANOVA model I. Random factor level models are called ANOVA model II. Mixed models, where some factors have random levels and other factors have fixed levels, are also possible and are called ANOVA model III. Table 1 lists the expected mean squares for each of these types for the model described in Equations [1]:
Table 1. Expected mean square values are listed for each of the three ANOVA Models when interactions are important (model Equation [1]).
Mean Square MSA Model I (A and B fixed) Model II (A and B random)
2 2 2 + n + nb 2 2 2 + n + na
2

Model III (A fixed, B random)

+ nb

2 i

MSB

2 + na

MSAB

2 +n
2

(a 1)(b 1)

b 1 2 ( )ij

a 1 i2

+ n

+ nb

2 i

a 1

2 2 + na

2 2 + n

2 2 + n

MSE

Each expected mean square in this table is made up of one or more terms. For random factors, the terms are based on the expected variance of the entire population of levels of 2 2 2 factors and interactions in the model, 2 , , or . For fixed factors, the expected mean squares are based on true variance of the factor main effects. For mixed models, if a fixed factor interacts with a random factor, the expected mean square for that fixed factor also includes random factor variance terms. The test statistic for making inferences about a model term is generated by dividing mean squares that have the same expectation under H0. In addition, under Ha the numerator mean square has a larger expectation than the de-

nominator mean square. This is the only way to correctly generate a test that can then be compared to the F-statistic. The test statistic will be larger than the comparable F-statistic if the model term contributes to the variance of the experimental data more than would be expected by pure chance. Each test statistic for the three models is listed in Table 2:
Table 2. The test for each model term is listed for a two-factor DOE when interactions are important (model Equation [1]).
Test for Significance of: A B AB Interaction Model I (A and B fixed) MSA/MSE MSB/MSE MSAB/MSE Model II (A and B random) MSA/MSAB MSB/MSAB MSAB/MSE Model III (A fixed, B random) MSA/MSAB MSB/MSE MSAB/MSE

The expected mean squares provide a way to predict the components of variance associated with each term in the model. We solve for the variance component of interest, then we replace the expected mean squares by the mean squares; by doing so, we know that the estimator we obtain is unbiased. For example, with a type II random factor model, the estimate6 of the variance of the AB interaction7 term is just:
2 s =

MSAB MSE . n

[2]

This is the expected variance in interaction of the entire population of operators with the entire population of parts. (We are estimating variance based on the finite data of the gage study, so here we replace with s.) The variance of all operators and the variance of all parts can be estimated by:
2 s =

MSA MSAB MSB MSAB 2 and s = . nb na


2 2 2 gage = 2 + + .

Thus, an estimate of the gage variance is just the sum of the appropriate components:

The first component, MSE or pure error is assigned to within-operator repeatability. The final two terms can be assigned to operator-to-operator reproducibility. It is interesting to compare this analysis with those from a mixed, type III model.

Confidence intervals for the estimates can be computed. For example, see Burdick, R. K., Larsen, G. A., Confidence Intervals on measures of variability in R&R studies, Journal of Quality Technology, July 1997, pp. 261-273. 7 Montgomery, D. C. and Runger, G. C., 1993-4 "Gauge Capability Analysis and Designed Experiments. Part II: Experimental Design Models and Variance Component Estimation," Quality Engineering 6, 289.

Suppose all of the operators involved in a measurement are included in the gage R&R study. Suppose also that ANOVA model III is applicable, that is, the part factor is random. Although the operator factor is fixed, the part factor being random makes the interaction operator-part a random factor. The component of variance due to the operator-part interaction can be estimated as shown in Equation 28. In this case, the operator factor is fixed, not random. The uninteresting variance component due to parts is easily estimated by:
2 s =

MSB MSE . na

The more interesting variation due to the operator cannot be estimated as a random variance because of the context of the problem. The variance due to operators cannot be modeled by a random variable. The gage data represents performance of all the operators and not a random sample of operators from a larger pool. However, the operator-tooperator variability is accounted for when we compute the squared sum of errors, SSE. The total sum of squares is broken down into SSOperator + SSPart + SSOperator*Part + SSE. Therefore, the variance components (see formulas in Appendix B) take into account the operator-to-operator variability by using MSE. Estimation of contrasts (comparisons) for operator level means can be computed. For example, in a two-operator gage, operator means can be calculated for each of the operators according to Appendix A. Of great interest is the bias difference, also called the contrast, L, of the two operators:

L = 1 2 = 1 2 .
This contrast allows the direct comparison of measurements obtained by the two operators, irrespective of the magnitude of their bias. With the contrast definition of:

L = ci i , where

= 0,

for this two operator study, c1 = 1 and c2 = 1 . The contrast between operators is not known exactly however, due to the fact that each Yijk observation include pure error. , is simply the difference in the operator While the estimate of the contrast, L = Y Y the estimated value of the variance of the contrast is: means, L 1 2
2 sL =

MSE ci2 . ab

Simple stated the expected bias of the operators is obtained through a simple comparison of the average measurement of each each operator in the gage R&R study. Therefore, this operator bias can be removed as a source of deviation in the measurement system. The gage variation becomes:
2 2 gage = 2 + .

Dolezal, K. K., Burdick, R. K. and Birch, N. J., April 1998, Analysis of a two-factor R & R study with fixed operators, Journal of Quality Technology, pp 163-170.

What about the part factor in a gage R&R study. Are parts a random or fixed factor? Parts selected for a gage R&R study need to be as close as possible to the actual parts the measurement is designed to test. In fact, the parts selected may be a random selection of the actual parts to which the measurement is applied. On the face of it, it sounds like parts is a random factor. Nonetheless, variation of a gage study due to differences in parts is not a direct goal of the study. This disinterest in the variability of parts does not imply that parts are a fixed factor. The quantitative measure of part-operator interactions, which is a part of reproducibility, is important. This part-operator interaction needs to be characterized for all possible parts a measurement system might encounter not just the interaction with the specific parts selected for the gage study. Therefore, parts should be randomly selected for a gage study. Thus, model I ANOVA, where all factors are fixed, will almost never be applied to a gage study. Whether model II or III is applied depends on the inferences to be made about the operator. While non-destructive, fully crossed designs have been discussed in this article, destructive, nested design can also be treated in a similar manner. In a simple destructive gage R&R study involving two factors of operator and part (nested within operator), an understanding of part-operator interaction is not possible because part-to-part variability is confounded with all terms in the model. That is, different operators cannot measure the same part because it was destroyed in the measurement. Whether a factor is fixed or random still has the same significance in the generation of expected mean squares (and therefore, statistical significance). It is of utmost importance that parts chosen for a destructive gage study be as similar as possible, so that in this case, parts will be a fixed factor. In conclusion, the context of a gage R&R study and a complete understanding of the inferences to be made from the results of the study determine how the experimental results can be statistically interpreted. Statistical tests and their associated probabilities vary in structure according to the nature of the levels of factors chosen in the study. In particular, fixed operator gage studies allows for the successful unbiasing of the main effect of operator on the experimental results. Finally, the interested reader is directed to Appendix B where an example data set is used to illustrate the points of this article.

Appendix A Calculations of Mean Squares


The sum of squares calculation is described for the simple, two-factor experiment with the model equation:
Yijk = + i + j + ( )ij + ijk .

Levels for the first factor, A, are specified by the index i running from one to a. For the second factor, B, levels are indexed by j and runs from one to b. Replicate data at the same factor levels are index by k and run from one to n. Several means are defined:

Grand mean:

Y = Yi =

Y
i j k

ijk

A level means:

nab Yijk
j k

B level means:

Y j = Yij =

Y
i k

nb

ijk

Treatment means:

na Yijk
k

With these definitions sums of squares (variances) can be calculated for all components of the model:
Term SSTO

Description
Total variance Replication (pure error) variance Variance due to factor A Variance due to factor B Variance due to AB interaction

(Y (Y
i j k i j k

ijk

Y )

Formula

SSE SSA SSB


SSAB

ijk

Yij )
2

nb (Yi Y )
i
j

na (Y j Y )
i j

n (Yij Yi Y j + Y ) = n (Yij Yi )(Yij Y j )


2 i j

SSTO = SSE + SSA + SSB + SSAB

It is interesting to point out that the second form of SSAB contains no squared terms, but is always guaranteed to be >= 0.

Appendix B An Example
A fictitious non-destructive gage R&R was performed where all three operators used in a process measurement measured five parts three times each with the following results:
Part 3 992 1004 1001 1010 1025 1019 1015 1020 1013

Operator

1 1019 1017 1018 1031 1031 1025 990 991 986

2 977 980 992 1001 1007 1010 962 966 952

4 988 991 982 1018 1018 1024 1023 1019 1027

5 967 981 971 997 992 1002 980 990 976

First, averages are calculated for each component: Grand Mean: Operator Means: Part Means:
Y... = 1000 Y1 = 992 Y2 = 1014 Y3 = 994 Y1 = 1012 Y2 = 983 Y3 = 1011 Y4 = 1010 Y5 = 984

Treatment Means, Yij :

j 1 2 3 1 1018 1029 989 2 983 1006 960 3 999 1018 1016 4 987 1020 1023 5 973 997 982

From these means, sums of squares (SS) can be calculated for each component of the model (see Appendix A). Degrees of freedom (df) for each term for this non-destructive test is also listed along with the mean squares (MS). Using Table 2, F test statistics can be calculated for each term, except of course, the error term that is the basic variance by which all other variances reference. Finally, probabilities (P) are calculated for each test statistic. This is the probability that random chance can account for the variance that is assigned to this term:

Term Operator Part Operator*Part Error Total

SS 4440 8190 5226 886 18742

df 2 4 8 30 44

MS 2220.0 2047.5 653.3 29.5

F 3.398 69.328 22.119

P 0.085 0.000 0.000

Note that the part factor test statistic and significance is very different than would be calculated if this were a pure random factor model. (In this case, Part would have an Ftest statistic of 3.134 and a P-value of 0.079.) Using the estimated mean squares for model III given in Table 1, estimated variances are calculated for three terms:

Estimated variances

Value 29.5 207.9 154.9 MSE

Formula

2
2 Part *Operator
2 Part

MSAB MSE n MSB MSE na

Variance for the operator factor cannot be estimated because the operator levels cannot be represented by a random variable9. Since operator is a fixed factor, it is reasonable to unbias each operators measurement. (Note that this is justified even with an ANOVA Pvalue of greater than 0.05. This is because the P-value of the simple operator means is much less than 0.05, primarily based on the relatively low pure error and the nb = 15 different observations for each operator.) A contrast for any operator can be defined with the constraints defined in the main article. For example, a contrast between operator #1 and #3 can be developed:

= Y Y = 994 992 = 2 , with c1 = -1 and c3 = 1. L 3 1


We can now estimate the variance of this contrast:
(1)2 ( 1)2 2 sL = MSE 15 + 15 = 3.93 .

The 95% confidence interval for the contrast is:


t (0.5, nb 2 )s = 2 4.1 . L L

Minitab 12.2 Reference Manual: Anova Models and Gage R&R Studies.

This confidence interval contains zero so that the gage study does not support a significant contrast between operator #1 and #3. The contrast (bias) for the operator #2 compared to operators #1 and #3 can be defined as: = Y Y1 + Y3 = 21 , so that c2 = 1 and c1 = c3 = 0.5. L 2 2 2 The estimated variance of this contrast is:
(1)2 ( 1)2 ( 1)2 2 sL = MSE 15 + 15 + 15 = 2.95 ,

and the 95% confidence interval is 21 3.5. This confidence interval does not contain zero and is therefore significant. We are statistically justified in unbiasing the measurements of operator #2. Based on 15-sample operator means, the total gage variation now becomes:
2 2 gage = 2 + Part *Operator = 237.4 .

This represents a 31% lower value than the value predicted by ANOVA model II erroneously applied to the same data.

10

T.A. Early R. Neagu

Random and Fixed Factor ANOVA Models: Gauge R&R Studies

99CRD094 July 1999

You might also like