Professional Documents
Culture Documents
Correlation/Regression
2004 American Society for Quality. All Rights Reserved. Recognize Define Measure Analyze Improve Control
Correlation Analysis is used to quantify: the degree of association between variables Regression Analysis is used to quantify: the functional relationship between variables
Six Sigma, A Quest for Process Perfection Attack Variation and Meet Goals
Measure
Analyze
Improve
Control
Page 2
Correlation/Regression
Version 2.1
Correlation How to measure the linear relationship between two variables How to interpret the Pearson Correlation Coefficient, r
Regression Y = f(X): how to regress a dependent variable, Y, on an independent variable, X (simple linear regression) How to interpret the Coefficient of Determination, R-Sq
Page 3
Correlation/Regression
Version 2.1
A quality manager wants to predict the strength of a plastic molding by destructively testing a coupon.
DESIGN
A chemical engineer, designing a new process, wants to investigate the relationship between a key input variable and the stack-loss of ammonia.
2004 American Society for Quality.
All Rights Reserved.
Measure
Analyze
Improve
Control
Page 4
Correlation/Regression
Version 2.1
Terms
Correlation
Used when both Y and X are continuous Measures the strength of linear relationship between Y and X
Regression Simple linear regression used when both Y and X are continuous Quantifies the relationship between Y and X (Y = b0 + b1X) Metric: Coefficient of Determination, R-Sq (varies from 0.0 to 1.0 or zero to 100%) None of the variation in Y is explained by X, R-Sq = 0.0 All of the variation in Y is explained by X, R-Sq = 1.0
Measure Analyze Improve Control Page 5 Correlation/Regression Version 2.1 2004 American Society for Quality.
All Rights Reserved.
SCATTERPLOT OF Y VERSUS X
r = 102 +1.0
101
-99
r = -1.0
-100
-Y
100 99 98 98 99 100 101 102 103
-101 -102 -103 98 99 100 101 102 103
SCATTERPLOT OF Y VERSUS X
210
200
Y
190
r = 0.0
200 210 220
180
Measure
Analyze
Improve
Control
Page 6
Correlation/Regression
Version 2.1
Voltage for the same power supply is measured at Station 1 and Station 2.
Approach:
Measure
Analyze
Improve
Control
Page 7
Correlation/Regression
Version 2.1
4 2 5
1. Select C1 Station 1 and C2 Station 2 2. Push Select 3. Observe Station 1 and Station 2 as Variables: 4. Select Display p-values 5. Select OK
Measure
Analyze
Improve
Control
Page 8
Correlation/Regression
Version 2.1
Station 1
9.0 8.9 8.8 8.7 8.6 8.5 8.6 8.8 9.0 9.2 Station 2 9.4 9.6
Control
Page 9
Correlation/Regression
Version 2.1
Used to fit lines and curves to data when the parameters (bs) are linear The fitted lines Quantify the relationship between the predictor (input) variable (X) and response (output) variable (Y)
A Green Belt is given the task of predicting voltage at Station 2 from the voltage at Station 1.
Approach:
Open Datafile\CORREL.mtw (the data are displayed in the Data Window) Go to Stat > Regression > Fitted Line Plot
Improve
Control
Page 11
Correlation/Regression
Version 2.1
3 1 2
5
1. Select C1 Station 1 and C2 Station 2 2. Push Select 3. Observe Station 1 as Response (Y): and Station 2 as Predictor (X):
Measure
Analyze
Improve
Control
Page 12
Correlation/Regression
Version 2.1
Station 1
8.6
8.8
9.0
9.2 Station 2
9.4
9.6
Measure
Analyze
Improve
Control
Page 13
Correlation/Regression
Version 2.1
How is dependent Station 1 related to independent Station 2 or what is the regression of Station 1 on Station 2? From the Session Window, the regression equation is Station 1 = 1.020 + 0.8729 Station 2 Intercept, b0
Slope, b1
The intercept, b0, is where the fitted line (regression line crosses the Y-axis when X = 0 The slope, b1, is rise over run or DY/DX
The coefficients b0 and b1 are estimates of the population parameters b0 and b1: they are linear coefficients
2004 American Society for Quality.
All Rights Reserved.
Measure
Analyze
Improve
Control
Page 14
Correlation/Regression
Version 2.1
80 70 60 50 40 40 50 60 70 80
???
The best fitted line goes through the means of Y and X (shown by the cross)
90
100
What is the best fitted line between the Time to Invoice and the Items Ordered ?
Measure
Analyze
Improve
Control
Page 15
Correlation/Regression
Version 2.1
The least squares method minimizes the sum of the squared residuals The resulting equations for the intercept and slope are called the normal equations
80
70 60 50 40 40 50 60 70 80 90 100
Measure
Analyze
Improve
Control
Page 16
Correlation/Regression
Version 2.1
Positive 80 Residual
70
Positive: point above the fitted line Zero: point on the fitted line Negative: point below the fitted line
60
50 40 40 50 60 70
80
90
100
Measure
Analyze
Improve
Control
Page 17
Correlation/Regression
Version 2.1
Statistical Significance
An analysis of variance (ANOVA) table informs us about the statistical significance of the regression analysis
The null hypothesis, H0: the regression results from common cause variationwhen H0 is true, there is no statistically significant regression and the best prediction of Y is the mean of Y
As before, the p-value is used to evaluate the null hypothesis: if p is less than 0.05, the null hypothesis is false, and the regression is statistically significant
Approach:
Measure
Analyze
Improve
Control
Page 18
Correlation/Regression
Version 2.1
Measure
Analyze
Improve
Control
Page 19
Correlation/Regression
Version 2.1
Analysis of Variance
Source Regression Residual Error Lack of Fit Pure Error Total
DF 1 12 3 9 13
F 722.31
P 0.000
1.98 0.188
No lack of fit: p >= 0.05
The sum of squares (SS) for Regression involves each predicted value of Y minus the mean of Y The SS for Residual Error involves each observed value minus the predicted value, that is, the residual SS for Residual Error can be further decomposed into SS Lack of Fit and SS Pure Error SS Pure Error is the within subgroup variation and SS Lack of Fit is the Residual minus the SS Pure Error
Analyze Improve Control Page 20 Correlation/Regression Version 2.1 2004 American Society for Quality.
All Rights Reserved.
Measure
Every sum of squares (SS) has a number called degrees of freedom, DOF DOF is the number of independent pieces of information, involving the n independent responses Y1, Y2, , Yn, that are needed to compile the sum of squares SS about the mean needs (n 1) pieces of information SS due to regression needs one piece of information, b1
SS of residuals needs (n 2) pieces of information: in general, DOF for SS of residuals equals (number of observations number of parameters estimated)
p j
(m j )
n-1
Measure
Analyze
Improve
Control
Page 22
Correlation/Regression
Version 2.1
DOF
PURE ERROR
Subgroup #1, m =2 Subgroup #2, m =4 Subgroup #3, m =3 Subgroup #4, m =2 Subgroup #5, m =3
Measure Analyze Improve Control
1.00 1.00 2.00 2.00 2.00 2.00 3.00 3.00 3.00 4.00 4.00 5.00 5.00 5.00
1+3+2+1+2=9
Page 23
Correlation/Regression
Version 2.1
F
MS REGRESSION MS RESIDUAL _ERROR
i Y ) ( Y i
n
i ) ( Y Y i i
n
p j
m k
(Yk Yj )
( Y Y ) i i n
Measure
Analyze
Improve
Control
Page 24
Correlation/Regression
Version 2.1
Analysis of Residuals
Residuals are used to test the adequacy of the prediction equation (model) In residual plots, three types of plots indicate model inadequacy The plots will be dramaticnot subtle!
1. Fans
Approach
3. Curved bands
Open Datafile\Residuals
Note: Fitted Line Plot. does not have Lack of Fit Tests.
Measure Analyze Improve Control Page 25 Correlation/Regression Version 2.1 2004 American Society for Quality.
All Rights Reserved.
1
4
1. In Fitted Line Plot dialog box, Select Graphs 2. Select Four in One Plot 3. Select OK 4. Select OK
2 3
Measure
Analyze
Improve
Control
Page 26
Correlation/Regression
Version 2.1
Units
R-Sq is 89.7% 15 The regression is significant Can we do better? 10 residuals look? How do the
5
Measure
Analyze
Improve
Control
Page 27
Correlation/Regression
Version 2.1
6 4 2 0 -3 -2 -1 0 1 Residual 2 3 4
2 0 -2 2 4 6 8 10 12 14 16 18 20 22 24 Observation Order
Measure
Analyze
Improve
Control
Page 28
Correlation/Regression
Version 2.1
95 90 80
Percent
70 60 50 40 30 20 10 5
-4
-3
-2
-1
0 1 Residual
99
Normal
95 90
Percent
Residuals must be normally distributed. Are they? First, Store residuals, then
80 70 60 50 40 30 20 10 5
-4
-3
-2
-1
0 1 RESI1
Page 29
Correlation/Regression
Version 2.1
The plot of Residuals vs. Fits shows a curved band. Try Stat > Regression > Fitted Line Plot and select
Residual
Quadratic.
Measure
Analyze
Improve
Control
Page 30
Correlation/Regression
Version 2.1
15
Units
10
10 Fitted Value
15
20
8 10 12 14 16 18 20 22 24 Observation Order
Control
Page 31
Correlation/Regression
Version 2.1
Your Turn
Open Datefile\CORREG YOUR TURN Analyze the data sets: 1. Do the variables correlate? 2. What is the prediction equation? 3. Is the regression statistically significant? 4. Does the analysis of residuals indicate anything unusual?
Select Pure Error when your data is replicated Select Data Sub setting when you data is not replicated
Measure Analyze Improve Control Page 32 Correlation/Regression Version 2.1 2004 American Society for Quality.
All Rights Reserved.
Correlation How to measure the linear relationship between two variables How to interpret the Pearson Correlation Coefficient, r
Regression Y = f(X): how to regress a dependent variable, Y, on an independent variable, X (simple linear regression) How to interpret the Coefficient of Determination, R-Sq
Page 33
Correlation/Regression
Version 2.1