You are on page 1of 3

Chapter 7 and 8

Saturday, September 13, 2014

10:42 AM

Correlation = measure of strength and direction between strength and direction of linear relationships

Values of r -1 to 1
Regression line Explanatory variable (x) - horizontal axis
Response variable (y) - vertical axis

Homework

Regression line - explains how values of response variable change in relation to the value of the explanatory variable
Use line - predict value of the response variable for explanatory variable
Need mathematical formula
Algebra y=mx+b
m=slope=rise/run=change in y / change in x
b=y-intercept=point where line crosses the y axis
Statistics
y^=b0+b1X - put intercept first than slope
Y hat - predicted values
Values on the regression line
Summarizes the relationship between x and y
Observed values are called y
Points in the scatterplot
Residual values are called e
Residual values are the difference between predicted vs observed values
Error - amount of variation in y model cannot account for
e=y-y^
Most common way to find the place to make the regression line is least squared Puts line where sum of the squared errors as small as possible
y^=b0+b1X

B1=(sy/sx)r
B0=Y_-b1x_z
Regression line always go through
x_ and y_
R is connected to the value of the slope
Predicted value
- predicted gas consumption when degree days is 43
y^ =1.089+0.189(43)=9.216
Predict gas consumption when degree days is 24 - put in 24 instead of 43
Observed is just letter
e=y-y^
How far away observed value - predicted value

Residual positive - above line - under predicted


Residual negative - below line - over predicted
Three sources of variability
- Observed y
Mean - y_
Standard deviation sy
Stat 101 Page 1

JMP
Printout

Standard deviation sy
- Predicted y^
Mean - y_
Standard deviation -s^
- Residuals e
Mean -0
Standard deviation se
Square standard deviation - variance
Sy2=sy^2+se2
Variable in observed values can be separated into - Part explained by the least squares regression model
Sy2
R^2 = sy^2/sy2
Ratio written as a percentage
0 to 100%
Very different - closer to 0, further away from line, and less variability
Closer to line - closer to 100%
R2 can also be a proportion from 0-1
R2 is the percentage of variation in t
If you have r, you can square it to get R
the observed values of the response variable that can be explained with the linear regression model with x
( r) ^2 = R^2
r=+-squarerootR2
Have to look at graph to know if it is + or Jump gives you Rsquared

Residual plot - has e instead of y


Always starts with a line at 0
Each observed value has it's own residual on a residual plot
Use a resudual plot to answer is a linear plot appropriate to model the relationship between the two variables
Answer is yes if:
Contains nothing if interest
Answer is yes but caution is needed if
Contains a megaphone shaped pattern
Contains outliers
Answer is no if see a curve on the residual plot
Observations outside overall pattern
Large residuals
High leverage
Influential
Non-influential
Does not usually affect placement of regression line
Does affect additional analyses

Outliers in x-direction
Influential point
- Observations that affects the placement of the regression line
Non-influential
X outliers greatly effects result
What to do with outliers
- Make sure data points are recorded correctly
- Collect more data
- Conduct analysis with and without the outlier
Stat 101 Page 2

- Conduct analysis with and without the outlier


Outliers - bigger impact with smaller data sets
One outlier won't affect much with large data sets
Linear relationship
- look at scatterplot and residual plot
If not linear, do not summarize with line
Extrapolation - relationships between explanatory and response is only valid when you have data for explanatory
variable
Lurking variables - effect on variables
Averaging - makes relationship between two variables appear stronger
Removes variation

Clusters - conduct analysis separately on each group


lurking variables - can cause clusters
Look at data and for explanations

Association is not causation


1. Find out which variable is explanatory vs response
Explanatory - x Response - y
2. Graph on scatterplot
3. Describe Scatterplot
4. Correlation
5. R^2
6. Regression - least squares regression equation - make predictions, find residuals
7. Make Residual Plot
When poor sleep quality rating increases by 1, predicted happiness decreases by .658
If negative in slope, just use word decrease in sentence
When poor sleep quality rating is 0, predicted happiness is 30.227.
Observed minus predicted
Y hat is line - predicted
Y are dots
Residual plot - error plots
Outliers, extrapolation - not appropriate
Out of data range only in x direction
59.5% of variability in house prices in Saratoga can be explained using house sizes.
Least squares regretting - write predicted price
Predicted price = -3.117+94.445(size)
^y intercept ^ slope
Sometimes y intercept is not appropriate for data set

Stat 101 Page 3

You might also like