Professional Documents
Culture Documents
Homework Assignment 4
MEMO
1. Draw a graph to represent the relationship. Ensure that bookings represent the
dependent variable (Y-axis) and Income the independent variable (X-axis).
1400
1300
y = 0.0194x + 371.68
Bookings
1200
1100
1000
900
800
20000 25000 30000 35000 40000 45000 50000
Income
ANOVA
Significance
df SS MS F F
Regression 1 208036.3214 208036.3214 34.0477481 0.000164917
Residual 10 61101.3453 6110.13453
Total 11 269137.6667
Standard
Coefficients Error t Stat P-value Lower 95%
Intercept 371.6758379 128.5571439 2.891133286 0.01607593 85.23267088
0.00016491
X Variable 1 0.019381393 0.00332155 5.835044825 7 0.011980518
3. Write down the equation of the regression model.
4. Describe the relationship between bookings and income in words, also, comment
on the significance of the relationship.
As Income per town increases by $1, the average amount of Airline Bookings will increase by
0.0194. Additionally, due to the P-Value = 0.016 (and therefore less than 0.05) the
relationship has great significance
5. Make a point and approximate 95% confidence interval estimate for another city
with given income of $39020.
Y = 371.68 + 0.01938(39020)
= 1127.937789
for 95% confidence interval for predicted value=
1127.94 + 2*(Standard error of estimate)
1127.94 +/- 2*(78.16734951)
Therefore, confidence estimate lays between 1284.27 and 971.60 when income is $39,020
6. Use the ANOVA part of the regression results to calculate the coefficient of
determination.
R2 has a value of 0.772973638. This means 77.297% of the variation in the amount of bookings (y) is
explained by the variation in the change of Income amount(x). The remaining 22,7% is unexplained,
i.e. due to error.
7. Use the ANOVA part to calculate the standard error of the estimate.
ANOVA
df SS
Regression 1 208036.3214
Residual 10 61101.3453
Total 11 269137.6667
Standard
Coefficients Error t Stat
Intercept 371.6758379 / 128.5571439 = 2.891133286 T-Value
X Variable 1 0.019381393 0.00332155 5.835044825
Residuals Plot
150
100
50
Residuals
0
0 10,000 20,000 30,000 40,000 50,000 60,000
-50
-100
-150
Income
There does appear to be an increase in the error variable as income increased, illustrated by
the spread of the plotted points, therefore heteroscedasticity is present.
10. What is autocorrelation? Draw an appropriate plot and comment on the presence
(or not) of autocorrelation.
100
50
0
1 2 3 4 5 6 7 8 9 10 11 12
-50
-100
-150
Due to there not really being a zig-zag pattern it is difficult to assess whether
autocorrelation is present or not.
But the graph shows an increase over time, therefore an external variable may be affecting
the dataset, as the graph should be stationary.
Model Summaryb
Std. Error of
R Adjusted R the Durbin-
Model R Square Square Estimate Watson
1 .879a .773 .750 78.16735 1.119
a. Predictors: (Constant), VAR00002
b. Dependent Variable: VAR00001
There are no outliers as none of the Standard Residuals are outside the confidence interval
of 2 or -2.
Standard outlier
Residuals
-1.514441286 no
-1.519469601 no
-0.436767926 no
-0.558442637 no
0.07870575 no
1.328092448 no
-0.66168675 no
0.566937659 no
0.497007431 no
0.719091741 no
-0.108353902 no
1.609327072 no
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Standardized Predicted .253 12 .033 .898 12 .151
Value
a. Lilliefors Significance Correction
Kolmogorov Smirnov test – residuals are not normal, Shapiro Wilk Test – residuals are
normal.
Inconclusive.
15. 15. Comment on whether your regression model is acceptable (conditions listed
on slide 22)