You are on page 1of 14

Department of Statistics

The Wharton School


University of Pennsylvania

Statistics 621 Fall 2008


Final Exam
Read these instructions carefully.

This is a closed-book exam. You are allowed to use a calculator and one page (8.5 11 inches
or A4, both sides) of handwritten notes. No use of cellular telephones or other portable
electronics is permitted.

You have two hours for the exam. The computer output associated with one or more items
should be considered an essential part of the questions. The multiple-choice questions are
equally weighted; your grade will be based on the number of correct answers you provide.

Throughout this exam, significant means statistically significant.

Please note the following when filling in the answer form:

Mark the answer form using a #2 pencil. Erase changes completely.

Fill in your name and student id number on the answer form.

Mark the bubbles under your name and student id number.

Choose the one best answer by marking the item on the answer form.

STOP
DO NOT TURN THE PAGE UNTIL YOU ARE
INSTRUCTED TO PROCEED.
Statistics 621, Final Exam -2- Q1, 2008

(1) Mark the answer for question #1 on your answer form b. This identifies your exam.

(2) Assuming the MRM holds, if the 95% confidence interval for the intercept 0 in a multiple
regression model is the interval [-18, 54], then the
a. Intercept 0 is zero.
b. Intercept is statistically significantly different from zero.
c. Standard error of the estimated intercept is about 18.
d. p-value for the test of H0: 0 = 0 is less than 0.05.
e. Value of R2 for the model is close to zero.
(3) The intercept is often an extrapolation in a regression model because
a. The population intercept 0 is 0.
b. The sample size used to fit the estimated model is too small.
c. The explanatory variables in the fitted model are collinear.
d. The fitted model has a highly leveraged outlier.
e. Zero lies outside the range of the explanatory variables.
(4) A narrow cluster of observations at the center of the leverage plot of an explanatory
variable in a multiple regression suggests
a. The effect of one or more highly leveraged observations.
b. That this variable is collinear with other explanatory variables.
c. That this variable is a statistically significant predictor of the response.
d. That this variable has a normal distribution.
e. The presence of a nonlinear relationship in the regression model.
(5) In order to build a regression model that estimates a constant marginal elasticity of sales
with respect to price, we must
a. Fit a simple regression of sales on log price.
b. Fit a linear regression of sales on price.
c. Fit a simple regression of log sales on log price.
d. Compute the derivative of sales with respect to price.
e. Form a categorical variable to represent different levels of price.
(6) The explanatory variables in a multiple regression are assumed by the multiple regression
model to
a. Be uncorrelated with each other.
b. Have normal distributions.
c. Be linearly related to the response.
d. Have equal variation.
e. Each consists of an iid sample of n cases from the studied population.
(7) The overall F-ratio in a multiple regression with K explanatory variables (found in the
analysis of variance table)
a. Tests the statistical significance of an added categorical variable.
b. Grows larger with a larger sample size.
c. Increases with the addition of explanatory variables to the model.
d. Tests H0: 1 = 2 = = K = 0.
e. Measures the impact of collinearity on a fitted multiple regression.
Statistics 621, Final Exam -3- Q1, 2008

(8) To check for the presence of heteroscedasticity in a multiple regression model, it is


recommended that we
a. Inspect the histogram for each explanatory variable.
b. Plot the residuals of the model versus the fitted values.
c. Inspect the collection of scatterplots in a scatterplot matrix.
d. Verify that each explanatory variable is statistically significant.
e. Look for collinearity between the response and each explanatory variable.
(9) In order to allow the effect of the explanatory variable Advertising (which measures
promotional spending) on the response Sales (which measures retail sales at various stores) to
depend upon the geographic Location of the stores (a categorical variable), we should
a. Transform Advertising to be on a logarithmic scale.
b. Include the explanatory variable Location our model.
c. Include Location and its interaction with Advertising in our model.
d. Form the inverse regression of Advertising on Sales.
e. Transform both Advertising and Sales to be on a common logarithmic scale.
(10) A meandering pattern in the plot of the residuals from a regression model on time
indicates
a. The possible presence of autocorrelation.
b. The effects of collinear explanatory variables.
c. The presence of several highly leveraged outliers.
d. The lack of an adequate sample size.
e. That the underlying model errors are not normally distributed.
(11) If the p-value of the explanatory variable in an estimated simple regression is larger than
0.05, then (assume the conditions of the SRM)
a. The p-value of the associated overall F-ratio is less than 0.05.
b. The R2 statistic is larger than 0.05.
c. The estimated slope is statistically significantly different from zero.
d. The R2 statistic is less than 0.05.
e. The absolute value of the t-ratio of the estimated slope is less than 2.
Statistics 621, Final Exam -4- Q1, 2008

Questions 12-21. The US Department of Agriculture regularly surveys farms in the US to


determine the amounts of various crops that are currently being planted. One survey asks for
the number of acres of land planted with wheat and the ultimate harvest from this land. The
data in this example considers the relationship between the number of acres planted with wheat
(Acres) and the number of bushels of wheat that were harvested (Wheat) in a sample of 60
farms.
70000

60000

50000
Wheat

40000

30000

20000

10000
600 700 800 900 1000 1100 1200 1300 1400
Acres

RSquare 0.512
Root Mean Square Error 6700
Mean of Response 29430
Observations 60
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|
Intercept 2950.04 4236.71 0.70 0.4893
Acres 37.68 4.83 7.81 <.0001

15000

10000

5000
Residual

-5000

-10000

-15000
600 700 800 900 1000 1100 1200 1300 1400
Acres
Statistics 621, Final Exam -5- Q1, 2008

(12) Based on the fitted model, a farmer who plants 1,000 acres of wheat can expect to harvest,
on average, about
a) 2,950 bushels of wheat.
b) 37,680 bushels of wheat.
c) 40,630 bushels of wheat.
d) 6,718 bushels of wheat.
e) 29,430 bushels of wheat.
(13) The positive estimated intercept along with the regression output indicates
a) That the relationship between Acres and Wheat is nonlinear near the origin.
b) That land that is not in active production nonetheless yields wheat.
c) Outliers have distorted the fit of the line for smaller farms.
d) No serious problem and is not significantly different from zero.
e) Most of the farms in this sample are relatively small.
(14) The fitted model implies that, on average,
a) Each additional acre planted yields 37.68 bushels of wheat.
b) Farms require about 37.68 acres to produce a bushel of wheat.
c) Each additional acre planted yields 2,950 bushels of wheat.
d) Farms require about 2,950 acres to produce a bushel of wheat.
e) Acreage planted does not significantly affect the amount of wheat produced.
(15) If a farmer were to increase the amount of land planted with wheat from 1,200 to 1,300
acres, then we would expect, on average, that the amount of wheat produced would (assuming
the SRM)
a) Increase between about 3,285 and 4,251 bushels, with 95% confidence.
b) Increase between about 3,758 and 3,778 bushels, with 95% confidence.
c) Increase between about 2,802 and 4,734 bushels, with 95% confidence.
d) Increase between about 3,763 and 3,773 bushels, with 95% confidence.
e) Remain the same.
(16) Most of the farms in this survey have fewer than 1000 acres of wheat planted, while a few
others are much larger (up to about 1400 acres). This skewness in the number of acres planted
a) Violates the normality assumption of the simple regression model.
b) Produces a lack of constant variance in the response.
c) Introduces dependence in the underlying model errors.
d) Implies that the simple regression model omits an important predictor.
e) Has produced a few leveraged observations.
(17) An extension of this model developed by the Department of Agriculture adds 1 other
predictor, increasing R2 to 0.75. This revised model, assuming it satisfies the MRM, would be
able to predict the wheat production of a typical individual farm with 95% probability to within
about
a) Plus or minus 4800 bushels.
b) Plus or minus 6700 bushels.
c) Plus or minus 9600 bushels.
d) Plus or minus 11,300 bushels.
e) Plus or minus 13,400 bushels.
Statistics 621, Final Exam -6- Q1, 2008

15000
.01 .05.10 .25 .50 .75 .90.95 .99
10000

5000

-5000

-10000

-15000

-20000
5 10 15 -3 -2 -1 0 1 2 3
Count
Normal Quantile Plot

(18) The diagnostic plot shown above graphs the residuals from the fitted model of Wheat on
Acres. This diagnostic plot indicates that
a) The simple regression omits an important predictor.
b) The relationship between Wheat and Acres is not linear.
c) The underlying model errors are apparently dependent.
d) The underlying model errors apparently have constant variance.
e) The underlying model errors are approximately normally distributed.
(19) Suppose that the Department of Agriculture were to conduct a second survey, only this
time sampling 400 farms rather than 100 farms from the same population as the current survey.
In a simple regression of Wheat on Acres using the data from this larger survey we should
expect to find that, in comparison to the shown simple regression,
a) The R2 summary statistic will be larger than 0.512.
b) The estimated intercept will be statistically significant.
c) The estimated slope will not be statistically significant.
d) The 95% confidence interval for the estimated slope will be longer.
e) The t-ratio for the estimated slope will be larger.
(20) When expanding the scope of the shown survey, if the Department of Agriculture accepts
the validity of the simple regression model for all US farms and wants to improve the accuracy
of the estimated slope by the most possible, it should seek to
a) Remove small farms that plant less than 650 acres of wheat.
b) Add more typical farms with about 1,000 acres of wheat planted.
c) Add more farms that plant the average number of acres of wheat.
d) Add farms that plant a wider range of acres of wheat than those in this survey.
e) Add farms that are adjacent to and of similar size as those in the shown model.
(21) Analysis indicated that a drought seriously affected farms in the middle of the US, but not those
along the Pacific coast. A drought reduces the yield of wheat per acre. If this expanded sample of 400
farms mixes farms from the central US and Pacific coast, then we should
a) Do a two-sample t-test to see if the effect of the drought is noticeable.
b) Add a categorical variable for location to the model.
c) Add a categorical variable for location and its interaction with size to the model.
d) Use a log transformation to capture the diminishing returns brought by drought.
e) Discard this sample since this mixing distorts the use of the data in regression.
Statistics 621, Final Exam -7- Q1, 2008

Questions 22-32. A company is interested in obtaining funding for a wind-based power


generation plant. In order to secure the funding, they need to estimate the daily wind speed at a
height of 100 feet for the site in question over the period of a year. The company has only 60
days of recorded wind speeds (measured in miles per hour, mph) at the proposed site (called
Wind in the dataset), but a nearby meteorological station called MS1 has complete data (called
MS1). The goal is to build a model for the daily wind speed at the proposed site using the wind
speed at MS1. Analysts also believe that wind direction at MS1 could be a useful predictor.
This categorical variable has two levels: North or East (NorE) and South or West (SorW).
Output from a multiple regression of Wind against MS1 and Direction follows.
Summary of Fit
RSquare 0.42569
Root Mean Square Error 3.214302
Mean of Response 15.65966
Observations (or Sum Wgts) 61
Analysis of Variance
Source DF Sum of Squares Mean Square F Ratio
Model 2 444.1698 222.085 21.4954
Error 58 599.2409 10.332 Prob > F
C. Total 60 1043.4107 <.0001
Indicator Parametrization Estimates
Term Estimate Std Error t Ratio Prob>|t|
Intercept 4.6342203 2.000155 2.32 0.0241
MS1 0.7089594 0.120678 5.87 <.0001
Direction[NorE] 1.3068352 0.901986 1.45 0.1528
Residual by Predicted Plot

(22) If the wind direction remains constant, then the fitted model estimates that an increase in the wind
speed at MS1 by 10 mph produces, approximately,
a) No effect on the wind speed at the proposed site.
b) An increase of 7.1 miles per hour in the wind speed at the proposed site.
c) An increase of 13.1 miles per hour in the wind speed at the proposed site if the wind is from
the north or east direction.
d) An increase of 13.1 miles per hour in the wind speed at the proposed site if the wind is from
the south or west direction.
e) A decrease of 1.3 miles per hour in the wind speed at the proposed site if the wind is from the
south or west direction.
Statistics 621, Final Exam -8- Q1, 2008

(23) The R2 summary of the fitted model implies that


a) The model includes about 43% of the annual wind-speed data.
b) Predicted wind speed implied by the regression is accurate on about 43% of days.
c) About 43% of the observations are within about 3.2 m/hr of the fitted line.
d) About 43% of the observations are within about 6.4 m/hr of the fitted line.
e) The fitted values of the model describe about 43% of the variation in wind speeds.
(24) Suppose the wind speed at MS1 on Friday was 5 miles per hour. If the wind on Saturday at MS1
is from the same direction but at a speed of 15 miles per hour, then how much on average does the
fitted model predict the wind speed to change at the development site?
a) Without knowing the wind direction this output cannot be used to answer the question.
b) 0.71 mph.
c) 7.1 mph.
d) 0 mph.
e) 6.5 mph.
(25) Assuming the MRM, the 95% confidence interval for the difference in expected wind speed at the
development site when the wind direction changes direction from NorE to SorW is approximately
a) From -0.4972 to 3.1108 with SorW being slower.
b) From 0.4048 to 2.2088 with SorW being slower.
c) From -0.4972 to 3.1108 with NorE being slower.
d) From 0.4048 to 2.2088 with NorE being slower.
e) From 0.634 to 8.634 with SorW being slower.
(26) If wind speed had been measured in kilometers per hour, rather than miles per hour, then which of
the following quantities would change between the two regressions?
a) R2
b) The t-ratio for MS1.
c) The p-value for MS1.
d) The slope for MS1.
e) The coefficient of the categorical variable.
(27) If you were to make a 95% prediction interval for wind speed for a day on which the wind speed
is 15 mph at MS1 from the NorE direction, then the interval would be approximately
a) (10.15, 23.00) mph.
b) (12.71, 19.14) mph.
c) (6.15, 19.0) mph.
d) (-0.45,0.45) mph.
e) None of the above.
(28) Does the output suggest the presence of collinearity between MS1 and Direction?
a) The high value of R2 suggests collinearity.
b) The estimated value of RMSE clearly indicates collinearity.
c) The large standard errors for Direction indicate collinearity.
d) There is no information in the output that suggests the presence of collinearity.
e) The residual plot strongly suggests the absence of collinearity.
Statistics 621, Final Exam -9- Q1, 2008

(29) Were there collinearity between MS1 and Direction, which of the following statements would
provide the best interpretation?
a) The impact of wind speed at MS1 on the development site depends on the wind direction.
b) The speed at MS1 is associated with the direction the wind is blowing.
c) On windless days the wind always blows from the North.
d) Knowing the wind speed on a Monday at MS1 provides information about the wind speed for
the following Tuesday.
e) Collinearity is a meaningless concept when a categorical variable is involved.
(30) Comment on the residual plot shown with the summary of the estimated model:
a) As this data is observed over time, the residuals show autocorrelation.
b) The apparent U shape in the residual plot shows the models lack of fit.
c) The plot does not indicate violations of the underlying regression assumptions.
d) There is clear evidence of heteroscedasticity in the plot.
e) This plot shows that outliers have distorted the model and made it unreliable.
(31) The following graph is the leverage plot for MS1.

This graph indicates that


a) There evidently was a hurricane during this time period.
b) The average wind speed at MS1 is more than 20 miles per hour.
c) The underlying linear model assumption is well founded.
d) There are some highly leveraged points in the dataset.
e) A transformation of MS1 would likely improve the model.
(32) Which of the following would not be a reasonable next step in the analysis of this dataset?
a) Plot the residuals versus levels of Direction to check the constant variance assumption.
b) Consider adding an interaction term to the model.
c) Find another meteorological station in the vicinity from which to collect additional data.
d) Transform the variable Direction to capture evident curvature in plots.
e) Create a plot of the residuals plotted against time.
Statistics 621, Final Exam -10- Q1, 2008

Questions 33-37. The analysts responsible for the project decided to add an interaction term
between Direction and the wind speed at MS1 to the model. Output from this model follows.
Crosses in the plots below indicate a day when Direction was SorW and dots represent when
the Direction was NorE.

Summary of Fit
RSquare 0.523
Root Mean Square Error 2.95
Mean of Response 15.66
Observations (or Sum Wgts) 61
Analysis of Variance
Source DF Sum of Squares Mean Square F Ratio
Model 3 546.1316 182.044 20.8666
Error 57 497.2791 8.724 Prob > F
C. Total 60 1043.4107 <.0001

Indicator Parametrization Estimates


Term Estimate Std Error t Ratio Prob>|t|
Intercept 8.985883 1.873487 1.81 0.0754
MS1 0.381676 0.112094 6.82 <.0001
Direction[NorE] -11.18558 3.746974 -2.99 0.0042
MS1*Direction[NorE] 0.76642 0.224188 3.42 0.0012
Statistics 621, Final Exam -11- Q1, 2008

(33) In the fitted multiple regression model, the terms denoted MS1*Direction imply that
a) MS1 and Direction are correlated (confounded).
b) The association between changes in wind speeds at MS1 and the development site depend
upon the wind directions.
c) It is always windy at the development site when it is windy at MS1.
d) Interaction terms are meaningless for categorical variables.
e) The day of the week improves predictive accuracy of wind speed at the development site.
(34) Assuming the MRM, does the addition of the term MS1*Direction significantly improve the
accuracy of the model when predicting new observations?
a) No, because R2 must always increase.
b) Yes, because the 95% confidence interval for the interaction term does not contain zero.
c) No, because the residual plot shows a clear breakdown in model assumptions.
d) Yes, because the overall F-ratio is significant.
e) No, because the value of the RMSE is too large.
(35) On a day when the wind direction is NorE and the wind speed at MS1 is 10 mph, then the shown
model predicts wind speed at the development site to be, approximately,
a) 15.66 mph.
b) 9.28 mph.
c) 1.62 mph.
d) 11.04 mph.
e) 5.45 mph.
(36) The plot of the residuals from the model shown on the previous page implies that
a) Outliers have made the estimated model unreliable.
b) There is more variability in predicted wind speed at the development site on days when the
wind is from the NorE direction.
c) The variance of the residuals depends on the wind direction.
d) The data are dependent observations due to the presence of autocorrelation.
e) There is evidence of a non-linear effect of MS1 on wind speed.
(37) On calm days, the wind speed at MS1 is less than 10 miles per hour; on windy days, it blows at
more than 25 miles per hour at MS1. Based on the fitted model, the highest estimated average wind
speeds at the proposed site are observed when the wind direction at MS1 is
a) On calm days from the North or East, and on windy days from the South and West.
b) On calm days from the South of West, and on windy days from the North and East.
c) On calm days from North and East, and on windy days from the North and East.
d) On calm days from South and West and on windy days from South and West.
e) On calm days from North, East, South and West and on windy days from the North and East.
Statistics 621, Final Exam -12- Q1, 2008

Questions 38 44. A company makes cash advances to small businesses. One of the outcome
metrics that the company uses to judge the quality of an advance is the performance of the
advance after the first six months. Performance is defined as the ratio of the actual amount paid
back to the expected amount paid back after six months at the time that the loan is made.
Ideally, this performance metric (called PRiSM, for performance ratio in six months) should be
1, or close to 1.
The company is interested in identifying predictors of PRiSM. They have developed a
regression model of PRiSM using several explanatory variables. The explanatory variables that
appear in the regression model are defined in this following table. A summary of the regression
model follows the table. All logarithms refer to natural logarithms (base e).
Loan type Either a first [O for original] loan made to this
merchant or a renewal [R] loan.
FICO Fair-Isaac credit score of the individual
guaranteeing the loan
Years In business The number of years that the merchant has been in
business
Current delinquent credit lines Number of current delinquent or derogative lines of
credit
Business type Legal entity, including corporation (Corp), limited
liability corporation (LLC), sole proprietor, or
partnership.
Median Income The income per household in the zip code where
the merchant does business (recorded in dollars).
ISO Identifier Two values, identifying whether the ISO is
Hawthorne (H) or a different ISO.
Payment stress Ratio of the expected monthly payment to the
typical volume of monthly credit card transactions.
Summary of Fit
RSquare 0.276602
Root Mean Square Error 0.146335
Mean of Response 0.937145
Observations (or Sum Wgts) 990
Analysis of Variance
Source DF Sum of Squares Mean Square F Ratio
Model 14 7.983223 0.570230 26.6289
Error 975 20.878589 0.021414 Prob > F
C. Total 989 28.861812 <.0001

Effect Tests
Source Nparm Sum of Squares F Ratio Prob > F
Log(Years Business) 1 2.2888226 106.7739 <.0001*
Loan.Type 1 1.0054689 46.9053 <.0001*
Business Type 3 0.8628517 13.4174 <.0001*
ISO Identifier 1 0.4845248 22.6032 <.0001*
FICO 1 0.4672788 21.7986 <.0001*
Loan.Type*FICO 1 1.0332935 48.2033 <.0001*
Log(Years Business)*BusinessType 3 0.4802592 7.4681 <.0001*
Current.Delinquent.Credit.Lines 1 0.0530002 2.4725 0.1162
Payment Stress 1 0.3281044 15.3061 <.0001*
Log Median Income 1 0.7113217 33.2177 <.0001
Statistics 621, Final Exam -13- Q1, 2008

Indicator Function Parameterization


Term Estimate Std Error t Ratio Prob>|t|
Intercept 1.004291 0.652137 1.54 0.1242
Log(Years Business) 0.1119165 0.01138 9.83 <.0001*
Loan.Type[O] -0.55738 0.081384 -6.85 <.0001*
Business Type[Corp] 0.1649579 0.030572 5.40 <.0001*
Business Type[LLC] 0.148657 0.03781 3.93 <.0001*
Business Type[Partnership] -0.088988 0.06977 -1.28 0.2025
ISO Identifier[Hawthorne] -0.063469 0.01335 -4.75 <.0001*
FICO -0.000877 0.000124 -7.04 <.0001*
Loan.Type[O]*FICO 0.0009768 0.000141 6.94 <.0001*
Log(Years Business)*Business Type[Corp] -0.052709 0.01381 -3.82 0.0001*
Log(Years Business)*Bus.Type[LLC] -0.045837 0.019228 -2.38 0.0173*
Log(Years Business)*Bus.Type[Partnership] 0.0536029 0.03464 1.55 0.1321
Current.Delinquent.Credit.Lines -0.002196 0.001397 -1.57 0.1162
Payment Stress -0.049024 0.012531 -3.91 <.0001*
Log Median Income 0.074156 0.012866 5.76 <.0001

(38) The shown model predicted the PRiSM score for a possible loan to be y = 1.15. Based on the
summary of the fitted model and the assumptions of the MRM, we can conclude that the probability
that the actual PRiSM for this loan is larger than 1 is approximately
a) 83%.
b) 95%.
c) 5%.156
d) 50%.
e) 67%.

(39) When comparing loan applications from businesses that are identical but for the fact that one is a
renewal and one is an original loan, the estimated model implies that a 10-point increase in the FICO
score
a) Has no effect on expected PRiSM for either application.
b) Increases the expected PRiSM for the original loan by about 0.0010.
c) Decreases the expected PRiSM for either loan by about -0.0088.
d) Increases the expected PRiSM for the original loan by about 0.00977.
e) Increases the expected PRiSM for the original loan by about 0.0105.

(40) The p-value of the intercept implies that, assuming the MRM,
a) 12.42% of loans result in a negative PRiSM score.
b) There is a 12.42% chance that these data are a sample from a population in which 0=0.
c) We should refit the model without the intercept, which is a large extrapolation.
d) The estimated intercept is not statistically significantly different from zero.
e) The probability that the population intercept is zero is 0.1242.
Statistics 621, Final Exam -14- Q1, 2008

(41) The best interpretation of the estimated coefficient for the logarithm of median income is,
given fixed levels of the other explanatory variables, that
a) A $100 increase in median income increases the estimated PRiSM by about 7.4.
b) A 1% increase in median income increases the estimated PRiSM by about 0.074%.
c) A 1% increase in median income increases the estimated PRiSM by about 0.00074.
d) A $100 increase in median income increases the estimated PRiSM by about 0.074%.
e) A 1% increase in median income increases the estimated PRiSM by about 0.074.
(42) A reasonable next step to take in the development of this model would be to
a) Remove the intercept from the model since it is not statistically significant.
b) Remove Log(Years Business)*Business.Type[Partnership] because it has the largest p-value.
c) Reduce the number of types of businesses distinguished in the fitted model.
d) Remove the number of delinquent credit lines from the model.
e) Check that the distribution of estimated PRiSM scores is approximately normal.
(43) The estimated PRiSM score for an original loan application from ISO Hawthorne for a 1-year old
corporation with no delinquent credit lines, a FICO score of 600, in a community with $60,000 median
income, and stress level 0.10 is approximately
a) 0.83
b) 0.60
c) 0.12
d) 1.42
e) 0.44
(44) The following two graphs are leverage plots for the fitted model.
1.5
1.4
1.3
PRiSM Leverage

1.2
1.1
Residuals

1.0
0.9
0.8
0.7
0.6
0.5
0.4
400 500 600 700 800
FICO Leverage, P<.0001

Based on these leverage plots, we should conclude that


a) Removing the leveraged observation (x) would produce a more significant slope for the
number of currently delinquent credit lines.
b) Neither explanatory variable contributes significantly to the estimated model.
c) The model requires a transformation of the FICO score to improve the fit.
d) Removing the leveraged observation (x) would produce a less significant slope for the number
of currently delinquent credit lines.
e) The distribution of the number of currently delinquent credit lines does not conform to the
assumptions of the MRM.

You might also like