You are on page 1of 7

CLASSWORK 1-3

Bivariate analysis: Quantitative vs Quantitative

Correlation and Regression Analysis


Study case: GARBAGE
The accompanying data gives weights (in pounds) of plastic (and other materials) discarded by a
sample of households, along with the sizes of the households. The relationship between household
size and plastic discarded is important to the Census Bureau, which provided project funding,
because the presence of a correlation suggests that we can predict population size by analyzing
discarded garbage.
1. Make a scatterplot of these two variables. Find the correlation coefficient. Based on the
Census Bureau's motivation for the study, which variable should you plot on the y axis?
Describe the form, direction and strength for this relationship.

In Y-axis=Dependent variable=Household size

2. Do you think that correlation coefficient is significant?

The correlation coefficient for Household size with Plastic (r = 0.751) shows that the linear
relationship is strong and positive, greater amount of plastic discarded greater size of
household. The p-value (Sig. = 0.000) for the correlation coefficient test is enough evidence
to prove the correlation is significant.

The correlation coefficient with Paper (r = 0.469) shows that the linear relationship is strong
and positive, greater amount of paper discarded greater size of household. The p-value
(Sig. = 0.000) for the correlation coefficient test is enough evidence to prove the correlation
is significant.

The correlation coefficient with Metal (r = 0.620) shows that the linear relationship is strong
and positive, greater amount of metal discarded greater size of household. The p-value
(Sig. = 0.000) for the correlation coefficient test is enough evidence to prove the correlation
is significant.

The correlation coefficient with Glass (r = 0.403) shows that the linear relationship is strong
and positive, greater amount of glass discarded greater size of household. The p-value
(Sig. = 0.001) for the correlation coefficient test is enough evidence to prove the correlation
is significant.

There is enough statistical evidence that prove the correlation between Household size and
Plastic, Paper, Metal and Glass garbage discarded are all SIGNIFICANT.

3. Perform the regression according to your choice of y and x variables. Do you think that
the Census Bureau can indeed use data from plastic in garbage to predict population
size?

Regression Model:

Where

Y = 0 + 1 X+
Y = household size

X=Plastic discarded (pounds)

Developing Regression Analysis on SPSS setting Household size as the dependent variable
and Plastic as the independent variable, the output gives the following results:

LSR equation:

Y =b 0 +b1 X
Y =1.08+1.376 X

b= 1.08 people
b= 1.376 people per pound of plastic discarded

The initial size of household is 1.08 people when no plastic garbage is discarded. For each
pound of plastic discarded the size of household increase in 1.376 people.

I think that Census Bureau cannot predict the size of population because of first it is kind of
impossible if only using that variable of plastic you can obtain the exactly total population and
also because not everyone use plastic or throw plastic to the garbage, it cannot be possible in
other words.

4. Perform a correlation analysis between Household_size and the other garbage variables.

5. Perform the regression analysis to find a prediction model for Hosehold_size with the
other garbage variables.

Regression Model:

Where

Y = 0 + 1 X+
Y = household size

X=Plastic discarded (pounds)

X 2 = Paper discarded (pounds)


X 3 = Metal discarded (pounds)
X4

LSR equation:

= Glass discarded (pounds)

Y^ =b 0 +b1 X 1 +b2 X 2 +b3 X 3 +b 4 X 4


Y^ =0.518+ 1.057 X 10.27 X 2+ 0.496 X 3 +0.087 X 4

6. According to the F (Fischer) in ANOVA test, how good is this model?

The ANOVA test in Regression Analysis:

Ho: 1= 2= 3=0

No X-variable affects Y-variable (the model is useless)

H1: At least one is different At least on X-variable affects Y-variable (the model is somehow
useful)

Actually this model explains the household size variation in 63,3% (Goodness of fit).
7. According to the t (student) test for COEFFICIENTS, how good is the prediction for each
independent variable?

The T-test for the coefficients in Regression Analysis:

H 0 : 1=0

Y does not depends on Xi (X is useless, you can remove it) - independent

H 1: 1 0

Y depends on Xi (X is useful, you can use it) - dependent

According to the t-test p-value for the coefficient of Plastic and Metal (0.000) there is enough
statistical evidence to prove household size depends on Plastic and Metal garbage discarded, but
does not depend on Paper (p-value = 0.425); therefore, it should be removed from the model.

8. Remove the variables that you think are useless and perform a new Regression model,
what is the difference in the goodness of fit between the previous model and the new
one. Which one is best?

This model fits 61.3% of the data, which is almost the same as the previous
The p-value (0.000) for the ANOVA F-test suggests that there is enough statistical evidence to
prove that this model is useful with some variables.
According to the t-test p-values for the coefficients of plastic (0.000) and metal (0.008), there is
enough statistical evidence to prove that plastic and metal affect household size. Both variables are
useful to predict household size.
The final fitted model is:
Goodness of fit = 61.3%

=0.554 +1.081 ( Plastic ) +0.491(Metal )

You might also like