You are on page 1of 10

Ice Cream Consumption

Introduction: Ice cream is one of the major frozen desserts


in the market. Many people enjoy it while watching their favor
television programs or after dinner. Yet, what are the key
factors that affect the people consuming ice cream? In this
paper we will investigate the potential factors. The data were
collected from March 18, 1951 to July 11, 1953, total of 30
four-week periods. The potential variables are: price of the
ice cream (Price), weekly family income of the consumers
(Income), and the temperature (Temp).

Methodology: Since we are interested what are the key factors


that affect the ice cream consumption, we will conduct certain
tests to identify and examine every potential variable. In view
of the fact that the data were collected over time, we will
conduct a time-plot to reveal the relationship between ice cream
consumption and time. Then, we will use the box-plot to
identify the outlier samples. Next, we will use backward
selection method to remove the irrelevant variable. Finally, we
will run the regression model and so, we can obtain the model to
estimate the ice cream consumption.

Analysis: Before conducting any statistical testing, we found


the ice cream consumption data is a time-series data. The
samples were collected every 4 weeks for 30 consecutive trials.
As a result, we have to conduct a time-plot to examine the
presence of any patterns over the observation period. At the
same time, we will conduct the regression model of ice cream
consumption (IC) vs. Date.
Figure 1:

Regression Model of IC vs. Date


IC = 0.3337 +0.0017 Date
0.55 N
30
Rsq
0.0492
0.50 AdjRsq
0.0152
RMSE
0.0653

0.45

0.40

0.35

0.30

0.25

0 5 10 15 20 25 30

Date

By observation, the time-plot displays a noticeable pattern.


The samples are bouncing up and down in time. We are positive
it was affected by the seasonal factor because the ice cream
consumption was higher during summer and lower in the winter.
Additionally, the regression line shows an increasing trend over
the observation period. As a result, we believe the time-series
factor ‘Date’ may explain some of the change in the means of ice
cream consumption.
Since the time-series factor takes affect to the model, we will
sort out the outliers by ‘Year’. Hence, we will conduct 3
separate box-plot graph as of Income vs. Year, Price vs. Year
and Temp vs. Year.

Box-plot 1 (Income vs. Year): Box-plot 2 (Price vs. Year):

0.30
100

95
0.29

90
I P
n r
c i 0.28
o c
m e
e
85

0.27

80

75 0.26
1 2 3 1 2 3
Year Year

Box-plot 3 (Temp vs. Year):

80

70

60

T
e
50
m
p

40

30

20

1 2 3
Year

According to the box-plot graphs at above, there are no outliers


among all 3 potential variables. Therefore, we will move on to
the backward selection method to screen out the needless
variable.
The purpose of this study is in an attempt to determine the main
factors for ice cream consumption. Therefore, we will run
“backward selection” to eliminate the ineffective variables
among ‘Price’, ‘Income’, and ‘Temp’ which in order to obtain a
better model of ice cream consumption. In addition,

Backward Elimination: Step 0


All Variables Entered: R-Square = 0.7190 and C(p) = 4.0000

Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F

Model 3 0.09025 0.03008 22.17 <.0001


Error 26 0.03527 0.00136
Corrected Total 29 0.12552

Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F

Intercept 0.19732 0.27022 0.00072338 0.53 0.4718


Price -1.04441 0.83436 0.00213 1.57 0.2218
Income 0.00331 0.00117 0.01082 7.97 0.0090
Temp 0.00346 0.00044555 0.08174 60.25 <.0001

Bounds on condition number: 1.1444, 9.9727

-------------------------------------------------------------------------------------------------

Backward Elimination: Step 1


Variable Price Removed: R-Square = 0.7021 and C(p) = 3.5669

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 2 0.08812 0.04406 31.81 <.0001


Error 27 0.03740 0.00139
Corrected Total 29 0.12552

Parameter Standard
Variable Estimate Error Type II SS F Value Pr > F

Intercept -0.11320 0.10828 0.00151 1.09 0.3051


Income 0.00353 0.00117 0.01261 9.10 0.0055
Temp 0.00354 0.00044496 0.08784 63.41 <.0001

Bounds on condition number: 1.1179, 4.4715

-------------------------------------------------------------------------------------------------

All variables left in the model are significant at the 0.0500 level.

Summary of Backward Elimination

Variable Number Partial Model


Step Removed Vars In R-Square R-Square C(p) F Value Pr > F

1 Price 2 0.0169 0.7021 3.5669 1.57 0.2218

By the result of the backward selection, the variable ‘Price’


has been removed. The R2 was dropped to 0.7021, from 0.7190.
Additionally, the F-value was increased to 31.81, from 22.17.
These information states the newest model has less variability
and higher significant value. Therefore, it yields a better
model to predict the ice cream consumption.
Predicting Ice Cream Consumption by Temperature and Income
The REG Procedure
Dependent Variable: IC

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 2 0.08812 0.04406 31.81 <.0001


Error 27 0.03740 0.00139
Corrected Total 29 0.12552

Root MSE 0.03722 R-Square 0.7021


Dependent Mean 0.35943 Adj R-Sq 0.6800
Coeff Var 10.35446

Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -0.11320 0.10828 -1.05 0.3051


Temp 1 0.00354 0.00044496 7.96 <.0001
Income 1 0.00353 0.00117 3.02 0.0055

According to the regression procedure at above, we obtain the


following regression model to estimate the Ice Cream consumption:

IC = -0.11320 + 0.00354 * Temp + 0.00353 * Income


The following figure is the residual plot for the regression
model. We can observe a noticeable pattern that it has an
increasing trend.

Figure 2:

Residual plot for IC by Temp and Income


IC = -0.1132 +0.0035 Temp +0.0035 Income
0.100 N
30
Rsq
0.7021
0.075
AdjRsq
0.6800
RMSE
0.0372
0.050

0.025

0.000

-0.025

-0.050

-0.075
-3 -2 -1 0 1 2 3
Normal Quantile

In the regression model, we notice the variable ‘Temp’ is


depending on the change of season. In order to provide concrete
evidence that the time-series factor ‘Date’ contribute an
increasing characteristic to the ice cream consumption, we will
conduct the regression model of IC vs. Temp for each Year.
Afterward, we will superimpose these 3 regression lines on the
same plot, and hopefully this overlay plot may express some
hidden facts behind the time-series factor.
Figure 3:

Overlay regression lines of IC vs. Temp by Year


0.55 3
0.54
0.53
0.52
0.51
0.50
0.49
0.48
0.47 2
0.46
0.45
0.44 2
3
0.43
1
0.42
3
0.41 1
0.40
0.39 1
1 2
0.38 2 2
3 1
0.37
0.36 3
0.35
0.34 2 1
0.33 PLOT 1 IC 1
3 2 1
0.32 2 IC 2
2 2
3 IC 3
0.31 3 2
0.30 2
0.29 2 1 FIT Year 1951
2
0.28 Year 1952
0.27 1 Year 1953
0.26 1
0.25

20 30 40 50 60 70 80

Temp1
Conclusion: Figure 3 expresses a clear fact that if the weather
is searing and hot, then the ice cream consumption will
increase; and on freezing and chilly days, the ice cream
consumption will be less. Moreover, the ice cream consumption
increased year after year since March 18, 1951, at least for the
3 years of the study as indicated by figure 3. Although our
sample is relatively small, we believe this is not coincident
because the ice cream industry is prosperity in this day and
age. For instance, Dreyer's Grand Ice Cream Company has new ice
cream flavor every year and had more than a billion dollars in
annual revenue (Reference 2). As a result, the history did
verify our testing result.

Summary: The main factor that influences the Ice Cream


consumption is the temperature. We may expect that the selling
of ice cream is higher in the summer and lower in the winter.
The demand for ice cream increases every year.

Reference:
1. The Data and Story Library, Cornell University, NY
http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html
2. Dreyer’s Grand Ice Cream Holdings, Inc.
http://www.dreyersinc.com/about/index.asp

Appendix 1 (Codebook):

1. Date: Time period (1-30) of the study (from 3/18/51 to 7/11/53)


2. IC: Ice cream consumption in pints per capita
3. Price: Price of ice cream per pint in dollars
4. Income: Weekly family income in dollars
5. Temp: Mean temperature in degrees Fahrenheit (o F)
6. Year: Year within the study (0 = 1951, 1 = 1952, 2 = 1953)
Appendix 2 (Data):
Obs Date IC Price Income Temp Year

1 1 0.386 0.270 78 41 1
2 2 0.374 0.282 79 56 1
3 3 0.393 0.277 81 63 1
4 4 0.425 0.280 80 68 1
5 5 0.406 0.272 76 69 1
6 6 0.344 0.262 78 65 1
7 7 0.327 0.275 82 61 1
8 8 0.288 0.267 79 47 1
9 9 0.269 0.265 76 32 1
10 10 0.256 0.277 79 24 1
11 11 0.286 0.282 82 28 2
12 12 0.298 0.270 85 26 2
13 13 0.329 0.272 86 32 2
14 14 0.318 0.287 83 40 2
15 15 0.381 0.277 84 55 2
16 16 0.381 0.287 82 63 2
17 17 0.470 0.280 80 72 2
18 18 0.443 0.277 78 72 2
19 19 0.386 0.277 84 67 2
20 20 0.342 0.277 86 60 2
21 21 0.319 0.292 85 44 2
22 22 0.307 0.287 87 40 2
23 23 0.284 0.277 94 32 2
24 24 0.326 0.285 92 27 3
25 25 0.309 0.282 95 28 3
26 26 0.359 0.265 96 33 3
27 27 0.376 0.265 94 41 3
28 28 0.416 0.265 96 52 3
29 29 0.437 0.268 91 64 3
30 30 0.548 0.260 90 71 3

Appendix 3 (SAS code):


data ice_cream;
Input Date IC Price Income Temp Year;
Datalines;
1 .386 .270 78 41 1
2 .374 .282 79 56 1
3 .393 .277 81 63 1
4 .425 .280 80 68 1
5 .406 .272 76 69 1
6 .344 .262 78 65 1
7 .327 .275 82 61 1
8 .288 .267 79 47 1
9 .269 .265 76 32 1
10 .256 .277 79 24 1
11 .286 .282 82 28 2
12 .298 .270 85 26 2
13 .329 .272 86 32 2
14 .318 .287 83 40 2
15 .381 .277 84 55 2
16 .381 .287 82 63 2
17 .470 .280 80 72 2
18 .443 .277 78 72 2
19 .386 .277 84 67 2
20 .342 .277 86 60 2
21 .319 .292 85 44 2
22 .307 .287 87 40 2
23 .284 .277 94 32 2
24 .326 .285 92 27 3
25 .309 .282 95 28 3
26 .359 .265 96 33 3
27 .376 .265 94 41 3
28 .416 .265 96 52 3
29 .437 .268 91 64 3
30 .548 .260 90 71 3
;
proc print data = ice_cream;
title 'Data for Ice Cream Consumption'; run;quit;

proc boxplot;
title 'Boxplot for Income vs. Year';
plot Income*Year;
run;

proc boxplot;
title 'Boxplot for Price vs. Year';
plot Price*Year;
run;

proc boxplot;
title 'Boxplot for Temp vs. Year';
plot Temp*Year;
run;

proc reg data = ice_cream;


title 'Regression Model of IC vs. Date';
model IC = Date;
plot IC * Date;
run;quit;

proc reg data = ice_cream;


model IC = Price Income Temp / selection = backward sls = .05 cp mse;
run;quit;

proc reg data = ice_cream;


title 'Predicting Ice Cream Consumption by Temperature and Income';
model IC = Temp Income; run;
title 'Residual plot for IC by Temp and Income';
plot residual.*nqq.;
run;quit;

proc sort data=ice_cream; by year;


proc reg data=ice_cream; by year;
title 'Predicting Ice Cream Consumption from Temperature by Year';
model IC = Temp;
output out=resids p=Fitted_IC;
proc print data=resids;
run; quit;

proc sort data=resids; by year Temp;


data resids;
set resids;
if year=1 then do; IC1=IC; Temp1=Temp; Fit1=Fitted_IC; end;
if year=2 then do; IC2=IC; Temp2=Temp; Fit2=Fitted_IC; end;
if year=3 then do; IC3=IC; Temp3=Temp; Fit3=Fitted_IC; end;
proc sort data=resids; by year Temp;
proc print data=resids;
run;quit;

symbol1 cv=red value='1' i=none;


symbol2 cv=blue value='2' i=none;
symbol3 cv=black value='3' i=none;
symbol4 cv=red value=none i=join ci=red line=1;
symbol5 cv=blue value=none i=join ci=blue line=2;
symbol6 cv=black value=none i=join ci=black line=3;

proc gplot data=resids;


title 'Overlay regression lines of IC vs. Temp by Year';
plot IC1*Temp1=1 IC2*Temp2=2 IC3*Temp3=3
Fit1*temp1=4 fit2*temp2=5 fit3*temp3=6/overlay legend;
run;quit;

You might also like