Skittles Count Group Project

Hope Carter
Fawnna Hentish
Dylan Dusoe
Alyssa Hewitson
Skittles Count Group Project
Introduction
Detailed Goal:
This paper will serve as a record of our groups hypothesis and the series of tests we used to
either prove the hypothesis correct, or incorrect. Our group will use statistics to determine if bags
of Skittles overall have more of one color than the rest. Statistics is a series of steps that will lead
to a goal of either proving our hypothesis correct, or incorrect. Our group will be using methods
learned in class, such as, organizing and analyzing collected data, drawing conclusions using
confidence intervals and hypothesis tests, and presenting our work in an organized paper at the
end. To begin, our hypothesis is that bags of Skittles have more RED candies than the other
colors.
Method:
We must first start our tests by collecting data. Since it is impossible to test every bag of
Skittles, we will using a sample population to represent the entire population of Skittles. The
sample data used for this report was gathered from fourteen students of an evening Math 1040
class at Salt Lake Community College. The fourteen students each purchased one, 2.17-ounce
bag of Original Skittles each, and recorded the color data from their individual bag. Each student
then e-mailed their individual data to the classs professor. The classs professor then complied
all fourteen individual sets of data into one comprehensive list (image A), which we as a group
will be using as reference. Our group of four students were then tasked with organizing and
analyzing the sample data, drawing conclusions using confidence intervals and hypothesis tests,
and presenting our work in a well-organized paper for review.
Red Orange Yellow Green Purple Total

10 16 9 12 12 59
11 8 14 18 9 60
13 11 9 9 16 58
22 14 9 13 4 62
13 12 15 9 11 60
14 11 13 11 14 63
11 12 10 13 12 58
11 14 8 18 9 60
12 11 15 12 11 61
15 11 12 9 13 60
18 10 16 8 12 64
18 12 9 8 11 58
12 14 12 8 14 60
16 8 9 15 8 56
Image A
Organizing and Displaying Categorical Data: COLORS
Out of the fourteen individual 2.17 ounce bags of Original Skittles, there was a total of
839 candies. The total number of RED candies was 196, ORANGE was 164, YELLOW was
160, GREEN was 163, and PURPLE was 156. Below is a pie chart showing the percentages of
the sample size of candies for the entire class and for my individual bag. My individual bag of
Skittles contained 58 candies out of the class total of 839. All my color percentages are smaller
than the sample size of the entire class, except for the PURPLE candies, where my percentage is
almost double the sample size. I originally expected that my percentages would be the same as
the entire class, since the candies are the same brand and the bags are the same size. However,
after compiling the data into graphs, I noticed that each individual bag would have slightly
different percentages, but when you compile all the data as a whole, you start to recognize a
pattern of which colors will usually be the largest in number per bag of candy.
Sample Size of Candies: Entire Class
Color Number
Red 196
Orange 164
Yellow 160
Green 163
Purple 156
Orange Yellow
20% 19%
Red Green
23% 19%
Purple
19%
My Individual Skittle Data
Orange Yellow
19% 15.5%
Red Green
22% 15.5%
Purple
28%
Color Number
Red 13
Orange 11
Yellow 9
Green 9
Purple 16
.
PERCENT OF CUMULATIVE
Color OCCURRENCES TOTAL PERCENT
Red 196 23.36% 23.36%
Orange 164 19.55% 42.91%
Yellow 160 19.07% 61.98%
Green 163 19.43% 81.41%
Purple 156 18.59% 100.00%

250 100%
90%
200 80%
70%
150 60%
50%
100 40%
30%
50 20%
10%
0 0%
Red Orange Yellow Green Purple
OCCURRENCES CUMULATIVE PERCENT

My Individual Sample
PERCENT OF CUMULATIVE
PROBLEM AREA OCCURRENCES TOTAL PERCENT
Red 13 22.41% 22.41%
Orange 11 18.97% 41.38%
Yellow 9 15.52% 56.90%
Green 9 15.52% 72.41%
Purple 16 27.59% 100.00%
My Individual Sample
18 100%
16 90%
80%
14
70%
12
60%
10
50%
8
40%
6
30%
4
20%
2 10%
0 0%
Red Orange Yellow Green Purple
OCCURRENCES CUMULATIVE PERCENT

Based on the charts showing the percentages of the entire sample size versus my
individual sample, I must conclude that each individual bag of candy will fluctuate in the ratio of
candy colors. However, when you gather a large enough sample, like our entire class, you can
find some similarities and the percentages even out.
Organizing and Displaying Quantitative Data: the Number of Candies per Bag:
Number of Candies per Bag

Shown above are a histogram and a box-plot for the number of Skittles per bag for our
class sample. This information can also be seen in the chart below, which shows the 5-number
summary as well as the mean and standard deviation.
Column N Mean Standard Min Q1 Median Q3 Max
Dev.
Total 14 59.9 2.13 56 58 60 61 64
My observation of this data is that the distribution is normal, and isnt skewed in either
direction. The graph did reflect what I expected to see, which was consistency because the bags
are sold by weight so there shouldnt be a large range in the number per bag. This is also shown
by the standard deviation being only 2.13, which shows that even two standard deviations away,
theres only a four or five Skittle difference per bag. In my own bag of Skittles I had 63, which
isnt far from the median of 59.9. Our sample data included fourteen bags of Skittles.
Categorical data is sorted by a quality or attribute, and cant be counted or given a
numerical value. An example of this in our data is the distribution of colors of Skittles per bag,
because you can categorize it but you cant do any math with it. Other examples of categorical
data would be yes or no questions, or pass or fail grades. There isnt a way to rank these things,
we can only organize them by category. Pie or bar charts are best fit for categorical data because
they compare the size of the categories with each slice or bar showing proportion to the whole.
Quantitative data is measured on a numeric scale, and you can do math with it. In our
data it would be the total number of Skittles per bag. It would also make sense to use quantitative
data when looking at height, weight, or temperatures. This data is best represented by a
histogram or boxplot, because they show the difference in distribution for each number, or class
of numbers. They also show the underlying structure of the data, and can be used to figure out if
the distribution is normal or skewed.
Confidence Intervals
Confidence intervals are a range of values with a specified probability that the parameter
is within that range. Confidence intervals are used in statistics to estimate the parameter, or range
of value gives a margin of error, because we cannot be 100% certain. Also, how confident you
want to be affects the error. For example, if you want to be 99% confident the range of numbers
will be larger than if you only want to be 95% confident.

For the first confidence interval constructed at a 99% confidence interval estimate for the
true proportion of yellow candies I found that 99% of all bags of Skittles will have around
34.9% of yellow candies.
For the second confidence interval I constructed a 95% confidence interval for the
number of candies per bag. What I found is that it is around 60 candies per bag.
You would construct a confidence interval if you ever need to gather a large amount of
data from a certain sample or population. By doing a confidence interval you can have an
estimate of what the total amount of something for a population may be.
Hypothesis Tests
A hypothesis test uses data to determine whether to reject, or fail to reject (accept), a
claim. Using a hypothesis test can help verify claims made about data to determine whether or
not you are getting what you paid for as the customer. This is also useful for quality control with
manufacturing.
The first hypothesis test we had to do was to test the claim that 20% of all skittle candies
are red. After calculating the critical values with a 0.05 significance level and the z-score , we
found that a 20% proportion of red skittles is not plausible and so we rejected the claim. Above
are the calculations for the first hypothesis test.
The second hypothesis test was to test the claim that the mean number of candies in a bag
of skittles is 55. By calculating the critical test statistic with a 0.01 significance level we found
that in order for that claim to be true, our t value must fall between -2.678 and 2.678. In reality,
the t value came out to be 8.61, which is way outside the range of acceptable numbers, therefore,
we reject the claim that the mean number of candies in a bag of skittles is 55. Above are the
calculations for the second hypothesis test.
The conditions for doing interval estimates and hypothesis tests are that the sample needs
to be random, the sample size large enough (n>30), and that the sample is less than 10% of the
populations. Our sample was random and was less than 5% of the population, but we did not
have a number of samples that was thirty or more. Since our sample size was smaller than thirty,
errors could have occurred in our data in that our conclusions about the population are incorrect.
The sampling method could have been improved by collecting more sample data to meet the
requirements of hypothesis testing. The conclusions that have been drawn from the research,
cannot be said to be valid since our sample size was not large enough.

Skittles Count Group Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Skittles Count Group Project

Uploaded by

Copyright:

Available Formats

Hope Carter

and presenting our work in a well-organized paper for review.

Red Orange Yellow Green Purple Total

Sample Size of Candies: Entire Class

OCCURRENCES CUMULATIVE PERCENT

OCCURRENCES CUMULATIVE PERCENT

find some similarities and the percentages even out.

Number of Candies per Bag

summary as well as the mean and standard deviation.

Column N Mean Standard Min Q1 Median Q3 Max

Total 14 59.9 2.13 56 58 60 61 64

Categorical data is sorted by a quality or attribute, and cant be counted or given a

the distribution is normal or skewed.

will be larger than if you only want to be 95% confident.

34.9% of yellow candies.

are the calculations for the first hypothesis test.

calculations for the second hypothesis test.

You might also like