You are on page 1of 10

Andrew Nguyen

Math 1040

Skittle Term Project

Introduction: In this project, we will be using data collected from each class member on skittles.
We will be using different ways to represent data as well as different ways to break down data
through charts and distributions. We will also be calculating different types of confidence intervals
with our class data as well as calculating different hypothesis tests with the class data.

My Skittle Bag Results


Color

Proportion

Green

.300

Yellow

.200

Red

.117

Orange

.267

Purple

.167

Observation: In this Pie chart, you


can see that the colors are pretty
evenly distributed in the overall
population of the class data. The
class data is pretty different from
my data. The only thing similar is
Red, 0.198
Purple, 0.187
that the Green was present more
often. The Orange in my data the
Orange, 0.189
second common color but in the
Green, 0.220
class data the Orange is actually
the fourth most common color.
Yellow is the second most
Yellow, 0.206
common in the pie chart but is
actually the third most common in
my data. The color Red is the least
common in my data but is actually the third most common in the class data. Then purple is the
fourth most common in my data but it is the least common in the class data.

Proportion of Skittle Color Pie Chart

Andrew Nguyen

Math 1040

Proportion of Skittle Color Pareto Chart


Proportion of Skittle

0.23
0.22
0.21

0.2
0.19
0.18
0.17
Green

Yellow

Red

Skittle Color

Total
Mean
Median
Mode
Standard Deviation
Range
Minimum
Maximum
Sum
Count
Q1
Q3
IQR

59.9
60.0
59.0
2.28
8.0
55.0
63.0
1197.0
20.0
59
62
3

5- Number Summary

Orange

Purple

Skittle Term Project

Observation: In this Pareto chart,


you can see that the color that is
in the skittle bags is green. This
is true for both the class data as
well as in my data shown above
the Pie chart. The other colors
however are distributed
differently compared to the data I
collected. The Pareto Chart data
is basically the same as previous
data in pie chart, but is just
shown in a different way.

Andrew Nguyen

1.
2.
3.
4.
5.
6.

Math 1040

Skittle Term Project

Minimum= 55.0
Q1= 59.0
Q2= 60.0
Q3= 62.0
Maximum= 63.0
Observation:

Observation: The shape of this distribution is


skewed left. I think that if we were to have a
larger sample size of 20 then the distribution
would look more normal. Most of the data I
collected lies in the median which is where most
of the data should agree with.

Number of Candies Per Bag


Histogram
5
4
3
2

Frequency

1
0

51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
More

NUmber of Bags

Number of Skittles

Observation: This distribution does not have


a shape. I thought the number of candies
would be more evenly distributed because I
assumed that the bags are filled by
machines and that they would be programed
to have at least the same amount in all bags
but in this there is an outlier in the
distribution. The class data in this graph I
feel like would not agree mostly because
its so spread out.

Andrew Nguyen

Math 1040

Skittle Term Project

Reflection: Quantitative data consists of numbers representing some sort of count or


measurements while categorical data consists of names or labels. I think graphs like Pie Charts
and Pareto Charts would make sense for Categorical Data but not Histograms and Boxplots,
because in Pie Charts and Pareto Charts you are able to put things into categories such as color. It
is vice versa for Quantitative Date, Histograms and Boxplots would be good but Pie Charts and
Pareto Charts, because you are able to find certain numerical evidence that you are able to calculate
that you arent really able to categorize into different named categories. I think proportion is would
be good to help you calculate things in Categorical Data, because you are able to put those
proportion into different categories while in Quantitative data you would do summary statistic
calculation where you would find averages or the most reoccurring number.

Confidence Interval: The reason why we use confidence intervals is to give us a range of values
for an estimated population parameter instead of just a single value or a point estimate. The
confidence interval gives us a range where we believe that the true population value falls.

95% Confidence Interval Estimate:

Discussion: We are 95% confident that the true proportion of orange candies are between .169
and .211.

Andrew Nguyen

Math 1040

Skittle Term Project

99% Confidence Interval Estimate:

Discussion: We are 99% confident that the true mean number of candies per bag are between
58.4 and 61.3.

Andrew Nguyen

Math 1040

Skittle Term Project

98% Confidence Interval Estimate:

Discussion: We are 98% confident that the standard deviation of the number of candies per bag
is between 1.7 and 3.6

Andrew Nguyen

Math 1040

Skittle Term Project

Hypothesis Testing: An operation that is used for testing a claim about a property of a
population. The purpose of hypothesis testing is to help us decide the value of a population
parameter based on a given sample data.

0.01 Significance Level:

Discussion: There is insufficient evidence to reject the claim that the true proportion of purple
skittles is 20%.

Andrew Nguyen

Math 1040

Skittle Term Project

0.05 Significance Level:

Discussion: There is sufficient evidence to reject the claim that mean number of candies in a bag
of Skittles is 58.

Reflection: There are several different conditions that must first be met before being able to do
to calculate confidence intervals and hypothesis tests correctly.
Conditions for a Confidence Interval for Estimating a Population Proportion and for Testing a
Claim about a Population Proportion:
1. The sample is a simple random sample.
2. The conditions for the binomial distribution are satisfied. There is a fixed number of trials
and each trial is independent. There must be two categories of outcomes and the
probability has to remain the same for each trial.
3. There must be at least 5 successes and at least 5 failures (np 5 and nq 5).

Conditions for Confidence Interval for Estimating a Population Mean with Not Known and for
Testing Claims about a Population Mean with Not Known:
1. The sample is a simple random sample.

Andrew Nguyen

Math 1040

Skittle Term Project

2. The population is normally distributed or n > 30. (Either one can be satisfied or both can
be satisfied).

Conditions for Confidence Interval for Estimating a Population Standard Deviation or Variance:
1. The sample is a simple random sample.
2. The sample must be normally distributed no matter how big the sample size is.
Overall, the first condition that was met was that the sample must be a simple random sample,
however, there werent two categories of outcomes. There were actually 5: red, orange, yellow,
green, and purple. There is also the possibility of interpreting the data per color so that there are
actually two categories. For example, we could say the color red or not the color red. I think that
the population of skittles is normally distributed, because as more and more bags of skittles are
made the closer well get to a normal distribution. Our sample size was only 20 bags of skittles
instead of following the requirement of 30.
There are two possible errors that could have been made in my hypothesis tests. There is a type 1
or type 2 error. A type 1 error is when you reject a null hypothesis but it is actually a true. A type
2 error is when you fail to reject a null hypothesis but it is actually false. There are many
different ways in which this sampling method could be improved. In our sample we only had 20
bags of skittles which is pretty small amount. We could improve the method by increasing the
number of bags, so that we can meet the condition of a small being greater than 30. If we were to
increase the number of bags of skittles then our proportion of skittle colors would be the same or
maybe off by a couple of decimals. The class mean was 59.6 skittles per bag which seems
correct because in our interval, we were 99% confident that the true mean number of candies per
bag were between 58.1 and 61.1 which 59.6 is in the middle. Then we also tested a claim that
says the mean number of skittles per bag is 58 and we had to reject the hypothesis because it was
false.

Andrew Nguyen

Math 1040

Skittle Term Project

Project Reflection
This project has been a learning process for me. Throughout this project I had a
lot of trouble figuring out how to do some items in this project. For example, I had
trouble trying to figure out which equation to use for the Confidence Intervals and
Hypothesis Testing because I missed that part of class, but once I started to figure out the
key terms each question that was given to use in the project packet. I began to start
understanding the questions more and more and I was able to figure out which equation
goes to which.
Usually, when I cant figure out how to solve something I sometimes give up or
just ask someone for help without even trying to solve the problem. After doing this
project, I decided to try figuring out problems that I didnt know how to do on my own.
As I started to find problems that I couldnt do, I tried my best to solve in on my own. I
started to get frustrated and decided to give up again. After a couple hours later, I decided
to come back and try the problems again. After reading the questions over again and
comparing the questions to the equations, I slowly began to understand what went with
what.
I think after doing this project, my problem solving skills have improved. I dont
think that they are at their best but I think that if I were to work on my problem solving
skills now and towards the future it will be improved. This project has helped me learn
many new things. For example, I learned how to create graphs for data and as well as
determine which equation goes with estimating a population parameter and hypothesis
testing goes to which question and ways to use the equation to try to solve for different
things.

You might also like