Professional Documents
Culture Documents
Interactive
LectureNotes
Solutions
Fall2015Winter2016
Dr.BrendaGunderson
DepartmentofStatistics
UniversityofMichigan
TableofContents
Topic
Page
Intro
1:SummarizingData
2:Sampling,Surveys,andGatheringData
25
3:Probability
35
4:RandomVariables
45
5:LearningaboutaPopulationProportion
Part1:DistributionforaSampleProportion
Part2:EstimatingProportionswithConfidence
Part3:TestingaboutaPopulationProportion
6:LearningabouttheDifferenceinPopulationProportions
Part1:DistributionforaDifferenceinSampleProportions
Part2:ConfidenceIntervalforaDifferenceinPopulationProportions
Part3:TestingaboutaDifferenceinPopulationProportions
7:LearningaboutaPopulationMean
Part1:DistributionforaSampleMean
Part2:ConfidenceIntervalforaPopulationMean
Part3:TestingaboutaPopulationMean
8:LearningaboutaPopulationMeanDifference
Part1:DistributionforaSampleMeanDifference
Part2:ConfidenceIntervalforaPopulationMeanDifference
Part3:TestingaboutaPopulationMeanDifference
9:LearningabouttheDifferenceinPopulationMeans
Part1:DistributionforaDifferenceinSampleMeans
Part2:ConfidenceIntervalforaDifferenceinPopulationMeans
Part3:TestingaboutaDifferenceinPopulationMeans
65
71
81
93
97
99
103
111
117
127
131
135
139
143
151
10:ANOVA:AnalysisofVariance
159
11:RelationshipsbetweenQuantitativeVariables:Regression
171
12:RelationshipsbetweenCategoricalVariables:ChiSquare
193
Stat250GundersonLectureNotes
Introduction
Statistics...the most important science in the whole world: for upon it depends the
practicalapplicationofeveryotherscienceandofeveryart:theonescienceessentialto
allpoliticalandsocialadministration,alleducation,allorganizationbasedonexperience,
foritonlygivesresultsofourexperience."FlorenceNightingale,Statistician
Definitions:
Statisticsarenumbersmeasuredforsomepurpose.
Statisticsisacollectionofproceduresandprinciplesforgatheringdataand
analyzinginformationinordertohelppeoplemakedecisionswhenfacedwith
uncertainty.
CourseGoal:Learnvarioustoolsforusingdatatogainunderstandingandmakesounddecisions
abouttheworldaroundus.
Stat250GundersonLectureNotes
1:SummarizingData
Youmustnevertellathing.Youmustillustrateit.Welearnthrough
theeyeandnotthenoggin."
WillRogers(18791935)
Simplesummariesofdatacantellaninterestingstoryandareeasiertodigestthanlonglists.
Sowewillbeginbylookingatsomedata.
RawData
Rawdatacorrespondtonumbersandcategorylabelsthathavebeencollectedormeasuredbut
havenotyetbeenprocessedinanyway.OnthenextpageisasetofRAWDATAinformation
aboutagroupofitemsinthiscase,individuals.ThedatasettitleisDEPRIVEDandhasinformation
about a sample size of n = 86 college students. For each student we are provided with their
answertothequestion:Doyoufeelthatyouaresleepdeprived?(yesorno),andtheirself
reportedtypicalamountofsleeppernight(inhours).Theinformationwehaveisorganizedinto
variables. In this case these 86 college students are a subset from a larger population of all
collegestudents,sowehavesampledata.
Definition:
Avariableisacharacteristicthatdiffersfromoneindividualtothenext.
Sampledataarecollectedfromasubsetofalargerpopulation.
Populationdataarecollectedwhenallindividualsinapopulationaremeasured.
Astatisticisasummarymeasureofsampledata.
Aparameterisasummarymeasureofpopulationdata.
TypesofVariables
Wehave2variablesinourdataset.Nextwewanttodistinguishbetweenthedifferenttypesof
variablesdifferenttypesofvariablesprovidedifferentkindsofinformationandthetypewill
guidewhatkindsofsummaries(graphs/numerical)areappropriate.
Thinkaboutit:
CouldyoucomputetheAVERAGEAMOUNTOFSLEEPforthese86students?YES
CouldyoucomputetheAVERAGESLEEPDEPRIVEDSTATUSforthese86students?NO
(couldcode,butwouldbearbitrary:0and1,orcoulduseanytwovalueslike1and203)
SLEEPDEPRIVEDSTATUSissaidtobeaCATEGORICALvariable,
AMOUNTOFSLEEPisaQUANTITATIVE_variable.
Definitions:
A categorical variable places an individual or item into one of several groups or
categories.Whenthecategorieshaveanorderingorranking,itiscalledanordinal
variable.
Aquantitativevariabletakesnumericalvaluesforwhicharithmeticoperationssuchas
adding and averaging make sense. Other names for quantitative variable are:
measurementvariableandnumericalvariable.
TryIt!
Foreachvariablelistedbelow,giveitstypeascategoricalorquantitative.
Age(years)
QUANTITATIVE
TypicalClassroomSeatLocation(Front,Middle,Back) CATEGORICAL
NumberofsongsonaniPodQUANTITATIVE
Timespentstudyingmaterialforthisclassinthelast24hourperiod(inhours)QUANTITATIVE
SoftDrinkSize(small,medium,large,supersized)
CATEGORICAL(ordinal)
TheAndthen...countrecordedinapsychologystudyonchildren(detailswillbeprovided)
numberoftimesachildsaysthephraseandthenwhenaskedtorecallastudythey
justheardQUANTITATIVE
Lookingahead:Later,whenwetalkaboutrandomvariables,wewilldiscusswhether
avariableismodeleddiscretely(becauseitsvaluesarecountable)orwhetheritwould
bemodeledcontinuously(becauseitcantakeanyvalueinanintervalorcollectionof
intervals).Gobackthroughthelistaboveandthinkaboutisitdiscreteorcontinuous?
DATASET=DEPRIVED
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.
FeelSleep
Deprived?
No
No
No
Yes
Yes
Yes
Yes
Yes
No
No
No
No
Yes
Yes
Yes
No
No
No
Yes
Yes
No
No
No
Yes
No
No
No
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
AmountSleep
perNight(hours)
9
7
8
7
7
8
7
8
10
8
9
8
8
4
6
8
10
4
7
8
9
9
7
8
9
9
8
6
9
7
11
7
9
7
8
7
7
9
1
7
6
8
6
FeelSleep
Deprived?
No
No
No
Yes
Yes
Yes
Yes
Yes
No
Yes
No
No
Yes
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
AmountSleep
perNight(hours)
8
7
9
7
7
7
7
6
8
6
9
8
7
8
8
8
7
7
7
7
7
8
7
7
7
7
8
6
6
8
9
7
8
6
7
8
5
6
7
8
8
7
6
Ourdatasetissomewhatlarge,containingalotofmeasurementsinalonglist.Presentedasa
tablelisting,wecanviewtherecordofaparticularcollegestudent,butitisjustalisting,andnot
easytofindthelargestvaluefortheamountofsleeporthenumberofstudentswhofeltthey
aresleepdeprived.Wewouldliketolearnappropriatewaystosummarizethedata.
SummarizingCategoricalVariables
NumericalSummaries
How would you go about summarizing the SLEEP DEPRIVED STATUS data? The first step is to
simplycounthowmanyindividuals/itemsfallintoeachcategory.Sincepercentsaregenerally
more meaningful than counts, the second step is to calculate the percent (or proportion) of
individuals/itemsthatfallintoeachcategory.
Count = Frequency
SleepDeprived?
Count
Percent
Yes
51
(51/86)*100=59.3%
No
35
(35/86)*100=40.7%
Total
86
100%
The table above provides both the frequency distribution and the relative frequency
distributionforthevariableSLEEPDEPRIVEDSTATUS.
VisualSummaries
There are two simple visual summaries for
categoricaldataabargraphorapiechart.Hereis
thetablesummaryandbargraphmadewithR.
Ifyouweremakingone:Dontforgettolabeleach
axisandshowsomevaluesoneachaxis!
counts:
Deprived
No Yes
35 51
percentages:
Deprived
No Yes
40.7 59.3
Aside:Doesitmatterwhether
theNoorYesbarisgivenfirst?No,notordinalhere=>
weshouldnotcommentonshape(i.e.donotusewordslikeskewedorincreasing
patternhere)
PieChart:Anothergraphforcategoricaldatawhichhelpsusseewhatpartofthewholeeach
groupforms.
Piechartsarenotaseasytodrawbyhand.
Itisnotaseasytocomparesizesofpie
piecesversuscomparingheightsofbars.
Thuswewillprefertouseabargraphfor
categoricaldata.
ExploringFeaturesofQuantitativeDatawithPictures
RecallourSleepDeprivedDataforn=86collegestudents.Wehavedataontwovariables:sleep
deprivedstatusandhoursofsleeppernight.Howwouldyougoaboutsummarizingthesleep
hoursdata?Thesemeasurementsdovary.Howdotheyvary?Whatistherangeofvalues?What
isthepatternofvariation?
Findthesmallestvalue=____1______andlargestvalue=_____11_______
Takethisoverallrangeandbreakitupintointervals(ofequalwidth).
Whatmightbereasonablehere?
Perhapsby2s;butweneedtowatchtheendpoints.
SummaryTable:
Class
Frequency
(orcount)
RelativeFrequency
(orproportion)
Percent
[0, 2]
1/86 = 0.012
1.2%
(2, 4]
2/86 = 0.023
2.3%
(4, 6]
12
0.139
13.9%
(6, 8]
56
0.651
65.1%
(8, 10]
14
0.163
16.3%
(10, 12]
0.012
1.2%
watch
endpoints
different
softwarewill
dodifferent
endpoints
W/tablewe
canreadily
drawa
histogram
Graphforquantitativedata=Histogram:
Note:ifwedivide
countby86,
wewouldhave
proportionbut
picturewould
looksame.
Note:eachbarrepresentsaclass,andthebaseofthebarcoverstheclass.
TheabovetableandhistogramshowthedistributionofthisquantitativevariableSLEEPHOURS,
thatis,theoverallpatternofhowoftenthepossiblevaluesoccur.
Remembertolabelaxesandaddsomevalues!
RHistograms(defaultontheleftandcustomizedontheright):
Allimages
Howtointerpret?
LookforOverallPattern
Threesummarycharacteristicsoftheoveralldistributionofthedata
Shape(approximatelysymmetric,skewed,bellshaped,uniform)
Location(center,average)
Approximatelythemiddlevalueorwhereitwouldbalance
Spread(variability)
Range(overallandthenwheremostoftheobservationsare)
LookfordeviationsfromOverallPattern
Outliers=adatapointthatisnotconsistentwiththebulkofthedata.
Outliersshouldnotbediscardedwithoutjustification.
DescribethedistributionforSLEEPHOURS:
Approximatelybellshaped,symmetricdistribution,unimodal,
centeredaround7hours,withmostvaluesbetween4and10hours.
Noapparentoutliers.
Whatifyouhadsomedataandyoumadeahistogramofitanditlookedlikethis
Count
Whatwouldittellyou?
Response
Wewouldcallthisabimodaldistribution.Thereappearstobetwosubgroupsofobservations.It
wouldbebesttoinvestigatewhy(e.g.maybeM/ForOLD/YOUNG).Itmayleadtoanalyzingdata
separatelyforeachgroup.
10
Othercomments
NOSPACEBETWEENBARS!Unlesstherearenoobservationsinthatinterval.
HowManyClasses?Useyourjudgment:generallysomewherebetween6and15
intervals.
Bettertouserelativefrequenciesontheyaxiswhencomparingtwoormoresetsof
observations.
Softwarehasdefaultsandmanyoptionstomodifysettings.
OneMoreExample:
AstudywasconductedinDetroit,Michigantofind
out the number of hours children aged 8 to 12 How many
yearsspentwatchingtelevisiononatypicalday. are in first
class? 3
Alistingofallhouseholdsinacertainhousingarea
having children aged 8 to 12 years was first
constructed. Out of the 100 households in this
listing, a random sample of 20 households was
selectedandallchildrenaged8to12yearsinthe
selectedhouseholdswereinterviewed.
Thefollowinghistogramwasobtainedforallthe
childrenaged8to12yearsinterviewed.
withaslightskewnesstothe___left____.
b. Assumingthatallchildreninterviewedare
representedinthehistogram,whatisthetotalnumberofchildreninterviewed?
3+6+9+10+4=32
c. Whatproportionofchildrenspentlessthan2hourswatchingtelevision?
(3+6)/32=0.281orabout28%
d. Canyoudeterminethemaximumnumberofhoursspentwatchingtelevisionbyoneofthe
interviewedchildren?Ifso,reportit.Ifnot,explainwhynot.
No,itissomewherebetween4and5,
buttheexactvalueisnotknownforsure.
11
NumericalSummariesofQuantitativeVariables
Wehavediscussedsomeinterestingfeaturesofaquantitativedatasetandlearnedhowtolook
fortheminpictures(graphs).Section2.5focusesonnumericalsummariesofthecenterandthe
spreadofthedistribution(appropriateforquantitativedataonly).
Notationforagenericrawsetofdata:
x1,x2,x3,,xnwheren=#itemsinthedatasetorsamplesize
DescribingtheLocationorCenterofaDataSet
Twobasicmeasuresoflocationorcenter:
Meanthenumericalaveragevalue
Werepresentthemeanofasample(calledastatistic)by
x1 x 2 x n
x
n
Medianthemiddlevaluewhendataarrangedfromsmallesttolargest.
nodd:M=middleobs;neven:M=avgoftwomiddleobservations
TryIt!FrenchFries
Weightmeasurementsfor16smallordersofFrenchfries(ingrams).
78
72
69
81
63
67
65
75
79
74
71
83
71
79
80
69
Whatshouldwedowithdatafirst?Graphit!
Basedonourhistogram,thedistributionofweightisunimodalandapproximatelysymmetric,
socomputingnumericalsummariesisreasonable.Theweights(ingrams)rangefromthe60sto
thelower80s,centeredaroundthelower70s.
12
1. Computethemeanweight.
78 72 69 80 69
x
Does73.6makesense?(yeslookathistogram)Would83?(no)
16
73.5 grams
2. Computethemedianweight.
Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83
(n+1)/2=(16+1)=8.5soavg8thand9thobservations=>(72+74)/2=73
Note:areabove73andarebelowit.
3. Whatifthesmallestweightwasincorrectlyenteredas3gramsinsteadof63grams?
Medianwouldstaythesame.Meanwoulddecrease.
Note: Themeanis____sensitivetoextremeobservations.
Themedianis________resistanttoextremeobservations.
Mostgraphicaldisplayswouldhavedetectedsuchanoutlyingvalue.
Somedianbetter
ifoutliersor
stronglyskewed.
SomePictures:MeanversusMedian
13
DescribingSpread:RangeandInterquartileRange
Midtermsarereturnedandtheaverage
wasreportedas76outof100.
Youreceivedascoreof88.
Howshouldyoufeel?Happytojustbeaboveaverage?
Oftenwhatismissingwhentheaverageofsomethingisreported,isacorrespondingmeasure
of spread or variability. Here we discuss various measures of variation, each useful in some
situations,eachwithsomelimitations.
Range:
Measuresthespreadover100%ofthedata.
Range=HighvalueLowvalue=MaximumMinimum
Percentiles: Thepthpercentileisthevaluesuchthatp%oftheobservationsfallatorbelow
thatvalue.
SomeCommonpercentiles:
Median:
50thpercentileQ2or.50
Firstquartile: 25thpercentileQ1or.25(medianofvaluesbelowmedian)
Thirdquartile: 75thpercentileQ3or.75(medianofvaluesabovemedian)
FiveNumberSummary:
VariableNameandUnits
(n=numberofobservations)
Median
Quartiles
Extremes
M
<=IQR=>
<=range=>
Q1
Min
Q3
Max
Provides a quick overview of the data values and information about the center and spread.
Dividesthedatasetintoapproximatequarters.
InterquartileRange: Measuresthespreadoverthemiddle50%ofthedata.
Tryit!FrenchFriesData
Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83
Findthefivenumbersummary:Providesmeasuresoflocationandspread
WeightofFries(ingrams)
(n=16orders)
Median
73
Quartiles
69
79
Extremes
63
83
Range:8363=20grams
IQR: 7969=10grams
14
IQR=Q3Q1
AndconfirmingthesevaluesusingRwehave:
mean
sd IQR 0% 25% 50% 75% 100% n
73.5 6.0663 10 63 69 73 79
83 16
Example:TestScores
The fivenumber summary for the distribution of test scores for a very large math class is
providedbelow:
TestScore(points)
(n=1200students)
Median
Quartiles
Extremes
58
46
34
78
95
1. Whatisthetestscoreintervalcontainingthelowestofthestudents?
34to46points
2. Supposeyouscoreda46onthetest.Whatcanyousayaboutthepercentageofstudentswho
scoredhigherthanyou?
75%
3. Supposeyouscoreda50onthetest.Whatcanyousayaboutthepercentageofstudentswho
scoredhigherthanyou?
Between50%and75%
4. Ifthetop25%ofthestudentsreceivedanAonthetest,whatwastheminimumscoreneeded
togetanAonthetest?
Needascoreof78orhigher
Boxplots
Aboxplotisagraphicalrepresentationofthefivenumber
summary.
Steps:
Labelanaxiswithvaluestocovertheminimumand
maximumofthedata.
MakeaboxwithendsatthequartilesQ1andQ3.
DrawalineintheboxatthemedianM.
Checkforpossibleoutliersusingthe1.5*IQRrule
andifany,plotthemindividually.
Extendlinesfromendofboxtosmallestandlargest
observationsthatarenotpossibleoutliers.
Note:Possibleoutliersareobservationsthataremorethan1.5*IQRoutsidethequartiles,
thatis,observationsthatarebelowQ11.5*IQRorobservationsthatareaboveQ3+1.5*IQR.
15
Tryit!FrenchFriesData
Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83
Thefivenumbersummary:
WeightofFries(ingrams)
(n=16orders)
Median
73
Quartiles
69
79
Extremes
63
83
Fromtheboxplotshown,weseethereareno
pointsplottedseparately,sotherearenooutliers
bythe1.5(IQR)rule.
Verifytherearenooutliersusingthisrule.
IQR=7969=10grams
1.5*IQR=1.5(10)=15grams
Lowerboundary(fence)=Q11.5*IQR=6915=54
Arethereanyobservationsthatfallbelowthislowerboundary?
No,sonolowoutliers.
Upperboundary(fence)=Q3+1.5*IQR=79+15=94
Arethereanyobservationsthatfallabovethisupperboundary?
No,sonohighoutliers.
16
Whatifthelargestweightof83gramswasactually93grams?
Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,95
Thenthefivenumbersummarywouldbe:
WeightofFries(ingrams)
(n=12orders)
Median
Quartiles
Extremes
73
69
63
79
95
TheIQRand1.5*IQRwouldbethesame,so
the boundaries for checking for possible
outliersareagain54and94.
Themodifiedboxplotwhenwehavethis
oneoutlierisshown.
Whyisthelineextendingoutonthetop
sidenowdrawnouttojust81?
81isthelargestvalueinthedataset
thatisnotanoutlier
NotesonBoxplots:
Sidebysideboxplotsaregoodfor...comparing2ormoresetsofobs.
Watchoutpointsplottedindividuallyare...
Stillpartofthedatasetdontignorethem!
Can'tconfirm....
Shapefromaboxplotonly(histogrambetterforshowingshape).
Whenreadingvaluesfromagraphshowwhatyouaredoing
(soappropriatecreditcanbegivenonexam/quiz).
17
TryIt:Sidebysideboxplots
Arandomsampleof100parentsofgradeschoolchildrenwererecentlyinterviewedregarding
thebreakfasthabitsintheirfamily.Onequestionaskedwasiftheirchildrentakethetimetoeat
abreakfast(recordedasbreakfaststatusYesorNo).Thegradesofthechildreninsomecore
classes(e.g.reading,writing,math)werealsorecordedandastandardizedgradescore(ona10
pointscale)wascomputedforeachchild.Sidebysideboxplotsofthechildrensstandardized
gradescoresareprovided.
11
a. Whatis(approx)thelowest
10
gradescoredbyachildwhodoes
havebreakfast?
9
_____4.5______points
Grades
4
b. Completethefollowing
3
sentence:
No
Yes
c. Considerthefollowingstatement:Thesymmetryintheboxplotforthechildrennoteating
breakfastimpliesthatthedistributionforthegradescoresofsuchstudentsisbellshaped.
TrueorFalse?False
FeaturesofBellShapedDistributions
Wehavealreadydescribedthedistributionofourfrenchfriesweightdataasbeingunimodaland
somewhatsymmetric.Ifweweretodrawacurvetosmoothoutthetopsofthebarsofthe
histogram,itwouldresembletheshapeofabell,andthuscouldbecalledbellshaped.
Onefairlycommondistributionofmeasurementswiththisshapehasaspecialname,calleda
normaldistributionornormalcurve.Wewillseenormalcurvesinmoredetailwhenwestudy
randomvariables.Whenadistributionissomewhatbellshaped(unimodal,indicatingafairly
homogeneous set of measurements), a useful measure of spread is called the standard
deviation. In fact, the mean and the standard deviation are two summary measures that
completelyspecifyanormalcurve.
18
DescribingSpreadwithStandardDeviation
Whenthemeanisusedtomeasurecenter,themostcommonmeasureofspreadisthestandard
deviation.Thestandarddeviationisameasureofthespreadoftheobservationsfromthemean.
We will refer to it as a kind of average distance of the observations from the mean. But it
actuallyisthesquarerootoftheaverageofthesquareddeviationsoftheobservationsfromthe
mean.Sincethatisabitcumbersome,weliketothinkofthestandarddeviationasroughly,
theaveragedistancetheobservationsfallfromthemean.Hereisaquicklookattheformula
forthestanddeviationwhenthedataareasamplefromalargerpopulation:
s=samplestandarddeviation=
( x1 x ) 2 ( x 2 x ) 2 ( x n x ) 2
n 1
(x
x)2
n 1
Note:Thesquaredstandarddeviation,denotedbys2,iscalledthevariance.Weemphasizethe
standarddeviationsinceitisintheoriginalunits.
Example:Considerthissampleofn=5scores:94,97,99,103,107.
Thesamplemeanis100points.Letsmeasuretheirspreadbyconsideringhowmucheachscore
deviatesfromthemean.Thenconsidertheaverageofthesedeviations.
x
x x
x
x
90
95
100
105
110
Howthecalculationsaredone:
x
x100 (x100)2 Calculations
1. x =500/5=100(usedincolumns2&3)
94
6
36
2. Variance:s2=104/(51)=26
97
3
9
3. Standarddeviation:s=(26)=5.1
99
1
1
103
3
9
Note#ofdeserveddecimalsusedfors.
107
7
49
500
0
104
Sums(ortotals)ofthecolumns
19
Tryit!FrenchFriesData
Weightmeasurementsfor12smallordersofFrenchfries(ingrams).
78
72
69
81
63
67
65
75
79
74
71
83
71
79
80
69
Themeanwascomputedearliertobe73.5.Findthestandarddeviationforthisdata.
s=
Notmuchfuntodoitbyhand,butnottoobadforasmallnumberofobservations.Ingeneral,
wewillhaveacalculatororcomputerdoitforus.
Interpretation:
Theweightsofsmallordersoffrenchfriesareroughly
OR
Onaverage,theweightsofsmallordersoffrenchfriesvarybyabout6.1grams
fromtheirmeanweightof 73.5grams
Notesaboutthestandarddeviation:
s=0means...nospread(allobsarethesame);elses>0.
Likethemean,sis...sensitivetoextremeobservations
Sousethemeanand
standarddeviationfor____ reasonablysym,bellshapeddistributions_____.
Thefivenumbersummary
isbetterforskeweddistributionsorifoutliers
20
TechnicalNoteaboutdifferencebetweenpopulationandsample:
Datasetsarecommonlytreatedasiftheyrepresentasamplefromalargerpopulation.A
numericalsummarybasedonasampleiscalledastatistic.Thesamplemeanandsample
standarddeviationaretwosuchstatistics.However,ifyouhaveallmeasurementsforan
entirepopulation,thenanumericalsummarywouldbereferredtoasaparameter.
Thesymbolsforthemeanandstandarddeviationforapopulationaredifferent,andthe
formula for the standard deviation is also slightly different. A population mean is
represented by the Greek letter (mu), and a population standard deviation is
represented by the Greek letter (sigma). The formula for the population standard
deviationisbelow.
Youwillseemoreaboutthedistinctionbetweenstatisticsandparametersinthenext
chapterandbeyond.
Populationstandarddeviation:
(x
)2
N
whereNisthesizeofthepopulation.
EmpiricalRule
Forbellshapedcurves,approximately
68%ofthevaluesfallwithin1standarddeviationofthemeanineitherdirection.
95%ofthevaluesfallwithin2standarddeviationsofthemeanineitherdirection.
99.7%ofthevaluesfallwithin3standarddeviationsofthemeanineitherdirection.
21
TryIt!AmountofSleep
Thetypicalamountofsleeppernightforcollegestudentshasabellshapeddistributionwitha
meanof7hoursandastandarddeviationof1.7hours.
About68%ofcollegestudentstypicallysleepbetween___5.3___and__8.7__hourspernight.
Work:71.7=5.3and7+1.7=8.7
Verifythevaluesbelowthatcompletethesentences.
About95%ofcollegestudentstypicallysleepbetween3.6and10.4hourspernight.
About99.7%ofcollegestudentstypicallysleepbetween1.9and12.1hourspernight.
Drawapictureofthedistributionshowingthemeanandintervalsbasedontheempiricalrule.
Supposelastnightyouslept11hours.
Howmanystandarddeviationsfromthemeanareyou?(117)/1.7=2.35
Supposelastnightyousleptonly5hours.
Howmanystandarddeviationsfromthemeanareyou?(57)/1.7=1.18
Thestandarddeviationisausefulyardstickformeasuringhowfaranindividualvaluefallsfrom
themean.Thestandardizedscoreorzscoreisthedistancebetweentheobservedvalueand
themean,measuredintermsofnumberofstandarddeviations.Valuesthatareabovethemean
havepositivezscores,andvaluesthatarebelowthemeanhavenegativezscores.
Standardizedscoreorzscore: z
standard deviation
22
TryIt!ScoresonaFinalExam
Scoresonthefinalinacoursehaveapproximatelyabellshapeddistribution.
Themeanscorewas70pointsandthestandarddeviationwas10points.
SupposeRob,oneofthestudents,hadascorethatwas2standarddeviationsabovethemean.
WhatwasRobsscore?70+2(10)=90points
WhatcanyousayabouttheproportionofstudentswhoscoredhigherthanRob?
About2.5%ofthestudentsscoredhigherthanRobsscoreof90points(basedonthemodel).
Sketchingapicturemayhelp.
SummaryTools
23
AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.
24