You are on page 1of 10
esson Notes - Chapter 28: Comparing Multiple Means (ANOVA) ‘A researcher wishes to investigate the effectiveness of diferent treatments for ‘removing bacteria fom human hands: washing wih water only, washing with regular ‘Soap, washing with antibacterial soap, and using en antbacteral spray containing (65% ethanol. Each moming, she used one, randomly chosen, cleaning technique on her hands, ‘and then placed her hand on a sterile media pate (used for growing, and counting, bacteria). She replicates this procedure 8 tmes for each ofthe 4 treatments, Media plates wore incubated for 2 days at constant temperature, and the number of bacteria on each plate was recorded AA side-by-side boxplot of the numbers of bacteria recorded for each treatment is shown: I ‘ ji, 4 o 4 ‘Ace the treatments equally effective? There is some variation in the mean number of bacteria for each treatment method. But there is also variation from sample to sample within each group. \We expect some variation between means, but whether or nat this is statistically significant really depends upon the varianees within each ofthe groups. Consider these two sets of data +e 7 @ Al @ : u| 8 & ‘The means forthe 4 groups is each data set are the same (31, 36, 98, and 31) but we ‘would conclude that in the left data these means are al within what would be normally ‘expected variation. In the right data, these same differences between means are instead ‘much larger than the expected variations, For this reason, when comparing multiple means, the analysis is realy an analysis of the variance, and this type of analysis is called Analysis of Variance (ANOVA). We need a statistic and a distribution to use to determine a p-value. The statistic needs to compare the diference in the means ofthe groups tothe variability expected (as determined by the variably within each group, somehow combined to include the variances of each ofthe groups) ‘This static should be larger ifthe diference between the group means is larger and it should also become larger f the variabllty within the groups fs smaller which means the ‘qumeralor of this slaiste should address cifference between aroup means and the ‘denominalor ofthis slasic should address variably within the groups. ‘The numerator of this static should address difference between group means. I there were ony {wo means to compare, we cou just examine the ference. With multiple means, if we just treat teach mean as a dala value then the variably could be measure by fist finding the variance ofthis Small set of mean values. For the hand-washing experiment = 1245.08 ‘But each data value is a mean of @ sample, each of size n=8, so we could determine the corresponding variance othe orginal observations by the sampling distribution standard deviation ‘oemcla (1245.08) = 9960.67 Ini esimate of adressen teronoe between gr ans an cae he treatment moan square, writen” MS, oF MST | “The denominator ofthis statistic should address variabilty within the aroups. For this, we ineed to consider the variances of each graup, and somehow combine these into one overall ‘measure of the total variably in all the groups. We started with subjects random assigned to treatments but all taken from the same population. When we do hypothesis testing, ur null hypothesis will assume that there is no Sitference between the treatments, so ifs reasonable fo assume the variance won't be ‘affected by the treatments, So each group variance is estimating the same common a” ‘To combine, wel simply pool the samples ftom the groups, and because n is the same for all groups, we can find this measure of variability wihin the groups by taking the average ‘ofthe group variances, This estimate of o addresses varibilly within the groups and is called the error mean square, writen: MS, or MSE or 5, 1, » OSL 17220524190 08 1410.10 | MS, ‘The F-statistic ‘The statistic i then the ratia of these two estimates of population variance a” and is called the F-statistc (in honors of Sir Ronald Fisher) F -statistic Hypotheses: The effect we are testing is whethor the diferencs inthe means is significant, 0 the nll hypothesis would be thal ther is ne diffrence at all in the means: He thh= t= ty H, :The group means are not all equal It Hols ue, th alerence in group means shouldbe around the same as the Ciforence within the groupe, go the F-statistlc should be around 1. the diference in ‘roup means is large compared with te ciflerence wih the groups, the F-statistc vl be higher than 1 ‘The F-aletribution In order to use the F-satsticto compute a p-value, we need a sampling distribution model for he F-satistc. This was investigated by Sir Ronald Fisher, and the resulting distribution is diferent from the Normal, t- and Chi ‘square distributions we've already seen. itis called the F-distribution, and like the Chi-square distribution itis posiive and one-stled. It also depends upon two diferent degrees of freedom, one for the numerator and one for the denominator of the F-statstc <1 where k=# of groups =k where N= kn (#of all samples combined) 1 and N-k Numerator, — dy Denominator: dae Usually, these are justlisted in order: df Back to the hand-washing example. penny nase | 0 aml tise | sma Fen oa fs | noe ‘+ Entering the group means into L1, we can get the std deviation with 1-VarStats (35.2857), square to get variance, and multiply by n=8 to get MSr = 9960.64. + Taking the average of the group variances gives us MSe= 1410.10. MS, _ 9960.64 MS, 1410.10 + We need the two degrees-of freedom: Gog =K=1 3 F Pian = NK =32-4= 28 To compute p-value, we use a calculator function: p-value = Fedf (7.06,999,3,28 ‘With low p-value, we reject Hp. There Is sulicient evidence that the group means are rot all equal (the hand washing methods are not equally effective). ~ statistic Technology vs. tables ‘The F-cietribution is particularly dificult to use without technology, because of the two degrees-of freedom. For an F-distibution table, the rows and columns must ‘each specify one of the degrees of freedom, so an entire page must be used for a single significance level (such as 05). Here is an excerpt from a portion of the table forthe 0.05 table: oni) Zee yNere ‘Much better to use a calculator or software wiich handles all the cases automaticaly Before we talk about conditions... \We need to consider two other ideas before we tlk about the colton for conducting F-statstc tats: that the ANOVA analysis can be thought of 2s a form of regression, and S,, a8 a standard deviation ‘The ANOVA model ‘You can think ofeach incvidual data value (observation) as being a distance away from ts group mean, and therefore made Up ofthe group mean plus an error tem called a residual. The Fh observation from the k-th group would be: Ya ‘Then ey, would the “rrr or Yesidua for this observation: &y = Yu — Ps aan ‘The MS, the variance of these errors, andthe MS, comes from the variance ofthe group means. ‘We can think ofthis as being simiar to regression analysis where we have « fited or pradted vale fr each observation which jus Bwermenn forthe run: > ‘So then the underying ‘rue! model assumes that the means of the groups are samples from varying around tre mean for the group, 4, and we could write tis expression for ‘ne underiying ANOVA model Ya = Met bi (vain place of Fy, and 6 in place of ¢ for the population) Residual Standard Deviation We've been using variances because they are easy to combine, but variance is notin the Unis of the problem, standard deviation is. Ifwe want a measure of variably ofthe data, this would be the standard deviation ofthe residuals of the ANOVA model. This is called residual standard deviation and represents the pooled standard deviation: This standard deviation should be representative of the standard deviation of each ofthe groups (because these should be approximately equal). In the handwashing example: 5, = VAIO =37.6 bacteria colonies which seems reasonable as a value ‘representing the standard deviation of al ofthe groups. wees HH Hoo HEH MA sap ae Conditions. 4) Plot the data first (show tho side-by-side boxplts). 2) Independence ! Randomization Condition: + Groups must be independent of each other + Data within each treatment group must be independent to. + Data colected with suitable randomization, 3) Equal variance assumption 1We need a pooled variance for MS, so varianoas of treatment groups mus be approximately ‘equal. Ways to check: “Look at side-by-side boxplois 1 see whether than have roughly the same spread, or look at ‘Side-by-side boxplots of the residuals (moves all the centers togethor). groups have dflerent spreads it makes MS¢ larger, reducing sats, making it ass lkaly we reject the ‘ul, so ANOVA usualy fais on the'safe side’. Because ofthis, we only miss this condition {the spreads are quite different from each other before becoming concerned. “Look lor syslematie changes in spread with change in cantar (widar spread for higher center values, otc.) Often, this can be fixed with re-oxpcession, especially for skowed groups. kat tha residuals plod againt the predicted valuas. Look for larger predicted values loading to largor residuas (Fanning). 4) Normal population assumption / Nearly Normal condition ~ Check this with ahistogram oc NPP of al the residuals logether (should use data from all g10ups, otherwise, mislead by small skews in each group). ‘Chock for outlers within each group. Consider removing outers within groups before proceeding ‘The ANOVA table" ‘ota for an ANOVA arisen presenta inthe form ofan "ANOVA table. The parelr formats structured to factate hand computation of to Fasc (rom fo lt Says when his was done by teams of people, by hand), Nonetheless, many modem soar packages sil produce ANOVA outputs fom nai of Variance Tabla Source Sumf Square” OF MeanSqiare rato Paalue Scope ‘0002 2 sees! = 70508 ODT Gre ‘e404 oe tatata Taal ‘65086 3 ‘The ony pcos ofnormaton we woud ever use fram such abl + The Ffaon (Estate) 7.0686 1 The Mean Square for ror (MSe): 1410.14 1 Pevaui i provided 0.0011 A complete example. ‘Sometimes, we don't have access to the entre data gt, but we have access fo the eummary stasis foreach of multiple distributions and wish to compare the moars. ‘An experiment is conducted to compare three weight loss programs: fow calorie, low fa, ‘and ow carbohydrate. 15 participants are randomly assigned fo tha groups and weight ie ‘recorded (in pounds) at the boginning and end ofan B-wock experiment period. The data ‘cnsidered are the weight oss values for each participant, and the flowing dala were recorded ow tow aloe fal uk ts oto gean orvew ‘Welsto do Vat on eat gate sma tits ofa, foreach ireatmeat : = Ba ae] ; Lt (ow calorie) dows appear to have a substantially higher mean weight loss compared to the ‘ther two treatments, buts this ference of means significant? Hypotheses: Hyih= I= ty H, :The group means are not all equal. Concitions: u Indopendonco / Randomization Condition: a + Groups must be independent ofeach other. + Data witin each teatrent group must be Independent too, 2 * Data coletos with uae randomization We can assume these data were collected appropriately We do not have any reason fo believe that there are any dependencies between samples in the data groups or between the groups. ‘uhhe ame spread, or ook ot sie-by sida boxpiot of (roves alto cota together). groupe have Afrent spreads I makes MSe largo, reducing F-statst, making les ily we reject he Tul 29 ANOVA usualy fae an he wate side’. Because ofthis, wo only miss tis condlvon {1 the spreads ere quite diferent from each other before becoming concerned. temat onder spread forPighe cantar values, 2) Ol, tuscan be lod wih re-exossion, especialy fr skewed groups ‘ook al he readuls pleted again! thn redeind vluoa Look for argorproitad valves teasing le largr rests aann). Normal population assumption / Nearly Normal condition ‘Shack fawn storan or NEP of th esas together (ehoud wee ta ren alt ‘foubs, otherwise, misleed by smal shews in osc group) "Check or oullrs win each gout, Coneerrfravng outers wit groups before procoeang ‘These box plats look reasonably good (they are not drastically different widths and there are no outliers). The difference in widths is not on isue (see note above. Tt does appear skewed right and seems less variable the Li, and L2, so there is some {justification for further analysis, although many would proceed with the test at this peint Untortunatly, nesiating futher not very easy you ora using calcu ‘Anne way to vestigate i i ta make the sample sie larger by combining le data ‘Opether and looing tho residuals. Each daa pln esdua sus hoaferenco between wean ‘roup Bul even pdr an ANOVA testi ate histogram ‘ut you can use tis procedure opt he Helogram or NPP ofthe combined residual: now sts LA 8, and LS fr each group's residuals Lae LWT, [4= 12-12, L6= 13-13 Use ne augment command (under 2nd STAT,OPS) to pal resi os up into now ts and ‘renal go ce fat ha he resid in (2 augment(LA,L5)~> 1 (-> 1 the store command, STO above the on button) re augment(13,18)—> 12 You can ton cpay a histogram or NPP stat plot of (2 “The histogram and Normal Probability Plt both show thatthe residual (from all ‘roups, token asa single deta set) are Necrly Normal and that the variances are equl naigh to proceed withthe ANOVA, ‘A-complato examp! Perform the ANOVA test: With he dataln 1, L2, and L3, on 2 THs STAT Tests TANOVA {nthe command ine, use 2nd-1, the comma key, 2642, ac. and to spect the fists, thon RI Enter to execute: “This output Inccates that he Fst Is 6.4176 wth 2 ‘akis of 0127. The other enties correspond tothe tems in {he ANOVA tabi (in which We are general not irtresod), ith sigicance level of 05, th pave of 0127 is ow, £0 we ‘eject the nl bypothes. Thre signieant evidonce tha the ‘0up means ae nota ual ‘That means that he fot hal ho fw calorie det means 6.6 fs los (comparadta 3 and forthe ser cet} stbstially significant you have only summary statistics. Some prot or situations may provide summary saistics and ack that you compote the nals. For example, hey may provide the memento and danornnle mors ana ak YOd toting heFstatate: UF MS, =19.4667, MS, =3.0333, what isthe F— statistic? F static = MSs. WAST 6479 MS, ~ 3.0333 (Wit: nthe 1-84 ANOVA clea tat vu MS, and MS Secor ana re) (tne problems may sve you an Fai value and ask youto determine the eemresponcing pvaue. As with Nomal end Ch-squared Getbwtons, you wuld us the Feat command forthe F-dstbuton p-value = Fedf (6.4177, 999, 2, 12)=,0127 Tower Spee" nue dom teund bound "ah or Unbalanced designs ‘The exaopee wave seen ar each had restment groupe ofthe same sale sie, This Event wa ty fo tance deg, thes happen nthe es wor hat oen make he data Uraaanced,Stejctedrop of heed, era of te cata woe ound be ken silences ar al oa Word experants a os Mest what weve developed works for unbalanced designs, excel ht conan ‘stuatons pt rae composed. now must become nx bocaur cals be deren fo ‘ath group, and we can na longer ure poled vananen wih a singe average. Ul loatnaogy make these suomects automaticaly, o oven using a Tras, we ae able to use ‘ots eo ere empl as ond conduc the naioes ine sam 2 ‘Comparing Means - Now that we know there means are not the same, what next? Ith resut of an ANOVA testis that the group means are not athe same, which means a diferent enough tobe considered signiteantly ferent? If we fallto reject Hy, ten we slop, but if we reject Ha, Ure more we can say one ofthe groups could legitimately serve as a conto (2.9. wale forthe hand washing ‘experiment then we could co a 2-sample dlerence of means test ween aach other group and tat contre group to see which dflerence mn means fs most significant nthe nand washing experimen, we could aso try comparing the means of al the soaps taken her tothe mean of all the sprays taken together. More complicated combinaions ike these are called contrasts and are beyond the scope on an introductory statistics course. But thors away todo something ko a2-sample ference of means comparison between altho groups and a single group. For example, in te hand-washing experiment, the Fesearchernoled thal the bacteria count was very tow for the alcohol eatment, Dut wanted to know if any ofthe other (more pleasant smeling} treatments were equaly effective as acto ‘She really wanted o compare each treatment against he alshal spray. ‘She could do separate tilorence of means tests on each graup wlalcohol, but each test poses a risk ofa Type | ror (mistakenly rejecting a tue H,- detecting a aiference when, in fact, there was ne sigalcant dference belwoen the means) ‘The problem with this approach i that if we have multiple groups, as we do more and more lest, the risk that we eventually make a Type | or inereases and eventually i bigger than the aha level of each test. we have enough groups it becomes vor Ikly that wel eect one of tho null hypotheses by mistake, but we won't know which one the tue Hy which we mistakenly rejected, ‘There are several defenses against his problem, all ted methods for multiple comparisons. Althese methods fst require that we be able to eect the overall nu hypothesis wit an ANOVA Fest. Once weve rejected the overal Hy, then we can think about ‘or even all pars of groups means wih each otter. instead of doing a act on the fronco, wo could calculate 295% val forthe ilove in moans. (Thies rats to test wi alphas 05). ‘The Margin of or or such a tat would be ME = Terje ne nil hypot ath ogy means ae eq he enc etween hm mu Zo anu en the Eat wy, Own eine oval ha recs, When we sen ble ‘ayrwecllhe arp eeverv eet lgifeantaferonce (30), Wwe grup mse er By tae han hs aout tay re suey ere aoe fr eos naa Fore hand waning exam n=8, 5, = (MS, = Viai014 For S5% conionce wih df =28, 1*=2.048 wo: 11.55 bacteria colonies LSD = 2.048 «37.55. 18.45. acteria colonies a8 “To reject the nu hypotesis hat the two group means are equal, the dference batwoen them must, ‘be greater than tho LSD, so any to wasting methods whose moans differ by more than 3645, colonia cul be said tobe staltealy significant ferent tsps. 05 by this mad, Bonferroni Multiple Comparisons This stl a way to examine individual pis, fwe want to examine many pais simultaneously, ‘hore are several methods that adjust the erica!” valuo so tal the resulting confidence intrvais ‘provide appropriates for al the pats bul Keeping over iype | ror rate at oF below alpha. ‘One such method is called the Bonferroni method. This method adjusts the LSD to alow or ‘making many comparisons. The results a wider margin of eror called the minimum significant ‘lflerence (MSD), found by replacing "witha sighlyfarger number, “This makes the conidance Intervals wiser foreach contrast and the corresponding Type error rates lower or each test, and koops the overal type | ror rate ator below alpha. The Bonferroni metros, more speiieaiy, stipes the err rate equally among .J confidence intervals, nding ‘ach at confidence level 14, Instead of egal =a To signal this adjustment, we label he crcal value instead of. For the hand-washing example: ta make the 3 confidence intervals comparing the alcohol ‘spray with the other 3 washing methods, and preserve our oval alpha risk at 8%, weld conetruct each with a conidance level of 0s 3 833 J For @ confidence level of 98.33% withthe danomingtor degrees of freedom (28) we can use Ihe iv function: insT'(.99165,28) = 2.5456 ‘This i somewhat larger than the 2.048 value we would use fore lngla comparison, and ves the following adjusted LSO value: ; 4 = 49.79 bacteria colonies Wie then compute the difference between each mean and the control (alcohol in this case) and it a diference in means is above the adjusted LSD, then that can be considered statistical significant ISD. Difference in means (compared to alcohol) i Nein’ | Side | arance = a | eo — * fis | ae 55 5 téce | moa 085 staal sa | aac 795 Allof these are above the 47.79 LSD, so all are signficanty diferent than alcohol. it seems that alcohol spray is ina class by itself, and Some statistical sofware packages assume ‘groups and note which groups are equivalent no other washing method is equivalent. you want to compare al groups against al ther thin statistical signcance) ros ‘Aesbol pay ws [A ‘Anica oop 25 8 Sop 1060 8 Wate 3170 8 This output shows tha the alcohol is in one gr different from one another. roup and all three others are not statistically

You might also like