You are on page 1of 17

Standard Deviation and Standard Error

Prof. John Dover

What are they for?


1. Imagine you have a pond with fish in it. 2. Imagine you stick your hand in the pond and pull out a fish (OK, you can use a net). 3. Consider this fish it might be big, small or medium sized does it represent the sizes of the rest of the fish in the pond? 4. Unless it was the only fish there (and you put it back) it will not. 5. The obvious thing to do, is pull some more fish out, measure them and take the average of their length, weight, or whatever!

Fish sizes (cm)

So, you have a load of fish.


And you have calculated the average (which we generally call by another name: the Mean) Mean of 15 fish lengths on right = 14.453 But is the Mean a good representation of all the fish in the pond [by-the-way, the mean is often shown in texts as a small x with a line over the top of it i.e. x ] ? How variable is our data? Well, if we took the mean value away from each data item that made it up perhaps we could then add up all the differences, and work out an average difference? Maybe that would help us decide if our mean value was representative? The assumption would be that a small average difference would be good and a large one bad. Can we do that? Lets see: Oh, and another word for something that is different is a deviation

21.4 12.8 5.0 5.8 15.0 17.2 19.4 16.6 12.4 13.8 22.0 10.4 14.6 12.6 17.8

1. So, take the mean away from each data item


Fish sizes (cm) 21.4 12.8 5.0 5.8 15.0 17.2 19.4 16.6 12.4 13.8 22.0 10.4 14.6 12.6 17.8 Fish sizes - Mean 6.95 -1.65 -9.45 -8.65 0.55 2.75 4.95 2.15 -2.05 -0.65 7.55 -4.05 0.15 -1.85 3.35

2. Then add up the deviations


Fish sizes (cm) 21.4 12.8 5.0 5.8 15.0 17.2 19.4 16.6 12.4 13.8 22.0 10.4 14.6 12.6 17.8 Fish sizes Mean 6.95 -1.65 -9.45 -8.65 0.55 2.75 4.95 2.15 -2.05 -0.65 7.55 -4.05 0.15 -1.85 3.35

Not much use!!!!!

Sum =

0.00

So, how do we get over the problem of our deviations adding up to zero? Actually, very easily!....................
Fish sizes (cm) 21.4 12.8 5.0 5.8 15.0 17.2 19.4 16.6 12.4 13.8 22.0 10.4 14.6 12.6 17.8 Fish sizes - Mean 6.95 -1.65 -9.45 -8.65 0.55 2.75 4.95 2.15 -2.05 -0.65 7.55 -4.05 0.15 -1.85 3.35 (Fish sizes - mean) 48.3 2.7 89.4 74.9 0.3 7.5 24.5 4.6 4.2 0.4 57.0 16.4 0.0 3.4 11.2
2

It is an old trick! If you square a number, the result must be a positive!!!!

So, now we have something more useful than a zero!!

Sum =

0.00

344.8

Now, as I recall, we wanted to get an average deviation..


So if we divide 344.8 by 15 (the number of data items) we get: 22.99. But what statisticians have found is that if instead of dividing by the sample size (15) we divide by the sample size -1, you get an answer which is a more accurate reflection of the variability of the population as a whole. This is the (n-1) bit you sometimes see in equations, where n=the sample size. So 344.8/14 = 24.63.

BUT, we had to square the deviations to get rid of the zero problem so it is difficult to really use this number. So what is the solution?

Easy! If we squared it before, we can take the square root now!!! 24.63 = 4.96

So, what have we done so far?


1. Well without realising it, you now know how to calculate the variance that was the value = 24.63!!! The symbol for this is usually s2 2. And, by taking the square root of the variance, you have calculated the fabled Standard Deviation i.e. in this case 4.96!!! The symbol of which is usually s. Obvious really: it is = s2
So the Standard Deviation (SD) is really a kind of average deviation of the mean it gives you a feel for the variability of your data. If the SD is very large compared with your mean, then the mean would be a poor representative of the lengths of the fish in the pool. If the SD was very small compared to the mean, then the mean would be a very good representative of the fish in the pool.

Presentation of data

Standard Deviations are usually given as Mean+SD. In the example this would be: 14.5+4.96. If you recall from earlier, there were positive and negative deviations, which is why we have both a positive (+) and negative (-) sign after the mean (+). One thing to note is that I have tidied-up the data making the mean accurate to one decimal place and the SD to two decimal places. This is the usual convention, whereby the SD is given to one more decimal place than the mean.

What next?
There is another common statistic used to express data variability: the standard error. The standard error is a statistic that measures the precision of the sample mean, in other words how close the mean of your sample is to the mean of the whole population (for example the mean of all the fish in our pool example). Again, the smaller the SE relative to the mean the better! If you have the Variance, the Standard Error is easy to calculate. You simply divide the variance (24.63 in our example) by the sample size (15) then take the square root, i.e.: 24.63/15= 1.64; 1.64 = 1.28 As with the Standard Deviation, the Standard Error is usually given as the Mean+SE so, in this case, it would be 14.5+1.28

The good news


Although Ive shown you the breakdown of how SD and SE are calculated, it would be most unusual to do it by hand these days. You can get Excel, Minitab or other stats packages to calculate them for you very easily: e.g. with Excel: Tools menu

Add-ins Data

You may have to install the Analysis Toolpak


1. Tick box by clicking on it 2. Click OK

3. Then the Tools menu displays: Data Analysis click on it

Next
1. Highlight Descriptive Statistics and click OK

3. Click on and highlight your data including the title. [You can do more than one column of data, but in this example Im just doing one] 2. A new box appears, click in the input range box and

Next
The data location appears in the Input range box 1. Now Click this

2. Then click on this, followed by clicking in the white output range box, then somewhere on your spreadsheet you would like the summary statistics to appear

3. Then click this, followed by OK

You then get your statistics

A whole load of other statistics appear by default as well as the info. you want with some programmes, such as Minitab, you can specify which appear Count means the number of data items: in other words the sample size

Which to use? SD or SE
Most people think that SD is easier to understand, but the reality is many scientists simply have favourite statistics that they are comfortable with. Some like SD, some like SE. One author of a book (I will not name him) thinks that some people like using SE over SD because SEs are always smaller!!!
Which do I prefer? Im not telling..

A cautionary note
Dont make the mistake of comparing the absolute value of the SDs or SEs of means and assume that the smaller values mean that the means are better descriptors of your data e.g. Two different means (say numbers of two beetle species in 20 pitfall traps) with their associated statistics (in this case SEs) Beetle 1 =10+1.3 compared with Beetle 2 =100+9.8 which mean is the better descriptor? You might think that Beetle 1 with an SE of 1.3 has a more accurate mean, because 9.8 for beetle 2 is much bigger? You would be wrong. 1.3 is much bigger compared with a mean of 10 than 9.8 compared with a mean of 100 How could you reduce your SE values if they were very large? A simple answer, but it means more work: collect more data- your sample size was too small.

If you want the formulas:


n = sample size Variance=

x = a data item
= sum (add
up) everything to the right of this sign (but do things in brackets and square first) s2 = variance s = standard deviation Standard Error Standard Deviation

s2 =

( x x)2 (n 1)

= the mean

SD = s = s 2

Note: all these terms refer to your sample, not to the whole population (which you have taken the sample from). So strictly we should describe them as: the sample variance, the sample standard deviation and the sample standard error

SE =

s n

You might also like