You are on page 1of 7

Basic Statistics

Measures of Central Tendency statistics paper which are arranged according to their
roll numbers, the maximum marks allotted to the
Variable paper being 100: 17, 71, 70, 14, 22, 00, 55, 59, 23,
A variable is a quantity which can vary from one 87, 93, 21, 22, 50, 87, 54, 70, 52, 87, 74, 63, 87, 28,
value to another. The common examples of variables 04, 17, 49, 85, 81, 76, 52, 21, 86, 50, 87, 26, 87, 50,
are barometer readings, temperature, rainfall records, 60, 32, 40, 30, 90, 27, 89, 81, 21, 14, 30, 37, 22.
wages, heights etc. Variables are of two kinds. The data given above is in raw form, i.e. we cannot
(i) Continuous variables: The quantities which conclude anything. Such type of data is called
can assume any numerical value within a Ungrouped data. When the data is arranged in
certain range are called continuous variables. ascending or descending order of magnitude, the
For example, if we consider the height H of a data is said to be arranged in an array. Suppose
child at various ages, we observe that as the these data is arranged in the intervals 0-10, 10-20,
child grows from 110 cm to 160 cm (say), his 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-
height takes all possible values within this 100.
range. Thus, the height H of a child at various
ages is an example of a continuous variable. This is done by the method called ‘tally method.’ In
this method, we prepare a frequency table as follows:
(ii) Discrete (or discontinuous) variables: The
quantities which cannot assume all possible
values are called discrete variables. For
example, the number of children x in a family
can take any of the values 0, 1, 2, 3 , ..., but
cannot be 2.1, 2.46, 3.783. Thus, the number
of children x is an example of a discrete
variable.

Constant: A quantity is a constant if it can assume


only one value.

Frequency Distribution
A frequency distribution is defined when the following
two information are specified:

(i) The value which the variable takes, and


(ii) The number of times (i.e. frequencies) a value
is taken by a variable. In the above example, 'marks' is the variable x and
the 'number of students' against the marks is called
Consider the marks obtained by 50 students in a frequency f. Thus, the frequency of the class 50-60
is 8. The value of the variable which is mid-way
between the upper and lower limits is called mid-

.. 1 ..
point (or mid-value) of the class. In the above Measures of Central Tendency
example, the mid-values of the classes are
respectively 5, 15, 25, ..., 95. The following are the five measures of central
tendency.
The cumulative frequency corresponding to a class 1. Arithmetic mean 2. Geometric mean
is the total of all the frequencies upto and including 3. Harmonic mean 4. Median
that class. In the above example, the cumulative 5. Mode
frequency of the class 0-10 is 2 (since there is no
class above it). The cumulative frequencies are Arithmetic Mean (AM)
shown in the last column. If X1, X 2, X 3 ... Xn are the given n observations, then
For the table, the following two forms of the frequency their AM, usually denoted by X , is given by
distribution may also be used.
X 1 + X 2 + X 3 + . . . . . . . .+ X n ∑ X
Cumulative frequency X= =
n n
'more than' Weighted Arithmetic Mean
Marks Number of students In the calculation of simple average, each item of
above 90 2 the series is considered equally important but there
above 80 13 may be cases where all items may not have equal
above 70 18 importance and some of them may be comparatively
above 60 20 more important than others. In such cases, proper
above 50 28 weightage is to be given to various items — the
above 40 30 weights attached to each item being
above 30 34 proportional to the importance of the item in the
above 20 44 distribution. Let
above 10 48 W , W , W , ..., W be the weights attached to
above 0 50 1 2 3 n
variable values X 1 , X 2 , X 3 ,......, X n respectively. Then
Cumulative frequency the weighted arithmetic mean, usually denoted
'less than' n
∑ Wi Xi
Marks Number of students by X is given by X W = i=1
W n
under 20 6
∑ Wi
under 30 16 i=1
under 40 20
Sometimes, to calculate AM, we employ a well-
under 50 22
defined short-cut method which is explained in the
under 60 30
example below. In this method, we assume any value
under 70 32
of the variable as ‘assumed mean’ and then find the
under 80 37
actual mean using X = A + h  
under 90 48 1
under 100 50  N
∑ fi ui  where A:
xi − A
Assumed mean, h: Class size and ui =
h

.. 2 ..
Ex. 1 Two variables X and Y, assume the values X1
 44 
= 2, X2 = –5, X3 = 4, X4 = –8 and Y1 = –3, = 900 + 20  −  = 891.2
Y 2 = –8, Y 3 = 10, Y 4 = 6, respectively.  100 
Calculate Hence, mean wage = Rs. 891.2.
(1) ΣX (2) ΣY (3) ΣXY With the help of this method, calculation of multiplying
(4) ΣX2 (5) ΣY2 two inconvenient numbers is avoided.
(6) (ΣX)(ΣY) (7) ΣXY2
(8) Σ(X + Y)(X – Y) Median
The median is that value of the variable which divides
Sol. (1) ΣX = –7 the group in to two equal parts. One part comprises
(2) ΣY = 5 all the values greater than and the other part
(3) ΣXY = 26 comprises all the values less than the median.
(4) ΣX2 = 109
(5) ΣY2 = 209 Calculation of Median
(6) (ΣX)(ΣY) = –35 (using (1) and (2)) For individual observations
(7) ΣXY2 = –190 Step 1: Arrange the observations x1, x2, ..., xn in
(8) Σ(X + Y)(X – Y) = Σ(X2 – Y2) = 109 – 209 = ascending or descending order of magnitude.
–100, (using (4) and (e)) Step 2: Determine the total number of observations,
say, n
Ex. 2 Find the mean wage from the data given below. Step 3: If n is odd, then median is the value of
Wage (Amount in Rs.) th
800 820 860 900 920  n + 1
  observation. If n is even, then median is
980 1000  2 
Number of workers
th th
7 14 19 25 20 10 n n 
5 the AM of the values of   and  + 1
2 2 
observations.
Sol. Let the assumed mean be A = 900, h = 20
For discrete frequency distribution.
xi fi xi - A ui = ( x1 − A) /h fiui Step 1: Find the cumulative frequencies (c.f.)
800 7 −100 −5 − 35 N n

820 14 − 80 −4 − 56 Step 2: Find


2
, where N = ∑ fi
i =1
860 19 − 40 −2 − 38
Step 3: See the cumulative frequency (c.f.) just
900 25 0 0 0
N
920 20 20 1 20 greater than and determine the corresponding
2
980 10 80 4 40 value of the variable, which is the median.
1000 5 100 5 25
∑fi = 100 ∑fi ui = − 44 For grouped or continuous frequency distribution.
Step 1: Obtain the frequency distribution.
Step 2: Prepare the cumulative frequency column
Here A = 900, h = 20.
N
1 n  and obtain N = Σ fi Find .
∴ Mean = X = A + h  ∑ fi ui  2
N 
 i=1 

.. 3 ..
Step 3: See the cumulative frequency just greater Thus 15-20 is the median class such that
l = 15, f = 15, F = 11, h = 5.
N
than and determine the corresponding class.
2 N
−F
24.5 − 11
This class is known as the median class. ∴ Median = l + 2 × h = 15 + ×5
Step 4: Use the formula: Median f 15
 N/ 2 − F  13.5
=L+  × h , where, L = lower limit of the = 15 + × 5 = 19.5
 f  15
median class, f is frequency of the median class,
h = width (size) of the median class, F = cumulative Mode
frequency of the class preceding the median Mode is the value which occurs most frequently in a
class, N = Σ fi. set of observations.
The mode may or may not exist, and even if it does
Note: The mean deviation from the median for any exist, it may not be unique. A distribution having a
distribution is minimum. unique mode is called ‘unimodal’ and one having
more than one is called ‘multimodal’.
Ex. 3 Calculate the median for the following
distribution: Examples
Class: The set 2, 2, 5, 7, 9, 9, 9, 10, 10, 11, 12, 18
5-10 10-15 15-20 20-25 25-30 has mode 9.
30-35 35-40 40-45 The set 3, 5, 8, 10, 12, 15, 16 has no mode.
Frequency: The set 2, 3, 4, 4, 4, 5, 5, 7, 7, 7, 9 has two
5 6 15 10 5 modes, 4 and 7, and is called bimodal.
4 2 2
Sol. Ex. 4
Find the mean, median, and mode for the sets
(1) 3, 5, 2, 6, 5, 9, 5, 2, 8, 6 and
Class Freqency Cumulative frequency (2) 51.6, 48.7, 50.3, 49.5, 48.9
5 − 10 5 5
10 − 15 6 11 Sol. (1) Arranged in an array, the numbers are 2,
15 − 20 15 26 2, 3, 5, 5, 5, 6, 6, 8 and 9.
Mean = 5.1; Median = Arithmetic mean
20 − 25 10 36
1
25 − 30 5 41 of two middle numbers = (5 + 5) = 5;
2
30 − 35 4 45
Mode = Number most frequently occurring
35 − 40 2 47 = 5.
40 − 45 2 49 (2) Arranged in an array, the numbers are
N = 49 48.7, 48.9, 49.5, 50.3 and 51.6.
Mean = 49.8; Median = middle number
= 49.5; Mode = non existent.
N 49
Here N = 49 ⇒ = = 24.5. The
2 2
N
cumulative frequency just greater thanis
2
26 and the corresponding class is 15-20.

.. 4 ..
Calculation of Mode Relationship Between Mean (M), Median (Md) and
In case of frequency distribution, mode is the value Mode(Mo)
of the variable corresponding to the maximum In case of symmetrical distribution, Mean = Median
frequency. In case of continuous frequency = Mode
distribution, the class corresponding to the maximum In case of a ‘moderately’ asymmetrical distribution,
frequency is called the modal class and the value of Mode = 3 Median – 2 Mean
h( f1 − f0 )
mode is obtained as, Mode = I + , Measures of Dispersion
( f1 − f0 ) − ( f2 − f1 ) The various measures of dispersion are as follows
where 1. Range
l = Lower limit of modal class 2. Mean deviation
h = Magnitude of the modal class, f1= frequency of 3. Standard deviation
the modal class 4. Quartile deviation
f0= Frequency of class preceding the modal class 5. 10 – 90 percentile range
f2= Frequency of class succeeding the modal class
The above formula can be rephrased as Mode Range
It is the difference between two extreme observations
 D1  of a distribution. Let Xmax be the greatest observation
= L +  D + D  c , and Xmin the smallest observation of the variable.
1  1 2 
Then, Range = Xmax–Xmin.
where L1 = Lower class boundary of the modal class
Uses
(i.e. the class containing the mode)
1. Quality control
D1 = Excess of modal frequency over frequency of
2. Shares
next lower class
3. Weather forecast
D2 = Excess of modal frequency over frequency of
next higher class
Ex. 6
c = Size of the model class interval.
Find the range of the sets
(1) 12, 6, 7, 3, 15, 10, 18, 5
Ex. 5
(2) 9, 3, 8, 8, 9, 8, 9, 18
Compute the mode of the following
distribution:
Sol. After arranging the terms of (1), we get 3, 5,
Class intervals:
6, 7, 10, 12, 15, 18. Therefore, range = 15.
0-7 7-14 14-21 21-28 28-35 35-42
After arranging the terms of (2), we get 3, 8,
42-49
8, 8, 9, 9, 9, 18. Therefore, range = 15.
Frequency :
Note that range is not a very good measure
19 25 36 72 51 43
of dispersion because, like in this example,
28
the range is same but the variation is greater
Sol. Mode
in the first series than the second series.
f − f−1 To counter this, semi-interquartile range and
(M0 ) = l + ×i
2f − f−1 − f1 10 – 90 percentile range were designed to
improve on the deficiencies of the range.
72 − 36
= 21 + ×7
144 − 36 − 51

252
= 21 + = 21 + 4.42 = 25.42 .
57

.. 5 ..
Mean Deviation (MD) The range from 65.19 in. to 69.71 inches is
If X1, X2, X3 ... Xn are n given observations, then the
X ± MD = 67.45 ± 2.26. This range includes
mean deviation (MD) about A, is given by
1 1 1
MD = Σ | Xi − A | = Σ | di | , where d = X – A. all the individuals in the third class +
3
(65.5
n n i i
Modulus removes the effects of negative deviations – 65.19) of the students in the second class
and therefore gives us the absolute value of the 1
deviation. + (69.71 – 68.5) of the students in the fourth
3
In case of frequency distribution, mean deviation
about A is given by class (since the class-interval size is 3
inches, the upper class boundary of the
1 1 second class is 65.5 inches, and the lower
Σ fi | Xi − A | = Σ fi | di | = | X − X | .
n n class boundary of the fourth class is 68.5
inches). The number of students in the range
Ex. 7 0.31 1.21
Determine the percentage of the students X ± MD is 42 + (18) + (27) ≈ 55,
3 3
heights in the following table of mean deviation
of the heights of 100 male students, that fall which is 55% of total.
within the ranges (2) The range from 62.93 to 71.97 inches is

(1) X ± MD (2) X ± 2 MD X ± 2 MD = 67.45 ± 3(2.26) = 67.45 ±


6.78. The number of students in the range
(3) X ± 3 MD
X ± 2 MD is
Height Number of  62.93 − 62.5 
(inches) students 18 –   18 + 42 + 27 +
 3 
60 - 62 5
 71.97 − 71.5 
63 - 65 18   8 ≈ 86, which is 86% of
66 - 68 42  3 
69 - 71 27 the total.
72 - 74 8 (3) The number of students in the range
X ± 3 MD is
Total = 100
 60.67 − 59.5 
5–   5 + 18 + 42 + 27 +
Sol.  3 

 74.5 − 74.23 
Class |X – X| = Frequency f|X – X|   8 ≈ 97, which is 97% of
Height (in)  3 
M ark (X) |X – 76.45| (f)
the total.
60 - 62 61 6.45 5 32.25
63 - 65 64 3.45 18 62.1
66 - 68 67 0.45 42 18.9
69 - 71 70 2.55 27 68.85
72 - 74 73 5.55 8 44.4
N = Σf Σf |X – X|
= 100 = 226.50

.. 6 ..
Standard Deviation Variance
If X1, X2, ... XN is the set of N observations, then its It is the square of the standard deviation. Therefore,
standard deviation is given by
1
variance = σ = Σ ( Xi − X)2 .Generally, s 2
2
1 N
σ= Σ ( Xi − X)2 where X is the AM alternative
N represents sample variance and σ2 represents
formula for standard deviations is population variance. Sample variance means
variance of a sample drawn out from a population.
2
Σfi X i  Σf X 
2
σ= − i i = X2 − X2
N  N 
The root mean square deviation about any point a is
N

defined as S =
∑(X
i =1
i − a)2
, where a is the
N
average besides the arithmetic mean. Of all such
deviations, the minimum is that for which a = X .
This is so because the sum of the squares of the
deviation of a set of numbers Xi from any number a
is minimum if and only if a = X . This property
provides an important reason for defining the standard
deviation as above.
Note that root mean square deviation about origin is
also called the quadratic mean and is given by
1
S=
N
∑ x i2 .
Also note that σ is independent of origin but not of
scale. This means that if each of the observations
are increased by a constant k, σ does not change.
But, if multiplied by k, σ changes to kσ. Therefore, if
ax i + b
each observation is transformed to , σ
c

transforms to .
X

.. 7 ..

You might also like