You are on page 1of 56

1

Graphical Representation of Statistical Data


GRAPHICAL REPRESENTATION: Tabulation is a good method of condensing and representing data in a readily understandable form, but many people have no taste for figures. They would prefer a way of representation where figures could be avoided. This purpose is achieved by the presentation of statistical data in a visual form. The visual display of statistical data in the form of points, lines, areas and other geometrical forms and symbols, is the most general terms known as Graphical Representation. Statistical data can be studied with this method without going through figures, presented in the form of tables. Such visual representation can be described in the sections that follow. The basic difference between a graph and a diagram is that a graph is a representation of data by a continuous curve, usually shown on a graph paper while a diagram is any other one, two or three-dimensional form of visual representation

1) Simple Bar DiagramsSimple Bar Chart: Simple bar diagrams are made to represent geographical, historical, numerical and the qualitative data. The vertical or horizontal bars are made to represent the data when the difference between different quantities is not very large. The different quantities may be arranged in ascending or descending order but the time series data (A time series consists of numerical data collected, observed or recorded at more or less regular intervals of time each hour, day, month, quarter or year.) are not arranged. Suggestions for Constructing Bar Chart: The bars should be constructed horizontally when the categorized observations are the outcomes of a categorical variable. The bars should be constructed vertically when the categorized observations are outcomes of a numerical variable. All bars should have the same width so as not to mislead the reader. Only the length should differ. Spaces between bars should range from one-half the width of a bar to the width of a bar Scales and guidelines are useful aids in reading a chart and should be included. The zero point or origin should be indicated. The axes of the chart should be indicated. Any keys to interpreting the chart may be included within the body of the chart or below the body of the chart. Footnotes or source notes, when appropriate, are presented after the title of the chart or at the bottom edge of the charts frame.

1. 2. 3. 4. 5. 6. 7.

a) Categorical Data:1) Example: - Squash World Open Champions from 1976-2001.

Country
Pakistan Australia United Kingdom Canada Newzeland
Squash World Open Champions
Newzeland

No. of times
14 6 2 2 1

Countries

1 2 2 6 14 0 5 10 15 No of times

canada United Kingdom Australia Pakistan

No.of times

2) Example: - Worlds Largest Deserts by Area in Km2.Source: Microsoft Encarta Deserts


Sahara Gobi Patagonian Rub al Khali Great Sandy Great Victoria

Sq Km
9100000 1300000 670000 650000 390500 390500

3
A rea

Great V ictoria Great Sandy Deserts Rub al Khali Pantagonian Gobi Sahara 0

390500 390500 650000 A rea 670000 1300000 9100000 2000000 4000000 6000000 8000000 10000000 A rears

b) Numerical Data:1) Example: - The Following data indicates Consumption of Raw Material by Industry in Pakistan during 1999-2004. Source:-APTMA period Consumption ('000' Kgs)
1999-00 2000-01 2001-02 2002-03 2003-04
CONSUMPTION OF RAW MATERIAL Conumtion(000 Kgs) 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 199900 200001 200102 Period 200203 200304 Consuption ('000' Kgs)

1,566,348 1,673,280 1,755,669 1,943,197 1,938,678

2) Example: - The following data shows Pakistans share of World Trade in Cotton Yarn. Data shown in percent of world trade. Source: - World Textile Demand
3

Year
1999 2000 2001 2002 2003
Sa h re 2 8 2 7 2 6 2 5 2 4 2 3 2 2 19 99 20 00 20 01 20 02 20 03 2 .8 3 Sa h re 2 .1 6 2 .3 7 2 7 2 .9 6

Share (%)
26.1 27.3 26.9 27 23.8

1. The following table gives the birth rate per thousand of different countries over a certain period of time. COUNTRY INDIA GERMANY UK CHINA DENMARK SWEDEN DIAGRAM:
50 40 30 20 10 0
INDIA GERMANY UK CHINA DENMARK SWEDEN

POPULATION RATE IN THOUSANDS 33 15 20 40 30 15

40 33 20 15 15 30

2. The following table shows the price list of various brands of cars with 2 Liter engine.

5 CAR TOYOTA NISSAN HONDA MERCEDES BMW MITSUBISHI VALUE IN RUPEES 1554000 1734000 2500000 5800000 3300000 2015000

DIAGRAM:
VALUE IN RUPEES 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 5800000 2500000 3300000 2015000

1554000

1734000

A N

E S

N D

E D

Y O

N IS S

B M

H O

E R C

TO

CARS

Data Array: The arrangement of raw data by observations in either ascending or descending order.

2) Multiple Bar Diagrams:A multiple bar chart shows two or more characteristic corresponding to the values of a common variable in the form of grouped bars, whose lengths are proportional to the values of the characteristics, and each of which is shaded or coloured differently to aid identification. This is a good device for the comparison of two or more kinds of information. For example, imports, exports and productions of a country can be compared from year to year by grouping the three bars together.

1) Example: - The Following data shows production of yarn by Punjab and Sindh from 2000-04. Source: - APTMA Production (000 Kgs) Period Punjab Sindh
2000-01 2001-02 2002-03 2003-04 1,239,358 1,283,964 1,353,027 1,411,817 5 365,029 391,492 417,513 428,411

IT S

U B

IS H

TA

1 ,5 0 0 ,0 0 0 1 ,0 0 0 ,0 0 0 5 0 0 ,0 0 0 0 2000-01 2001-02 2 00 2-0 3 200 3-0 4 Pu n ja b S in d h

2) Example: - The Following data shows Marks obtained position holders boys and girls in Federal Board HSSC-2 examination 2005. Source: - www.fbise.edu.pk Discipline Boys Girls
Pre-Engineering Pre-Medical Humanities Commerce
Top position in Federal Board
1050 Marks obtained 1000 950 900 850 800 750
PreEngeneering Pre-Medical Humanities Commerce 978 1003 974 979 896 909 844 854

978 974 896 844

1003 979 909 854

Boys Girls

Disipline

1. The table below gives data relating to the exports and imports of a certain country X (in thousands of dollars) during the four years ending in 1930 - 31. YEAR 2000-01 2001-02 2002-03 2003-04 2004-05 DIAGRAM: 6 EXPORT 315 423 123 278 350 IMPORT 249 234 300 250 190

7
500 $ IN THOUSANDS 400 300 200 100 0 2000-01 2001-02 2002-03 YEARS 2003-04 2004-05 315 249 234 123

423 300 350 278 250 190 EXPORT IMPORT

2. The table given below shows the values of shares of P.T.C.L.A, OGDCL, & PPL in KSE on trading days of a week. SHARES P.T.C.L.A OGDCL PPL DIAGRAM:
250 VALUE IN RUPEES 200 150 100 50 0 MON TUES WED TRADING DAYS THURS FRI 115.45 62.58 118.9 63.15 208.3 210.95 125.5 68.85

MON 62.58 115.45 208.30

TUES 63.15 118.90 204.15

WED 65.69 119.95 202.36

THURS 66.70 123.30 205.70

FRI 68.85 125.50 210.95

204.15

202.36 119.95 65.69

205.7 123.3 66.7

P.T.C.L.A OGDCL PPL

3) Component Bar Chart:A component bar chart is an effective technique in which each bar is divided into two or more sections, proportional in size to the component parts of a total being displayed by each bar. The various component parts shown as sections of the bar are shaded or coloured differently to increase the overall effectiveness of the diagram. The component bar charts are used to present the cumulation of the various components of data and the percentages. They are also known as sub-divided bars. 7

1) Example: - The following data shows rate of Birth, death and migrants per 1000 in different regions of world. Source:-Microsoft Encarta Rate Per 1000 Region Births Deaths Migrants
World Africa Asia Europe
10 0% 8% 0 6% 0 4% 0 2% 0 0 % Wr old A ic fr a Rg n e io s A ia s Er p uo e Mrn iga ts da s e th B th ir s

21.2 35.9 20.7 10.2

8.9 14.2 7.6 11.4

1.5 1.7 2 1.1

W r P p laio r t s o ld o u t n ae

2) Example:-The Following data show World war two casualties of some countries. Source:-Microsoft Encarta Casualties Military Military Prisoners of Country killed wounded War
India Newzeland Australia Canada
10 0% 8% 0 6% 0 4% 0 2% 0 0 %

24,250 12,250 23,250 37,500


W r W r 2C s a s o ld a a u ltie

64,250 19,250 39,750 53,250


Pis n rso w r r oe f a M ryw u d d ilita o n e M ryk d ilita ille

79,500 8,500 26,250 9,750

e la n d

lia

In d ia

u s tr a

w z

N e

C u tr s o n ie

1. The following table shows the number of students in various technologies in certain years. YEAR SPINNING WEAVING 8 PROCESSING GARMENTS

C a

n a d a

9 2001 2002 2003 2004 DIAGRAM:


NO. OF STUDENTS 350 300 250 200 150 100 50 0 188 55 48 45 40 2001
135 232 171

40 30 35 50

45 32 40 55

48 35 46 62

55 38 50 65

65 62 55 50 2004

38 35 32 30 2002 YEARS

50 46 40 35 2003

GARMENTS PROCESSING WEAVING SPINNING

2. The following table shows the salary packages of university staff. SALARY TEACHER ASSIT. PROFESSOR BASIC PAY 25 30 MEDICAL 5 5 TRANSPORT 4 4 HOUSE 8 12 RENT TOTAL 42 51 DIAGRAM:
1 00 80

PROFESSOR DEAN VICE CHANCELLOR 40 50 60 5 5 5 4 4 4 14 16 20 63 75 89

89 75 63 42
4 8 5 25 T EA C HER 4 20 5 4 16 5

TA D A

SALARY IN THOSANDS

60 40 20 0

51
4 12 5 30 AS SIS T A NT P R O F ES SO R

4 14 5 40

HO M E R E NT
60

50

M E D IC A L
D EA N VIC E C HANC E L L O R

P R O F E SS O R

S T AFF

B A S IC S A LA R Y

10

4) Pie Chart:A pie-diagram, also known as sector diagram, is a graphic device consisting of a circle divided into sectors or pie-shaped pieces whose areas are proportional to the various parts into which the whole quantity is divided. The sectors are shaded or coloured differently to show the relationship of parts to the whole. Procedure for construction of pie chart: Draw a circle of any convenient radius. As a circle consisting of 360o, the whole quantity to be displayed is equated to 360. the proportion that each component part or category bears to the whole quantity will be the corresponding proportion of 360o. These corresponding proportions, i.e. angles, are calculated by component part Angle = 3600 wholequantity Then divided the circle into different sectors by constructing angles at the centre by means of a protector and draw the corresponding radii.

1) Example: - The Data Shows Soccer world cup won by different teams from 1930-1998. Source:-www.google.com No. of times Out of Country won Percentages 3600
Uruguay Italy West Germany Brazil England Argentina France
N. ft e wn o ims o o 6 % 1% 3 6 % 1% 3

2 3 3 4 1 2 1
U ga r uy u It l a y Ws et Gr a y emn Bz r il a Eg n nl d a

13 19 19 24 6 13 6

46.8 68.4 68.4 86.4 21.6 46.8 21.6

1% 9

2% 4

A e tn r n a g i
1% 9

Fac r ne

2) Example:-The give data show Exports of USA in Billion dollars from 1975-2000. Source:-Microsoft Encarta Exports(Billion Out of Year Dollars) Percentage (%) 3600
1975 1980 1985 1990 132.6 271.8 288.8 537.2 10 4 9 9 17 14.4 32.4 32.4 61.2

11 1995 2000 794.2 1065.7 26 35 93.6 126

E p rsB nD llas x ot ( illio o r ) 17 95 4 % 18 90 9 % 18 95 9 % 19 90 1% 7 19 95 2% 6

20 00 3% 5

1. The following table shows the yearly expenditure of a Mr. Ted, a college undergraduate in various categories. CATeGORIES TUITION FEES BOOKS AND LAB CLOTHS CLEANING ROOM AND BIARDING TRANSPORTATION INSURANCE SINDRY EXPENCES TOTAL DIAGRAM: EXPENDITURE 6000 2000 2000 12000 3000 1000 4000 30000 %age 20 6.66 6.66 40 10 3.33 13.33 Out Of 360 72 24 24 144 36 12 48

11

12

TUITION FEES

4000, 13% 1000, 3% 3000, 10%

6000, 20%

BOOKS AND LAB CLOTHS/CLEANING

2000, 7%

ROOM AND BOARDING TRANSPORTATION

2000, 7%

INSURANCE SINDRY EXPENCES

12000, 40%

2. The pie chart below shows the fractions of dogs in a dog competition in seven different groups of dog breeds. Suppose 1000 dogs entered the competition in all. GROUPS SPORTING GROUP WORKING GROUP HOUND GROUP TERRIER GROUP TOY GROUP NON-SPORTING GROUP HERDING GROUP TOTAL DIAGRAM: NO. OF DOGS 240 210 160 160 120 50 60 1000 %age 24 21 16 16 12 5 6 out of 360 86.4 75.6 57.6 57.6 43.2 18 21.6

12

13

60, 6% 50, 5% 120, 12% 240, 24%


SPORTING GROUP WORKING GROUP HOUND GROUP TERRIER GROUP TOY GROUP NON-SPORTING GROUP HERDING GROUP

160, 16% 210, 21% 160, 16%

5) Frequency Polygons:1) Example: -The Following data shows marks obtained by students of a class in a quiz. Marks No. of Students
10 11 12 13 14 15 16 17 18 19 20 5 7 11 15 24 16 11 6 3 2 10

13

14

Result of quiz
30 25 24 15 11 5 10 11 7 16 11 6 3 12 13 14 15 16 17 18 19 2 20 10

Frequency

20 15 10 5 0

No.of Students

Marks

2) Example: -The following data displays monthly wages of person in Dollars. Wages No. of persons
50 55 60 70 75
W eekly wages 20 15 10 5 0 50 55 60 W age 70 75 12 7 3 8 No.of persons 19

12 3 7 8 19

1. The following table shows the Frequency distribution for quantity of Glucose in 100 people. QUANTITY OF GLUCOSE 64 68 72 76 80 84 88 92 FREQUENCY 0 1 0 0 2 8 5 14 14

15 96 100 104 108 112 116 120 124 128 132 136 140 144 DIAGRAM:
20 FREQUENCY 15 10 5 0 0 20 40 60 80 100 120 140 160 VALUE OF GLUCOSE

18 11 18 6 8 5 3 1 0 0 0 0 0

2. The following chart shows the number of lunatics and their frequency. No. of lunatics 0 25 100 200 300 400 500 550 DIAGRAM: Frequency 0 2 6 2 0 3 1 0

15

16

7 6 5 4 3 2 1 0 0 100 200 300 NO. OF LUNATICS 400 500 600

6) Cumulative Frequency Polygons:1) Example: - The data below is a frequency table Cumulative Midpoints(x) Frequency(f) Frequency(C.F)
25 35 45 55 65 75 18 25 44 88 91 97
Culmulative Frequecy plygon 400 300 200 100 0 25 363 266 175 18 35 43 45 87 55 65 75

FREQUENCY

18 43 87 175 266 363

2) Example: -The data shows wickets taken by a cricket player in his debut. Wickets Cumulative taken(x) No. of times(f) Frequency(c.f)
1 2 3 4 5 6 22 19 20 12 9 3 22 41 61 73 82 85

16

17
C ulm ulativ frequency e 100 Frequency 80 60 40 20 0 1 2 3 W ickets 4 5 6 22 41 61 73 82 85

1. The following table shows the frequency distribution for quantity of Glucose in 100 people. QUANTITY OF GLUCOSE 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 124 128 132 136 140 144 DIAGRAM: FREQUENCY 0 1 0 0 2 8 5 14 18 11 18 6 8 5 3 1 0 0 0 0 0 CUMMULATIVE FREQUENCY 0 1 1 1 3 11 16 30 48 59 77 83 91 96 99 100 100 100 100 100 100

17

18

150
CUMMULATIVE FREQUENCY

100 50 0 0 50 100 150 200


QUANT ITY OF GLUCOSE

2. The following chart shows the number of lunatics and their frequency No.of lunatics 0 25 100 200 300 400 500 550 DIAGRAM:
15 COMMULATIVE FREQUENCY 10 5 0 0 100 200 300 400 500 600 NUMBER OF LUNATICS

Frequency 0 2 6 2 0 3 1 0

CUMMULATIVE FREQUNCY 0 2 8 10 10 13 14 14

18

19

7) Pareto Diagrams:Definition: A bar graph used to arrange information in such a way that priorities for process improvement can be established. A Pareto diagram is used to determine what characteristic is the major contributor in a process. The diagram is constructed by ranking the data in frequency of occurrence and plotting the bars in descending order. Purposes: To display the relative importance of data. To direct efforts to the biggest improvement opportunity by highlighting the vital few in contrast to the useful many. Pareto diagrams are named after Vilfredo Pareto, an Italian sociologist and economist, who invented this method of information presentation toward the end of the 19th century. The chart is similar to the histogram or bar chart, except that the bars are arranged in decreasing order from left to right along the abscissa. The fundamental idea behind the use of Pareto diagrams for quality improvement is that the first few (as presented on the diagram) contributing causes to a problem usually account for the majority of the result. Thus, targeting these "major causes" for elimination results in the most cost-effective improvement scheme. How to Construct: 1. Determine the categories and the units for comparison of the data, such as frequency, cost, or time. 2. Total the raw data in each category, then determine the grand total by adding the totals of each category. 3. Re-order the categories from largest to smallest. 4. Determine the cumulative percent of each category (i.e., the sum of each category plus all categories that precede it in the rank order, divided by the grand total and multiplied by 100). 5. Draw and label the left-hand vertical axis with the unit of comparison, such as frequency, cost or time. 6. Draw and label the horizontal axis with the categories. List from left to right in rank order. 7. Draw and label the right-hand vertical axis from 0 to 100 percent. The 100 percent should line up with the grand total on the left-hand vertical axis. 8. Beginning with the largest category, draw in bars for each category representing the total for that category. 9. Draw a line graph beginning at the right-hand corner of the first bar to represent the cumulative percent for each category as measured on the right-hand axis. 19

20 10. Analyze the chart. Usually the top 20% of the categories will comprise roughly 80% of the cumulative total. Tips: Create before and after comparisons of Pareto charts to show impact of improvement efforts. Construct Pareto charts using different measurement scales, frequency, cost or time. Pareto charts are useful displays of data for presentations. Use objective data to perform Pareto analysis rather than team members opinions. If there is no clear distinction between the categories -- if all bars are roughly the same height or half of the categories are required to account for 60 percent of the effect -- consider organizing the data in a different manner and repeating Pareto analysis. Pareto analysis is most effective when the problem at hand is defined in terms of shrinking the PV to a customer target. For example, reducing defects or elimination the non-value added time in a process.

1) Example: - The Following data shows Price of Polyester Staple Fiber at Karachi in Rs/Kgs. Source:-APTMA Cumulative Months Price Percentage (%) Percentage
May April March January February June
120 100 80 60 40 20 0 M ay April March January Feburary June 17.8 17.78 17.05 16.5 16.5 14.37

111.72 111.55 106.95 103.5 103.5


90.15
Polyester rate

17.8 17.78 17.05 16.5 16.5


14.37

17.8 35.58 52.63 69.13 85.63


100

2) Example: - The following data shows areas of continents in million sq. miles. Area(million Percentage cumulative Continent sq miles) (%) Percentage
20

21 Asia Africa North America South America Europe


Ae o c nin ns r a f ot e t
10 2 Area ( % ) 10 0 8 0 6 0 4 0 2 0 0 A ia s Aia frc Nr h ot A ei a mrc C nin ns ot e t S uh ot A ei a mrc Er p uo e

17.1 11.7 9.4 6.9 3.9

35 24 19 14 8

35 59 78 92 100

1. The table given below shows the problems with the computer. PROBLEMS Setup Difficulty Not Easy to Use Unspecified Not Fast Enough Too Slow Incompatible Internet Inoperative Too Heavy Too loud Too Small Power Inop Bad Color Screen Small Dim Screen Cord too short Slow Internet Too Fast Floppy Slow Too Big Case smells No Printer No Books Pwr Btn Stiff Won't Start Won't Print Didn't talk FREQUENCY 80 45 25 22 15 10 10 9 6 6 5 4 3 3 3 3 3 3 2 2 2 2 1 1 1 1 21 FRACTIONS 0.296296 0.166667 0.092593 0.081481 0.055556 0.037037 0.037037 0.033333 0.022222 0.022222 0.018519 0.014815 0.011111 0.011111 0.011111 0.011111 0.011111 0.011111 0.007407 0.007407 0.007407 0.007407 0.003704 0.003704 0.003704 0.003704

22 No Movies No Help No Manuals 1 1 1 0.003704 0.003704 0.003704

DIAGRAM:
P are to C h art
250 200 150

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

# Observations

100 50 0

Ea sy

Fa st

I nc om

Un s

Se tup

G ro u p

2. The following table shows the defect in production in a factory. TYPE HIGH TURN ON SPEED HIGH RIPPLE CURRENT HIGH LEAKAGE LOW OUTPUT AT LOW SPEED LOW OUTPUT AT HIGH SPEED 22 SUBTOTAL 18 38 12 15 7 % OF TOTAL 14.754 31.147 9.836 12.295 5.737

Int er ne t

No t

No t

Ino

pe rat ive To oH ea vy To ol ou d

To oS low

pe cif ied

Di ffic ult y

to Us e

En ou g

pa tib le

Ot he r

23 DEAD UNIT BAD REGULATOR BAD VOLTAGE SETPOINT DIAGRAM: 4 22 6 3.278 18.032 4.918

8) Pictographs:1) Example: - The following data shows Bikes assembled by a company on different week days from Monday to Saturday. Scale: - 1 Bike picture represents 5 bikes Day
Monday Tuesday Wednesday Thursday Friday Saturday 23

Production
12 23 17 27 10 22

24

Production of bikes
Saturday Friday Week days Thursday Wednesday Tuesday Monday 0 5 10 12 15 Proction 20 25 30 17 23 10 27 22

2) Example: - The following data shows Gold medals won by different nations in Athens Olympics 2004. Source: -www.olympic.org Nation
USA China Russia Australia Japan
Gold medals w on in Athens Olympics
1 gold = 4 units Japan A ustralia
Countries 16 17 27 32 35 0 10 20 M e dals w on 30 40

Gold won
35 32 27 17 16

Russia China US A

24

25

9) Fishbone diagram:Dr. Kaoru Ishikawa, a Japanese quality control statistician, invented the fishbone diagram. Therefore, it may be referred to as the Ishikawa diagram. The fishbone diagram is an analysis tool that provides a systematic way of looking at effects and the causes that create or contribute to those effects. Because of the function of the fishbone diagram, it may be referred to as a cause-and-effect diagram. The design of the diagram looks much like the skeleton of a fish. Therefore, it is often referred to as the fishbone diagram. Whatever name you choose, remember that the value of the fishbone diagram is to assist teams in categorizing the many potential causes of problems or issues in an orderly way and in identifying root causes.

When should a fishbone diagram be used?


Does the team...

Need to study a problem/issue to determine the root cause? Want to study all the possible reasons why a process is beginning to have difficulties, problems, or breakdowns? Need to identify areas for data collection? Want to study why a process is not performing properly or producing the desired results?

1) Example: - This diagram indicates Cause and effect of Heavy School bags. The effect is back pain.

2) Example: - This diagram indicates some possible reasons of high reject rate of machine parts.

25

26

EXAMPLE # 1:
Draw a fishbone diagram of doorknob by showing its parts.

GRAPH:

26

27

EXAMPLE # 2:
Draw a fishbone diagram of Biological Warfare Disease.

GRAPH:

27

28

1. The CEO of a call center wants to know the cause that why all calls is not answered and also wants to improve the ability to handle calls. DIAGRAM:

2. The following fish bone diagram illustrates that what were the causes that project deadline was not met DIAGRAM: 28

29

10) Histogram:1) Example: - The following data shows the number of workers in different factories. No of workers Factories h
70-75 75-80 80-85 85-90 90-95 15 9 25 18 27 5 5 5 5 5

29

30

2) Example: - The following data shows marks obtained by 50 students in Physics exam. Marks obtained Number of Students h
0-10 10-20 20-30 30-40 40-50 3 6 12 18 11 10 10 10 10 10

11) Stem and Leaf Diagram:A stem-and-leaf display separates data entries into leading digits or stem and trailing digits or leaves. For example, since the annual cost (in $000) in the private institution data set all have two-digit integer numbers, the tens and units columns would be the leading digits, and the remaining column (the tenths column) would be the trailing digit. Thus, an entry of 26.4 (corresponding to $26,400) has a stem of 26 and a trailing digit or leaf of 4. The following figure depicts the stem-and-leaf display of the annual cost of attending the 50 sampled private colleges and universities. Steps to follow in constructing a Stem and Leaf Display 1. Divide each observation in the data set into two parts, the Stem and the Leaf. 2. List the stems in order in a column, starting with the smallest stem and ending with the largest. 3. Proceed through the data set, placing the leaf for each observation in the appropriate stem row. Depending on the data, a display can use one, two or five lines per stem. Among the different stems, two-line stems are widely used. Advantages of a stem and leaf display over a frequency distribution (considered in the next section): 30

31 1. the original data are preserved. 2. a stem and leaf display arranges the data in an orderly fashion and makes it easy to determine certain numerical characteristics to be discussed in the following chapter. the classes and numbers falling in them are quickly determined once we have selected the digits that we want to use for the stems and leaves

1) Example: - The following data shows the sugar levels of 50 patients.

12 2 17 6 11 8 17 0 10 9

13 2 18 9 19 2 12 2 11 8

12 8 19 0 18 5 13 2 18 7

14 0 15 6 17 3 13 7 17 6

11 0 18 8 16 9 10 9 15 1

17 0 16 6 15 4 16 5 17 6

16 5 16 5 14 8 19 7 14 5

11 2 14 5 12 2 12 9 18 9

14 5 17 8 11 3 16 7 14 5

16 7 13 3 11 1 17 8 12 3

Blood Sugar Level


Frequency
2 6 6 4 6 4 7 8 4 3

Stem
10 11 12 13 14 15 16 17 18 19

Leaf (=<100)
99 012388 222389 2237 055558 1446 5556779 00366688 8959 027

2) Example: - The following data shows price of different goods in Pakistani rupees.
10 14 62 55 15 75 63 20 20 36 22 35 29 23 40 10 60 10 20 58 15 35 30 17 18 85 18 10 90 29

31

32

Price of Items in a Utility store


Frequency
10 7 4 1 2 3 1 1 1 NO#2
CULULATIVE FREQUENCY 150 100 50 0 1965 1966 1967 1968 1969 1970 CULULATIVE FREQUENCY

Stem
1 2 3 4 5 6 7 8 9

Leaf (=<10 )
0000455788 0002399 0556 0 58 023 5 5 0

A visual display of the five number summary. The box-and-whisker plot is a simplified boxplot taught to beginners . It does not show outliers. The whiskers extending all the way to the minimum and maximum values regardless of how far out they may be.

32

33

A box and whisker graph is used to display a set of data


so that you can easily see where most of the numbers are. For example, suppose you were to catch and measure the length of 13 fish in a lake:

A box and whisker plot is based on medians. The first step is to rewrite the data in order, from smallest length to largest:

Now find the median of all the numbers. Notice that since there are 13 numbers, the middle one will be the seventh number:

This must be the median (middle number) because there are six numbers on each side. The next step is to find the lower median. This is the middle of the lower six numbers. The exact centre is half-way between 8 and 9 ... which would be 8.5 Now find the upper median. This is the middle of the upper six numbers. The exact centre is half-way between 14 and 14 ... which must be 14

33

34 Now you are ready to construct the actual box & whisker graph. First you will need to draw an ordinary number line that extends far enough in both directions to include all the numbers in your data:

First, locate the main median 12 using a vertical line just above your number line:

Now locate the lower median 8.5 and the upper median 14 with similar vertical lines:

Next, draw a box using the lower and upper median lines as endpoints:

Finally, the whiskers extend out to the data's smallest number 5 and largest number 20:

This is a box & whisker plot!

But what does it mean? What information about the data does this
graph give you? 34

35 Well, it's obvious from the graph that the lengths of the fish were as small as 5 cm, and as long as 20 cm. This gives you the range of the data ... 15. You also know the median, or middle value was 12 cm. Since the medians (three of them) represent the middle points, they split the data into four equal parts. In other words: one quarter of the data numbers are less than 8.5 one quarter of the data numbers are between 8.5 and 12 one quarter of the data numbers are between 12 and 14

one quarter of the data numbers are greater than 14

The shading below, as an example, shows the quarter of the numbers that are between 12 and 14:

Here is a picture of the quarter of the data that is between 8.5 and 12. Notice that the data is more spread out here:

This picture is showing where half the data numbers are. Half of all the fish caught had a length between 8.5 and 14 centimetres:

Graphical representation of Textile Related data


35

36

1. SIMPLE BAR DIAGRAM


a) Exported man made fiber

Years
1993-94 1994-95 1995-96 1996-97 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04

Quantity of fiber Exported


25,422 61,485 28,714 48,484 34,015 34,515 22,716 28,524 45,665 66,653 54,878

EXPORTED MAN MADE FIBER


QUANTITY OF FIBERS 70,000 60,000 50,000 40,000 30,000 20,000 10,000 0
19 93 19 -94 94 19 -95 95 19 -96 96 19 -97 97 19 -98 98 19 -99 99 20 -00 00 20 -01 01 20 -02 02 20 -03 03 -0 4

YEARS

b) Consumption of cotton

36

37

Year
1990-91 1991-92 1992-93 1993-94 1994-95 1995-96 1996-97 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04

Cotton
1,128,978 1,257,399 1,318,892 1,511,610 1,412,732 1,509,955 1,444,368 1,471,169 1,441,923 1,566,348 1,673,280 1,755,669 1,943,197 1,938,678

consumption of cotton
2,500,000 2,000,000 1,500,000 1,000,000 500,000 0
19 90 -9 19 1 92 -9 19 3 94 -9 19 5 96 -9 19 7 98 -9 20 9 00 -0 1 20 02 -0 3

cotton

2. MULTIPLE BAR DIAGRAM

37

38

a)

Production of cloth province wise

PERIOD 1993-94 1994-95 1995-96 1996-97 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04

PANJAB 223,789 174,293 189,559 208,107 212,813 230,018 292,536 309,634 355,370 317,981 358,962

SIND 158,693 143,720 137,422 125,388 256,258 154,519 144,634 180,530 256,952 264,164 255,963

N.W.F.P 6,255 3,828 0 0 8 24 20 0 0 0 0

BALUCHISTAN 0 0 0 0 2 0 0 0 2 0 0

C lo th P rd u c tio n P ro v in c e V is e
400,000 300,000 200,000 100,000 0
19 93 -9 19 4 95 19 9 6 97 -9 19 8 99 20 0 0 01 -0 20 2 03 -0 4

Years

P unjab S ind N.W .F .P B aluc his tan

Q u a n tity

b) Textile related exports


38

39 Year 1971-72 1979-80 1989-90 1996-97 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04 yarn 55.6 32.87 33.2 28.09 23.7 20.72 20.95 20.6 16 13 14 %age of textile material exported cloth others 35.5 9 38.9 28.14 22.32 44.35 25.12 46.77 25.56 50.72 24.45 55 21.44 57.59 19.8 59.59 20 64 19 68 21 65

te x tile re la ted e x p o rts


80 60 40 20 0 1997-98 1971-72 1979-80 1989-90 1996-97 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04

y arn c loth others

3. COMPONENT BAR DIAGRAM


39

40

a)
Year Blended 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04 23 26 25 25 26 25 20

Production of yarn

%age of yarn produced Coarse Medium Fine 45 47 49 50 49 46 50 27 24 23 22 21 25 25 4 2 2 3 3 4 5

Yarn Production
100% 80% 60% 40% 20% 0%
19 97 -9 8 19 99 -0 0 20 01 -0 2 20 03 -0 4

Quantity %

Fine Medium Coarse Blended

Years

B) Production of cloth
40

41

Year 1971-72 1981-82 1991-92 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04 grey 64 61 52 61 51 60 57 56 51 49

%age of cloth produced bleached Dyed & printed 17 19 10 16 6 21 4 19 7 25 3 23 4 25 3 27 5 28 6 30

blended 13 21 16 17 14 14 14 16 15

p ro d u c tio n o f c lo th
100% 80% 60% 40% 20% 0%
19 71 -7 2 19 91 -9 19 2 98 -9 9 20 00 -0 1 20 02 -0 3

B lended Dy ed & printed B leac hed G rey

4. FREQUENCY POLYGONES
a)

Export of Wool yarn


41

42

Years 1995-96 1996-97 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04

Quantity 455,693 212,452 512,862 421,481 512,971 512,467 544,217 519,329 458,962

Wool Yarn Export


Quantity 600,000 400,000 200,000 0
6 8 0 2 95 -9 97 -9 99 -0 01 -0 03 -0 20 4

19

Year

B) Export of cloth
Year 1971-72 1979-80 1989-90 1996-97 1997-98 1998-99 Quantity of cloth exported 409808 545768 1017868 1257430 1271272 1355166
42

20

19

19

43

1999-00 2001-01 2001-02 2002-03 2003-04

1574876 1735824 1957353 2036321 2378900

e x p o rt o f c lo th
2 50 000 0 2 00 000 0 1 50 000 0 1 00 000 0 50 000 0 0
19 71 19 -7 2 79 19 -8 0 89 19 -9 0 96 19 -9 7 97 19 -9 8 98 19 -9 9 99 20 -0 0 01 20 -0 1 01 20 -0 2 02 20 -0 3 03 -0 4

5. CUMMULATIVE FREQUENCY POLYGONES


A) Export of cotton yarn
Year
1991-92 1992-93 Quantity Kgs 505,863 555,294

%age of cotton yarn exported 7.527587673 8.263154785

cumulative % 7.527587673 15.79074246

43

44
1993-94 1994-95 1995-96 1996-97 1997-98 1998-99 1999-00 2000-01 2001-02 2002-03 2003-04 578,648 522,091 535,889 508,188 461,919 421,481 512,971 545,134 544,217 519,329 509,097

8.61067829 7.76907142 7.974395104 7.562185264 6.873670876 6.271925758 7.633359578 8.111967032 8.098321444 7.72797097 7.575711806

24.40142075 32.17049217 40.14488727 47.70707254 54.58074341 65.58074341 68.48602875 76.59799578 84.69631722 92.42428819 100

exp ort o f co to n yarn


cummulative % 150 100 50 0
19 92 -9 3 19 94 -9 5 19 96 -9 7 19 98 -9 9 20 00 -0 1 20 02 -0 3

q ua n tity

b)
Year 1996-97 1997-98 1998-99 1999-00 2001-01 2001-02 2002-03 2003-04

Export of cloth
% Quantity Cumulative % 9.26820107 9.26820107 9.370226979 18.63842805 9.988588606 28.62701666 11.6080159 40.23503256 12.79432323 53.02935579 14.42715791 67.4565137 15.00921123 82.46572493 17.53427509 100

Quantity of cloth exported 1257430 1271272 1355166 1574876 1735824 1957353 2036321 2378900 44

45

e x p o rt o f c lo th
150 cummulative % 100 50 0 1 9 9 6 1 9 9 7 1 9 9 8 1 9 9 9 2 0 0 1 2 0 0 12 0 0 2 2 0 0 3 97 98 99 00 01 02 03 04 ye a r

6. PARETO DIAGRAM
A) Export of yarn
Year 1996-97 1997-98 2003-04 2000-01 1999-00 2002-03 1998-99 2001-02 Quantity of yarn exported 1,411,519 1,153,542 1,141,219 1,076,600 1,071,616 926,358 345,169 342,369 % quantity 18.89990509 15.44565416 15.28065212 14.41541901 14.34868443 12.40371421 4.621731157 4.58423982 Cumulative %
18.89990509 34.34555925 49.62621137 64.04163038 78.39031481 90.79402902 95.41576018 100

45

46

export of yarn
120 100 80 60 40 20 0 19 9697 19 9798 20 0304 20 0001 19 9900 20 0203 19 9899 20 0102 cummulative % %quantity

year % quantity Cumulative %

Cotton prices month wise in year 200405


Month Jul Aug Mar June Apr May Sep Feb Jan Rs per maund 2615 2273 2272 2251 2233 2217 2205 2164 2035 % value 10.06040088 8.744662024 8.740814835 8.660023853 8.590774439 8.529219405 8.48305313 8.325318355 7.829030893
46

Cumulative % value
10.06040088 18.8050629 27.54587774 36.20590159 44.79667603 53.32589544 61.80894857 70.13426692 77.96329781

47

Oct Nov Dec

1940 1923 1865

7.463547878 7.398145655 7.175008656

85.42684569 92.82499135 100

cotton prices m onth w in year 2004ise 05


12 0 %value 10 0 80 60 40 20 0
u l u g M a r J u n e A p r M a y S e p F e b J a n O c t N o v D e c J A

cummulative % m nth o % v lue a C m tiv % v lu u ula e a e

7.PIE CHART
a) World wide fiber production in year 2004 (Quantity measured in 1000 tones) Type of fiber Man made fiber Cotton Wool Silk Quantity of fiber produced 34560 24900 1215 97 % quantity 56.86 40.97 1.99 0.16 Quantity out of 360 204.72 147.50 7.19 0.57

47

value

48

w o rld w id e fib e r p ro d u c tio n in y e a r 2004


0 .1 6 1 .9 9 4 0.97 5 6.8 6 M an m a de fibe r C ot ton W o ol S ilk

B) Production of cloth province wise in year 1998-99


Province Punjab Sindh N.W.F.P Baluchistan Quantity of cloth produced 230,018 154,519 24 0 % quantity 59.81 40.18 0.0062 0 Angle of sectors in degrees 215.32 144.65 0.022 0

p ro d u c tio n o f c lo th p ro v in c e w is e in y e a r 1 9 9 8 -9 9
0.0062 40 .18 0 59.81 P un jab S indh N .W .F .P B aluc h is tan

48

49

Multiple Choice Questions


1. Consider the following output from DataDesk when analyzing the pH values of the 1986 data collected on precipitation events.

Summary stats for 1986 NumNumeric = 55 NumNonNumeric = 0 NumCases = 55 Mean = 6.0673 Median = 6.1000 Std Deviation = 0.47339 Range = 2.4000 Minimum = 4.6000 Maximum = 7 75-th %ile = 6.4000

2. Which of the following is NOT CORRECT? a. The 25th percentile is about 5.9. b. Some outliers appear to be present below a pH of 5.4. c. About 95% of the observations have pH values in the approximate range 61. d. About 10% of the values are in the range 5.8 to 6.0. e. About 75% of the values are less than 6.4. (d) 3. The following is a histogram showing the actual frequency of the closing prices on

the New York exchange of a particular stock. Based on the above frequency histogram for New York Stock exchange, the class that contains the 80th percentile is: a. 20-30 b. 10-20 c. 40-50 d. 50-60 (e) e. 30-40

49

50 4. A histogram of the heights of 39 plants is as follows:

The 75th percentile of the height distribution is approximately: a. 9.4 b. 9.7 c. 7.7 d. 7.5 (b) e. 10.0 5. The weights of the male and female students in a class are summarized in the following boxplots:

Which of the following is NOT correct? a. About 50% of the male students have weights between 150 and 185 lbs. b. About 25% of female students have weights more than 130 lbs. c. The median weight of male students is about 162 lbs. d. The mean weight of female students is about 120 because of symmetry. e. The male students have less variability than the female students. (e) 6. Consider the following box plots of the grades in a course in statistics for each sex drawn according to the convention that the whiskers reach the 10th and 90th percentiles.

Which of the following is correct? a. The mean grade of the female students is about 72. b. The median grade for the male students is about 68. c. About 25% of female students get grades above 72. 50

51 d. About 10% of male students get grades below 60. e. About 50% of female students get grades between 62 and 82. (e)

7. Consider the following box-plot of the yield of barley drawn using the convention that the wiskers reach the 10 and 90th percentiles.

Which of the following is NOT correct? a. The mean is about 200 g/400 m2. b. the median is about 180 g/400 m2. c. About 50% of the yields are between 160 and 220 g/400 m2. d. About 25% of the yields are below 220 g/400 m2. (d) e. About 10% of the yields are below 130 g/400 m2. 8. Consider the following ogive of the scores of students in an introductory statistics course:

A grade of C or C+ is assigned to a student who scores between 55 and 70. The percentage of students that obtained a grade of C or C+ is: a. 25% b. 30% c. 20% d. 50% (c) e. 15% 9. Which of the following is NOT correct about constructing histograms? a. The approximate number of classes is 1+3.3log(n). b. All class intervals should be of equal width. c. The bars of the histogram are centred over the class mark (midpoint). d. The first and last classes should be open-ended to account for extreme points. e. There should be no spaces between bars. (d) 10. Forest companies routinely take samples from tracts that have been replanted to monitor the growth of the trees. Suppose that in a recent sample of two tracts, the diameter of the trees was measured with the following results: 51

52
Tract Trees Range A B 75 210 232-315 215-250 (mm)

Histograms to compare the two groups are to be constructed. Which of the following is not recommended. a. The number of classes in Group A and Group B should be around 7 or 8. b. The class width of both groups will be 10 mm. c. The class bounds will be 232-242 mm, 242-252 mm, etc. for Group A and 215- 225 mm, 225-235 mm, etc for Group B. d. The vertical scale for both groups should be relative frequency (%). e. The two histograms will be stacked and alligned so that the vertical axes are the same and the horizontal axes are identical. (c) 11. Consider the following SAS procedure to construct a histogram.
PROC CHART DATA=BARLEY; VBAR YIELD/ TYPE=PERCENT MIDPOINTS=90 TO 300 BY 30;

Which of the following statements is correct? a. b. c. d. e. The first bar will contain only values from 75 to 105. All yields below 90 will not be used. The vertical axis will be labelled with the actual frequency in each class. Several charts will be produced, one for each distinct year. The last class will include values above 315. (e)

12. For each student in a class, the sex and weight (in kilograms) are recorded Consider the following SAS program:
DATA STUDENTS; INPUT SEX $ WEIGHT; DATALINES; F 62 . . (<-- more data here) . M 78 ; PROC SORT DATA=STUDENTS; BY SEX; PROC CHART DATA=STUDENTS; VBAR WEIGHT/TYPE=PERCENT MIDPOINTS=55 TO 95 BY 10; BY SEX;

Which of the following statements is correct? a. The first class will contain weights from 55 to 65 kg. b. All weights below 55 kilograms will be discarded. 52

53 c. One vertical bar chart will be produced; it will contain the weights of all of the students, and the males and females will be indistinguishable. d. The vertical axis will be labelled "frequency". e. Two separate vertical bar charts will be produced -- one for males and one for females. (e) 13. A single stem-and-leaf plot is a useful tool because: a. it includes the average and the standard deviation. b. it shows the percentage distribution of the data values. c. it enables us to examine the data values for the presence of trends, cycles, and seasonal variation. d. it enables us to locate the centre of the data, see the overall shape of the distribution, and look to marked deviations from the overall shape. e. it enables us to compare this dataset against others of a similar kind. (d) 14. Forty students wrote a Statistics examination having a maximum of 50 marks. The mark distribution is given in the following stem-and-leaf plot:
0|28 1|2245 2|01333358889 3|001356679 4|22444466788 5|000

The third quartile of the mark distribution is equal to: a. b. c. d. e. 75 44 32 37.5 30

(b)

15. Thirty students wrote a statistics examination having a maximum of 50 marks. The mark distribution is given in the following stem-and-leaf plot:
0|9 1|225 2|013335889 3|00136679 4|02244478 5|0

The median mark is equal to: a. b. c. d. e. 30.5 30.0 25.0 28.5 44.0

(a) 53

54 16. Rainwater was collected in water collectors at thirty different sites near an industrial basin and the amount of acidity (pH level) was measured. The following stem-and- leaf diagram shows the pH values that ranged from 2.6 to 6.3.
Stems 2 3 4 5 6 Leaves 679 237789 1222446899 0556788 0233

The median acidity is: a. b. c. d. e. 4.2 4.4 4.5 4.6 (c) Average of 15 and 16.

17. Refer to the previous question. Which of the following box-plots is correct:

(e) 18. Refer to the previous question. The interquartile range is: a. 7.75 b. 23.25 c. 5.625 d. 3.77 (e) e. 1.855 19. The following is a stem-plot of the birth weights of male babies born to the smoking group. The stems are in units of kg.
Stems 2 3 4 5 Leaves 3,4,6,7,7,8,8,8,9 2,2,3,4,6,7,8,9 1,2,2,3,4 3,5,5,6

The median birth weight is: a. 13.5 54

55 b. c. d. e. 3.2 3.5 3.7 Average of 13 and 14.

(c)

20. Refer to the previous question. The first quartile (25th) percentile of the weights is a. 2.3 b. 2.7 c. .25 d. 6.5 (e) e. 2.8 21. Which one of the following statements is FALSE? a. Pie charts are better than bar graphs for comparing relative sizes. b. Data that are nominal scale are presented using frequency tables. c. Means and standard deviation of ordinal data are meaningless. d. The scatter-plot is the basic graphic tool for investigating relationships between two interval or ratio scaled variables. e. Box-plots are a good choice for comparing the distribution of values among groups. (a) 22. Which of the following is NOT CORRECT? a. The scatterplot is the basic graphical tool for investigating relationships between two continuous interval or ratio scaled variables. b. The frequency table is useful for summarizing data from a nominal scaled variable. c. Means and standard deviations of nominal or ordinal scaled variables are useful summary measures. d. Pie charts dont perform well because people have difficulty in accurately quantifying angles. (c) e. Boxplots perform well for comparing groups because it is relatively straightforward to see how the mean and median change over the groups. 23. The following is a display comparing the favorite TV shows selected from a specified set by gender. Each person had to select one preferred show from the three shows given below:
<--------------------- Percent -------------------> Sample 10 20 30 40 50 60 70 80 90 100 Size |----|----|----|----|----|----|----|----|----|----| Male: sssssssssssssssssssssfffffffffffffffbbbbbbbbbbbbbbb Female:sssssssssssffffffffffffffffffffffffffffffbbbbbbbbbb Star Trek; ffff= Friends; bbbb = Baywatch;

100 300

Which of the following is FALSE:

55

56 a. A greater percentage of males have StarTrek as their favorite show than females. b. More females (in absolute number) selected Baywatch as their favorite show than males. c. About 1/5 of females surveyed selected StarTrek as their favorite show from the three given. d. The modal favorite show (from the three specified) for males is StarTrek. e. About twice as many females (in relative numbers) enjoy Star Trek than males. (e) 24. An experiment was conducted to investigate the effect of a new weed killer to suppress weed germination in onion crops. Two chemicals were used, the standard week killer (C) and the new chemical (W). Both chemicals were tested at high and low concentrations. Measurements are made on each of 50 plots for each treatment combinations of the % weed germination. Here are some box-plots of the results where the whiskers extend to the min and max of the data.
0 10 20 30 40 50 |---------|--------|---------|---------|---------| | | _______________ | -----------|________|______|-------| | ___________ | ----|_____|_____|---| | ______________ | -----|_______|______|---------* * * | | _________ | --|____|____|---

W-low conc. C-low conc. W-high conc C-high conc.

Which of the following is NOT a feature of this data: a. At either high or low concentrations, the new chemical (W) gives better control of weed germination than the control (C). b. Fewer weeds germinate at higher concentrations of both chemicals. c. The results from the control chemical are less variable than the new chemical. d. High or low concentrations of either chemical have approximately the same effects on weed germination. e. Some of the results from the low concentration of weed killer W have fewer weeds germinating than some of the results from the high concentration of W. (d)

56

You might also like