You are on page 1of 19

Chapter 1

What is Statistics?
Statistics is a science of collection, organization, presentation, interpretation and analysis that deals with a large amount of numbers or population not only for people but also for animals and things in a bound group or area. Sometimes, Statistics is used in illustrating and briefing graphs or tables that include averages and measures that summarize sets of data. Statistics has two phase, descriptive statistics and inferential statistics. Descriptive is focused more on collecting, organizing, and presenting the data. On the other hand, inferential statistics or analytical statistics is focused more on the higher level of descriptive. Its main concerns are analyzing and interpreting the data.

Application of Statistics: y y y y y y y y y y y y Education Psychology Economics HRM (Food tasting Statistics) Sports Government (NSO) Social Medicine Telecommunications Sociology Engineering Etc.

is

Basic Concepts
A. Several Important Terms: Population - must be universal. - it is the entire collection of all possible observations.

Sample - must be a subset. - random sampling. - is a subset of statistical population from which it is derived.

Variable - properties or characteristics that is observed or measured. - any information measurable on every element of the population. Census - process of collecting information from population. Survey - process of collecting information fro sample. Parameter - used to describe a population. - usually denoted by small Greek letter.

Variables
A variable is a quantity that is characterized from a sample or population, which assumes a succession of values observed. Also, variable is any information measurable on every element of the population. Variables can be classified either quantitative or qualitative. Qualitative variable is variable that is not expressed numerically, because it differs in kind rather than in degrees among elementary units. Qualitative data provide labels, or names, for categories of like items. This variable can be dichotomous or multinomial. a) Dichotomous variable examination in this variable is made up with two categories. E.g. gender: male or female. b) Multinomial variable examination in this variable is made up with more than two categories. E.g. color, job title, religion, etc.

Quantitative variable is a variable that is normally expressed numerically because it differs in degree rather than in kind among the members of the group. Quantitative data measures either how much or how many is something. This variable can be discrete or continuous.

a) Discrete variable produces numerical responses that arise from count data. E.g. Number of children in the family, number of teachers in a faculty, number of schools in Manila, etc. b) Continuous variable can take on numerical responses that arise from measured data. E.g. Weight, height, volume, etc.

Scales of Measurement

a) Nominal values are simply labels or names or categories without any explicit or implicit. It is the lowest level of measurement known as categorical scale. E.g. gender, student number, color, etc. b) Ordinal values are simply labels or names or categories with any explicit or implicit ordering of the labels. Ranking can be done on data. The distance between two labels cannot be determined. E.g. Student classification, Job position, Military ranks, etc. c) Interval variables whose values can be ordered and values maybe added or subtracted but not multiplied or divided. This measure is more complicated. The interval has all the features of ordinal measurements. d) Ratio measurement possess the characteristics of ordinal and interval data, but the values between numbers as well as the ratios are meaningful. All types of arithmetic operations can be performed with such data because these types of numbers have a natural or true zero point that denotes the complete absence of the characteristic they measure, thus making the ratio of any two such numbers independent of the unit of measurement.

Source of Data
a) Primary data agencies. is obtained personally from primary sources such as government

b) Secondary data magazine. Collection of Data y y y y y

obtained from secondary sources such as newspaper and

Direct or Interview Indirect or Questionnaire Observation Experimentation Use of documents or any existing records

Qualities of a Good Questionnaire


1. 2. 3. 4. 5. Must be clear Structured in a way that the interviewee answers the questions easily Units of measure must be specified Question must be specified Not too long

Methods of Presenting a Data


a) Tabular form is used when the data presented in rows or columns are arranged in an ascending or descending range or measures e.g. Year 2000 2001 2002 2003 2004 2005 2006 2007 Profit (million pesos) 5.0 4.7 6.9 7.0 7.1 10.6 12.8 15.1

b) Textual form is utilized when the data to be presented are purely qualitative. e.g. Profit of Elemental Corporation from 2000 to 2007 At the start of the year 2000, the profit reached up to 5 million pesos but went down 4.7 million pesos in the next year because of the calamities that took place. With the introduction of the new system in marketing, profit went up to 6.9 million pesos in 2002. During the next two years, 2003 and 2004, it maintained the increase of 0.1 million pesos only. With the revision of management s marketing system which was introduced at the beginning of the year 2005 gives the best increase in profit of 3.4 million pesos and has maintained the better increases in the subsequent years. c) Graphic form is the most popular way of presenting statistical data. It helps present the information condensed into any type of table. e.g.

Chapter 2
Methods of Data Organization
Array listing the value in either ascending or descending order.

Frequency Distribution grouping the value into manually exclusive classes and showing the number of observations occurring in each class.

Some Basic Definitions


a.) Class Boundaries reflects the continuous property of the data. It is also called as absolute class limits. Class boundary may be obtained if the class mark is given using the formula  and  b.) Class Mark (X) midpoint of the class interval and is obtained by getting the  or

average of the upper limit (UL) and the lower limit (LL). =

c.) Relative Frequency is a frequency of the class expressed in proportion to the total number of observations. d.) Less than cumulative frequency total number of observations whose values do not exceeds the upper limit of class.

Other Frequency Distributions


a.) Relative Frequency Distribution is obtained by multiplying the quotient of the frequency and the total frequency (n or N) BY 100.
   

b.) Cumulative Frequency Distribution has two kinds; less than (<cf) and the greater than (>cf). The less than cumulative frequency distributions is obtained by accumulating the frequencies starting from the lowest class while the greater than cumulative frequency is obtained by accumulating the frequencies from the highest class.

c.) Relative Cumulative Frequency Distribution cumulative frequency in percent form.


   

is obtained by taking each

Graphic Presentation of Frequency Distribution


a.) Histogram a bar graph of a frequency distribution with the class boundaries on the horizontal axis and frequencies on the vertical axis. e.g.

b.) Frequency Polygon a line graph of a frequency distribution with the class marks on the horizontal axis and frequencies on the vertical axis.

e.g.

c.) Ogives are the graphical presentations of the less than and greater than cumulative frequency distributions. Less-than ogive is constructed so that the cumulative frequencies are on the vertical axis and the upper class boundaries are plotted on the horizontal axis. For the greater-than ogive, the lower class boundaries are plotted on the horizontal axis.

Chapter 3
The Mean
Arithmetic Mean or simply mean is the sum of the observational divided by the number of observations. The mean of a set of observed data with n observations is equal to the sum of the values of these observations divided by n. Grouped Data:


Properties: a.) b.) c.) It always exist for quantitative variables It is unique It takes into an account every item of the data thus it is easily affected by extreme values.

e.g. 13, 18, 13, 14, 13, 16, 14, 21, 13


The mean is the usual average, so:

(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) 9 = 15

The Median
It is the middle value of array denoted by . Median is the middlemost value in an ordered array of data. It is the middle value when there is odd number of items in the array. For even number of items, the median it is the mean of the two middle items. If n is odd: 

If n is even:  Properties: a.) Not easily affected by extreme values b.) It always exist and is unique e.g. 13, 13, 13, 13, 14, 14, 16, 18, 21 There are nine numbers in the list, so the middle one will be the (9 + 1) 2 = 10 2 = 5th number: 13, 13, 13, 13, 14, 14, 16, 18, 21

The Mode
The third measure of central tendency which is the most unstable and t he weakest is the mode, which is defined as the value that occurs with the highest frequency and more than once. Among the measures of central tendency in a set of data, only the mode could have more than one value or no value at all. It is an observation that occurs most frequently in the data set denoted by . Properties: a.) No calculations are required (for the ungrouped mode) b.) It may not exist c.) It may not be unique e.g. 13, 13, 13, 13, 14, 14, 16, 18, 21 The mode is the number that is repeated more often than any other, so 13 is the mode.

Chapter 4
Range
Range is the difference between the highest value and the lowest value in a set of ungrouped data. It is the difference between the upper limit of the largest class and the lower limit of the smallest class for the grouped data. The range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion. For Ungrouped Data: R = HV LV

For Grouped Data: R= 

Where HV highest value in the data set LV lowest value in the data set - upper limit of the highest class - lower limit of the lowest class

Standard Deviation
Another measure to calculate the variability of a set of data is the standard deviation which is often used in science as a measure of the precision to which an experiment has been done. It is calculated by taking the square root of the average squares of the individual deviations from the mean.

Variance
It is the square of the standard deviation.

Sample variance = Population variance = Formula for grouped data:


Where: = mean X = data f frequency

n number of frequencies

Coefficient of Variation
The coefficient of variation is used in comparing two or more sets of data pertaining to different kinds of measurement. It measures the variability of the set of data in percentage.

Coefficient of variation of Sample Data

Coefficient of variation of Population 

 

Other Descriptive Measures


Skewness
A frequency curve that is not summetrical about the mean is said to be skewed if it tails off to the right, we describe it as positively skewed, but if it tails off to the left, we say it is negatively skewed. The relationship between the mean and the median is related to the direction skewness. If the mean is greater than the median we have positively showed curved, but if the mean is less than the median, we have a negatively skewed curved. Now, with the use of the standard deviation, it is possible to obtain a measure of skewness which indicates both the direction and the magnitude for the extent of skewness of a frequency data. It is called Personian Coefficient of skewness (sk) and the formula is:   

e.g.

Normal

Kurtosis
Kurtosis is the degree of peakedness or flatness of a normal or symmetric distribution. It is measured by the Coefficient of Kurtosis denoted by K.

For grouped data:    

For ungrouped data:     

Types of Kurtosis: Leptokurtosis, mesokurtic, and platykurtic.

Chapter 5
Probability
A probability is a numerical measure of the possibility of the event. e.g. A problem on dice is being experimented by rolling it 50 times. The results are: 4 7 6 5 10 10 10 5 6 10 9 5 7 4 4 8 5 10 4 6 5 6 11 11 3 3 6 4 7 8 8 7 7 4 10 11 3 8 8 4 3 8 7 3 7 5 4 11 9 5

To approximate the probability that the sum is equal to 6, count the number of 6's in the experiments (5) and divide by the total number of experiments (50). That is, the probability of observing a 6 is roughly the relative frequency of 6's. # of 6's PROBABILITY (SUM IS 6) is approximately ------------------# of tosses 5 ---- = .1 50

In general, the probability of an event can be approximated by the relative frequency , or proportion of times that the event occurs. # of times event occurs -------------------------------------# of experiments

PROBABILITY (EVENT) is approximately

Basic Rules in Probability


1. For any event A, 0 p (A) 1 2. p(A) + p(A ) = 1 p(A) = 1 p(A ) p(A ) = 1 p(A) 3. The probability of an impossible event (event that can not occur) is equal to zero, that is, p  4. For any event A comprising of the outcomes , , , p(A) = P( ) + P( )+ + p( )

5. P(S) = 1, where S = sample space 6. Addition Rule a) For mutually exclusive events, the probability that one or the other will occur equals the sum of their probabilities. In symbols, p    

b) For non-mutually exclusive events, the probability that only event A or only event B occurs or both A and B occur equals the sum of their probabilities minus the probability that A and B has occurred.  General Addition Rule            

7. Multiplication Rule a) For independent events (sample with replacements)       b) For dependent events (sample without replacement)          

Where p(B/A) is read as probability of B given A which means the probability that B occurs if A occur or has occurred.

Chapter 6
The Normal Distribution
The normal distribution is the most important and most commonly used distribution in the entire field of statistics because it provides foundations to other statistical analysis that has been developed.

Graph of Normal Distribution

68.3% 95.5% 99.7%

Properties of normal curve: 1. The graph of the normal distribution is a bell-shaped curve with the highest point over the mean. 2. The curve is symmetrical about the vertical axis through mean. 3. The tails or ends of the curve are asymptotic relative to the horizontal axis. 4. The values of the mean, median and mode are equal under the normal curve.

5. The area under the normal curve is equal to one or 100%. 6. The normal curve may be divided into 3 parts, one on each side of the vertical axis. The distance from the mean to one part of the curve to the right is equal to +1 and -1 from the mean to one part to the left. 7. Regardless of the values of the mean and standard deviation, the area that lies within one standard deviation from the mean is 68.3%, within2 standard deviations from the mean is 95.5% and within 3 standard deviations from the mean is 99.7%.

Chapter 7

Hypothesis Testing
Hypothesis testing is one of the most important tools of applications of statistics to real life problems. Oftentimes, before making decisions, a study or analysis about the problem situation must be done first to avoid failures especially when it concerns business. In attempting to reach a decision, it is useful to make an educated guess or assumption about the population involved. Such educated guess or assumption is called statistical hypothesis and this process is called statistical analysis or statistical testing. Statistical hypothesis is an assumption or educated guess concerning one or more population parameters of a distribution. The hypothesis may or may not be accepted on the basis of information taken from a sample. The hypothesis being tested that is set up whether it can be accepted or not, is called the null hypothesis, denoted as . It is sometimes called the statement of no difference that is nothing is changed. Another hypothesis is stated, opposing the null hypothesis, to have a choice when the null hypothesis is not accepted. This is called the alternative hypothesis, denoted as .

You might also like