You are on page 1of 6

statistics the practice or science of collecting and analyzing numerical data in large quantities,

especially for the purpose of inferring proportions in a whole from those in a representative
sample. a branch of mathematics dealing with the collection, analysis, interpretation, and
presentation of masses of numerical data
Descriptive statistics are a set of brief descriptive coefficients that summarizes a given data
set, which can either be a representation of the entire population or a sample. The measures
used to describe the data set are measures of central tendency and measures of variability or
dispersion.
Inferential- Mathematical methods that employ probability theory for deducing (inferring) the
properties of a population from the analysis of the properties of a data sample drawn from it. It is
concerned also with the precision and reliability of the inferences it helps to draw.
Sample-A selection taken from a larger group (the "population") so that you can examine it to
find out something about the larger group.

Population-The whole group from which a sample is taken.


Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical).

Categorical. Categorical variables take on values that are names or labels. The color of a ball (e.g., red,
green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of categorical
variables.

Quantitative. Quantitative variables are numerical. They represent a measurable quantity. For example,
when we speak of the population of a city, we are talking about the number of people in the city - a
measurable attribute of the city. Therefore, population would be a quantitative variable.

nominal data, or categorical data that assigns numerical values as an attribute to an object,
animal, person or any other non-number.
ordinal data, which is data that can be ordered and ranked, but not measured, such as levels of
achievement, prizes, rankings, and placements. Similar to nominal data, ordinal data cannot be
multiplied, divided, added, or subtracted.

Nominal: Nominal data have no order and thus only gives names or labels to various categories.

Ordinal: Ordinal data have order, but the interval between measurements is not meaningful.

Interval: Interval data have meaningful intervals between measurements, but there is no true starting
point (zero).

Ratio: Ratio data have the highest level of measurement. Ratios between measurements as well as
intervals are meaningful because there is a starting point (zero).
Continuous Data can take any value (within a range)
Continuous data have infinite possibilities
Discrete data are numeric data that have a finite number of possible values. Discrete Data can
only take certain values.

A probability sampling method is any method of sampling that utilizes some form of random
selection. In order to have a random selection method, you must set up some process or
procedure that assures that the different units in your population have equal probabilities of
being chosen. Humans have long practiced various forms of random selection, such as picking a
name out of a hat, or choosing the short straw. These days, we tend to use computers as the
mechanism for generating random numbers as the basis for random selection. probability
sampling: simple random, systematic, and stratified sampling
non-probability sampling, subjects are chosen to be part of the sample in non-random ways.
three non-probability sampling methods - convenience, quota, and judgmental sampling.

Probability Sampling: Sample has a known probability of being selected

Non-probability Sampling: Sample does not have known probability of being selected as in
convenience or voluntary response surveys

Simple random sampling is the basic sampling technique where we select a group of subjects
(a sample) for study from a larger group (a population). Each individual is chosen entirely by
chance and each member of the population has an equal chance of being included in the sample.
Every possible sample of a given size has the same chance of selection.
Slovinss formula is used to calculate an appropriate sample size from a population.

A method of choosing a random sample from among a larger population. The process of systematic sampling
typically involves first selecting a fixed starting point in the larger population and then obtaining subsequent
observations by using a constant interval between samples taken. Hence, if the total population was 1,000, a
random systematic sampling of 100 data points within that population would involve observing every 10th data
point.
Stratified sampling is a probability sampling technique wherein the researcher divides the entire
population into different subgroups or strata, then randomly selects the final subjects
proportionally from the different strata.
Cluster sampling is the sampling method where different groups within a population are used
as a sample. This is different from stratified sampling in that you will use the entire group, or
cluster, as a sample rather than a randomly selected member of all groups.

Methods of Data Collection


For this tutorial, we will cover four methods of data collection.

Census. A census is a study that obtains data from every member of a population. In most studies, a
census is not practical, because of the cost and/or time required.

Sample survey. A sample survey is a study that obtains data from a subset of a population, in order to
estimate population attributes.

Experiment. An experiment is a controlled study in which the researcher attempts to understand causeand-effect relationships. The study is "controlled" in the sense that the researcher controls (1) how
subjects are assigned to groups and (2) which treatments each group receives.
In the analysis phase, the researcher compares group scores on some dependent variable. Based on the
analysis, the researcher draws a conclusion about whether the treatment ( independent variable) had a
causal effect on the dependent variable.

Observational study. Like experiments, observational studies attempt to understand cause-and-effect


relationships. However, unlike experiments, the researcher is not able to control (1) how subjects are
assigned to groups and/or (2) which treatments each group receives.

different ways of data presentation


FREQUENCY DISTRIBUTION:

Data can be presented in various forms depending on the type of data collected. A frequency distribution is a
table showing how often each value (or set of values) of the variable in question occurs in a data set. A
frequency table is used to summarize categorical or numerical data. Frequencies are also presented as relative
frequencies, that is, the percentage of the total number in the sample.

GRAPHICAL METHODS:
Frequency distributions and are usually illustrated graphically by plotting various types of graphs:
Bar graph - A bar graph is a way of summarizing a set of categorical data. It displays the data using a
number of rectangles, of the same width, each of which represents a particular category. Bar graphs can
be displayed horizontally or vertically and they are usually drawn with a gap between the bars
(rectangles).
Histogram - A histogram is a way of summarizing data that are measured on an interval scale (either
discrete or continuous). It is often used in exploratory data analysis to illustrate the features of the
distribution of the data in a convenient form.
Pie chart - A pie chart is used to display a set of categorical data. It is a circle, which is divided into
segments. Each segment represents a particular category. The area of each segment is proportional to the
number of cases in that category.
Line graph - A line graph is particularly useful when we want to show the trend of a variable over time.
Time is displayed on the horizontal axis (x-axis) and the variable is displayed on the vertical axis (yaxis).
1. Measures of central tendency:

Measures of central tendency are measures of the location of the middle or the center of a distribution.
The most frequently used measures of central tendency are the mean, median and mode.

What are the measures of central tendency?

A measure of central tendency (also referred to as measures of centre or central location) is a summary measure that
attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution.
There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a
different indication of the typical or central value in the distribution.

What is the mode?


The mode is the most commonly occurring value in a distribution.
Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
Age

Frequency

54

55

56

57

58

60

The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.

Advantage of the mode:


The mode has an advantage over the median and the mean as it can be found for both numerical and categorical (nonnumerical) data.

Limitations of the mode:


The are some limitations to using the mode. In some distributions, the mode may not reflect the centre of the distribution very
well. When the distribution of retirement age is ordered from lowest to highest value, it is easy to see that the centre of the
distribution is 57 years, but the mode is lower, at 54 years.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
It is also possible for there to be more than one mode for the same distribution of data, (bi-modal, or multi-modal). The
presence of more than one mode can limit the ability of the mode in describing the centre or typical value of the distribution
because a single value to describe the centre cannot be identified.
In some cases, particularly where the data are continuous, the distribution may have no mode at all (i.e. if all values are
different).
In cases such as these, it may be better to consider using the median or mean, or group the data in to appropriate intervals, and
find the modal class.

What is the median?


The median is the middle value in distribution when the values are arranged in ascending or descending order.
The median divides the distribution in half (there are 50% of observations on either side of the median value). In a distribution
with an odd number of observations, the median value is the middle value.
Looking at the retirement age distribution (which has 11 observations), the median is the middle value, which is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value is the mean of the two middle values. In the
following distribution, the two middle values are 56 and 57, therefore the median equals 56.5 years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

Advantage of the median:


The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency
when the distribution is not symmetrical.

Limitation of the median:


The median cannot be identified for categorical nominal data, as it cannot be logically ordered.

What is the mean?


The mean is the sum of the value of each observation in a dataset divided by the number of observations. This is also
known as the arithmetic average.
Looking at the retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623) and dividing by the
number of observations (11) which equals 56.6 years.

Advantage of the mean:


The mean can be used for both continuous and discrete numeric data.

Limitations of the mean:


The mean cannot be calculated for categorical data, as the values cannot be summed.
As the mean includes every value in the distribution the mean is influenced by outliers and skewed distributions.

What else do I need to know about the mean?

The population mean is indicated by the Greek symbol (pronounced mu). When the mean is calculated on a distribution
from a sample it is indicated by the symbol xx (pronounced X-bar).

You might also like