You are on page 1of 122

Basic Statistical

Concepts and Methods

Ahmed-Refat AG Refat
FOM-ZU
Ahmed-Refat-ZU

Definition of Statistics
Statistics is the science of dealing with
numbers.

collection, summarization,
presentationandanalysisofdata.
It is used for

Statisticsprovidesawayoforganizingdatato
get information on a wider and more formal
(objective) basis than relying on personal
experience(subjective).
Ahmed-Refat-ZU

Uses of medical
statistics
Medicalstatisticsareusedin
1- Planning, monitoring and evaluating

community

healthcareprograms.
2- Epidemiologicalresearchstudies.
3- Diagnosisofcommunityhealthproblems.
4- Comparisonofhealthstatusanddiseasesindifferent
countriesandinonecountryoveryears.
5- To form standards for the different biological
measurementsasweight,height.
6- Todifferentiatebetweendiseasedandnormalgroups.

Ahmed-Refat-ZU

Types of data
Anyaspectofanindividualthatismeasured,iscalled
variable.Variablesareeither
1-Quantitative or 2-Qualitative.
1-Quantitative data: it isnumericaldata.
Discrete data: areusually whole numbers, suchasnumber
of cases of certain disease, number of hospital beds (no
decimalfraction).
Continuous data: it implies the measurement on a
continuous scalee.g.height,weight,age(adecimalfraction
canbepresent).

Ahmed-Refat-ZU

1-Quantitative data

.
Quantitative data: it isnumericaldata.
TowTypes
A- Discrete data: are usually whole numbers, such
as number of cases of certain disease, number of
hospitalbeds(nodecimalfraction).
B- Continuous data:itimpliesthemeasurementona
continuous scalee.g.height,weight,age
(adecimalfractioncanbepresent).

Ahmed-Refat-ZU

2- Qualitative data
Qualitative data: It is non numerical data and is
subdividedintoTwoTypes:
A- Categorical : data are purely descriptive and
implynoorderingofanykindsuchassex,areaof
residence.
B- Ordinal data:arethosewhichimplysomekind
oforderinglike
Levelofeducation:
Socio-economicstatus:
Degreeofseverityofdisease:
Ahmed-Refat-ZU

Presentation Of Data
Thefirststepinstatisticalanalysisistopresent
datainaneasywaytobeunderstood.
Thetwobasicwaysfordatapresentationare:

Tabular presentation.
2. Graphical presentation
1.

Ahmed-Refat-ZU

Tabulation
Somerulesfortheconstructiontables:
1-Thetablemustbeself-explanatory.
2- Title: written at the top of table to define
preciselythecontent,theplaceandthetime.
3-Clearheading of the columns and rows and
units of measurements
4-The size of the table depends on the number
of classes. Usually lie between 2 and 10 rows
or classes. Its selection depends on the form of data and the
requirementofthedistribution.Toosmallmayobscuresomeinformationand
toolongwillnotdifferfromrawdata.

Ahmed-Refat-ZU

Types of tables
For Qualitative data, draw a simple table eg., List
Table : count the number of observations
( frequencies) in each category.
For Quantitative data, we have to form a
frequency distribution Table

List tables (2 columns- one value for each measured variable)


Frequency Distribution Tables

Ahmed-Refat-ZU

Types of tables
:List:
A table consisting of two columns, the first giving an
identificationoftheobservational unitandthesecondgiving
thevalue of variable for that unit.
Example : number of patients in each hospital department
are
Medicine100patients
Surgery80
ENT28
Ophthalmology30

Ahmed-Refat-ZU

Frequency Distribution
tables
FDTs are used for presentation of
qualitative ( and quantitative Discrete)data,
Byrecordingthenumberof
observationsineachcategory.
Thesecountsarecalledfrequencies.

.
No Classes .. No Intervals
Ahmed-Refat-ZU

Frequency Distribution
tables
FDTforQuantitative Continuous Data
consistsofaseriesof classes
(intervals) together with the number of
observations ( frequency) whose values
fallwithintheintervalofeachclass.

Ahmed-Refat-ZU

Frequency Distribution
tables
EXAMPLE(1)Assumewehaveagroup
of 20 individuals whose blood groups
were as followed :A,AB,AB, O, B,A,
A,B,B,AB,O,AB,AB,A,B,B,B,A,O,
A.Wewanttopresentthesedataby
table.
?????Typeofdata>>>>>>
Ahmed-Refat-ZU

How to Construct a
Frequency Distribution
tables
FourSteps

Title,Table,No,%
1- Put a title
2- Draw Columns & Rows
3- Enumerate the individuals in each
category
4- Calculate The relative frequency (%)
(%)
Ahmed-Refat-ZU

How to Construct a
Frequency Distribution
tables
FourSteps
1- Put a titleeg.,
Distribution of the studied individuals according
to their blood group.
2- Draw a table (Columns & Rows),
Firstcolumn>StudiedVariable

Blood Group,

2ndcolumnheading>Frequency-Number
3rdcolumnheading>Percentage%

Ahmed-Refat-ZU

Frequency Distribution
tables
3- Enumerate the individuals in each
blood group, i.e. individuals with blood group A are 6
and those with blood group B are 6 , AB are 5 and blood group

Make sure that the total number of


individualsinallbloodgroupsis20(the
numberofthestudiedgroup).
O are 3.

Ahmed-Refat-ZU

Frequency Distribution
tables
4- Calculate The relative frequency
(%)ofeachbloodgroupbydividingthe
(%)
frequency of that group over the total
number of individuals and multiplied by
100
i.e.thepercentageofgroupA=6/20x100,andthesamefor
groupAB=5/20x100andgroupO=3/20x100.Thefinal
tablewillbe

:
Ahmed-Refat-ZU

Frequency Distribution
tables What is Your
Conclusion?

Ahmed-Refat-ZU

Frequency Distribution
tables
We can conclude from this table that
blood groups A & B are the most
commongroupsandtherarestisgroup
O(depending on the percentage of each group).
Sopresentingdataintableisbeneficial
in deducing facts and simplify
informationthanrawdata.
Ahmed-Refat-ZU

Frequency Distribution
tables
EXAMPLE (3) : The Following data are
Systolic Blood Pressure measurements
(mmHg) of 30 patients with hypertension.
Presentthesedatainfrequencytable:
150,155,160,154,162,170,165,155,190,186,180,178,
195,200,180,156,173,188,173,189,190,177,186,
177,174,155,164,163,172,160.

???????TypeofData
Ahmed-Refat-ZU

Frequency Distribution
tables
FourSteps
1- Put a titleeg.,
Frequencydistributionofbloodpressure
measurements(mmHg)amongagroupof
hypertensivepatients.
2- Draw a table (Columns & Rows),

Firstcolumn>StudiedVariable Blood Pressure-mm Hg,


2ndcolumnheading>Frequency-Number
3rdcolumnheading>Percentage%

Ahmed-Refat-ZU

Frequency Distribution
tables
3-Inthefirstcolumnwehavetoclassify
blood pressure into categories or
classes because we have a large
sample(N=30)
and the measured variable is of
continuoustype(notdiscreteasintheprevious
examples).
Ahmed-Refat-ZU

Frequency Distribution
tables

construction of classes
Calculate the Range of observation:
subtractthelowestvalueofbloodpressuresfromthehighestvalue
(thehighestwas200andthelowestwas150)thedifferenceis50 .

Determine the number of classes


and the width class intervalsLetclass

intervalbe10,sowewillhave50/10=5classes.
EnumeratetheFrequencyByTallyMethods
Calculate the Exact Frequncy & Relative
frequency
Ahmed-Refat-ZU

Frequency Distribution
tables
construction of classes

Determine the the number of classes You want to


display(nottoofew~2andtoofrequent>8. itisa
matteroftrialandsense!!!
Letclassinterval= 10 mmHg,wewillhave5classes.
Ifwechoose5 mmHgasaclassinterval-widthwewillobtain
10classes(toolongtable).
We must maintain constant width for all intervals .

Choose the upper and lower limits of the


class start with the lowest value i.e
150
Listtheintervalsinorderevery10
Ahmed-Refat-ZU

Ahmed-Refat-ZU

2-Graphical Presentation
The diagram should be:
Simple
Easy to understand
Save a lot of words
Self explanatory
Has a clear title indicating its content
Fully labeled
The y axis (vertical) is usually used for frequency

Ahmed-Refat-ZU

2-Graphical Presentation
Graphicpresentationsusedtoillustrate
and clarify information. Tables are
essential in presentation of scientific
data and diagrams are complementary
to summarize these tables in an easy,
attractiveandsimpleway.

Ahmed-Refat-ZU

Graphical Presentation

1- Bar chart

It is used for presenting discrete or


qualitative data.
It represent the measured value (or %) by
separated rectangles of constant width and
itslengthsproportionaltothefrequency
Type:

>>>Simple ,
>>> Multiple,
>>>Components
Ahmed-Refat-ZU

Graphical Presentation

1- Bar chart- Simple


Meanmaternalageofthreestudiedgroups

Meanageinyears

27
26.5
26
25.5
25
24.5
24

groupI

groupII
Thestudiedgroups

Ahmed-Refat-ZU

groupIII

Graphical Presentation

1- Bar chart

Multiple bar chart: Each observation


has more than one value represented,
by a group of bars. Percentage of
males and females in different
countries, percentage of deaths from
heart diseases in old and young age,
mode of delivery (cesarean or vaginal)
indifferentfemaleagegroups.
Ahmed-Refat-ZU

Graphical Presentation

1- Bar chart-Multiple
Multiple bar chart:
Males
Females

Cancer

Anemia

Ahmed-Refat-ZU

Graphical Presentation

1- Bar chart

Component bar chart : subdivision


of a single bar to indicate the
composition of the total divided into
sections according to their relative
proportion.

Ahmed-Refat-ZU

Graphical Presentation

1- Bar chart

Component bar chart :


For example two countries are compared in
their socio-economic standard of living, each
bar represent one country, the height of the
bar is 100, it is divided horizontally into 3
components(low,moderateandhighclasses)
ofsocio-economicclasses(SE),eachclassis
representedbydifferentcolororshape.
Ahmed-Refat-ZU

Graphical Presentation

1- Bar chartComponent

ComparisonbetweenEgyptandUSAinsocio-economicstandardof
living

percentageofpopulation

100%
80%
high

60%

moderate
low

40%
20%
0%
Egypt

USA

Ahmed-Refat-ZU

Graphical Presentation

2-Pie diagram:

Consistofacirclewhosearearepresents
thetotalfrequency(100%)whichis
dividedinto segments.
Eachsegmentrepresentsaproportional
compositionofthetotalfrequency.

Ahmed-Refat-ZU

Graphical Presentation

2-Pie diagram:

PercentageofcausesofchilddeathinEgypt

congenital
10%

accident
10%

diarrhea
50%
chestinfection
30%

Ahmed-Refat-ZU

Graphical Presentation

3- Histogram:

It is very similar to the bar chart with the


difference that the rectangles or bars are
adherent (without gaps).
Itisusedforpresentingclassfrequencytable
(continuousdata).
Each bar represents a class and its height
represents the frequency (number of cases),
itswidthrepresenttheclassinterval.
Ahmed-Refat-ZU

Graphical Presentation

3- Histogram:
Distributionofstudiedgroupaccordingtotheirheight

numberofindividuals

30
25
20
15
10
5
0

100-

110-

120-

130-

height in cm

Ahmed-Refat-ZU

140-

150-

Graphical Presentation
4 -Frequency Polygon
Derived from a histogram by connecting the
mid points of the tops of the rectangles in
thehistogram.
The line connecting the centers of histogram
rectanglesiscalledfrequencypolygon.
We can draw polygon without rectangles so
wewillgetsimplerformoflinegraph.
A special type of frequency polygon is the

NormalDistributionCurve.
Ahmed-Refat-ZU

Graphical Presentation
5 - Scatter diagram
-

It is useful to represent the

relationship
between
numeric measurements,

two

each
observation being represented by a
pointcorrespondingtoitsvalueoneach
axis

Ahmed-Refat-ZU

Thisscatterdiagramshowedapositiveordirect
relationshipbetweenNAGand
albumin/creatinineamongdiabeticpatients

NAG

CorrelationbetweenNAGandalbumincreatinine
ratioingroupofearlydiabetics
35
30
25
20
15
10
5
0
0

0.05

0.1

0.15

0.2

albumincreatinineratio

Ahmed-Refat-ZU

0.25

0.3

0.35

CorrelationbetweenDopplervelocimetry(RI)and
babybirthweight
1

RI

0.8
0.6
0.4
0.2
0
1.5

2.5

3.5

4.5

babyweightinkg

Innegativecorrelation,thepointswillbe
scatteredindownwarddirection,
meaningthattherelationbetweenthe
twostudiedmeasurementsis
controversiali.e.ifonemeasure
increasestheotherdecreases.As
showninthefollowinggraph
Ahmed-Refat-ZU

Graphical Presentation
6- Line graph:
itisdiagramshowingtherelationshipbetweentwo
numericvariables(asthescatter)butthepointsare
joinedtogethertoformaline(eitherbrokenlineor
smoothcurve)
Changesinbodytemperatureofapatientafteruseofantibiotic
39.5
39

temperature

38.5
38
37.5
37
36.5
36
1

Ahmed-Refat-ZU
timeinhours

Normal Distribution
Curve

Ahmed-Refat-ZU

Normal Distribution
curve
NDCisaGraphical
Presentation<FrequencyPolygon>
ofanyQuantitativeBiologicVariables
TheNormalDistributionCurveisthefrequency polygonofaquantitativevariable
measuredinlargenumber.
Itisaformofpresentationoffrequencydistributionofbiologicvariablessuchas
weights,heights,hemoglobinlevelandbloodpressureoranycontinuousdata.

Itoccupiesamajorroleinthetechniquesofstatistical
analysis.

Ahmed-Refat-ZU

Ahmed-Refat-ZU

Characteristics of Normal
Distribution curve
1- Itisbellshaped,continuouscurve.
2- It is symmetrical i.e.can bedividedinto two equal
halvesvertically.
3- The tails never touch the base line but extended
toinfinityineitherdirection.
4- Themean,medianandmodevaluescoincide
5- Itisdescribedbytwoparameters:arithmeticmean
determinethelocationofthecenterofthecurveand
standard deviation represents the scatter around
themean.
Ahmed-Refat-ZU

Areas under the normal


curve
X 1 SD = 68% of the area on each
sideofthemean.
X2SD=95%ofareaoneachsideof
themean.
X3SD=99%ofareaoneachsideof
themean.

Ahmed-Refat-ZU

Skewed data
Ifwerepresentacollecteddatabya
frequencypolygongraphandthe
resultedcurvedoesnotsimulatethe
normaldistributioncurve(withallitscharacteristics)
thenthesedataare not normally
distributed
Ahmed-Refat-ZU

Causes of Skewed Curve


Not Normally Distributed Data

Thecurvemaybeskewedtotherightortotheleftside

ThisisbecauseThedatacollectedarefrom:
1.

2.

certain heterogeneous group

or from diseased or abnormal population

thereforetheresultsobtainedfromthesedatacannotbeapplied
orgeneralizedonthewholepopulation.
Ahmed-Refat-ZU

NDCcanbeusedindistinguishingbetweennormal
fromabnormalmeasurements.
Example:
IfwehaveNDCforhemoglobinlevelsforapopulation
ofnormaladultmaleswithmeanSD=11
1.5
Ifweobtainahemoglobinreadingforanindividual=
8.1andwewanttoknowifhe/sheisnormalor
anemic.
Ifthisreadinglieswithintheareaunderthecurveat
95%ofnormal (i.e. mean 2 SD)he/she
willbeconsiderednormal.Ifhisreadingisless
thenheisanemic.
Ahmed-Refat-ZU

The normal range for hemoglobin in this


examplewillbe:
thehigherlevelofhemoglobin:11 + 2 ( 1.5 ) =14.
Thelowerhemoglobinlevel11 2 ( 1.5 ) = 8.

i.ethenormalrangeofhemoglobinofadultmales
isfrom8to14.
our sample (8.1 ) lieswithinthe95%ofhis
population.
thereforethisindividualis normalbecausehis
readinglieswithinthe95%ofhispopulation.

Ahmed-Refat-ZU

Data Summarization
To summarize data, we need to use
one or two parameters that can
describethedata.
1.
Measures of Central
tendency
whichdescribesthecenterofthedata
2. and the Measures of Dispersion,
whichshowhowthedataarescattered
arounditscenter.
Ahmed-Refat-ZU

Measures of central tendency


Variableusuallyhasapoint(center)around
which the observed values lie. These
averages are also called measures of
centraltendency.Thethreemostcommonly
usedaveragesare:
1. The arithmetic mean:
2. The Median
3. The Mode
Ahmed-Refat-ZU

1- The arithmetic
mean:
the sum of observation divided by the number
ofobservations:
x =
x
n
Where:x=mean
denotesthe(sumof)
xthevaluesofobservation
nthenumberofobservation
Ahmed-Refat-ZU

1- The arithmetic
mean:
Example: In a study the age of 5
studentswere:12,15,10,17,13
Mean = sum of observations / number
ofobservations
ThenthemeanX=(12+15+10+17
+13)/5=13.4years

Ahmed-Refat-ZU

CalculationofMean
ForfrequencyDistributionData
In case of frequency distribution data we
calculatethemeanbythisequation:
x =

fx
n
wheref=frequency
for example : we want to calculate the
meanincubationperiodofthisgroup.
Ahmed-Refat-ZU

CalculationofMean
ForfrequencyDistributionData

Ahmed-Refat-ZU

CalculationofMean
ForfrequencyDistributionData

with class intervals

If data is presented in frequency table


with class intervals we calculate mean
by the same equation summation of f
x1/n,x1denotesthemidpointofclass
interval.
Example : calculate the mean of blood
pressureofthefollowinggroup:
Ahmed-Refat-ZU

Ahmed-Refat-ZU

Ahmed-Refat-ZU

2- Median
It is the middle observation in a series
of observation after arranging them in
anascendingordescendingmanner.
The rank of median for is (n + 1)/2 if
thenumberofobservationisodd
andn/2ifthenumberiseven
Ahmed-Refat-ZU

2- Median
Calculatethemedianofthefollowing

data5,6,8,9,11n = 5~ Odd!!
-Therankofthemedian=n + 1 / 2
i.e.(5+ 1)/ 2 = 3
The median is the third value in these groups
when data are arranged in ascending (or
descending)manner.
-So the median is 8 (the third value)
Ahmed-Refat-ZU

2- Median
- If the number of observation is even, the
medianwillbecalculatedasfollows:
e.g. 5, 6, 8, 9
n=4
-Therankofmedian= n / 2i.e.4/2=2.The
medianisthesecondvalueofthatgroup.Ifdata
arearrangedascendinglythenthemedianwillbe
6 and if arranged descendingly the median will
be8thereforethemedianwill be the mean of
both observationsi.e.(6+8)/2=7.
Ahmed-Refat-ZU

2- Median
For simplicity we can apply the same
equationusedforoddnumbersi.e.
n + 1 / 2. The median rank will be 4 +
1 /2 = 2 i.e. the median will be the
secondandthethirdvaluesi.e.6and
8,taketheirmean=7.

Ahmed-Refat-ZU

3- Mode
Themostfrequentoccurringvalueinthedata
isthemodeandiscalculatedasfollows:
Example: 5, 6, 7, 5, 10. The mode in this
data is 5 since number 5 is repeated twice.
Sometimes, there is more than one mode
andsometimesthereis no modeespecially
insmallsetofobservations.

Ahmed-Refat-ZU

3- Mode
Example : 20 , 18 , 14, 20, 13, 14, 30,
19.Therearetwomodes14and20.
Example : 300, 280 , 130, 125 , 240 ,
270.Hasnomode.

UnimodalBimodalNomodal
Ahmed-Refat-ZU

Advantages and disadvantages

the measures of
central Tendency:

of

- Mean: is the preferred CTM since it


takes into account each individual
observation but its main disadvantage
isthatitisaffectedbytheextremevalus
ofobservations.

Ahmed-Refat-ZU

Advantages and disadvantages

the measures of
central Tendency:

of

Median: it is a useful descriptive


measure if there are one or two
extremelyhighorlowvalues.
-Mode:isseldomused.

Ahmed-Refat-ZU

Measures of
Dispersion
The measure of dispersion describes the
degreeofvariationsorscatterordispersion
of the data around its central values:
(dispersion=variation=spread=scatter).

1.

Range-R

2.

Variance-V

3.

StandardDeviation-SD

4.

CoefficientofVariation-COV
Ahmed-Refat-ZU

1-Range:
is the difference between the largest and
smallestvalues.
isthesimplestmeasureofvariation.
disadvantages,itisbasedonlyontwoof
theobservationsandgivesnoideaofhowthe
otherobservationsarearrangedbetween
thesetwo.
Also,ittendstobelargewhenthesizeof
thesampleincreases
Ahmed-Refat-ZU

2-Variance

If we want to get the average of


differences between the mean and
each observation in the data,
wehavetoreduceeachvaluefromthe
mean
andthensumthesedifferencesand
divideitbythenumberofobservation.
V=(meanxi)/n
Ahmed-Refat-ZU

2-Variance
VarianceV=(meanx)/n
Thevalueofthisequationwillbeequal
tozero
because the differences between each value and the
mean will have negative and positive signs that will
equalize zero on algebraic summation.

Ahmed-Refat-ZU

2-Variance
Toovercomethiszerowesquarethe
differencebetweenthemeanandeachvalue
sothesignwillbealwayspositive
.Thusweget:
V

= (mean x)2 / n - 1

Ahmed-Refat-ZU

3- Standard Deviation
SD
The main disadvantage of the variance
isthatitisthesquareoftheunitsused.
So,itismoreconvenienttoexpressthe
variation in the original units by taking
the square root of the variance. This is
called the standard deviation (SD).
ThereforeSD=V
i.e.SD = (mean x)2 / n - 1
Ahmed-Refat-ZU

4- Coefficient of
variation CoV

The coefficient of variation expresses the


standard deviation as a percentage of the
samplemean.

C. V = SD / mean * 100
C.V is useful when, we are interested in the
relativesizeofthevariabilityinthedata.
Example : if we have observations 5, 7, 10, 12
and 16. Their mean will be 50/5=10. SD =
(25+9+0+4+36)/(5-1)=74/4=4.3
C.V.=4.3/10x100=43%
Ahmed-Refat-ZU

Example
Calculate the mean, variance, SD and CV
Fromthefollowingmeasurements
5,7,10,12and16.

Mean=5+7+10+12+16/5=10.
SD=(25+9+0+4+36)/ (5-1)=
74/4=4.3

C.V.=4.3/ 10x100=43%
Ahmed-Refat-ZU

Example
Another observations are 2, 2, 5, 10, and 11. Their
mean=30/5=6
SD=(16+16+1+16+25)/(51)=74/4
=4.3
C.V=4.3/6x100=71.6%
Both observations have the same SD but they are
different in C.V. because data in the first group is
homogenous (so C.V. is not high), while data in the
second observations is heterogenous (so C.V. is
high).

Ahmed-Refat-ZU

Example
Example: In a study where age was
recorded the following were the
observed values: 6, 8, 9, 7, 6. and the
numberofobservationswere5.
Calculate the mean, SD and range,
modeandmedian.
Themean=sumofobservation/
theirnumber
Ahmed-Refat-ZU

Examples
The variance = Sum of the squared
differences (mean minus observation) /
number of observations. (7.2 6)2 +
(7.28)2+(7.29)2+(7.27)2+(7.2
6)2/51.whichisequalto(1.2) 2+(-
0.8)2+(-1.8)2+(0.2)2+(1.2)2/4=1.7
-Sothevariance=1.7
Ahmed-Refat-ZU

Examples
-TheS.D.=1.7=1.3
Range=96=3
Themodeis6
Themedianis:firstwehavetoarrange
dataascendinglyi.e.66789.
Therankofmedian=n+1/2i.e.5+1/2=
3 therefore the median is the third value i.e.
median=7
Ahmed-Refat-ZU

Inferential statistics
Inference

involves
making
a
Generalization about a larger group
ofindividualsonthebasisofasubsetor
sample.

Ahmed-Refat-ZU

Inferential statistics
Hypothesis Testing
Inhypothesistestingwewanttofindout
whether the observed variation among
samplingisexplainedbychance alone
???? (i.e., the chance of random sampling
variations ),orduetoa real difference
????betweengroups.

Ahmed-Refat-ZU

Hypothesis Testing
Itinvolvesconductingatestofstatistical
significance quantifying the chance of
random
sampling variations that may
accountforobservedresults.
Inhypothesestesting,weareaskingwhether
the sample mean for example is consistent
with a certain hypothesis value for the
populationmean.
Ahmed-Refat-ZU

Hypothesis Testing
The method of assessing the
hypotheses testing is known as
significance test.

The significance testing


is a method for assessing whether a
result is likely to be due to chance or
duetoarealeffect.
Ahmed-Refat-ZU

Hypothesis Testing
Steps
>>>FormulateHypothesis
>>>CollecttheData
>>>>TestYourHypothesis
>>>AcceptofRejectYourHypothesis

Ahmed-Refat-ZU

Null and alternative


hypotheses
Inhypothesestesting,aspecifichypothesis(Null
andalternativeHypothesis)areformulatedand
tested.
The null hypotheses H0 means:X1=X2
OrX1-X2=0
thismeansthatthereisnodifferencebetweenx1
andx2
The alternative hypotheses H1 means
X1>X2 or X1< X2
Ahmed-Refat-ZU

Null and alternative


hypotheses
The alternative hypotheses H1 means
X1>X2 or X1< X2
this means that there is no difference
betweenx1andx2.
If we reject the null hypothesis, i.e there is a
difference between the two readings, it is
eitherH1:x1<x2orH2:x1>x2
inotherwordsthenullhypothesisisrejected
becausex1isdifferentfromx2.
Ahmed-Refat-ZU

General principles of
significance tests
1. set up a null hypothesis and its

alternative.
2. findthevalueoftheteststatistic.
3. referthevalueoftheteststatistictoa
known distribution which it would
followifthenullhypothesiswastrue.

Ahmed-Refat-ZU

General principles of
significance tests
4-concludethatthedataareconsistentor
inconsistentwiththenullhypothesis.
If the data are not consistent with the
nullhypotheses,thedifferenceissaidto
bestatisticallysignificant.Ifthedataare
consistent with the null hypotheses it is
said that we accept it i.e. statistically
insignificant.
Ahmed-Refat-ZU

General principles of
significance tests P<0.05
In medicine, we usually consider that
differences are significant if the
probabilityislessthan0.05.Thismeans
that if the null hypothesis is true, we
shallmakeawrongdecisionlessthan5
inahundredtimes

Ahmed-Refat-ZU

Tests of significance
The selection of test of significance depends
essentiallyonthetypeofdatathatwehave.
1-Quantitative Data ( Means & SD): t

test ,paired

t test and ,ANOVA


2-QualitativeData>>>Chi, and Z test
.
Ahmed-Refat-ZU

Tests of significance
Comparison of means:
1-comparingtwomeansoflargesamplesusingthe
normaldistribution:
(ztestorSNDstandardnormaldeviate)
Ifwehavealargesamplesizei.e.60ormoreand
itfollowsanormaldistributionthenwehavetouse
thez-test.
z = (population mean sample mean) /

SD. If the result of z >2 then there is


significant difference.
Ahmed-Refat-ZU

Tests of significance
Since the normal range for any
biological reading lies between the
meanvalueofthepopulationreading
2 SD. (this range includes 95% of the
area under the normal distribution
curve).

Ahmed-Refat-ZU

Students t-test
2-Comparing two means of small
samplesusingt-test:
If we have a small sample size (less
than 60), we can use the t distribution
insteadofthenormaldistribution.
T = mean1 mean2 /(SD1 2 / n1) +
(SD22/n2)
Ahmed-Refat-ZU

t-test
Thevalueoftwillbecomparedtovaluesin
thespecifictableof"tdistributiontest"atthe
valueofthedegreeoffreedom.Ifthevalueof
t is less than that in the table , then the
differencebetweensamplesisinsignificant.
Ifthetvalueislargerthanthatinthetableso
the difference is significant i.e. the null
hypothesisisrejected.
Ahmed-Refat-ZU

t-test
2-Comparing two means of small
samplesusingt-test:
If we have a small sample size (less
than 60), we can use the t distribution
insteadofthenormaldistribution.
T = mean1 mean2 /(SD1 2 / n1) +
(SD22/n2)
Ahmed-Refat-ZU

Paired t-test
3-pairedt-test:
If we are comparing repeated
observation in the same individual or
difference between paired data, we
have to use paired t-test where the
analysis is carried out using the mean
andstandarddeviationofthedifference
betweeneachpair.
Ahmed-Refat-ZU

ANOVA
4-comparingseveralmeans:
Sometimesweneedtocomparemore
thantwomeans,thiscanbedonebythe
useofseveralt-testwhichisnotonly
tediousbutcanleadtospurious
significantresults.Thereforewehaveto
usewhatwecallanalysisofvarianceor
ANOVA.
Ahmed-Refat-ZU

ANOVA
4-comparingseveralmeans:
Therearetwomaintypes:one-wayanalysisof
varianceandtwo-wayanalysisofvariance.Onewayanalysisofvarianceisappropriatewhenthe
subgroupstobecomparedaredefinedbyjust
onefactor,forexamplecomparisonbetween
meansofdifferentsocio-economicclasses.The
two-wayanalysisofvariablesisusedwhenthe
subdivisionisbaseduponmorethanonefactor

Ahmed-Refat-ZU

ANOVA
The main idea in the analysis of variance is
that we have to take into account the
variability within the groups and between the
groups and value of F is equal to the ratio
between the means sum square of between
thegroupsandwithinthegroups.
F=between-groupsMS/within-groupsMS

Ahmed-Refat-ZU

Chi-Squared Test
b-Qualitative variables:
1)Chi -squared test:
Qualitative data are arranged in table
formed by rows and columns. One
variable define the rows and the
categories of the other variable define
thecolumn.
Ahmed-Refat-ZU

Chi-Squared Test
A chi-squared test is used to test whether
there is an association between the row
variable and the column variable or, in other
words whether the distribution of individuals
among the categories of one variable is
independent of their distribution among the
categoriesoftheother.
X2=(O-E)2

/E

Ahmed-Refat-ZU

Chi-Squared Test
1)Chi -squared test:
degreeoffreedom=(row-1)(column-
1)
O=observedvalueinthetable
E=expectedvaluecalculatedasfollows:
E=Rt x Ct / GT
totalofrowxtotalofcolumn/grandtotal
Ahmed-Refat-ZU

Ahmed-Refat-ZU

Chi-Squared Test
FromtablesofX2significanceat
degreeoffreedom(row3-1)x(column31)=2x2=4.Thelevelofsignificanceat
0.05level,d.f.=4is9.48.thereforewe
concludethatthereissignificantrelation
betweensocioeconomiclevelandthe
degreeofintelligence(becausethe
valueofX2>thatofthetable).
Ahmed-Refat-ZU

Z Test
2)Ztestforcomparingtwopercentages:

z = p1 p2

/p1q1/n1 + p2q2/n2.

where p1=percentage in the 1st group. P2 =


percentage in the 2nd group, q1=100-p1,
q2=100-p2, n1= sample size of group 1,
n2=sample size of group2.Z test is
significant(at0.05level)iftheresult>2.

Ahmed-Refat-ZU

Chi-Squared Test
Example:ifthenumberofanemicpatientsin
group 1 which includes 50 patients is 5 and
the number of anemic patients in group 2
which contains 60 patients is 20. To find if
groups 1 & 2 are statistically different in
prevalenceofanemiawecalculateztest.
P1=5/50=10%p2=20/60=33%q1=10010=90q2=100-33=67
Ahmed-Refat-ZU

Chi-Squared Test
Z=1033/10x90/50+33x67/60
Z=23/18+36.85z=23/7.4
z=3.1
Therefore there is statistical significant
difference between percentages of
anemia in the studied groups (because
z>2).
Ahmed-Refat-ZU

Correlation &
regression
c-Correlation and regression:
Correlation measures the closeness of
theassociationbetweentwocontinuous
variables, while linear regression gives
theequationofthestraightlinethatbest
describesandenablesthepredictionof
onevariablefromtheother.
Ahmed-Refat-ZU

Correlation &
regression
1-Correlation:
In the correlation, the closeness of the
association is measured by the correlation
coefficient,r.Thevaluesofrrangesbetween+
1and1.
Onemeansperfectcorrelationwhile0means
nocorrelation.Ifrvalueisnearthezero,it
meansweakcorrelationwhileneartheoneit
meansstrongcorrelation.Thesignand+
denotesthedirectionofcorrelation,
Ahmed-Refat-ZU

Correlation
1-Correlation:
the +ve correlation means that if one
variable increases the other one
increases similarly while for the ve
correlation means that when one
variable increases the other one
decreases
Ahmed-Refat-ZU

Linear regression
2- Linear regression:
Similar to correlation, linear regression
is used to determine the relation and
prediction of the change in a variable
due to changes in other variable. For
linearregression,theindependentfactor
has to be specified from the dependent
variable.
Ahmed-Refat-ZU

Linear regression
2- Linear regression:
The linear regression, not only allow assessment
of the presence of association between the
independent and dependent variable but also
allows the prediction of dependent variable for a
particular independent variable. However,
regression for prediction should not be used
outside the range of original data. a t-test is also
used for the assessment of the level of
significance. The dependent variable in linear
regressionmustbeacontinuousone.
Ahmed-Refat-ZU

CorrelationbetweenDopplervelocimetry(RI)and
babybirthweight
1
0.8
RI

0.6
0.4
0.2
0
1.5

2.5

babyweightinkg
Ahmed-Refat-ZU

3.5

4.5

Multiple

regression

3-Multiple regression:
Situations frequently occur in which we
are interested in the dependency of a
dependent variable on several
independent variables, not just one.
Testofsignificanceusedistheanalysis
ofvariance.(Ftest).
Ahmed-Refat-ZU

1. How do you select a representative

sample of 100 students from a


primary school Use all possible
methods of sample selection
2. How to select a primary school from
a rural area and another school
from an urban area in Egypt?
Ahmed-Refat-ZU

What Type of Sample is?


1. Lotterytoselectawinner
2. HospitalizedPatientswithSLE
3. Every6thpatientcomingtoan

outpatientclinic
4. Random20femalesand20males
outofgroupof100person
5. Allworkersinafactorychosenfrom
allfactoriesincertaingovernorate
Ahmed-Refat-ZU

Present the following data


by a suitable table & graph
Infantmortalityratesin2006insome
countrieswereasfollows:Egypt
=25/1000,USA=10/1000,Sweden
12/1000andPakistan=30/1000

Ahmed-Refat-ZU

Present the following data


by a suitable table & graph
Athebodyweight(Kg)ofagroupof
malechildrenwereasfollow:
12-22-18-17-28-20-16-21-19-16-27-21Kg
andforagroupoffemalechildrenwere
asfollows:
16-23-19-29-18-22-17-15-21-21-24Kg

Ahmed-Refat-ZU

Theweight(Kg)ofapregnant

Ahmed-Refat-ZU

Ahmed-Refat-ZU

You might also like