You are on page 1of 23

5.

1 Gii thiu

Chng 5.
Nhp v x l d liu
Mn hc: Phng php nghin cu kinh t
Khoa Kinh t Pht trin
i hc Kinh T TP. H Ch Minh

Nhm hng dn sinh vin cch:


Cch nhp liu, x l v phn tch d liu.
Cc k thut phn tch d liu mang tnh
khm ph (exploratory data analysis).
Cch s dng bng cho (cross-tabulation)
trc nghim mi quan h gia cc bin
phn loi (categorical variables).
Cch s dng cc thng k phn tch trc
nghim gi thit.
2

5.2 Quy trnh phn tch d liu


Hnh 5.1 Cc bc khm
ph, trc nghim v phn
tch trong qu trnh nghin
cu

Lp cng NC

Thu thp v chun


b d liu

5.3 Nhp s liu


5.3.1 Cch b tr d liu trn my
tnh
Mc tiu:

K hoch phn tch s


khi
Xc nh li gi thit
Th hin trc quan d liu

Phn tch v din gii d liu

Nhm to iu kin thun tin cho vic


nhp liu
Nhm to s thun li cho vic chnh sa
d liu

Phn tch m t cc bin s


Lp bng cho cho cc bin s
Trnh by d liu
(histogram, boxplots, Pareto, stem-andleaf, AID, etc.)
Phn tch d liu

Trc nghim gi thit

Bo co nghin cu

Ra quyt nh

5.3 Nhp s liu

5.3 Nhp s liu

Thc hin:

Nguyn tc chung: t tn bin ngn gn, vit


tt (ting Vit khng du hoc ting Anh). Tn
bin nn c t theo quy nh.
Dng Excel: d thao tc v chnh sa, khng
gian lu tr hn ch, cng c thng k v kinh
t lng khng cho phn tch.
Dng SPSS: khng gian lu tr gn nh khng
hn ch, cng c thng k v kinh t lng pht
trin y cho nhu cu phn tch. Khai bo d
liu bt buc, mt thi gian.

Hnh 5. 2 Cch
nhp d liu vo
bng tnh SPSS

5.3 Nhp s liu

nh ngha kiu bin

Hnh 5.3 Cch nh ngha cc thuc tnh ca cc bin s nh tnh v


nh lng
7

Xc nh nhn (gii thch) ca bin

Xc nh gi tr phn loi ca bin

Xc nh thang o ca bin

10

5.4 Lm sch d liu


5.4.1 Pht hin gi tr d bit trong d liu
a. S dng Excel: hm Max v Min, cng c Auto Filter,
th Scatter

11

12

5.4 Lm sch d liu

5.4 Lm sch d liu


Hnh 5.4 Cng c
th Scatter trong Excel

5.4.1 Pht hin gi tr d bit trong d liu


b. S dng SPSS: th Scatter, cng c Frequency, Bar
Chart, Pie Chart, v Box Plot trong Explore

13

14

5.4 Lm sch d liu

5.4 Lm sch d liu

b. S dng SPSS: th Scatter

b. S dng SPSS: cng c Frequency, Explore

80

Others
Honda @

70

Honda Dream
60
SYM Attila
50

Yamaha Cygnus
Honda Wave

40

Yamaha Jupiter
30
Yamaha Sirius
20

Honda Future Neo


Honda AirBlade

10
0

10

20

30

40

Number of used days in a month

15

Hnh 5.6 Cng c Frequency v Explore trong SPSS

16

5.4 Lm sch d liu

5.4 Lm sch d liu

b. S dng SPSS: cng c Frequency

b. S dng SPSS: cng c Pie Chart v Bar Chart

Frequency
Honda Air Blade

10

Percent
10.0

%Valid
10.0

Cumulative
Percent
10.0

30
Others
Honda AirBlade
10.0%
10.0%
20

Honda @

Honda Future Neo

8.0

8.0

18.0

Honda Future Neo


7.0%
8.0%

7.0

7.0

25.0

Yamaha Jupiter

13

13.0

13.0

38.0

Honda Wave

24

24.0

24.0

62.0

4.0

4.0

66.0

11

11.0

11.0

77.0

Honda Dream

6.0

6.0

83.0

Honda @

7.0

7.0

90.0

Others

10

10.0

10.0

100.0

Total

100

100.0

100.0

Honda Dream
Yamaha Sirius
7.0%
SYM Attila
11.0%

Yamaha Cygnus

SYM Attila

10

6.0%

s
er
th
O a @am
d re
on D
Hda
s
on
nl au
H
ygtti
CA
aM
aShY
e
am
Y
aver
Wpit
daJu
on a
H ah
uso
i ri e
am
SN
Y
a
re
auhtu
amF
Yda
de
la
on
H
irB
A
da
on
H

Yamaha Sirius

Yamaha Cygnus

Yamaha Jupiter

13.0%

4.0%
Honda Wave
24.0%

Motobike Names

17

18

5.4 Lm sch d liu

5.4 Lm sch d liu

b. S dng SPSS: cng c Histogram

b. S dng SPSS: cng c Histogram

Biu histogram l mt gii php quy c


dng th hin cc d liu t l hoc
khong cch.
Biu histogram c s dng phn
nhm cc gi tr d liu ca cc bin s
(variable) thnh cc khong cch.
Biu histogram c xy dng di
dng cc thanh th hin gi tr d liu.
19

Biu histogram rt hu dng cho vic: (1)


th hin tt c cc khong cch trong mt
phn phi (distribution), v (2) trc nghim
dng hnh ca phn phi nh mo
(skewness), nhn (kurtosis).

Ghi ch: Biu histogram khng dng c


cho cc bin danh ngha.

20

5.4 Lm sch d liu

5.4 Lm sch d liu

b. S dng SPSS: cng c Histogram

b. S dng SPSS: biu Thn-v-L (Stem-and-Leaf Displays)


V d 5.2 Phn phi
bin s tui ca
ngi s dng xe
my

30

20

Khi biu thn-v-l c quay tri


900 , n s c dng hnh tng t nh
biu histogram.

10

Std. Dev = 14.42


Mean = 39
N = 100.00

Mi dng ca biu c gi l mt
thn; v mi s liu th hin trn mt
thn gi l mt l.

20 25 30 35 40 45 50 55 60 65 70 75

Age of motorbike user

21

22

5.4 Lm sch d liu

5.4 Lm sch d liu

b. S dng SPSS: biu Thn-v-L (Stem-and-Leaf Displays)

b. S dng SPSS: biu hp (Box-Plots)

Age of motorbike user Stem-and-Leaf Plot


Frequency
6.00
18.00
8.00
13.00
4.00
12.00
13.00
10.00
9.00
2.00
4.00
.00
1.00
Stem width:
Each leaf:

Stem & Leaf


1.
2.
2.
3.
3.
4.
4.
5.
5.
6.
6.
7.
7.

889999
000111122222233344
55677788
0012233334444
5556
123333334444
5555566777789
0123344444
566667779
03
5567

5.3 Biu Thn-v L


ca bin s Tui ca
ngi s dng xe my

Biu hp, hay cn gi l biu hp-v-ru (boxand-whisker plot), cho ta mt hnh nh trc quan khc
v v tr, phn tn, dng hnh, di ui v cc gi
tr bt thng (outliers) ca phn phi.
Biu hp th hin tm tt 5 gi tr thng k ca mt
phn phi l trung v (median), hai t phn v trn v
di (the upper and lower quartiles), v cc gi tr quan
st ln nht v nh nht

10
1 case(s)

23

24

5.4 Lm sch d liu

5.4 Lm sch d liu

b. S dng SPSS: biu hp (Box-Plots)

b. S dng SPSS: biu hp (Box-Plots)

Cc thnh phn ch yu ca biu hp l:


Hp hnh ch nht cha ng 50% cc gi tr d
liu.
ng thng trung tm hp l gi tr trung v.
Hai l ca hp th hin hai gi tr t phn v th 1 v
th 3 (tng ng vi gi tr th 25% (25th
percentile) v gi tr th 75% (75th percentile) ca
dy s liu.
Cc ru ko di t l pha trn v pha di ca
hp th hin gi tr ln nht v nh nht. Cc gi tr
ny nm trong khong ti a 1,5 ln khong cch
gia cc t phn v tnh t l ca hp.
25

5.4 Lm sch d liu

Cc gi tr ln hn 3 ln so vi di ca hp tnh t
gi tr t phn v th 3 (75th percentile) (extremes)
Cc gi tr ln hn 1,5 ln so vi di ca hp tnh
t gi tr t phn v th 3 (75th percentile) (outliers)
Gi tr ln nht quan st c khng phi l
gi tr bt thng

50% trng
hp c gi
tr nm
trong hp

T phn v th 3 (75th PERCENTILE)

Trung v (MEDIAN)

T phn v th 1 (25th PERCENTILE)

Gi tr ln nht quan st c khng phi l gi tr


bt thng
Cc gi tr ln hn 1,5 ln so vi di ca hp tnh
t gi tr t phn v th 1 (25th percentile) (outliers)
Cc gi tr ln hn 3 ln so vi di ca hp tnh t
gi tr t phn v th 1 (25th percentile) (extremes)

26

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng

b. S dng SPSS: biu hp (Box-Plots)

S dng Excel: cng c Descriptives


Statistics trong chc nng Data Analysis.
S dng SPSS: cng c Frequency,
Descriptives, Explore trong chc nng
Descriptive Statistics ca SPSS.

100

80

60

40

5.4 Biu hp ca bin


s Tui ca ngi s dng
xe my v s ngy s
dng trong thng

20

0
N=

100

100

Age of motorbike use

Number of used days

27

28

5.5 Phn tch thng k m t

5.5 Phn tch thng k m t

5.5.1 Phn tch thng k m t nh lng

5.5.1 Phn tch thng k m t nh lng

Cc ch tiu thng k m t :
xu hng trung tm,
tnh bin thin v
dng hnh phn phi ca d liu.

o lng xu hng trung tm (Measures of Central


Tendency)

Gi tr trung bnh (mean) l tng tt c gi tr ca cc d liu


chia cho s lng ca d liu.

Trung v (median) l gi tr ca s liu c v tr nm gia b


s liu sp xp theo trt t. y chnh l im gia ca phn
phi. Khi s quan st l chn, trung v l gi tr trung bnh ca
hai quan st v tr trung tm.

Mode l gi tr ca quan st c tn sut xut hin nhiu nht


trong b d liu.

Khong cch (range) l gi tr khc bit gia con s ln nht


v nh nht trong b d liu.
29

30

5.5 Phn tch thng k m t

5.5 Phn tch thng k m t

5.5.1 Phn tch thng k m t nh lng

5.5.1 Phn tch thng k m t nh lng

o lng tnh bin thin (Measures of Variability)


Phng sai (Variance; 2) l trung bnh tng cc sai s
bnh phng gia cc gi tr ca cc quan st v gi tr
trung bnh.
lch chun (Standard deviation; SD; ) o lng
mc phn tn ca s liu xung quanh gi tr trung
bnh.
Sai s chun ca gi tr trung bnh (Standard error of
the mean; s.e.) o lng phm vi m gi tr trung bnh
ca qun th () c th xut hin vi mt xc sut cho
trc da trn gi tr trung bnh ca mu (mean).

o lng dng hnh ca phn phi (Measures of Shape)

mo (skewness) o lng lch ca phn phi v


mt trong hai pha.

Phn phi mo tri (negative skew, left-skewed) khi


ui pha tri di hn, v phn ln s liu tp trung
pha phi ca phn phi.

Phn phi mo phi (positive sknew, right-skewed) khi


ui pha phi di hn, v phn ln s liu tp trung
pha tri ca phn phi.

Khi lch phi, gi tr sknewness dng; khi lch tri, gi


tr skewness m. mo cng ln th gi tr sknewness
cng ln hn 0.

31

32

5.5 Phn tch thng k m t

5.5 Phn tch thng k m t

5.5.1 Phn tch thng k m t nh lng

5.5.1 Phn tch thng k m t nh lng

Hnh 5.11 Cc dng phn phi lch tri v lch phi so vi phn
phi bnh thng

Hnh 5.10 ng phn phi chun v cc c tnh


33

5.5 Phn tch thng k m t

34

5.5 Phn tch thng k m t


Phn tch thng k m t vi SPSS: cng c Descriptive

5.5.1 Phn tch thng k m t nh lng


o lng dng hnh ca phn phi (Measures of Shape)

nhn (kurtosis) o lng mc nhn hay bt ca


phn phi so vi phn phi bnh thng (c nhn
bng 0). Phn phi c dng nhn khi gi tr kurtosis
dng v c dng bt khi gi tr kurtosis m.
Vi phn phi bnh thng, gi tr ca mo v
nhn bng 0. Cn c trn t s gia gi tr skewness v
kurtosis v sai s chun ca n, ta c th nh gi phn
phi c bnh thng hay khng (khi t s ny nh hn 2 v ln hn +2, phn phi l khng bnh thng).

35

Hnh 5.13 Cc chc nng thng k m t ca cng c Descriptives

36

5.5 Phn tch thng k m t

5.5 Phn tch thng k m t

Phn tch thng k m t vi SPSS: cng c Descriptive


Bng 5.6 Thng k m t cc bin s Tui ca ngi s dng xe my
Statistic
Age of motorbike user

Phn tch thng k m t vi SPSS: cng c Explore


Cng c Explore rt thch hp thng k m t chi tit cc bin s
phn nhm theo mt bin phn loi khc (factor variable).

Std. Error

100

Range

58

Minimum

18

Maximum

76

Mean

39.01

Std. Deviation

14.42

Variance

1.44

207.909

Skewness

Kurtosis

.242

.241

-.948

.478
37

5.7 Thng k m t cc bin s Tui ca ngi s dng xe my v s


ngy s dng trong thng phn theo gii tnh
Age of motorbike user

User
gender
female

Mean
95% Confidence
Interval for Mean

Lower
Bound

Upper
Bound

Number of used
days in a month

Statistic

Std. Error

Statistic

Std.
Error

38.46

2.11

20.71

1.07

34.19

18.54

42.74

22.88

5% Trimmed Mean

38.13

20.95

Median

41.00

22.00

183.205

47.212

13.54

6.87

Variance

Std. Deviation

38

5.7 Thng k m t cc bin s Tui ca ngi s dng xe my v s


ngy s dng trong thng phn theo gii tnh
male

Mean
95% Confidence
Interval for Mean

228.173

60.460

15.11

7.78

18

Maximum

76

32

Range

58

27

65

30

Interquartile Range

Range

46

23

Skewness

Kurtosis

Kurtosis

-.513

.369

-1.089

.724

-.838

39
.724

21.79

19.90

19

11.00

43.33

21.00

Minimum

.369

Upper Bound

38.87

Std. Deviation

.118

17.74

42.00

Variance

23.00

19.76

Median

Maximum

Skewness

1.97

35.45

5% Trimmed Mean

Minimum

Interquartile Range

39.39
Lower Bound

28.00

1.01

15.00

.292

.311

-.175

.311

-.932

.613

-1.271

.613
40

5.5 Phn tch thng k m t

5.5 Phn tch thng k m t

5.5.2 Phn tch thng k m t nh tnh

5.5.2 Phn tch thng k m t nh tnh

a. S dng cng c Basic Table trong SPSS

a. S dng cng c Basic Table trong SPSS

41

5.5 Phn tch thng k m t

42

5.5 Phn tch thng k m t

5.5.2 Phn tch thng k m t nh tnh

5.5.2 Phn tch thng k m t nh tnh

a. S dng cng c Basic Table trong SPSS

b. S dng cng c Bng cho (Cross-Tabulation) trong SPSS

Bng cho l mt k thut dng so snh d liu t hai hoc


nhiu hn cc bin phn loi hoc danh ngha (categorical or
nominal variables), v d nh l gii tnh. Bng cho s dng
cc bng c cc ct v dng th hin cc mc hoc cc gi
tr m ha ca tng bin phn loi hoc danh ngha.

Bng cho l bc u tin xc nh cc quan h gia cc


bin. Khi bng cho c xy dng trc nghim thng k,
ta gi chng l bng contingency (contingency tables), v loi
trc nghim dng nh gi liu cc bin phn loi c c
lp vi nhau hay khng l 2 (Chi bnh phng / chi-square).

Bng. Phn b nhm tui ca ngi s dng xe my theo nhn hiu

Motobike
Names

Honda AirBlade
Honda Future Neo
Yamaha Sirius
Yamaha Jupiter
Honda Wave
Yamaha Cy gnus
SYM Attila
Honda Dream
Honda @
Others

under 20
Count
Row %
2
20.0%

4.2%

27.3%

under 30
Count
Row %
3
30.0%
4
50.0%
1
14.3%
4
30.8%
2
8.3%
1
25.0%
4
36.4%
3
50.0%
2
28.6%
2
20.0%

Age groups
under 40
under 50
Count
Row %
Count
Row %
3
30.0%
1
10.0%
2
25.0%
1
14.3%
1
7.7%
4
30.8%
8
33.3%
7
29.2%
1
25.0%
1
25.0%
1
9.1%
2
18.2%
1
16.7%
1
16.7%
1
14.3%
2
20.0%
5
50.0%

under 60
Count
Row %
1
10.0%
2
25.0%
2
28.6%
4
30.8%
5
20.8%
1
25.0%

older than 60
Count
Row %

42.9%

4.2%

1
1

9.1%
16.7%

10.0%

57.1%

43

44

5.5 Phn tch thng k m t

5.5 Phn tch thng k m t

5.5.2 Phn tch thng k m t nh tnh

5.5.2 Phn tch thng k m t nh tnh

b. S dng cng c Bng cho (Cross-Tabulation) trong SPSS

b. S dng cng c Bng cho (Cross-Tabulation) trong SPSS

45

5.5 Phn tch thng k m t

5.5 Phn tch thng k m t

Bng. Phn b gii tnh ca ngi s dng xe my theo nhn hiu

Count

Mot obike
Names

Tot al

Honda AirBlade
Honda Fut ure Neo
Yamaha Sirius
Yamaha Jupiter
Honda Wav e
Yamaha Cy gnus
SYM Att ila
Honda D ream
Honda @
Others

Bng. Phn b gii tnh ca ngi s dng xe my theo nhn hiu


User gender * Motobike Names Crosstabulation

Motobike Names * User gender Crosstabulation

User gender
f emale
male
3
7
4
4
3
4
6
7
9
15
2
2
5
6
2
4
3
4
4
6
41
59

46

Honda
Honda
AirBlade Future Neo
User gender f emale Count
3
4
Expected Count
4.1
3.3
% within User gender
7.3%
9.8%
% within Motobike Names
30.0%
50.0%
% of Total
3.0%
4.0%
male
Count
7
4
Expected Count
5.9
4.7
% within User gender
11.9%
6.8%
% within Motobike Names
70.0%
50.0%
% of Total
7.0%
4.0%
Total
Count
10
8
Expected Count
10.0
8.0
% within User gender
10.0%
8.0%
% within Motobike Names 100.0%
100.0%
% of Total
10.0%
8.0%

Tot al
10
8
7
13
24
4
11
6
7
10
100

47

Yamaha
Sirius
3
2.9
7.3%
42.9%
3.0%
4
4.1
6.8%
57.1%
4.0%
7
7.0
7.0%
100.0%
7.0%

Motobike Names
Yamaha
Yamaha
Jupiter Honda Wave Cy gnus SYM Attila Honda Dream Honda @
6
9
2
5
2
3
5.3
9.8
1.6
4.5
2.5
2.9
14.6%
22.0%
4.9%
12.2%
4.9%
7.3%
46.2%
37.5%
50.0%
45.5%
33.3%
42.9%
6.0%
9.0%
2.0%
5.0%
2.0%
3.0%
7
15
2
6
4
4
7.7
14.2
2.4
6.5
3.5
4.1
11.9%
25.4%
3.4%
10.2%
6.8%
6.8%
53.8%
62.5%
50.0%
54.5%
66.7%
57.1%
7.0%
15.0%
2.0%
6.0%
4.0%
4.0%
13
24
4
11
6
7
13.0
24.0
4.0
11.0
6.0
7.0
13.0%
24.0%
4.0%
11.0%
6.0%
7.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
13.0%
24.0%
4.0%
11.0%
6.0%
7.0%

Others
4
4.1
9.8%
40.0%
4.0%
6
5.9
10.2%
60.0%
6.0%
10
10.0
10.0%
100.0%
10.0%

Total
41
41.0
100.0%
41.0%
41.0%
59
59.0
100.0%
59.0%
59.0%
100
100.0
100.0%
100.0%
100.0%

48

5.6 Phn tch trc nghim gi thit

5.6 Phn tch trc nghim gi thit


5.6.1 Trc nghim gi thit

Mc tiu v kiu ca cc cu hi nghin cu

Mc tiu ca trc nghim gi thit l nhm quyt


nh tnh chnh xc ca gi thit da trn cc s
liu mu thu thp c. Chng ta nh gi tnh
chnh xc ca cc gi thit bng cch p dng cc
k thut thng k; v nh gi tm quan trng ca
s khc bit c ngha thng k.
Cch tip cn c in hay l l thuyt ly mu th
hin cch nhn mc tiu theo xc sut da trn
phn tch d liu mu. Mt gi thit c xy dng,
n s b bc b hoc chp nhn da trn mu d
liu thu thp.

Mc tiu chung

Quan h gia cc bin

Mc tiu c th

So snh
nhm

Kiu cu hi/ gi thit

Kiu thng k

Khc bit
Thng k khc bit
(v.d. t-test, ANOVA)

49

5.6 Phn tch trc nghim gi thit


Gi thit H0

Biu din gi
thit H0

C s khc
bit v tui
gia nam v
n?

Khng c s
khc bit v
tui gia nam
v n.

H0: nam = n

C lin h g
gia gii tnh
v nhn hiu
xe?

Khng c lin
h g gia gii
tnh v nhn
hiu xe.

H0: GM = 0

Mc s
dng xe c
khc bit gia
cc nhm tui
khng?

Khng c khc
bit gia cc
nhm tui v
mc s
dng xe.

H0: uth = uth

Lin quan
Thng k lin
quan
(v.d. tng
quan, hi quy)

Tm lc
d liu

M t

Thng k m
t (v.d. trung
bnh, t l)
50

5.6 Phn tch trc nghim gi thit


5.6.2 Quy trnh trc nghim gi thit

Xy dng gi thit H0 v gi thit thay th


Cu hi NC

Mc lin
quan, cc bin
lin quan

Thun M t

Gi thit H1

Biu din gi
thit H1

1. Pht biu gi thit

C s khc
bit v tui
gia nam v
n.

H0: nam n

2. Chn loi trc nghim thng k

C lin h
gia gii tnh
v nhn hiu
xe.

H0: GM 0

C khc bit
gia cc nhm
tui v mc
s dng
xe.

H1: uth uth

3. Chn mc ngha mong mun


4. Tnh gi tr khc bit
5. C c gi tr trc nghim
6. Din gii kt qu trc nghim
51

52

5.6 Phn tch trc nghim gi thit

5.6 Phn tch trc nghim gi thit

5.6.2 Quy trnh trc nghim gi thit

Gi tr xc sut (p Values)

Hu ht cc phn mm thng k u
cho kt qu vi gi tr xc sut (p
values).
Gi tr xc sut p value l xc sut
t c mt kt qu, t nht cao
bng, hoc cao hn gi tr c quan
st trong thc t, vi iu kin cho
trc l gi thit H0 l ng.

1. Pht biu gi thit v gi thit


thay th
2. Chn mc ngha mong mun

3. C c gi tr xc sut p
4. So snh gi tr xc sut p v
mc ngha v ra quyt nh
5. Din gii kt qu trc nghim

53

54

5.6 Phn tch trc nghim gi thit

5.6 Phn tch trc nghim gi thit

Gi tr xc sut (p Values)

Kim nh ngha: cc kiu kim nh

Gi tr p value c so snh vi mc
ngha (significant level - ), v da trn kt
qu ny bc b hay khng bc b gi
thit.
Nu gi tr p value nh hn mc ngha,
gi thit b bc b (p value < , bc b gi
thit H0).
Nu gi tr p value bng hoc ln hn mc
ngha, khng bc b gi thit (p value >
, khng bc b gi thit H0).

C hai loi: parametric (tham s) v


nonparametric (phi tham s).
Parametric tests l cng c mnh v
x l cc d liu dng scale (interval,
ratio).
Nonparametric tests l cng c x l
cc d liu dng nominal v ordinal.

55

56

5.6 Phn tch trc nghim gi thit

5.6 Phn tch trc nghim gi thit

Parametric tests

Parametric tests

Parametric tests i hi mt s gi
nh:

Nonparametric tests t i hi cc gi nh:


Khng i hi cc quan st phi c rt ra t cc
dn s phn phi bnh thng chun.
Khng i hi cc dn s phi c phng sai tng
ng.
L cch duy nht x l d liu nominal.
L cch ng n x l d liu ordinal, mc d
parametric c th p dng c.
D hiu v d s dng.

Cc quan st phi c lp vi nhau.


Cc quan st phi c rt ra t cc dn s
phn phi bnh thng chun.
Cc dn s nn c phng sai tng ng.
Thang o phi dng scale cc tnh ton
c th thc hin c.
57

58

5.6 Phn tch trc nghim gi thit

5.6 Phn tch trc nghim gi thit

Lm sao chn mt trc nghim thng k ph hp?

Cc k thut phn tch thng k nn dng theo loi d liu v


trc nghim

chn mt trc nghim thng k ph


hp, nn suy ngh n 3 cu hi:

Measurement
scale

Trc nghim lin quan n 1 mu, 2 mu hay


nhiu hn 2 mu (k)?
Nu c 2 mu hay nhiu hn 2 mu (k),
chng c c lp vi nhau hay khng?
D liu thuc loi no (nominal, ordinal,
scale)?
59

One-sample
Case

Two-Samples Tests
Related
Samples

Independent
Samples

k-Samples Tests
Related
Samples

Independent
Samples

Nominal

- Binomial
- 2 one-sample
test

- McNemar

- Fisher exact
test
- 2 twosample test

- Cochran Q

- 2 for ksamples

Ordinal

- KolmogorovSmirnov onesample test


- Runs test

- Sign test
- Wilcoxon
matched-pairs
test

-Median test
Mann-Whitney
U
- KolmogorovSmirnov
Wald-Wolfowitz

-Friedman twoway ANOVA

- Median
extension
- KruskalWallis one-way
ANOVA

Interval and Ratio

- T-test
- Z test

- T-test for
paired samples

- T-test
- Z test

- Repeatedmeasured
ANOVA

- One-way
ANOVA
- N-way
ANOVA60

5.6 Phn tch trc nghim gi thit

5.6 Phn tch trc nghim gi thit

5.6.3 Phn tch d liu

5.6.3 Phn tch d liu

a. Excel: cng c Correlation, Anova v Regression trong chc


nng Data Analysis

b. SPSS: cc cng c Compare Means v Nonparametric Tests

b. SPSS: cc cng c Compare Means v Nonparametric Tests

61

62

5.7 Mt s p dng c th

5.7 Mt s p dng c th

1. One-Sample T Test

1. One-Sample T Test

One-sample tests c dng khi ta c 1


mu v mun kim nh gi thit l liu
mu ny c n t 1 dn s c th no
khng? V d:

V d 1 (Parametric test)

Liu c s khc bit gia tn sut quan st v 1


tn sut chun no da trn l thuyt?
Liu c s khc bit gia t phn quan st vi 1
t phn k vng no khng?

63

C s liu tc tng doanh s ca 9


doanh nghip.
Tc tng trng chun l 6,5%/nm.
Gi thit: tc tng trng doanh s
bnh qun ca 9 doanh nghip khng
khc bit vi tc chun (6,5%/nm).
64

5.7 Mt s p dng c th

5.7 Mt s p dng c th

1. One-Sample T Test. V d 1 (parametric test)

1. One-Sample T Test
Analyze Compare Means One-Sample T Test (TI SAO?)

65

66

5.7 Mt s p dng c th

5.7 Mt s p dng c th

1. One-Sample T Test

1. One-Sample T Test

Analyze Compare Means One-Sample T Test

Analyze Compare Means One-Sample T Test

Din gii kt qu phn tch V d 1


(Parametric test)
P value (Sig. 2 tailed) > 0.05.
Khc bit gia tc tng trng doanh s bnh
qun ca 9 doanh nghip v tc chun khng
c ngha thng k mc ngha 0.05.
Chp nhn gi thit (khng bc b): tc tng
trng doanh s bnh qun ca 9 doanh nghip
khng khc bit vi tc chun (6,5%/nm).
67

68

5.7 Mt s p dng c th

5.7 Mt s p dng c th

2. One-Sample Chi-Square Test

2. One-Sample Chi-Square Test

V d 2 (Nonparametric test)
S liu iu tra s dng xe my.
Gi thit H0: tt c cc nhn hiu xe my
u c c hi c ngi s dng xe la
chn nh nhau.
Analyze Nonparametric Tests Chi-Square

69

5.7 Mt s p dng c th

70

5.7 Mt s p dng c th
3. Two-Sample T Test

Ta c 100 quan st v 10
nhn xe my. C hi
mi nhn xe c chn l
10%, v s lng k vng
l 10 xe/nhn hiu.
Tuy nhin, s khc bit
gia N quan st v N k
vng cho tng nhn xe l
ln.

Vi P value < 0.05, ta bc


b gi thit Ho v pht biu
l cc nhn hiu xe my
c ngi s dng la
71
chn khc bit nhau.

C hai kiu T Test cho hai mu:


Khng bt cp (unpaired, independent T
Test): cho hai mu c lp vi nhau, v
d nam, n, cc nhm ngi, nhm
ngh nghip, v.v.)
Bt cp (paired T Test): cho hai mu c
lin h vi nhau, v d 1 nhm ngi
trc v sau khi b mt yu t tc ng.
72

5.7 Mt s p dng c th

5.7 Mt s p dng c th

3. Two-Sample T Test

3. Two-Sample T Test

V d 3. S liu iu tra s dng xe


my
Gi thit: tui trung bnh ca ngi s
dng xe my nam v n l nh nhau.

Analyze Compare Means Independent-Samples T Test


73

74

5.7 Mt s p dng c th

5.7 Mt s p dng c th

3. Two-Sample T Test

3. Two-Sample T Test

Chn bin Age cho


Test Variable(s)
Grouping Variable:
Group 1 = 1 (male);
Group 2 = 0 (female)

75

76

5.7 Mt s p dng c th

5.7 Mt s p dng c th

3. Two-Sample T Test

4. Two-Sample Nonparametric Test


Independent Samples Test

Lev ene's Test for


Equality of Variances

F
Age of motorbike user Equal variances
assumed
Equal variances
not assumed

1.239

Sig.
.268

t-test f or Equality of Means

df

Sig. (2-tailed)

Mean
Std. Error
Dif f erence Dif f erence

95% Conf idence


Interv al of the
Dif f erence
Lower
Upper

-.315

98

.754

-.93

2.95

-6.77

4.92

-.321

91.785

.749

-.93

2.89

-6.66

4.81

V d 4. S liu iu tra s dng xe


my
Gi thit: s la chn nhn hiu xe my
gia ngi s dng nam v n l nh nhau.
Analyze Nonparametric Test Two-Independent
Samples

P values (Sig. (2-tailed)) cao hn = 0.05 rt nhiu.


Ta chp nhn gi thit v din gii l khng c s khc bit v tui
trung bnh gia ngi s dng xe my l Nam v N.
77

78

5.7 Mt s p dng c th

5.7 Mt s p dng c th

4. Two-Sample Nonparametric Test

4. Two-Sample Nonparametric Test


Mann-Whitney Test

Two-Sample Kolmogorov-Smirnov Test


Test Statisticsa

Test Statisticsa
Mot obike
Names
Mann-Whit ney U
1200.000
Wilcoxon W
2970.000
Z
-. 067
Asy mp. Sig. (2-t ailed)
.946
a. Grouping Variable: User gender

Most Extreme
Dif f erences

Absolute
Positiv e
Negativ e

Kolmogorov-Smirnov Z
Asy mp. Sig. (2-tailed)

Motobike
Names
.045
.045
-.018
.224
1.000

a. Grouping Variable: User gender

Kt lun: chp nhn gi thit v pht biu rng s la chn nhn


hiu xe my gia ngi s dng nam v n l nh nhau.

Analyze Nonparametric Test Two-Independent Samples

79

80

5.7 Mt s p dng c th

5.7 Mt s p dng c th

5. One-Way ANOVA (Parametric Test)

5. One-Way ANOVA (Parametric Test)

Phng php thng k kim nh gi thit l


cc trung bnh ca cc dn s bng nhau l
Phn tch phng sai - analysis of variance
(ANOVA).
One-way ANOVA s dng cc m hnh 1 yu
t, cc nh hng c nh so snh nh
hng ca mt nghim thc (treatment) hoc
mt yu t (factor) trn mt bin ph thuc v
lin tc.

V d 5. S liu iu tra s dng xe my


Gi thit: Khng c s khc bit gia cc ngi
s dng xe my cc nhm tui khc nhau v
s ngy s dng bnh qun trong thng.

81

82

Analyze Compare Means One-Way ANOVA

5.7 Mt s p dng c th

5.7 Mt s p dng c th

5. One-Way ANOVA (Parametric Test)

5. One-Way ANOVA (Parametric Test)

83

84

5.7 Mt s p dng c th

5.7 Mt s p dng c th

5. One-Way ANOVA (Parametric Test)

5. One-Way ANOVA (Parametric Test)


Nu mb er o f used d ays in a mo n th

ANOVA
Number of used day s in a month

Between Groups
Within Groups
Tot al

Sum of
Squares
1428.944
3987.806
5416.750

df
5
94
99

Mean Square
285.789
42. 423

F
6. 737

Sig.
.000

P value < 0.05.


Kt lun: bc b gi thit;
Pht biu rng c s khc bit gia cc ngi s dng xe my cc
nhm tui khc nhau v s ngy s dng bnh qun trong thng
85

Age groups
a,b under 60
Tuk ey HSD
under 50
under 20
under 30
under 40
older t han 60
Sig.
a,b
Duncan
under 60
under 50
under 20
under 30
under 40
older t han 60
Sig.

N
19
25
6
26
17
7
19
25
6
26
17
7

Subset f or alpha = .05


1
2
3
14. 47
17. 96
17. 96
18. 33
18. 33
22. 62
22. 62
24. 12
24. 12
26. 14
.695
.198
.769
14. 47
17. 96
17. 96
18. 33
18. 33
22. 62
22. 62
24. 12
26. 14
.175
.101
.215

Means f or groups in homogeneous subs ets are display ed.


a. Uses Harmonic Mean Sam ple Size = 12.013.
b. The group sizes are unequal. The harm onic mean of the group sizes is
used. Ty pe I error lev els are not guarant eed.

5.7 Mt s p dng c th

5.7 Mt s p dng c th

5. One-Way ANOVA (Parametric Test)

5. One-Way ANOVA (Parametric Test)

Age Group

Value

Grouping

Under 60

14,5

Under 50

17,9

ab

Under 20

18,3

ab

Under 30

22,6

abc

Under 40

24,1

abc

Older than 60

26,1

abc

86

Hnh. Phn b s ngy s


dng xe my bnh qun
trong thng theo tui ca
ngi s dng

87

88

5.7 Mt s p dng c th

5.7 Mt s p dng c th

6. Nonparametric Test for k-Independent Samples

6. Nonparametric Test for k-Independent Samples

V d 6. S liu iu tra s dng xe my


Gi thit: Khng c s khc bit gia cc
ngi s dng xe my cc nhm tui
khc nhau v nhn hiu xe.
Analyze Nonparametric Tests k Independent Samples

89

5.7 Mt s p dng c th
6. Nonparametric Test for k-Independent Samples
Kruskal-Wallis Test
Test Statisticsa,b

Ranks

Mot obike Names

Age groups
under 20
under 30
under 40
under 50
under 60
older t han 60
Tot al

N
6
26
17
25
19
7
100

Mean Rank
46. 25
49. 40
50. 62
55. 66
45. 87
52. 07

Chi-Square
df
Asy mp. Sig.

Mot obike
Names
1. 493
5
.914

a. Krusk al Wallis Test


b. Grouping Variable: Age groups

P value > 0.05 Kt lun: chp nhn gi thit;


Pht biu rng s la chn cc nhn hiu xe my gia cc ngi s
dng xe my cc nhm tui khc nhau l nh nhau.
91

90

You might also like