You are on page 1of 23

Chng 5.

Nhp v x l d liu
Mn hc: Phng php nghin cu kinh t Khoa Kinh t Pht trin i hc Kinh T TP. H Ch Minh

5.1 Gii thiu


Nhm hng dn sinh vin cch: Cch nhp liu, x l v phn tch d liu. Cc k thut phn tch d liu mang tnh khm ph (exploratory data analysis). Cch s dng bng cho (cross-tabulation) trc nghim mi quan h gia cc bin phn loi (categorical variables). Cch s dng cc thng k phn tch trc nghim gi thit.
2

5.2 Quy trnh phn tch d liu


Hnh 5.1 Cc bc khm ph, trc nghim v phn tch trong qu trnh nghin cu
Lp cng NC K hoch phn tch s khi
Xc nh li gi thit
Th hin trc quan d liu

5.3 Nhp s liu


5.3.1 Cch b tr d liu trn my tnh Mc tiu:
Nhm to iu kin thun tin cho vic nhp liu Nhm to s thun li cho vic chnh sa d liu

Thu thp v chun b d liu

Phn tch v din gii d liu


Phn tch m t cc bin s Lp bng cho cho cc bin s Trnh by d liu (histogram, boxplots, Pareto, stem-andleaf, AID, etc.) Phn tch d liu

Trc nghim gi thit

Bo co nghin cu

Ra quyt nh

5.3 Nhp s liu


Thc hin:
Nguyn tc chung: t tn bin ngn gn, vit tt (ting Vit khng du hoc ting Anh). Tn bin nn c t theo quy nh. Dng Excel: d thao tc v chnh sa, khng gian lu tr hn ch, cng c thng k v kinh t lng khng cho phn tch. Dng SPSS: khng gian lu tr gn nh khng hn ch, cng c thng k v kinh t lng pht trin y cho nhu cu phn tch. Khai bo d liu bt buc, mt thi gian.
5

5.3 Nhp s liu

Hnh 5. 2 Cch nhp d liu vo bng tnh SPSS

5.3 Nhp s liu

nh ngha kiu bin

Hnh 5.3 Cch nh ngha cc thuc tnh ca cc bin s nh tnh v nh lng


7 8

Xc nh nhn (gii thch) ca bin

Xc nh gi tr phn loi ca bin

10

Xc nh thang o ca bin

5.4 Lm sch d liu


5.4.1 Pht hin gi tr d bit trong d liu
a. S dng Excel: hm Max v Min, cng c Auto Filter, th Scatter

11

12

5.4 Lm sch d liu


Hnh 5.4 Cng c th Scatter trong Excel

5.4 Lm sch d liu


5.4.1 Pht hin gi tr d bit trong d liu
b. S dng SPSS: th Scatter, cng c Frequency, Bar Chart, Pie Chart, v Box Plot trong Explore

13

14

5.4 Lm sch d liu


b. S dng SPSS: th Scatter
80 Others Honda @ Honda Dream 60 SYM Attila 50 Yamaha Cygnus Honda Wave Yamaha Jupiter 30 Yamaha Sirius 20 10 0 10 20 30 40 Honda Future Neo Honda AirBlade

5.4 Lm sch d liu


b. S dng SPSS: cng c Frequency, Explore

70

40

Number of used days in a month

15

Hnh 5.6 Cng c Frequency v Explore trong SPSS

16

5.4 Lm sch d liu


b. S dng SPSS: cng c Frequency Frequency
Honda Air Blade
Honda Future Neo

5.4 Lm sch d liu


b. S dng SPSS: cng c Pie Chart v Bar Chart %Valid
10.0
8.0

Percent
10.0
8.0

Cumulative Percent
10.0
18.0

30 Others Honda AirBlade 10.0% 10.0% Honda @ 20 Honda Future Neo 7.0% 8.0% Honda Dream Yamaha Sirius 6.0% 7.0% 10

10
8

Yamaha Sirius
Yamaha Jupiter

7
13

7.0
13.0

7.0
13.0

25.0
38.0

s er th O @ am da e r on D Hda us on na l H ygti t CA aM Y ah S e am Y av r e W it p daJu on a H ah uso i ri e am SN Y e ha r autu amF Yda de la on H irB A da on H

Honda Wave
Yamaha Cygnus

24
4

24.0
4.0

24.0
4.0

62.0
66.0

SYM Attila 11.0% Yamaha Cygnus 4.0% Honda Wave 24.0% Yamaha Jupiter 13.0% 0

SYM Attila
Honda Dream Honda @ Others Total

11
6 7 10 100

11.0
6.0 7.0 10.0 100.0

11.0
6.0 7.0 10.0 100.0

77.0
83.0 90.0 100.0
17

Motobike Names

18

5.4 Lm sch d liu


b. S dng SPSS: cng c Histogram

5.4 Lm sch d liu


b. S dng SPSS: cng c Histogram

Biu histogram l mt gii php quy c dng th hin cc d liu t l hoc khong cch. Biu histogram c s dng phn nhm cc gi tr d liu ca cc bin s (variable) thnh cc khong cch. Biu histogram c xy dng di dng cc thanh th hin gi tr d liu.
19

Biu histogram rt hu dng cho vic: (1) th hin tt c cc khong cch trong mt phn phi (distribution), v (2) trc nghim dng hnh ca phn phi nh mo (skewness), nhn (kurtosis).

Ghi ch: Biu histogram khng dng c cho cc bin danh ngha.

20

5.4 Lm sch d liu


b. S dng SPSS: cng c Histogram
30

5.4 Lm sch d liu


b. S dng SPSS: biu Thn-v-L (Stem-and-Leaf Displays) V d 5.2 Phn phi bin s tui ca ngi s dng xe my

20

Mi dng ca biu c gi l mt thn; v mi s liu th hin trn mt thn gi l mt l. Khi biu thn-v-l c quay tri 900 , n s c dng hnh tng t nh biu histogram.

10

Std. Dev = 14.42 Mean = 39 0 20 25 30 35 40 45 50 55 60 65 70 75 N = 100.00

Age of motorbike user

21

22

5.4 Lm sch d liu


b. S dng SPSS: biu Thn-v-L (Stem-and-Leaf Displays)
Age of motorbike user Stem-and-Leaf Plot
Frequency 6.00 18.00 8.00 13.00 4.00 12.00 13.00 10.00 9.00 2.00 4.00 .00 1.00 Stem width: Each leaf: Stem & Leaf 1. 2. 2. 3. 3. 4. 4. 5. 5. 6. 6. 7. 7. 889999 000111122222233344 55677788 0012233334444 5556 123333334444 5555566777789 0123344444 566667779 03 5567

5.4 Lm sch d liu


b. S dng SPSS: biu hp (Box-Plots)

5.3 Biu Thn-v L ca bin s Tui ca ngi s dng xe my

Biu hp, hay cn gi l biu hp-v-ru (boxand-whisker plot), cho ta mt hnh nh trc quan khc v v tr, phn tn, dng hnh, di ui v cc gi tr bt thng (outliers) ca phn phi. Biu hp th hin tm tt 5 gi tr thng k ca mt phn phi l trung v (median), hai t phn v trn v di (the upper and lower quartiles), v cc gi tr quan st ln nht v nh nht

6
23 24

10 1 case(s)

5.4 Lm sch d liu


b. S dng SPSS: biu hp (Box-Plots)

5.4 Lm sch d liu


b. S dng SPSS: biu hp (Box-Plots)
Cc gi tr ln hn 3 ln so vi di ca hp tnh t gi tr t phn v th 3 (75th percentile) (extremes) Cc gi tr ln hn 1,5 ln so vi di ca hp tnh t gi tr t phn v th 3 (75th percentile) (outliers) Gi tr ln nht quan st c khng phi l gi tr bt thng T phn v th 3 (75th PERCENTILE)

Cc thnh phn ch yu ca biu hp l: Hp hnh ch nht cha ng 50% cc gi tr d liu. ng thng trung tm hp l gi tr trung v. Hai l ca hp th hin hai gi tr t phn v th 1 v th 3 (tng ng vi gi tr th 25% (25th percentile) v gi tr th 75% (75th percentile) ca dy s liu. Cc ru ko di t l pha trn v pha di ca hp th hin gi tr ln nht v nh nht. Cc gi tr ny nm trong khong ti a 1,5 ln khong cch gia cc t phn v tnh t l ca hp.
25

50% trng hp c gi tr nm trong hp

Trung v (MEDIAN)

T phn v th 1 (25th PERCENTILE)

Gi tr ln nht quan st c khng phi l gi tr bt thng


Cc gi tr ln hn 1,5 ln so vi di ca hp tnh t gi tr t phn v th 1 (25th percentile) (outliers) Cc gi tr ln hn 3 ln so vi di ca hp tnh t gi tr t phn v th 1 (25th percentile) (extremes)

26

5.4 Lm sch d liu


b. S dng SPSS: biu hp (Box-Plots)
100

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng

80

60

S dng Excel: cng c Descriptives Statistics trong chc nng Data Analysis. S dng SPSS: cng c Frequency, Descriptives, Explore trong chc nng Descriptive Statistics ca SPSS.

40

5.4 Biu hp ca bin s Tui ca ngi s dng xe my v s ngy s dng trong thng

20

0
N= 100 100

Age of motorbike use

Number of used days

27

28

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng
o lng xu hng trung tm (Measures of Central Tendency) Gi tr trung bnh (mean) l tng tt c gi tr ca cc d liu chia cho s lng ca d liu. Trung v (median) l gi tr ca s liu c v tr nm gia b s liu sp xp theo trt t. y chnh l im gia ca phn phi. Khi s quan st l chn, trung v l gi tr trung bnh ca hai quan st v tr trung tm. Mode l gi tr ca quan st c tn sut xut hin nhiu nht trong b d liu. Khong cch (range) l gi tr khc bit gia con s ln nht v nh nht trong b d liu.
29 30

Cc ch tiu thng k m t : xu hng trung tm, tnh bin thin v dng hnh phn phi ca d liu.

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng
o lng dng hnh ca phn phi (Measures of Shape) mo (skewness) o lng lch ca phn phi v mt trong hai pha. Phn phi mo tri (negative skew, left-skewed) khi ui pha tri di hn, v phn ln s liu tp trung pha phi ca phn phi. Phn phi mo phi (positive sknew, right-skewed) khi ui pha phi di hn, v phn ln s liu tp trung pha tri ca phn phi. Khi lch phi, gi tr sknewness dng; khi lch tri, gi tr skewness m. mo cng ln th gi tr sknewness cng ln hn 0.
32

o lng tnh bin thin (Measures of Variability) Phng sai (Variance; 2) l trung bnh tng cc sai s bnh phng gia cc gi tr ca cc quan st v gi tr trung bnh. lch chun (Standard deviation; SD; ) o lng mc phn tn ca s liu xung quanh gi tr trung bnh. Sai s chun ca gi tr trung bnh (Standard error of the mean; s.e.) o lng phm vi m gi tr trung bnh ca qun th () c th xut hin vi mt xc sut cho trc da trn gi tr trung bnh ca mu (mean).
31

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng

Hnh 5.10 ng phn phi chun v cc c tnh


33

Hnh 5.11 Cc dng phn phi lch tri v lch phi so vi phn phi bnh thng
34

5.5 Phn tch thng k m t


5.5.1 Phn tch thng k m t nh lng
o lng dng hnh ca phn phi (Measures of Shape)

5.5 Phn tch thng k m t


Phn tch thng k m t vi SPSS: cng c Descriptive

nhn (kurtosis) o lng mc nhn hay bt ca phn phi so vi phn phi bnh thng (c nhn bng 0). Phn phi c dng nhn khi gi tr kurtosis dng v c dng bt khi gi tr kurtosis m. Vi phn phi bnh thng, gi tr ca mo v nhn bng 0. Cn c trn t s gia gi tr skewness v kurtosis v sai s chun ca n, ta c th nh gi phn phi c bnh thng hay khng (khi t s ny nh hn 2 v ln hn +2, phn phi l khng bnh thng).

35

Hnh 5.13 Cc chc nng thng k m t ca cng c Descriptives

36

5.5 Phn tch thng k m t


Phn tch thng k m t vi SPSS: cng c Descriptive
Bng 5.6 Thng k m t cc bin s Tui ca ngi s dng xe my
Statistic Age of motorbike user N Range 100 58 Std. Error

5.5 Phn tch thng k m t


Phn tch thng k m t vi SPSS: cng c Explore
Cng c Explore rt thch hp thng k m t chi tit cc bin s phn nhm theo mt bin phn loi khc (factor variable).

Minimum
Maximum

18
76

Mean
Std. Deviation

39.01
14.42

1.44

Variance
Skewness

207.909
.242 .241

Kurtosis

-.948

.478
37 38

5.7 Thng k m t cc bin s Tui ca ngi s dng xe my v s ngy s dng trong thng phn theo gii tnh
Age of motorbike user Number of used days in a month

5.7 Thng k m t cc bin s Tui ca ngi s dng xe my v s ngy s dng trong thng phn theo gii tnh
male Mean 95% Confidence Interval for Mean Lower Bound Upper Bound 39.39 35.45 43.33 1.97 19.76 17.74 21.79 1.01

User gender
female Mean 95% Confidence Interval for Mean Lower Bound

Statistic
38.46 34.19

Std. Error
2.11

Statistic
20.71 18.54

Std. Error
1.07

5% Trimmed Mean
Median Variance

38.87
42.00 228.173

19.90
21.00 60.460

Upper Bound
5% Trimmed Mean Median Variance

42.74
38.13 41.00 183.205

22.88
20.95 22.00 47.212

Std. Deviation
Minimum Maximum

15.11
18 76

7.78
5 32

Std. Deviation
Minimum

13.54
19

6.87
7

Range
Interquartile Range Skewness

58
28.00 .292 .311

27
15.00 -.175 .311

Maximum
Range Interquartile Range Skewness Kurtosis

65
46 23.00 .118 -1.089 .369 .724

30
23 11.00 -.513 -.838 .369
39 .724

Kurtosis

-.932

.613

-1.271

.613
40

5.5 Phn tch thng k m t


5.5.2 Phn tch thng k m t nh tnh
a. S dng cng c Basic Table trong SPSS

5.5 Phn tch thng k m t


5.5.2 Phn tch thng k m t nh tnh
a. S dng cng c Basic Table trong SPSS

41

42

5.5 Phn tch thng k m t


5.5.2 Phn tch thng k m t nh tnh
a. S dng cng c Basic Table trong SPSS

5.5 Phn tch thng k m t


5.5.2 Phn tch thng k m t nh tnh
b. S dng cng c Bng cho (Cross-Tabulation) trong SPSS Bng cho l mt k thut dng so snh d liu t hai hoc nhiu hn cc bin phn loi hoc danh ngha (categorical or nominal variables), v d nh l gii tnh. Bng cho s dng cc bng c cc ct v dng th hin cc mc hoc cc gi tr m ha ca tng bin phn loi hoc danh ngha. Bng cho l bc u tin xc nh cc quan h gia cc bin. Khi bng cho c xy dng trc nghim thng k, ta gi chng l bng contingency (contingency tables), v loi trc nghim dng nh gi liu cc bin phn loi c c lp vi nhau hay khng l 2 (Chi bnh phng / chi-square).

Bng. Phn b nhm tui ca ngi s dng xe my theo nhn hiu


under 20 Count Row % 2 20.0% under 30 Count Row % 3 30.0% 4 50.0% 1 14.3% 4 30.8% 2 8.3% 1 25.0% 4 36.4% 3 50.0% 2 28.6% 2 20.0% Age groups under 40 under 50 Count Row % Count Row % 3 30.0% 1 10.0% 2 25.0% 1 14.3% 1 7.7% 4 30.8% 8 33.3% 7 29.2% 1 25.0% 1 25.0% 1 9.1% 2 18.2% 1 16.7% 1 16.7% 1 14.3% 2 20.0% 5 50.0% under 60 Count Row % 1 10.0% 2 25.0% 2 28.6% 4 30.8% 5 20.8% 1 25.0% older than 60 Count Row %

Motobike Names

Honda AirBlade Honda Future Neo Yamaha Sirius Yamaha Jupiter Honda Wave Yamaha Cy gnus SYM Attila Honda Dream Honda @ Others

3 1 1 1

42.9% 4.2% 9.1% 16.7% 10.0%

1 3

4.2% 27.3%

57.1% 1

43

44

5.5 Phn tch thng k m t


5.5.2 Phn tch thng k m t nh tnh
b. S dng cng c Bng cho (Cross-Tabulation) trong SPSS

5.5 Phn tch thng k m t


5.5.2 Phn tch thng k m t nh tnh
b. S dng cng c Bng cho (Cross-Tabulation) trong SPSS

45

46

5.5 Phn tch thng k m t


Bng. Phn b gii tnh ca ngi s dng xe my theo nhn hiu
Motobike Names * User gender Crosstabulation Count User gender f emale male 3 7 4 4 3 4 6 7 9 15 2 2 5 6 2 4 3 4 4 6 41 59 Tot al 10 8 7 13 24 4 11 6 7 10 100

5.5 Phn tch thng k m t


Bng. Phn b gii tnh ca ngi s dng xe my theo nhn hiu
User gender * Motobike Names Crosstabulation Motobike Names Yamaha Yamaha Jupiter Honda Wave Cy gnus SYM Attila Honda Dream Honda @ 6 9 2 5 2 3 5.3 9.8 1.6 4.5 2.5 2.9 14.6% 22.0% 4.9% 12.2% 4.9% 7.3% 46.2% 37.5% 50.0% 45.5% 33.3% 42.9% 6.0% 9.0% 2.0% 5.0% 2.0% 3.0% 7 15 2 6 4 4 7.7 14.2 2.4 6.5 3.5 4.1 11.9% 25.4% 3.4% 10.2% 6.8% 6.8% 53.8% 62.5% 50.0% 54.5% 66.7% 57.1% 7.0% 15.0% 2.0% 6.0% 4.0% 4.0% 13 24 4 11 6 7 13.0 24.0 4.0 11.0 6.0 7.0 13.0% 24.0% 4.0% 11.0% 6.0% 7.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 13.0% 24.0% 4.0% 11.0% 6.0% 7.0%

Mot obike Names

Honda AirBlade Honda Fut ure Neo Yamaha Sirius Yamaha Jupiter Honda Wav e Yamaha Cy gnus SYM Att ila Honda D ream Honda @ Others

Tot al

Honda Honda AirBlade Future Neo User gender f emale Count 3 4 Expected Count 4.1 3.3 % within User gender 7.3% 9.8% % within Motobike Names 30.0% 50.0% % of Total 3.0% 4.0% male Count 7 4 Expected Count 5.9 4.7 % within User gender 11.9% 6.8% % within Motobike Names 70.0% 50.0% % of Total 7.0% 4.0% Total Count 10 8 Expected Count 10.0 8.0 % within User gender 10.0% 8.0% % within Motobike Names 100.0% 100.0% % of Total 10.0% 8.0%

Yamaha Sirius 3 2.9 7.3% 42.9% 3.0% 4 4.1 6.8% 57.1% 4.0% 7 7.0 7.0% 100.0% 7.0%

Others 4 4.1 9.8% 40.0% 4.0% 6 5.9 10.2% 60.0% 6.0% 10 10.0 10.0% 100.0% 10.0%

Total 41 41.0 100.0% 41.0% 41.0% 59 59.0 100.0% 59.0% 59.0% 100 100.0 100.0% 100.0% 100.0%

47

48

5.6 Phn tch trc nghim gi thit


5.6.1 Trc nghim gi thit Mc tiu ca trc nghim gi thit l nhm quyt nh tnh chnh xc ca gi thit da trn cc s liu mu thu thp c. Chng ta nh gi tnh chnh xc ca cc gi thit bng cch p dng cc k thut thng k; v nh gi tm quan trng ca s khc bit c ngha thng k. Cch tip cn c in hay l l thuyt ly mu th hin cch nhn mc tiu theo xc sut da trn phn tch d liu mu. Mt gi thit c xy dng, n s b bc b hoc chp nhn da trn mu d liu thu thp.
49

5.6 Phn tch trc nghim gi thit


Mc tiu v kiu ca cc cu hi nghin cu
Mc tiu chung Quan h gia cc bin Thun M t

Mc tiu c th

So snh nhm

Mc lin quan, cc bin lin quan Lin quan Thng k lin quan (v.d. tng quan, hi quy)

Tm lc d liu

Kiu cu hi/ gi thit

Khc bit Thng k khc bit (v.d. t-test, ANOVA)

M t

Kiu thng k

Thng k m t (v.d. trung bnh, t l)


50

5.6 Phn tch trc nghim gi thit


Xy dng gi thit H0 v gi thit thay th
Cu hi NC Gi thit H0 Biu din gi thit H0 Gi thit H1 Biu din gi thit H1

5.6 Phn tch trc nghim gi thit


5.6.2 Quy trnh trc nghim gi thit 1. Pht biu gi thit

C s khc bit v tui gia nam v n?


C lin h g gia gii tnh v nhn hiu xe?

Khng c s khc bit v tui gia nam v n.


Khng c lin h g gia gii tnh v nhn hiu xe.

H0: nam = n

C s khc bit v tui gia nam v n.


C lin h gia gii tnh v nhn hiu xe.

H0: nam n

2. Chn loi trc nghim thng k 3. Chn mc ngha mong mun

H0: GM = 0

H0: GM 0

4. Tnh gi tr khc bit


H1: uth uth

Mc s dng xe c khc bit gia cc nhm tui khng?

Khng c khc bit gia cc nhm tui v mc s dng xe.

H0: uth = uth

C khc bit gia cc nhm tui v mc s dng xe.

5. C c gi tr trc nghim 6. Din gii kt qu trc nghim

51

52

5.6 Phn tch trc nghim gi thit


5.6.2 Quy trnh trc nghim gi thit 1. Pht biu gi thit v gi thit thay th 2. Chn mc ngha mong mun

5.6 Phn tch trc nghim gi thit


Gi tr xc sut (p Values)

3. C c gi tr xc sut p 4. So snh gi tr xc sut p v mc ngha v ra quyt nh


5. Din gii kt qu trc nghim

Hu ht cc phn mm thng k u cho kt qu vi gi tr xc sut (p values). Gi tr xc sut p value l xc sut t c mt kt qu, t nht cao bng, hoc cao hn gi tr c quan st trong thc t, vi iu kin cho trc l gi thit H0 l ng.
53 54

5.6 Phn tch trc nghim gi thit


Gi tr xc sut (p Values)

5.6 Phn tch trc nghim gi thit


Kim nh ngha: cc kiu kim nh

Gi tr p value c so snh vi mc ngha (significant level - ), v da trn kt qu ny bc b hay khng bc b gi thit. Nu gi tr p value nh hn mc ngha, gi thit b bc b (p value < , bc b gi thit H0). Nu gi tr p value bng hoc ln hn mc ngha, khng bc b gi thit (p value > , khng bc b gi thit H0).
55

C hai loi: parametric (tham s) v nonparametric (phi tham s). Parametric tests l cng c mnh v x l cc d liu dng scale (interval, ratio). Nonparametric tests l cng c x l cc d liu dng nominal v ordinal.
56

5.6 Phn tch trc nghim gi thit


Parametric tests

5.6 Phn tch trc nghim gi thit


Parametric tests

Parametric tests i hi mt s gi nh:


Cc quan st phi c lp vi nhau. Cc quan st phi c rt ra t cc dn s phn phi bnh thng chun. Cc dn s nn c phng sai tng ng. Thang o phi dng scale cc tnh ton c th thc hin c.
57

Nonparametric tests t i hi cc gi nh:


Khng i hi cc quan st phi c rt ra t cc dn s phn phi bnh thng chun. Khng i hi cc dn s phi c phng sai tng ng. L cch duy nht x l d liu nominal. L cch ng n x l d liu ordinal, mc d parametric c th p dng c. D hiu v d s dng.
58

5.6 Phn tch trc nghim gi thit


Lm sao chn mt trc nghim thng k ph hp?

5.6 Phn tch trc nghim gi thit


Cc k thut phn tch thng k nn dng theo loi d liu v trc nghim
Measurement scale One-sample Case Two-Samples Tests
Related Samples - McNemar Independent Samples - Fisher exact test - 2 twosample test -Median test Mann-Whitney U - KolmogorovSmirnov Wald-Wolfowitz - T-test - Z test

chn mt trc nghim thng k ph hp, nn suy ngh n 3 cu hi:


Trc nghim lin quan n 1 mu, 2 mu hay nhiu hn 2 mu (k)? Nu c 2 mu hay nhiu hn 2 mu (k), chng c c lp vi nhau hay khng? D liu thuc loi no (nominal, ordinal, scale)?
59

k-Samples Tests
Related Samples - Cochran Q Independent Samples - 2 for ksamples

Nominal

- Binomial - 2 one-sample test - KolmogorovSmirnov onesample test - Runs test

Ordinal

- Sign test - Wilcoxon matched-pairs test

-Friedman twoway ANOVA

- Median extension - KruskalWallis one-way ANOVA - One-way ANOVA - N-way ANOVA 60

Interval and Ratio

- T-test - Z test

- T-test for paired samples

- Repeatedmeasured ANOVA

5.6 Phn tch trc nghim gi thit


5.6.3 Phn tch d liu

5.6 Phn tch trc nghim gi thit


5.6.3 Phn tch d liu b. SPSS: cc cng c Compare Means v Nonparametric Tests

a. Excel: cng c Correlation, Anova v Regression trong chc nng Data Analysis
b. SPSS: cc cng c Compare Means v Nonparametric Tests

61

62

5.7 Mt s p dng c th
1. One-Sample T Test

5.7 Mt s p dng c th
1. One-Sample T Test

One-sample tests c dng khi ta c 1 mu v mun kim nh gi thit l liu mu ny c n t 1 dn s c th no khng? V d:


Liu c s khc bit gia tn sut quan st v 1 tn sut chun no da trn l thuyt? Liu c s khc bit gia t phn quan st vi 1 t phn k vng no khng?

V d 1 (Parametric test)
C s liu tc tng doanh s ca 9 doanh nghip. Tc tng trng chun l 6,5%/nm. Gi thit: tc tng trng doanh s bnh qun ca 9 doanh nghip khng khc bit vi tc chun (6,5%/nm).
64

63

5.7 Mt s p dng c th
1. One-Sample T Test. V d 1 (parametric test)

5.7 Mt s p dng c th
1. One-Sample T Test
Analyze Compare Means One-Sample T Test (TI SAO?)

65

66

5.7 Mt s p dng c th
1. One-Sample T Test
Analyze Compare Means One-Sample T Test

5.7 Mt s p dng c th
1. One-Sample T Test
Analyze Compare Means One-Sample T Test

Din gii kt qu phn tch V d 1 (Parametric test)


P value (Sig. 2 tailed) > 0.05. Khc bit gia tc tng trng doanh s bnh qun ca 9 doanh nghip v tc chun khng c ngha thng k mc ngha 0.05. Chp nhn gi thit (khng bc b): tc tng trng doanh s bnh qun ca 9 doanh nghip khng khc bit vi tc chun (6,5%/nm).
67 68

5.7 Mt s p dng c th
2. One-Sample Chi-Square Test

5.7 Mt s p dng c th
2. One-Sample Chi-Square Test

V d 2 (Nonparametric test)
S liu iu tra s dng xe my. Gi thit H0: tt c cc nhn hiu xe my u c c hi c ngi s dng xe la chn nh nhau.
Analyze Nonparametric Tests Chi-Square

69

70

5.7 Mt s p dng c th
Ta c 100 quan st v 10 nhn xe my. C hi mi nhn xe c chn l 10%, v s lng k vng l 10 xe/nhn hiu. Tuy nhin, s khc bit gia N quan st v N k vng cho tng nhn xe l ln.

5.7 Mt s p dng c th
3. Two-Sample T Test

C hai kiu T Test cho hai mu:


Khng bt cp (unpaired, independent T Test): cho hai mu c lp vi nhau, v d nam, n, cc nhm ngi, nhm ngh nghip, v.v.) Bt cp (paired T Test): cho hai mu c lin h vi nhau, v d 1 nhm ngi trc v sau khi b mt yu t tc ng.
72

Vi P value < 0.05, ta bc b gi thit Ho v pht biu l cc nhn hiu xe my c ngi s dng la 71 chn khc bit nhau.

5.7 Mt s p dng c th
3. Two-Sample T Test

5.7 Mt s p dng c th
3. Two-Sample T Test

V d 3. S liu iu tra s dng xe my


Gi thit: tui trung bnh ca ngi s dng xe my nam v n l nh nhau.

Analyze Compare Means Independent-Samples T Test


73 74

5.7 Mt s p dng c th
3. Two-Sample T Test

5.7 Mt s p dng c th
3. Two-Sample T Test

Chn bin Age cho Test Variable(s)


Grouping Variable: Group 1 = 1 (male); Group 2 = 0 (female)

75

76

5.7 Mt s p dng c th
3. Two-Sample T Test
Independent Samples Test Lev ene's Test for Equality of Variances

5.7 Mt s p dng c th
4. Two-Sample Nonparametric Test

t-test f or Equality of Means 95% Conf idence Interv al of the Dif f erence Lower Upper -6.77 -6.66 4.92 4.81

V d 4. S liu iu tra s dng xe my


Gi thit: s la chn nhn hiu xe my gia ngi s dng nam v n l nh nhau.
Analyze Nonparametric Test Two-Independent Samples

F Age of motorbike user Equal variances assumed Equal variances not assumed 1.239

Sig. .268

t -.315 -.321

df 98 91.785

Sig. (2-tailed) .754 .749

Mean Std. Error Dif f erence Dif f erence -.93 -.93 2.95 2.89

P values (Sig. (2-tailed)) cao hn = 0.05 rt nhiu. Ta chp nhn gi thit v din gii l khng c s khc bit v tui trung bnh gia ngi s dng xe my l Nam v N.
77

78

5.7 Mt s p dng c th
4. Two-Sample Nonparametric Test

5.7 Mt s p dng c th
4. Two-Sample Nonparametric Test Mann-Whitney Test
Test Statisticsa Mot obike Names Mann-Whit ney U 1200.000 Wilcoxon W 2970.000 Z -. 067 Asy mp. Sig. (2-t ailed) .946 a. Grouping Variable: User gender

Two-Sample Kolmogorov-Smirnov Test


a Test Statistics

Most Extreme Dif f erences Kolmogorov-Smirnov Z Asy mp. Sig. (2-tailed)

Absolute Positiv e Negativ e

Motobike Names .045 .045 -.018 .224 1.000

a. Grouping Variable: User gender

Kt lun: chp nhn gi thit v pht biu rng s la chn nhn hiu xe my gia ngi s dng nam v n l nh nhau.

Analyze Nonparametric Test Two-Independent Samples

79

80

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)

Phng php thng k kim nh gi thit l cc trung bnh ca cc dn s bng nhau l Phn tch phng sai - analysis of variance (ANOVA). One-way ANOVA s dng cc m hnh 1 yu t, cc nh hng c nh so snh nh hng ca mt nghim thc (treatment) hoc mt yu t (factor) trn mt bin ph thuc v lin tc.
81

V d 5. S liu iu tra s dng xe my Gi thit: Khng c s khc bit gia cc ngi s dng xe my cc nhm tui khc nhau v s ngy s dng bnh qun trong thng.
Analyze Compare Means One-Way ANOVA

82

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)

83

84

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)
ANOVA Number of used day s in a month Sum of Squares 1428.944 3987.806 5416.750 df 5 94 99 Mean Square 285.789 42. 423 F 6. 737 Sig. .000

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)
Nu mb er o f used d ays in a mo n th Subset f or alpha = .05 1 2 3 14. 47 17. 96 17. 96 18. 33 18. 33 22. 62 22. 62 24. 12 24. 12 26. 14 .695 .198 .769 14. 47 17. 96 17. 96 18. 33 18. 33 22. 62 22. 62 24. 12 26. 14 .175 .101 .215

Between Groups Within Groups Tot al

P value < 0.05. Kt lun: bc b gi thit; Pht biu rng c s khc bit gia cc ngi s dng xe my cc nhm tui khc nhau v s ngy s dng bnh qun trong thng
85

Age groups a,b Tuk ey HSD under 60 under 50 under 20 under 30 under 40 older t han 60 Sig. a,b Duncan under 60 under 50 under 20 under 30 under 40 older t han 60 Sig.

N 19 25 6 26 17 7 19 25 6 26 17 7

Means f or groups in homogeneous subs ets are display ed. a. Uses Harmonic Mean Sam ple Size = 12.013. b. The group sizes are unequal. The harm onic mean of the group sizes is used. Ty pe I error lev els are not guarant eed.

86

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)

5.7 Mt s p dng c th
5. One-Way ANOVA (Parametric Test)
Hnh. Phn b s ngy s dng xe my bnh qun trong thng theo tui ca ngi s dng

Age Group Under 60


Under 50 Under 20 Under 30

Value 14,5
17,9 18,3 22,6

Grouping a
ab ab abc

Under 40 Older than 60

24,1 26,1

abc abc
87 88

5.7 Mt s p dng c th
6. Nonparametric Test for k-Independent Samples

5.7 Mt s p dng c th
6. Nonparametric Test for k-Independent Samples

V d 6. S liu iu tra s dng xe my Gi thit: Khng c s khc bit gia cc ngi s dng xe my cc nhm tui khc nhau v nhn hiu xe.
Analyze Nonparametric Tests k Independent Samples

89

90

5.7 Mt s p dng c th
6. Nonparametric Test for k-Independent Samples
Kruskal-Wallis Test
Ranks Age groups under 20 under 30 under 40 under 50 under 60 older t han 60 Tot al N 6 26 17 25 19 7 100 Mean Rank 46. 25 49. 40 50. 62 55. 66 45. 87 52. 07

Test Statisticsa,b Mot obike Names 1. 493 5 .914

Mot obike Names

Chi-Square df Asy mp. Sig.

a. Krusk al Wallis Test b. Grouping Variable: Age groups

P value > 0.05 Kt lun: chp nhn gi thit;


Pht biu rng s la chn cc nhn hiu xe my gia cc ngi s dng xe my cc nhm tui khc nhau l nh nhau.
91

You might also like