Professional Documents
Culture Documents
Y t Cng cng
Nguyn Vn Knh
B MN DCH T HC I HC Y DC TP.HCM
Trung tm pht trin Y t Cng cng PHD
Bin son t ti liu cng khai ca i hc California, LA. USA
Khng s dng cho mc ch thng mi
Thng 07 nm 2010
Li ni u
Ti liu ny c bin son vi s may mn c tip xc v lm vic vi rt nhiu cc nghin cu ti
B mn Dch t hc HYD TP.HCM ca tc gi. Thm vo l nhu cu v ngh t mt ln hp
mt cng cc bn sinh vin YTCC Hu, YTCC Cn Th. y cng l nhu cu ca cc bn YTCC HCM
sau khi ra trng cho bit mnh c nguy cqun ht v dch t hc v thng k y hc.
Ti liu ny s c vit thnh nhiu giai on, mi giai on s c b sung thm ni dung lin
quan n dch t hc v thng k y hc. Tc gi hi vng trong phin bn 2 s c nhiu tc gi khc
gi bi vit ca mnh v cc ch ny gom gp thnh mt cun sch cp nht nh k.
Phin bn u tin ny cung cp mt s kim nh thng k s dng Stata. Mi kim nh s c mt
m t ngn v php kim, sau s l lnh Stata v kt qu cng vi din gii ngn v kt qu.
c mt php kim ng cho mnh bn cn xc nh bn cht bin s ca bn l nh lng, nh
tnh, th t, nh gi v c phn phi bnh thng hay khng. Phin bn 1 ny ch bao gm mt s
php kim thng gp nht.
Tuy nhin cn ch rng vi s a dng ca d kin trong nhiu chuyn ngnh cng nh s cp
nht lin tc cc l thuyt v thng k, bn c khuyn l nn tham kho kin ca mt chuyn gia
trc khi quyt nh s dng mt php kim trong nghin cu ca mnh.
V phin bn ny c xut bn online nn v hon ton khng c mc ch thng mi nn ngi
c s t chu trch nhim cho vic s dng ti liu ca mnh.
Nguyn Vn Knh
Hng dn s dng:
Font: lnh g vo nhp lnh (command) ca Stata
Font:
Bin s nh tnh (Categorical variables) l nhng bin s xc nh tnh cht ca i tng nghin
cu, bao gm
-
Cc lu thm
Cc mi lin quan c ly lm v d c th khng c ngha v mt sinh hc, nhng y chng ta
ch ch trng n mt thng k v lm th no tnh ton v din gii n.
Ngi bin son sch ny gi nh rng bn c c v hc v sinh thng k c bn v dch t
hc c bn.
Mong i g bn bin tp th 2: b sung cc khi nim v dch t hc, cc khi nim c bn, cc
phn tch su hn (One-way repeated measures ANOVA, Repeated measures logistic regression,
Factorial ANOVA, Factorial logistic regression, Analysis of covariance, Discriminant analysis,
Multivariate multiple regression, Canonical correlation, Factor analysis, bootstrap analysis), bn c c
th hiuphi tham s khi khng cn gi nh pha trn (bn c th hiu khi nim vui hiu phi tham
s khi c ht ti liu ny).
Gp xin gi v nguyenkinh@ytecongcong.com
Hoc Trung tm pht trin Y t Cng cng PHD
Phng D n, Khoa Y t Cng cng, 159 Hng Ph, P.8, Q.8, tP.HCM
V b d kin ytcc
Kim nh t mt mu
Kim nh ny cho php ta kim xem liu trung bnh ca mu (ca mt bin s nh lng phn phi
bnh thng) khc bit c ngha thng k so vi mt gi tr gi thuyt hay khng. V d, gi s ta
mun kim nh c s khc bit c ngha thng k gia im trung bnh mn vit (bin s viet) v
50 hay khng. Ta c th lm (g lnh) nh sau
ttest viet=50
One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------viet |
200
52.775
.6702372
9.478586
51.45332
54.09668
-----------------------------------------------------------------------------mean = mean(viet)
t =
4.1403
Ho: mean = 50
degrees of freedom =
199
Ha: mean < 50
Pr(T < t) = 1.0000
Ha: mean != 50
Pr(|T| > |t|) = 0.0001
im trung bnh bi vit ca hc sinh trong mu nghin cu ny l 52.775, c s khc bit c ngha
thng k vi gi tr kim nh l 50 (c th l im trung bnh ca trng khc,..v.v.). Ta s kt lun
rng nhm hc sinh ny c im trung bnh phn vit cao hn c ngha thng k so vi gi tr 50.
Kim nh trung v mt mu
Php kim ny cho ta bit liu trung v ca mu c khc bit c c ngha thng k so vi mt gi tr
gi thit hay khng. Ta s vn s dng bin s viet nh trn, nhng trong trng hp ny ta khng
cn gi nh l bin s im c phn phi bnh thng (ta ch cn gi nh im vit l c th t). Ta
s kim nh xem liu trung v ca im vit (viet) khc bit c ngha thng k vi 50 hay khng.
signrank viet=50
Wilcoxon signed-rank test
sign |
obs
sum ranks
expected
-------------+--------------------------------positive |
126
13429
10048.5
negative |
72
6668
10048.5
zero |
2
3
3
-------------+--------------------------------all |
200
20100
20100
unadjusted variance
adjustment for ties
adjustment for zeros
adjusted variance
Ho: viet = 50
z =
Prob > |z| =
671675.00
-1760.25
-1.25
---------669913.50
4.130
0.0000
Kt qu cho thy trung v ca im vit trong nhm hc sinh ny c s khc bit c ngha thng k
so vi 50.
Kim nh nh thc
Kim nh nh thc mt mu kim nh xem t l mt gi tr ca mt bin s nh tnh (c m ha
l 1) c khc bit c ngha thng k so vi mt gi tr gi thit hay khng. Chng hn ta mun xem
t l nam trong mu nghin cu (bin s gii tnh) c khc bit c ngha thng k vi gi tr gi
thit l 50% hay khng. Ta c th lm nh sau.
bitest gioitinh=.5
Variable |
N
Observed k
Expected k
Assumed p
Observed p
-------------+-----------------------------------------------------------gioitinh |
200
109
100
0.50000
0.54500
Pr(k >= 109)
= 0.114623
Pr(k <= 109)
= 0.910518
Pr(k <= 91 or k >= 109) = 0.229247
(one-sided test)
(one-sided test)
(two-sided test)
Kt qu cho thy khng c s khc bit c ngha thng k (p = .2292). Ni cch khc, t l nam
trong mu khng c s khc bit c ngha thng k vi gi tr gi thit l 50%.
Kim nh t 2 mu c lp
c s dng khi ta mun so snh trung bnh ca mt bin s ph thuc c phn phi bnh thng
ca 2 nhm c lp. Gi s ta mun xem c s khc bit gia im bi vit nam v n hay khng.
ttest viet, by(gioitinh)
Ha: diff != 0
Pr(|T| > |t|) = 0.0002
Kt qu cho thy c s khc bit c ngha thng k v trung bnh im vit nam v n (t = 3.7341, p = .0002). Ni cch khc, n c im vit trung bnh cao hn c ngha thng k so vi
nam (54.99 so vi 50.12).
Kim nh Wilcoxon-Mann-Whitney
Kim nh Wilcoxon-Mann-Whitney l kim nh tng t vi kim nh t 2 mu c lp nhng c
s dng khi ta khng gi nh rng bin s ph thuc l bin s nh lng c phn phi bnh
thng (ta ch gi nh rng bin s ny t nht l c th t). Cu trc lnh ca Stata cho kim nh
Wilcoxon-Mann-Whitney ging ht kim nh t 2 mu c lp. Ta s kim nh li kt qu lm
trn nhng ln ny ta khng gi nh l bin s ph thuc viet c phn phi bnh thng.
ranksum viet, by(gioitinh)
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
gioitinh |
obs
rank sum
expected
-------------+--------------------------------nu |
91
7792
9145.5
nam |
109
12308
10954.5
-------------+--------------------------------combined |
200
20100
20100
unadjusted variance
adjustment for ties
adjusted variance
166143.25
-852.96
---------165290.29
Kt qu cho thy c s khc bit c ngha thng k gia nam v n v s phn phi im bi vit
(z = -3.329, p = 0.0009). Ta c th xc nh nhm no c hng cao hn bng cch so gi tr tht ca
Ytecongcong.COM | YTCC Online
tng hng (ct ranksum) vi vng tr ca tng hng (gi tr mong i ca tng hng khi
gi thuyt bt d xy ra). Tng hng ca n cao hn trong khi tng hng ca nam li thp
hn do n c hng cao hn nam.
Kim nh ny dng khi ta mun xem c mi lin quan gia hai bin s nh tnh hay khng. Gi s
ta mun bit c mi lin quan gia loi hnh trng theo hc (loaitruong) v gii tnh hc sinh
(gioitinh). Nn nh rng kim nh Chi bnh phng gi nh rng vng tr ca mi l 5. Gi nh
ny c p ng trong v d sau. Tuy nhin, c th khng xy ra vi cc b d liu khc, xem thm
trong phn kim nh chnh xc Fisher's.
tabulate loaitruong gioitinh, chi2
Loai hinh |
Gioi tinh
truong |
nu
nam |
Total
------------+----------------------+---------Truong cong |
77
91 |
168
Truong tu |
14
18 |
32
------------+----------------------+---------Total |
91
109 |
200
Pearson chi2(1) =
0.0470
Pr = 0.828
Kt qu cho thy khng c s khc bit c ngha thng k gia cc loi hnh trng theo hc v
gii tnh ( t do = 1, Chi bnh phng = 0.0470, p = 0.828).
Hy xem mt v d khc v xt mi lin quan gia gii tnh (gioitinh) v tnh trng kinh t x hi
(ktxh). Trong v d ny, mt (hoc c hai) bin s c th c hn 2 nhm, v cc bin s khng nht
thit phi c cng s nhm. Bin s gioitinh c 2 nhm (nam v n) v bin s ktxh c 3 nhm
(thp, trung v cao).
tabulate gioitinh ktxh, chi2
|
Kinh te xa hoi
Gioi tinh |
Thap
Trung
Cao |
Total
-----------+---------------------------------+---------nu |
15
47
29 |
91
nam |
32
48
29 |
109
-----------+---------------------------------+---------Total |
47
95
58 |
200
Pearson chi2(2) =
4.5765
Pr = 0.101
10
0.597
Kt qu cho thy khng c mi lin quan c ngha thng k gia chng tc v loi hnh trng (p
= 0.597). Lu rng kim nh chnh xc Fisher's khng thc hin mt kim nh thng k" no m
tnh ton trc tip p-value.
R-squared
=
Adj R-squared =
0.1776
0.1693
Source | Partial SS
df
MS
F
Prob > F
------------+---------------------------------------------------Model | 3175.69786
2 1587.84893
21.27
0.0000
chuongtrinh | 3175.69786
2 1587.84893
21.27
0.0000
Residual | 14703.1771
197
74.635417
------------+---------------------------------------------------Total |
17878.875
199
89.843593
Trung bnh ca im s bi vit khc bit c ngha thng k gia cc chng trnh hc. Tuy nhin
ta khng bit s khc bit l ch gia 2 chng trnh hay l c 3 chng trnh. xem im trung
bnh bi vit (viet) theo mi loi chng trnh hc, ta c th s dng lnh tabulate vi ty chn
summarize nh sau.
tabulate chuongtrinh, summarize(viet)
Chuong |
Summary of Diem bai viet
trinh hoc |
Mean
Std. Dev.
Freq.
------------+-----------------------------------Tong quat |
51.333333
9.3977754
45
Hoc thuat |
56.257143
7.9433433
105
Chuyen ng |
46.76
9.3187544
50
------------+-----------------------------------Total |
52.775
9.478586
200
T kt qu trn ta c th thy rng hc sinh trong chng trnh hc thut c im trung bnh cao
nht trong khi hc sinh trong chng trnh chuyn ngnh l thp nht.
Ytecongcong.COM | YTCC Online
11
12
Nu c mt s bi vit b xp hng rng buc (tied rank: theo nhng i tng c s im bng
nhau s c tnh hng bng tng s hng ca cc i tng c cng im chia cho s i tng
c cng im, v d c 3 i tng c cng im c s hng c sp l 5, 6 v 7 th s hng rng
buc s bng (5+6+7)/3 = 6.0) th mt nhn t hiu chnh s c s dng v cho ra mt gi tr Chi
bnh phng hi khc. V d trn cho thy l d c hay khng c rng buc trong xp hng, kt qu
cho thy im trung bnh bi vit c s khc bit c ngha thng k gia 3 chng trnh hc.
Kim nh t bt cp
Kim nh t bt cp (mu) c s dng khi ta c 2 quan st c lin quan vi nhau (nh quan st
mt c im 2 ln trn cng i tng) v ta mun xem cc trung bnh ca nhng bin s nh
lng phn phi bnh thng ny c khc nhau hay khng. V d ta mun xem trung bnh bi vit c
bng trung bnh bi c hay khng, ta lm nh sau.
ttest doc = viet
Paired t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------doc |
200
52.23
.7249921
10.25294
50.80035
53.65965
viet |
200
52.775
.6702372
9.478586
51.45332
54.09668
---------+-------------------------------------------------------------------diff |
200
-.545
.6283822
8.886666
-1.784142
.6941424
-----------------------------------------------------------------------------mean(diff) = mean(doc - viet)
t = -0.8673
Ho: mean(diff) = 0
degrees of freedom =
199
Ha: mean(diff) < 0
Pr(T < t) = 0.1934
Ha: mean(diff) != 0
Pr(|T| > |t|) = 0.3868
Kt qu cho thy khng c s khc bit c ngha thng k gia im bi vit (viet) v
im bi c (doc) (t = -0.8673, p = 0.3868).
13
Kt qu cho thy khng c s khc bit c ngha thng k gia im bi c (doc) v im bi vit
(viet).
Nu da vo kinh nghim bn tin rng s khc bit gia im bi c (doc) v im bi vit (viet) l
khng c th t m ch c th phn loi n gin l dng tnh hoc m tnh m thi th lc ny bn
c th xem xt kim nh c du (sign test) thay v kim nh c du xp hng (sign rank test). Ta vn
s dng cng bin s trong v d trn nhng khng gi nh l s khc bit ny l c th t.
signtest doc = viet
Sign test
sign |
observed
expected
-------------+-----------------------positive |
88
92.5
negative |
97
92.5
zero |
15
15
-------------+-----------------------all |
200
200
One-sided tests:
Ho: median of doc - viet = 0 vs.
Ha: median of doc - viet > 0
Pr(#positive >= 88) =
Binomial(n = 185, x >= 88, p = 0.5)
Ho: median of doc - viet = 0 vs.
Ha: median of doc - viet < 0
Pr(#negative >= 97) =
Binomial(n = 185, x >= 97, p = 0.5)
Two-sided test:
Ho: median of doc - viet = 0 vs.
Ha: median of doc - viet ~= 0
Pr(#positive >= 97 or #negative >= 97)
min(1, 2*Binomial(n = 185, x >= 97,
0.7688
0.2783
=
p = 0.5)) =
0.5565
Kim nh McNemar
14
.8571429
.2379799
2.978588
(exact)
Thng k Chi bnh phng McNemar's cho thy khng c s khc bit c ngha thng k v t l
tr li ng/sai vi 2 cu hi ny.
Kim nh Friedman
Kim nh ny c s dng khi c mt bin s c lp c t 2 nhm tr ln thu thp trn cng mt
i tng v mt bin s ph thuc khng c phn phi bnh thng (nhng t nht c th t). Ta s
s dng kim nh Friedman xc nh xem c s khc bit gia im bi vit (viet), bi c (doc)
v im ton (toan) hay khng. Gi thuyt bt d trong kim nh ny l phn phi th hng ca mi
loi im l nh nhau. tin hnh kim nh ta cn ti thm kim nh ny bng cch g findit
friedman v chn ci t. Thm vo , ta cn hon v xp cc i tng quan st xung thnh
cc ct v cc bin s thnh cc hng bng lnh xpose.
Ytecongcong.COM | YTCC Online
Gi tr Chi bnh phng Friedman's l 0.6175 v p-value l 0.7344 cho thy khng c s khc bit c
ngha thng k. Do vy, khng c chng c rng phn phi ca ba loi im l khc nhau.
15
Tng quan
S dng khi bn mun xem c mi lin quan tuyn tnh gia 2 (hoc nhiu hn) cc bin s nh
lng phn phi bnh thng hay khng. V d, ta c th xem mi lin quan gia hai bin s nh
lng lin tc l im bi vit (viet) v im bi c (doc).
corr doc viet
(obs=200)
|
doc
viet
-------------+-----------------doc |
1.0000
viet |
0.5968
1.0000
Trong v d tip theo ta s xem mi tng quan gia mt bin s nh gi l gii tnh (gioitinh) v mt
bin s nh lng lin tc l im bi vit (viet). Mc d cn phi c gi nh l bin s phi l bin
s nh lng v phn phi bnh thng, ta c th dng mt bin s thay th khi kim nh mi
tng quan.
corr gioitinh viet
(obs=200)
| gioitinh
viet
-------------+-----------------gioitinh |
1.0000
viet |
0.2565
1.0000
Trong v d u tin ta thy mi tng quan gia doc v viet l 0.5968. Bng cch ly bnh phng
ch s ny ri nhn vi 100, ta c th xc nh phn trm ng gp ca bin s vo s bin thin. V
d trn, ta lm trn 0.5968 thnh 0.6, ri bnh phng s l 0.36, nhn vi 100 l 36%. Do , im
bi c ng gp 36% trong s bin thin ca im bi c vi im bi vit. Trong v d th hai kt
qu bnh phng l 0.06579225, c ngha l gii tnh (gioitinh) gp gn 6.5% trong s bin thin ca
n vi im bi vit (viet).
mi lin quan gia im bi vit (viet) v im bi c (doc); ni cch khc ta mun tin
on im bi vit t im bi c.
regress viet doc
-----------------------------------------------------------------------------viet |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------doc |
.5517051
.0527178
10.47
0.000
.4477446
.6556656
_cons |
23.95944
2.805744
8.54
0.000
18.42647
29.49242
------------------------------------------------------------------------------
16
Ta thy mi lin quan gia viet v doc l dng tnh (.5517051) v vi t=10.47 v p=0.000, ta s kt
lun mi lin quan ny l c ngha thng k. V vy, ta s ni rng c mi lin quan tuyn tnh
dng c ngha thng k gia im bi c v im bi vit.
Kt qu cho thy c mi lin quan c ngha thng k gia doc v viet (rho = 0.6167, p = 0.000).
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
200
0.56
0.4527
0.0020
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
200
0.56
0.4527
0.0020
-----------------------------------------------------------------------------gioitinh |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------doc | -.0104367
.0139177
-0.75
0.453
-.0377148
.0168415
_cons |
.7260875
.7419612
0.98
0.328
-.7281297
2.180305
------------------------------------------------------------------------------
17
Hi qui a bin
Hi qui a bin rt ging vi hi qui n bin, ch khc bit l trong hi qui a bin bn s c t 2
bin s tin on tr ln trong cng thc. V d ta s tin on im bi vit bng cc bin s gii
tnh (gioitinh), im bi c (doc), im ton (toan), im khoa hc (khoahoc) v im nghin cu x
hi (khxh).
regress viet gioitinh doc toan khoahoc khxh
Source |
SS
df
MS
Number of obs =
200
-------------+-----------------------------F( 5,
194) =
58.60
Model | 10756.9244
5 2151.38488
Prob > F
= 0.0000
Residual |
7121.9506
194 36.7110855
R-squared
= 0.6017
-------------+-----------------------------Adj R-squared = 0.5914
Total |
17878.875
199
89.843593
Root MSE
=
6.059
-----------------------------------------------------------------------------viet |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gioitinh |
5.492502
.8754227
6.27
0.000
3.765935
7.21907
doc |
.1254123
.0649598
1.93
0.055
-.0027059
.2535304
toan |
.2380748
.0671266
3.55
0.000
.1056832
.3704665
khoahoc |
.2419382
.0606997
3.99
0.000
.1222221
.3616542
khxh |
.2292644
.0528361
4.34
0.000
.1250575
.3334713
_cons |
6.138759
2.808423
2.19
0.030
.599798
11.67772
------------------------------------------------------------------------------
Kt qu cho thy m hnh tng qut l c ngha thng k (F = 58.60, p = 0.0000). Hn na, tt c
cc bin s tin on u c ngha thng k ngoi tr im bi c (doc).
18
Number of obs
=
200
LR chi2(2)
=
27.82
Prob > chi2
=
0.0000
Log likelihood = -123.90902
Pseudo R2
=
0.1009
-----------------------------------------------------------------------------gioitinh | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------doc |
.9314488
.0182578
-3.62
0.000
.8963428
.9679298
viet |
1.112231
.0246282
4.80
0.000
1.064993
1.161564
------------------------------------------------------------------------------