You are on page 1of 13

Lp Phng php NCKH Chng trnh CUD UPNT03

Bai 4 (STATA)

S DNG STATA 10.0 PHN TCH THNG K


TRONG NGHIN CU KHOA HC
PHN TCH PHNG SAI (ONE-WAY ANOVA),
TNG QUAN V HI QUI TUYN TNH
TS. BS Tng Kim Hng
1. Phn tch phng sai

- M file PULSE phn tch.


- Lnh s dng trong STATA nh sau:
+ Statistics -> Linear models and related -> ANOVA/MANOVA -> One-way ANOVA ->
Chn bin s cn kim nh (response variable) v bin s phn nhm (factor variable)
-> Chn loi test so snh gia cc nhm (Multiple-comparison tests) -> Submit
(Gi s gi nh v phn phi bnh thng ca bin s c tha)

Lp Phng php NCKH Chng trnh CUD UPNT03

. oneway weight activity, bonferroni scheffe tab


|
Summary of weight
activity |
Mean
Std. Dev.
Freq.
------------+-----------------------------------0 |
66
0
1
1 |
65.888889
12.830086
9
2 |
63.360656
10.773475
61
3 |
65.047619
9.2977212
21
------------+-----------------------------------Total |
64.021739
10.53198
92
Analysis of Variance
Source
SS
df
MS
F
Prob > F
-----------------------------------------------------------------------Between groups
84.0496781
3
28.0165594
0.25
0.8638
Within groups
10009.9068
88
113.748941
-----------------------------------------------------------------------Total
10093.9565
91
110.922599
Bartlett's test for equal variances:
0.527

chi2(2) =

1.2830

Prob>chi2 =

note: Bartlett's test performed on cells with positive variance:


1 single-observation cells not used

Lp Phng php NCKH Chng trnh CUD UPNT03

Comparison of weight by activity


(Bonferroni)
Row Mean-|
Col Mean |
0
1
2
---------+--------------------------------1 |
-.111111
|
1.000
|
2 |
-2.63934
-2.52823
|
1.000
1.000
|
3 |
-.952381
-.84127
1.68696
|

1.000

1.000

1.000

Comparison of weight by activity


(Scheffe)
Row Mean-|
Col Mean |
0
1
2
---------+--------------------------------1 |
-.111111
|
1.000
|
2 |
-2.63934
-2.52823
|
0.996
0.931
|
3 |
-.952381
-.84127
1.68696
|
1.000
0.998
0.942

Gi tr p ca Bartletts test (test chng minh phng sai bng nhau) = 0.527 tc l ln
hn gi tr 0.05 rt nhiu, cho thy rng phng sai ca cc nhm vn ng th lc l
tng t nhau. Gi tr p ca F test = 0.8638, chng t rng trung bnh cn nng ca cc
nhm vn ng th lc l khng khc nhau.
Kt qu ca test so snh cc nhm vi nhau Bonferroni hay Scheffe u cho kt qu
tng t. iu ny gi rng khng c s khc bit mt cch c ngha ca trung bnh
cn nng gia cc nhm c hot ng th lc khc nhau.
2. Tng quan
- M file PULSE phn tch.
- Trc khi tnh h s tng quan, chng ta nn v 1 th scatterplot kho st bng
mt s lin quan gia 2 bin s
- Lnh s dng trong STATA: scatter y x, trong bin s y to thnh trc tung v bin
s x to thnh trc honh. Gi s ta kho st s lin quan gia 2 bin s pulse1 v pulse2.
. twoway (scatter pulse2 pulse1)

40

60

pulse2
80

100

120

Lp Phng php NCKH Chng trnh CUD UPNT03

50

60

70

80

90

100

pulse1

th scatter plots cho thy c s lin quan tuyn tnh gia 2 bin s pulse1 v pulse2.

Lp Phng php NCKH Chng trnh CUD UPNT03

Nh trong phn l thuyt chng ta bit: nu bin s x v y phn phi bnh thng th
tng quan gia 2 bin s s c tnh bng h s tng quan r hay cn gi h s
tng quan Pearson, hay cn gi product moment correlation coeficient.
Nu 2 bin s x v y phn phi khng bnh thng th tng quan s c tnh bng h
s tng quan Spearman
- tnh h s tng quan Pearson, lnh trong STATA nh sau:
+ Statistics -> Summaries, tables, and tests -> Summary and descriptive statistics ->
Correlations and covariances -> Chn bin s cn tnh tng quan -> Submit

Lp Phng php NCKH Chng trnh CUD UPNT03

- Kt qu c c nh sau:
. correlate pulse1 pulse2
(obs=90)
|
pulse1
pulse2
-------------+-----------------pulse1 |
1.0000
pulse2 |
0.5999
1.0000

Khng ging nh nhiu phn mm khc, STATA khng cho bit gi tr p ca php kim
chng minh h s tng quan ny khc khng, do ta phi dng cch khc tnh gi
tr p ny. Trong trng hp ny p <0.001, do ta c th kt lun rng 2 bin s ny tht
s tng quan vi nhau vi h s tng quan = 0.60.
- tnh h s tng quan Spearman, lnh trong STATA nh sau:
+ Statistics -> Summaries, tables, and tests -> Non-parametric tests of hypotheses ->
Spearmans rank correlation -> Chn bin s cn tnh tng quan -> Submit

Lp Phng php NCKH Chng trnh CUD UPNT03

Lp Phng php NCKH Chng trnh CUD UPNT03

- Kt qu c c nh sau:
. spearman pulse1 pulse2, stats(rho)
Number of obs =
Spearman's rho =

90
0.6393

Test of Ho: pulse1 and pulse2 are independent


Prob > |t| =
0.0000

Trong trng hp ny, STATA cho bit gi tr p ca php kim chng minh h s tng
quan ny khc khng, do ta phi dng cch khc tnh gi tr p ny. Trong trng
hp ny p <0.001, do ta c th kt lun rng 2 bin s ny tht s tng quan vi nhau
vi h s tng quan = 0.64
3. Hi qui tuyn tnh n gin
3.1 Phng trnh hi qui tuyn tnh
- Trc khi bo co kt qu ca hi qui tuyn tnh, chng ta phi m bo rng d liu
tha cc gi nh (xem phn l thuyt)
- M file PULSE phn tch.
- Cng ging nh khi tnh h s tng quan, trc tin chng ta nn v 1 th
scatterplot ca Y (trc tung) theo X (trc honh).
- Gi s ta kho st phng trnh hi qui ca bin s chiu cao trn bin s cn nng (cn
nng l bin s kt qu cn chiu cao l bin s gii thch)

40

60

weight

80

100

. twoway (scatter weight height)

150

160

170
height

180

190

- thc hin regression, lnh trong STATA nh sau:

Lp Phng php NCKH Chng trnh CUD UPNT03

+ Statistics -> Linear model and related -> Linear regression -> Chn bin s cn lm
regression -> Submit

Lp Phng php NCKH Chng trnh CUD UPNT03

- Kt qu c c nh sau:
. regress weight height
Source |
SS
df
MS
-------------+-----------------------------Model | 6180.73703
1 6180.73703
Residual | 3913.21949
90 43.4802166
-------------+-----------------------------Total | 10093.9565
91 110.922599

Number of obs
F( 1,
90)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

92
142.15
0.0000
0.6123
0.6080
6.594

-----------------------------------------------------------------------------weight |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------height |
.9015106
.075613
11.92
0.000
.7512921
1.051729
_cons | -90.59713
12.98666
-6.98
0.000
-116.3974
-64.79685
------------------------------------------------------------------------------

Phn trn ca bng cho ta kt qu ca sum of square (thng l khng s dng). Chng
ta bit c s i tng kho st l 92. Kt qu ca R-squared = 0.6123, cho chng ta
bit 61% cn nng c gii thch bng mi lin quan tuyn tnh vi chiu cao.
Phn di ca bng kt qu cho ta con s c lng ca Y intercept - (a) (trong bng kt
qu, n c hiu l _cons) v dc - . Phng trnh hi qui nh sau:
cn nng = -90.6 + 0.9 chiu cao.
Vi php kim H0: = 0, s thng k tnh c t = 11.9, c p < 0.001, do ta loi b
gi thit H0 cho rng dc bng 0, v kt lun rng tht s c mi lin quan tuyn tnh
mt cch c ngha gia chiu cao v cn nng.
3.2 Tin on kt qu sau khi chy lnh regression
Sau khi chng ta chy lnh regress, STATA s lu tr cc h s c c lng v cc
sai s chun trong cc bin s gi l bin s h thng. Cc bin s h thng ny s
10

Lp Phng php NCKH Chng trnh CUD UPNT03

c lu li trong b nh cho n khi chng ta chy lnh regression khc. Cc bin s


h thng ny s c s dng bi cc lnh predict tnh:
gi tr tin on (predicted values),
s d (residuals),
sai s chun ca gi tr trung bnh c c lng ca Y,
sai s chun ca gi tr c c lng ca Y.
V d to ra mt bin s mi c gi l pweigh c cha cc gi tr tin on cn
nng da trn chiu cao trong b s liu, ta dng lnh sau:
predict pweight
* Lu : Lnh predict khng to ra kt qu no ht, thy c kt qu ca vic to ra
pweight ta dng lnh sum
sum weight pweight
. sum

weight pweight

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------weight |
92
64.02174
10.53198
42
95
pweight |
92
64.02174
8.24137
46.43248
77.98535

4. Hi qui tuyn tnh a bin


Bi tp: Gi s chng ta mun kho st cn nng tr em suy dinh dng thay i theo
chiu cao v tui nh th no. Bin s kt qu (outcome) l y = wgt v bin s gii thch
l x1 = hgt v x2 = age. Mt mu ngu nhin 12 tr c rt ra t s tr bnh ca tri A.
Cn nng (wgt), chiu cao (hgt) v tui (age) nh sau:
Cn nng (wgt) Chiu cao (hgt) Tui (age)
64
57
8
71
59
10
53
49
6
67
62
11
55
51
8
58
50
7
77
55
10
57
48
9
56
42
10
51
42
6
76
61
12
68
57
9
- Lnh s dng trong STATA:
+ Statistics -> Multivariate analysis -> MANOVA, multivariate regression and related
-> Multivariate regression -> Nhp bin s cn chy regression -> Submit
regress wgt hgt age

11

Lp Phng php NCKH Chng trnh CUD UPNT03

- Kt qu c c nh sau:
. regress wgt hgt age
Source |
SS
df
MS
Number of obs =
12
-------------+-----------------------------F( 2,
9) =
15.95
Model | 692.822607
2 346.411303
Prob > F
= 0.0011
Residual | 195.427393
9 21.7141548
R-squared
= 0.7800
-------------+-----------------------------Adj R-squared = 0.7311
Total |
888.25
11
80.75
Root MSE
= 4.6598
-----------------------------------------------------------------------------wgt |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hgt |
.722038
.2608051
2.77
0.022
.1320559
1.31202
age |
2.050126
.9372256
2.19
0.056
-.0700253
4.170278
_cons |
6.553048
10.94483
0.60
0.564
-18.20587
31.31197
------------------------------------------------------------------------------

Kt qu cho thy F = 346.41/21.71 = 15.96 vi (2,9) t do. Gi tr p ca F-test =


0.0011 gi rng chng ta nn loi b H0 v kt lun l c 1 t l ng k thay i ca
cn nng c gii thch bi chiu cao v tui.
R-squared = 0.78, cho chng ta bit chiu cao v tui gii thch c 78% s thay i
ca cn nng c gii thch bng mi lin quan tuyn tnh vi chiu cao.
Adjusted R-square l t l thay i ca y c gii thch bng phng trnh hi qui.
y, adjusted R-square = 0.73, cho thy sau khi hiu chnh R cho 2 bin s c a vo
m hnh, chng ta c th gii thch c khong 73% s thay i ca y.
Phng trnh hi qui s c a= 6.55, b1 = 0.72, b2 = 2.05
iu ny c ngha l nu chiu cao = 0, tui = 0 th cn nng l 6.55 pounds. Hoc c
chiu cao tng 1cm th cn nng tng 0.72 pounds (nu tui gi nguyn), hoc nu tui
tng 1 nm th cn nng tng 2.95 pounds (nu chiu cao gi nguyn).

12

Lp Phng php NCKH Chng trnh CUD UPNT03

BI TP THC HNH
Bi tp 1
S dng file lowbwt tr li nhng cu hi sau y:
a) To ra th two-way scatter plot ca bin s sbp (HA tm thu) v gestage (tui
thai).
b) th c a ra gi g v mi lin quan nhng bin s ny khng?
c) Gi s sbp l bin s kt qu, hy vit phng trnh hi qui th hin mi lin quan
gia sbp v tui thai. Din gii the slope and the y-intercept of the line.
d) mc ngha 0.05, hy kim nh gi thit H0: = 0.
e) Hy tin on HA tm thu ca thai ph c tui thai l 31 tun
Bi tp 2
1) Hy dng STATA nhp s liu sau, trong :
Y = Trung bnh huyt p ng mch (mm Hg)
X1 = Tui (nm)
X2 = Cn nng (kg)
X3 = Din tch b mt c th (m2)
X4 = Thi gian b CHA (nm)
X5 = Mch (ln/pht)
X6 = S o mc tress

2) Nhng bin s no lin quan mt cch c ngha vi trung bnh huyt p ng mch

13

You might also like