You are on page 1of 9

HI QUY TUYN TNH A BIN

L Tn Phng*

Khi nim
Hi quy tuyn tnh a bin (Multiple Linear Regression, vit tt l MLR) ging nh hi quy
tuyn tnh n, ch khc ch thay v ch c 1 bin c lp th hi quy tuyn tnh a bin c
t 2 bin c lp tr ln. Mi bin c lp c h s dc (slope) ring ca n. Phng trnh
hi quy tuyn tnh a bin c th c biu din nh sau:
Yi = 0 + 1Xi1 + 2Xi2 ++ nXin + i

Cc bc phn tch MLR


Bc 1:
c lng phng trnh hi quy (model)

Bc 2:
nh gi m hnh. Bao gm:
1. ngha thng k ca m hnh (ANOVA F test)
2. Kho st ngha ca tng bin c lp (Xi)
3. Kim tra gi nh (assumptions) ca m hnh, cn gi l kim tra tnh gi tr ca m
hnh (model validity). Bao gm:
a. Phn tch phn d: Phn d c tnh c lp, phng sai ging nhau (gi l
homoscedasticity), v trung bnh bng zero.
b. Tnh phn phi bnh thng ca phn d
c. Multicollinearity
4. Gii thch Adj-R2
5. n gin ha m hnh (Parsimony)

Bc 3:
Tin on, bao gm:
-

Khong tin cy cho cc h s (coefficicent) ca m hnh.


Test gi thuyt lin quan n tng bin c lp
Lng gi (gauge) nh hng ca tng bin c lp xi ln y: h s chun ha
(standardized )

Bc s, Thc s Y t cng cng

Cc v d v hi quy a bin s dng Stata


Hi quy a bin c din gii thng qua 2 v d vi 2 m hnh khc nhau:
Hi quy tuyn tnh a bin vi cc bin c lp u l bin lin tc
Hi quy a bin vi cc bin c lp c cha c bin lin tc ln bin phn loi

V d 1: Mt kho st trn 25 hc sinh v im hc tp cui kha sau khi d 3 bi kim tra


ti 3 thi im khc nhau. Cc bin ny ln lt c k hiu l final, exam1, exam2, v
exam3. Tm hiu kt qu hc tp cui kha c lin quan tuyn tnh vi im s ca 3 bi
kim tra hay khng.
Trc ht, cn kim tra tnh cht quan h tuyn tnh ca cc bin vi nhau bng cch v biu
ma trn hnh chm cho 4 bin s ny. Trong Stata, s dng lnh graph matrix. Kt qu
nh hnh bn di cho thy quan h tuyn tnh gia cc bin.

40

60

80

100

60

80

100
200

final

150

100

100
80

exam1
60
40

100
80

exam2
60
40

100
80

exam3

60
100

150

200

40

60

80

100

Bc 1: c lng m hnh hi quy. Chy hi quy vi Stata, ta c kt qu sau y:


. regress

final exam1 exam2 exam3

Source |
SS
df
MS
-------------+-----------------------------Model | 13731.5148
3 4577.17161
Residual | 143.445179
21 6.83072279
-------------+-----------------------------Total |
13874.96
24 578.123333

Number of obs
F( 3,
21)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

25
670.09
0.0000
0.9897
0.9882
2.6136

-----------------------------------------------------------------------------final |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------exam1 |
.3559382
.1213889
2.93
0.008
.1034962
.6083802
exam2 |
.5425188
.1008495
5.38
0.000
.3327908
.7522467
exam3 |
1.167444
.1030141
11.33
0.000
.9532148
1.381674
_cons | -4.336102
3.764226
-1.15
0.262
-12.16424
3.492034
------------------------------------------------------------------------------

Bc 2: nh gi m hnh, bao gm:


1. ngha thng k ca m hnh: M hnh hi quy trn c ngha thng k v test F
cho kt qu 670.09 v p = 0.000 (p trong kt qu trn c k hiu l Prob > F).
2. Kim tra ngha ca cc bin c lp, tc l ngha ca cc h s gn vi n.
Nhn vo kt qu hi quy, ta thy cc h s gn vi cc bin c lp exam1,
exam2 v exam 3 ln lt l 0.3559, 0.5425, v 1.1674. Ta gii thch cc h s
ny nh sau: Khi im ca bi kim tra exam1 tng ln 1 n v th im cui
kha (final) ca hc sinh tng ln 0.3599 im vi iu kin l im s ca
exam1 v exam2 gi khng i. Cc h s khc gii thch tng t.
3. Kim tra cc gi nh ca m hnh, i khi cn gi l chn on hi quy
(Regression Diagnostics): Bao gm kim tra tnh cht tuyn tnh ca bin ph
thuc vi cc bin c lp; kim tra phn d (phn phi bnh thng, c lp, c
cng phng sai).
o Kim tra tnh cht tuyn tnh ca bin ph thuc vi cc bin c lp. Cch
n gin nht l lp ma trn biu chm. Kt qu ny c th hin
hnh trn.
o Kim tra phn d: kim tra tnh phn phi bnh thng ca phn d, ta v
biu phn phi km theo phn phi chun so snh. Trong Stata, lnh
kdensity vi option normal s cho biu nh hnh di y:

.05

Density
.1

.15

.2

Kernel density estimate

-5

10

Residuals
Kernel density estimate
Normal density
kernel = epanechnikov, bandwidth = .85

kim tra tnh c lp, cng phng sai ca phn d, cch n gin l v biu
chm ca phn d i vi cc gi tr tin on (fitted values) ca bin ph thuc. C th
trong v d ny ta v biu chm ca r (phn d) vi fitted values ca final. Stata c lnh
rvfplot thc hin biu ny nh hnh di y:
rvfplot, yline(0)

10
5
Residuals
0
-5
100

120

140
160
Fitted values

180

200

Ta thy rng biu chm khng cho thy mt hnh dng hay xu hng c bit no v cc
gi tr gn nh xoay quanh tr trung bnh bng zero (ng ngang mu ). Biu ny gi
mt phn d c lp, c phng sai bng nhau v trung bnh bng zero.
4. Kim tra multicollinearity: Stata dng lnh vif (Variance Inflation Factor) tnh
ton gi tr VIF. Kt qu cho thy c 3 bin u c VIF>5. Do , cn phi kim
tra li cc bin c lp ny. Tuy nhin, thng thng VIF<5 l tt, VIF t 5-10 c
th chp nhn c, nhng cn phi xem xt li khi VIF>10.
. vif
Variable |
VIF
1/VIF
-------------+---------------------exam1 |
7.81
0.128093
exam2 |
5.59
0.178990
exam3 |
5.16
0.193750
-------------+---------------------Mean VIF |
6.19

Bc 3:
-

Khong tin cy ca cc h s: Xem bng kt qu hi quy trn.


Lng gi tnh tc ng ca tng h s: xem xt mc tc ng ca tng bin s
c lp i vi bin s ph thuc, trong hi quy tuyn tnh a bin, ngi ta s dng
h s beta chun ha (standardized coefficient).
Kt qu ca stata cho h s chun ha trong v d ny (s dng ty bin beta ngay
sau lnh regression) nh sau:

. regress

final exam1 exam2 exam3,beta

Source |
SS
df
MS
-------------+-----------------------------Model | 13731.5148
3 4577.17161
Residual | 143.445179
21 6.83072279
-------------+-----------------------------Total |
13874.96
24 578.123333

Number of obs
F( 3,
21)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

25
670.09
0.0000
0.9897
0.9882
2.6136

-----------------------------------------------------------------------------final |
Coef.
Std. Err.
t
P>|t|
Beta
-------------+---------------------------------------------------------------exam1 |
.3559382
.1213889
2.93
0.008
.1817819
exam2 |
.5425188
.1008495
5.38
0.000
.2821267
exam3 |
1.167444
.1030141
11.33
0.000
.5712626
_cons | -4.336102
3.764226
-1.15
0.262
.
------------------------------------------------------------------------------

H s beta chun ha c th hin ti ct ngoi cng bn phi di tn ct l


Beta. Da vo h s chun ha ny, ta c th kt lun nh sau:
o Mc tc ng ca im s exam3 ln hn exam2 v exam2 ln hn exam1
i vi kt qu ca im s cui kha (v 0.5713>0.2821>0.1818)
o Khi im s exam1 tng ln 1 lch chun (ch khng phi 1 n v nh
trn) th im s cui kha tng ln 0.1818 lch chun. Cch gii thch
tng t cho exam2 v exam3.

V d 2: Mt phn tch a bin c thc hin tm hiu im hi lng ca bnh nhn


(trong gii hn t 26 n 130 im) i vi cc c im v tui, khong cch t nh n
bnh vin v tnh trng hn nhn. Cc bin c k hiu theo th t nh sau: scalescore, age,
distance_r, v marital. Khong cch t nh n bnh vin c chia thnh 4 mc: t 5km tr
xung, t 5-10km, t >10-20 km, v >20km. Tnh trng hn nhn c chia lm 4 loi: c
thn, c gia nh, ly d, ga.
Nh vy trong v d ny, cc bin c lp gm c 1 bin lin tc (tui) v 2 bin phn loi
(khong cch n bnh vin v tnh trng hn nhn).
phn tch hi quy ny, iu cn thit l phi bin i cc bin phn loi thnh nhiu bin
nh phn (dummy variable hoc cn gi l indicator variable). V d nh i vi bin tnh
trng hn nhn c 4 gi tr khc nhau, ta c th to thnh cc bin marital1, marital2,
marital3, marital4 i din cho 4 gi tr ca marital nh sau:
Marital1: C gi tr l 1 nu c thn, ngoi ra th c gi tr l zero
Marital2: C gi tr l 1 nu c gia nh, ngoi ra th c gi tr l zero
Marital3: C gi tr l 1 nu ly d, ngoi ra th c gi tr l zero
Marital4: C gi tr l 1 nu ga, ngoi ra th c gi tr l zero
Vic to cc bin nh phn cho bin khong cch n bnh vin (distance_r) tng t nh
trn. Nh vy, t 1 bin phn loi c n gi tr khc nhau, ta to c n bin nh phn khc
nhau, mi bin ch c gi tr 0 hoc 1.
Khi phn tch hi quy, mt bin nh phn ng vai tr nh bin tham chiu ( so snh cc
bin nh phn khc vi bin tham chiu ny).
5

Tr li v d trn, phn tch hi quy cho kt qu nh sau:


Bc 1: Xc nh m hnh hi quy
. xi: regress scalescore age i.distance_r i.marital
i.distance_r
_Idistance__1-4
(naturally coded; _Idistance__1 omitted)
i.marital
_Imarital_1-4
(naturally coded; _Imarital_1 omitted)
Source |
SS
df
MS
-------------+-----------------------------Model | 10080.4753
7
1440.0679
Residual | 109371.619
822 133.055498
-------------+-----------------------------Total | 119452.095
829 144.091791

Number of obs
F( 7,
822)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

830
10.82
0.0000
0.0844
0.0766
11.535

-----------------------------------------------------------------------------scalescore |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age |
.0474926
.0276755
1.72
0.087
-.0068304
.1018155
_Idistance~2 | -1.441733
1.065138
-1.35
0.176
-3.532444
.6489789
_Idistance~3 |
3.085925
1.422831
2.17
0.030
.2931151
5.878735
_Idistance~4 |
6.846812
1.019919
6.71
0.000
4.844859
8.848764
_Imarital_2 |
1.3647
1.383331
0.99
0.324
-1.350578
4.079978
_Imarital_3 |
4.039166
4.593545
0.88
0.379
-4.977293
13.05562
_Imarital_4 |
4.446256
2.285178
1.95
0.052
-.039215
8.931728
_cons |
77.98277
1.558111
50.05
0.000
74.92442
81.04111
------------------------------------------------------------------------------

Lu : Lnh hi quy trn c thc hin trn Stata 10. i vi Stata t 12 tr i th khng
cn s dng tin t xi: trc lnh regression na.
Gii thch mt s k hiu:
Trong v d trn, cc bin _Idistance~2, _Idistance~3 v _Idistance~4 c to ra t bin
distance_r. Khng c s hin din ca bin _Idistance~1, v _Idistance~1 ng vai tr nh
bin tham chiu (reference variable) cho nn khng th hin trong bng kt qu. Nh vy, cc
h s km theo cc bin ny th hin s so snh trc tip vi bin tham chiu, c ngha l
_Idistance~2 so snh vi _Idistance~1, _Idistance~3 so snh vi _Idistance~1, v
_Idistance~4 so snh vi _Idistance~1. Tnh hung tng t nh vy i vi bin
_Imarital_2, _Imarital_3 v _Imarital_4.
Bc 2:
-

ngha ca m hnh: Da vo kt qu trn, ta thy test F = 10.82, v p = 0.000 (p


trong kt qu trn chnh l k hiu Prob > F). Nh vy, m hnh c ngha thng k
(p < 0.01).
Gii thch tng h s gn vi cc bin c lp ca m hnh: y ta c cc h s
tng ng vi tui (age), khong cch n bnh vin (distance_r), v tnh trng hn
nhn (marital). Kt qu trn c gii thch nh sau:
o Khi i tng (bnh nhn) tng ln 1 tui th im s hi lng ca ngi
tng ln 0.047 im.
o i vi bin khong cch n bnh vin (distance_r) th ch c nhm 3 v
nhm 4 l khc nhau c ngha thng k so vi nhm 1 (p = 0,030 v 0,000).
Do , c th ni nhm 3 (t >10-20km) c im hi lng hn nhm 1 (5km)
l 3.01 im; nhm 4 (>20km) c im hi lng hn nhm 1 l 6.85 im.
6

o i vi bin tnh trng hn nhn, khng c nhm no khc vi nhm 1 (c


thn) mt cch c ngha thng k (p >0.05).
Gii thch Adj-R2: Kt qu cho thy R2 = 0.0766, tc l khong 7.7 % bin thin ca
scalescore c gii thch bi m hnh ny.
Phn tch phn d:
o Phn phi bnh thng ca phn d: c khng nh qua hnh di y c
thc hin bi lnh kdensity r (r k hiu cho phn d):

.01

Density
.02

.03

.04

Kernel density estimate

-40

-20

0
Residuals

20

40

Kernel density estimate


Normal density
kernel = epanechnikov, bandwidth = 2.55

-40

-20

Residuals
0

20

40

o Tnh ng phng sai v c lp: c khng nh qua hnh bn di thng


qua lnh rvfplot, yline(0):

75

80

85
Fitted values

90

95

Kim tra multicollinearity: S dng lnh vif nh trn sau khi phn tch hi quy cho
thy khng c bin noo c VIF>5. Do khng c hin tng multicollinearity,

. vif
Variable |
VIF
1/VIF
-------------+---------------------_Imarital_4 |
2.05
0.487912
_Imarital_2 |
1.76
0.568755
age |
1.33
0.751877
_Idistance~2 |
1.18
0.849218
_Idistance~4 |
1.17
0.851353
_Idistance~3 |
1.14
0.879844
_Imarital_3 |
1.10
0.908484
-------------+---------------------Mean VIF |
1.39

Bc 3:
-

Khong tin cy cho tng h s: Xem bng kt qu hi quy trn.


Lng gi tnh tc ng ca tng bin c lp: Da trn kt qu chun ha tng t
nh v d 1. S dng option beta sau lnh regression s cho kt qu h s chun ha
nh sau:

. xi: regress scalescore age i.distance_r i.marital,beta


i.distance_r
_Idistance__1-4
(naturally coded; _Idistance__1 omitted)
i.marital
_Imarital_1-4
(naturally coded; _Imarital_1 omitted)
Source |
SS
df
MS
-------------+-----------------------------Model | 10080.4753
7
1440.0679
Residual | 109371.619
822 133.055498
-------------+-----------------------------Total | 119452.095
829 144.091791

Number of obs
F( 7,
822)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

830
10.82
0.0000
0.0844
0.0766
11.535

-----------------------------------------------------------------------------scalescore |
Coef.
Std. Err.
t
P>|t|
Beta
-------------+---------------------------------------------------------------age |
.0474926
.0276755
1.72
0.087
.0660505
_Idistance~2 | -1.441733
1.065138
-1.35
0.176
-.0490217
_Idistance~3 |
3.085925
1.422831
2.17
0.030
.07717
_Idistance~4 |
6.846812
1.019919
6.71
0.000
.2428217
_Imarital_2 |
1.3647
1.383331
0.99
0.324
.0436584
_Imarital_3 |
4.039166
4.593545
0.88
0.379
.0307896
_Imarital_4 |
4.446256
2.285178
1.95
0.052
.0929658
_cons |
77.98277
1.558111
50.05
0.000
.
------------------------------------------------------------------------------

Kt qu trn cho thy mc tc ng ca khong cch n bnh vin >20km l c tc ng


ln nht n im s hi lng ca bnh nhn (h s beta chun ha l 0.24).

TI LIU THAM KHO CHNH


Bi ging ca lp Phng php nghin cu nh lng nng cao (Advanced Qualitative
Research Methods), k hiu HLN706, Queensland University of Technology, Australia.

Chen, X., Ender, P., Mitchell, M. and Wells, C. (2003). Regression with Stata, from
http://www.ats.ucla.edu/stat/stata/webbooks/reg/default.htm

You might also like