You are on page 1of 8

1

HI QUY LOGISTIC
L Tn Phng
*

NHNG THNG TIN C BN
1. M hnh hi quy logistic gm 3 loi chnh:
- Hi quy logistic nh bin (binary logistic regression), hay gi n gin l hi quy
logistic. y l m hnh hay gp nht v hay s dng nht trong cc nghin cu. M
hnh ny c bin ph thuc l bin nh phn, c ngha l bin ch c 2 gi tr m thi.
V d nh sng hay cht, c bnh hay khng c bnh, thnh cng hay tht bi, phi
nhim hay khng phi nhim.
- Hi quy logistic nh danh (nominal logistic regression): Khi bin ph thuc l mt
bin nh danh c trn 2 gi tr. V d nh bin ph thuc l ngh nghip th cc gi
tr c th c l nng dn, cng nhn, cn b, hu tr. Cc gi tr ny thng khng
mang tnh xp loi hoc th t.
- Hi quy logistic th t (ordinal logistic regression): Khi bin ph thuc c trn 2 gi
tr v cc gi tr ny c tnh xp loi hoc th t. V d, bin nhn thc c th phn
loi thnh cao, trung bnh, thp; bin kt qu iu tr c th phn loi tt, trung bnh,
km v.v...
Hai loi hi quy sau c gi l hi quy a gi tr (polytomous logistic regression). Bi vit
ny ch ni v hi quy logistic nh bin, thng vn c gi tt l hi quy logistic.
2. Hi quy logistic l:
- Hi quy kinh in nht, c xy dng cho bin ph thuc l bin nh phn.
- Lin quan n xy dng m hnh cho t sut chnh (OR)
- S dng hm ni l hm logit (xem hm ni trong bi Generalized Linear Models),
c biu din nh sau:



3. Cch trnh by kt qu hi quy logistic
- Thng k m t: t l phn trm, cc s m khc lin quan n s liu
- T l gia cc nhm
- Khng bao gi trnh by h s beta m ch trnh by t sut chnh (OR). T sut chnh
ny chnh l c s e ly tha beta: OR = e


- T sut chnh th (Crude Odd Ratios)
- T sut chnh hiu chnh (Adjusted Odd Ratios) v khong tin cy 95%

*
Bc s, Thc s Y t cng cng
n n
controls
cases
x x
odds
odds
| | | + + + =
|
|
.
|

\
|
... log
1 1 0
2

- Gii thch ngha
- V biu nu c s tng tc (interaction).
V D MINH HA HI QUY LOGISTIC
V d di y s minh ha cho hi quy logistic s dng 2 phn mm thng k thng dng l
SPSS v STATA. tin theo di v so snh, ch 1 v d s c phn tch trn 2 phn mm
khc nhau.
B s liu c tn hsb2 c download t trang web ca trng UCLA theo a ch di y:
http://www.ats.ucla.edu/stat/data/hsb2.sav
B s liu tp hp cc thng tin ca 200 hc sinh cng im s hc tp, gm nhng bin sau
y:
id: ID ca hc sinh
female: Gii tnh ca hc sinh, bng 1 nu l n, bng 0 nu l nam
race: Dn tc, bao gm 1: Gc Ty ban nha, 2: Gc chu , 3: Gc chu Phi, 4: Da trng
ses: iu kin kinh t, t thp, trung bnh, n cao, tng ng vi cc gi tr 1,2,3
schtyp: Loi trng, bng 1 l trng cng, bng 2 l trng t
prog: Chng trnh hc, bng 1: tng qut, 2: hn lm, 3: dy ngh
read, write, math, science: Ln lt l im cc mn c, vit, ton, khoa hc
socst: im khoa hc x hi
Ta s to mt bin mi, t tn l honcomp c to ra t bin write vi iu kin nhng
trng hp no c im write >=60 th honcomp nhn gi tr 1, ngc li s nhn gi tr 0.
(honcomp l vit tt ca t honors composition, tm dch l bi vit tt). Nh vy honcomp l
bin nh phn. Bin ny s ng vai tr l bin ph thuc cho phn tch logistic regression
tip theo.
Cu hi t ra l tm hiu mi lin quan gia tnh trng bi vit (tt hay khng tt) vi cc
yu t lin quan l im c, im khoa hc v iu kin kinh t ca hc sinh. C ngha l ta
phi tm mi lin quan gia bin ph thuc l honcomp vi 3 bin c lp l read, science, v
ses. Trong 3 bin c lp ny, bin ses l bin phn loi, cn 2 bin cn li l bin lin tc.
Phn tch bng SPSS
Sau khi m file hsb2.sav, to mi bin honcomp theo tiu chun nh trn.
Chy logistic regression bng cch: Analyze, Regression, Binary Logistic.
Chn bin honcomp cho vo khung Dependent. Sau chn ln lt cc bin read,
science, v ses cho vo khung Covariates.
3

Do ses l bin phn loi cho nn ta cn chn tip Categorical t ca s ny. Ca s mi xut
hin. T ca s mi ny, chn bin ses trong khung Covariates bn tri a sang khung
Categorical Covarites bn phi. Cc mc nh gi nguyn.
Chn Continue.
Tip tc chn Option, click chn vo CI for exp(B). Chn Continue, cui cng chn OK.
Output ca SPSS cho cc kt qu sau y:
Case Processing Summary
Unweighted Cases
a
N Percent
Selected Cases Included in Analysis 200 100.0
Missing Cases 0 .0
Total 200 100.0
Unselected Cases 0 .0
Total 200 100.0
a. If weight is in effect, see classification table for the total
number of cases.

Bng trn cho kt qu s b v b s liu: C 200 trng hp c a vo phn tch
(Included in Analysis), khng c trng hp no b mt s liu (Missing cases), khng c
trng hp no khng c chn (Unselected cases)

Dependent Variable Encoding
Original Value Internal
Value
.00 0
1.00 1

Bng trn thng tin v m ha bin ph thuc (dependent variable) t ngun s liu (Original
value) v thc t m ho ca SPSS (Internal Value). C 2 u m ho nh nhau.

Categorical Variables Codings

Frequency
Parameter coding
(1) (2)
ses low 47 1.000 .000
middle 95 .000 1.000
high 58 .000 .000

Bng trn cho thng tin v m ho bin phn loi ses, km theo tn sut ca tng gi tr
(Frequency).

4

Block 0: Beginning Block
Classification Table
a,b

Observed Predicted
honors composition
Percentage
Correct .00 1.00
Step 0 honors composition .00 147 0 100.0
1.00 53 0 .0
Overall Percentage 73.5
a. Constant is included in the model.
b. The cut value is .500

Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 0 Constant -1.020 .160 40.540 1 .000 .361

Variables not in the Equation
Score df Sig.
Step 0 Variables read 47.906 1 .000
science 34.862 1 .000
ses 14.783 2 .001
ses(1) .302 1 .582
ses(2) 8.666 1 .003
Overall Statistics 58.644 4 .000

Ba bng trn cho thng tin v Block 0, tc l thng tin phn tch khi khng c bin c lp
no c a vo m hnh. Ta s khng quan tm n cc bng ny v ta ang mun tm hiu
m hnh c y 3 bin c lp. Chnh v vy, cc bng di y s cho kt qu ca m
hnh c y 3 bin c lp. Cc kt qu ny c th hin di tiu : Block 1. Phng
php hi quy ang dng l phng php Enter, tc l cho ton b 3 bin c lp vo m hnh
cng mt lc. Phng php ny phn bit vi cc phng php backward, forward, stepwise,
block. Trong ni dung bi vit ny, ch cp phng php enter.
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 65.588 4 .000
Block 65.588 4 .000
Model 65.588 4 .000

Bng trn cho kt qu phn tch cc h s ca m hnh. Step 1 l bc th nht trong chy
m hnh logistic vi SPSS. V ta ch dng phng php Enter, cho nn ch c mt bc m
5

thi. Trng hp dng phng php block, stepwise th kt qu s cho thm nhng bc
khc (step 2, step 3 v.v...).
Ct Chi-square v Sig. cho kt qu ca test Chi bnh phng v gi tr p. Tt c cc gi tr
Chi bnh phng u nh nhau cho Step, Block v Model v ta ang s dng phng php
Enter, khng s dng phng php stepwise hoc block. Kt qu bng trn cho thy gi tr p
nh hn 0.001 cho nn m hnh c ngha thng k.
Ct df l t do ca m hnh.
Model Summary
Step
-2 Log likelihood
Cox & Snell R
Square
Nagelkerke R
Square
1 165.701
a
.280 .408
a. Estimation terminated at iteration number 6 because
parameter estimates changed by less than .001.

Bng trn cho kt qu tm tt ca m hnh. Ct -2 Log likelihood l gi tr -2 Log
Likelihood ca m hnh (thng k hiu l -2LL). Gi tr ny thng khng cho nhiu thng
tin.
Hai ct Cox & Snell R Square v Nagelkerke R Square l gi tr ca R
2
gi (pseudo-R
2
).
Hi quy logistic khng s dng gi tr R
2
gi nh trong trng hp hi quy tuyn tnh. Cc
gi tr ny c th dng so snh cc m hnh khc nhau trn cng mt b s liu, cng mt
bin ph thuc xem m hnh no tt hn. M hnh tt hn s c R
2
gi ln hn. Thng tin
lin quan n R
2
gi c th tham kho t trang web ca UCLA:
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm

Classification Table
a

Observed Predicted
honors composition
Percentage
Correct .00 1.00
Step 1 honors composition .00 132 15 89.8
1.00 26 27 50.9
Overall Percentage 79.5
a. The cut value is .500

Bng trn cho kt qu phn tch ca bin ph thuc honcomp. Ct Observed cho kt qu v 2
gi tr ca bin ny: 0 v 1. Ct Predicted cho gi tr tin on ca bin honcomp da trn m
hnh. Bng ny cho gi tr tin on ng ca m hnh so vi thc t quan st. Nh v d trn,
m hnh tin on ng 132 trng hp i vi honcomp bng 0 v tin on sai 15
trng hp. Do , kt qu tin on ng l 89.8% (Ct Percentage Correct). Tng t nh
vy i vi gi tr 1 ca honcomp.
6

Overall Percentage cho thy t l tin on ng ca m hnh, trong trng hp ny l 79,5%.
So vi kt qu Block 0 trn ,ta thy m hnh tin on tt hn (t 73.5% ln 79.5%).
Variables in the Equation

B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B)
Lower Upper
Step 1
a
read .098 .025 15.199 1 .000 1.103 1.050 1.158
science .066 .027 5.867 1 .015 1.068 1.013 1.127
ses 6.690 2 .035
ses(1) .058 .532 .012 1 .913 1.060 .373 3.010
ses(2) -1.013 .444 5.212 1 .022 .363 .152 .867
Constant -9.561 1.662 33.112 1 .000 .000
a. Variable(s) entered on step 1: read, science, ses.

Bng trn cho nhiu thng tin v m hnh lin quan n cc bin.
Ct B cho thng tin v gi tr ca phng trnh hi quy logistic, hay ni cch khc, l h
s tng ng vi tng bin c lp. Cc gi tr ny c tnh l log odds, theo phng trnh
di y:
4 4 3 3 2 2 1 1 0
)
1
log( x b x b x b x b b
p
p
+ + + + =


Thay vo cc gi tr ca bng trn, ta c phng trnh logistic cho v d trn:
) 2 ( * 013 , 1 ) 1 ( * 058 , 0 * 066 , 0 * 098 , 0 561 , 9 )
1
log( ses ses science read
p
p
+ + + =


Phng trnh trn cho ta bit mi lin h gia bin ph thuc v cc bin c lp. Bin ph
thuc c tnh theo thang o ca hm logit. Cc h s ny cho ta bit s tng (gim) ca log
odds bin ph thuc l bao nhiu khi tng (gim) 1 n v ca bin c lp khi cc bin c
lp khc gi nguyn gi tr. Tuy nhin, do cc h s ca bin c lp ny c tnh theo n
v log odds, cho nn d gii thch, ngi ta chuyn i sang Odds Ratio (OR), c biu
hin ti ct Exp(B).
Cc ct S.E, Wald, df, Sig. l gi tr ca sai s chun (ca h s), t do, gi tr ca test
Wald, v gi tr p.
p dng OR gii thch kt qu ca v d trn:
read: Odds ca honcomp s tng ln 1,103 ln nu im mn read tng ln 1 n v trong
trng hp im ca mn science v tnh trng kinh t (ses) gi nguyn gi tr.
Science: Odds ca honcomp s tng ln 1,068 ln nu im ca mn science tng ln 1 n
v trong trng hp im ca mn read v tnh trng kinh t (ses) gi nguyn gi tr.
ses: Bng phn tch cho thy gi tr p ca ses c ngha thng k. Tuy nhin, i vi bin ses
l bin phn loi, cch gii thch s hi khc. Lu trong trng hp ny, bin ses c bin
7

i thnh cc dummy variable vi gi tr tham chiu l gi tr 3 (thu nhp cao). t gi tr no
l gi tr tham chiu c thc hin mc Reference category (Last hay First) khi chn mc
Categorical... t ca s lnh binary logistic cho bin ny.
Tuy nhin, khng c kt qu OR cho ses v bn thn ses khng c a vo m hnh do ses
c bin i thnh cc dummy variables. Kt qu cho thy ch c ses(2) l khc bit c
ngha thng k so vi nhm tham chiu l 3 (thu nhp cao). Do , c th gii thch l odds
ca honcomp s gim i 1-0.867=0.133 ln hc sinh c thu nhp trung bnh so vi hc sinh
c thu nhp cao trong trng hp im ca mn read v mn science gi nguyn gi tr.
Phn tch bng STATA
Chi tit gii thch l ging nh trnh by trong phn phn tch vi SPSS. y s minh
ha lnh STATA s dng (ch mu ) v trnh by kt qu ca STATA.
Xi:logistic honcomp read science i.ses
i.ses _Ises_1-3 (naturally coded; _Ises_1 omitted)

Logistic regression Number of obs = 200
LR chi2(4) = 65.59
Prob > chi2 = 0.0000
Log likelihood = -82.850368 Pseudo R2 = 0.2836

------------------------------------------------------------------------------
honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.102714 .0276551 3.90 0.000 1.049822 1.158271
science | 1.068141 .0290699 2.42 0.015 1.012658 1.126664
_Ises_2 | .3426752 .1800591 -2.04 0.042 .1223538 .9597268
_Ises_3 | .943259 .5022617 -0.11 0.913 .3321907 2.678393
------------------------------------------------------------------------------

Nu s dng STATA version 11 tr ln th khng cn dng tin t xi: pha truc lnh
logistic. Kt qu trn c chy t STATA version 10.
Lu vi kt qu ca STATA ny, i vi bin ses th gi tr tham chiu l 1 (kinh t thp)
thay v 3 nh vi SPSS. Ta c th thay i bng cch thm ib3 vo trc tn bin ses, nh
sau (chy vi STATA version 12, mc ch minh ho s khc bit gia 2 version):
. logistic honcomp read science ib3.ses

Logistic regression Number of obs = 200
LR chi2(4) = 65.59
Prob > chi2 = 0.0000
Log likelihood = -82.850368 Pseudo R2 = 0.2836

------------------------------------------------------------------------------
honcomp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.102714 .0276552 3.90 0.000 1.049822 1.158272
science | 1.068141 .0290702 2.42 0.015 1.012657 1.126664
|
ses |
1 | 1.060154 .5645084 0.11 0.913 .3733562 3.010335
2 | .3632885 .1611263 -2.28 0.022 .152309 .8665186
|
_cons | .0000704 .000117 -5.75 0.000 2.71e-06 .0018278
------------------------------------------------------------------------------
8

Kt qu hon ton ging vi phn tch bng SPSS trn khi s dng ses 3 l gi tr tham
chiu.

Ti liu tham kho chnh:
Bi ging ca lp Phng php nghin cu nh lng nng cao (Advanced Qualitative
Research Methods), k hiu HLN706, Queensland University of Technology, Australia.
http://www.ats.ucla.edu/stat/spss/output/logistic.htm

You might also like