You are on page 1of 92

S dng chng trnh Stata khai thc s liu

iu tra Mc sng h gia nh (VLSS) *

ni dung

CHNG I: GII THIU CHUNG V CHNG TRNH STATA...............................1


1.
2.
3.
4.
5.

T CHC LU TR D LIU TRONG STATA (DATASET IN STATA)..............................................2


KHI NG V THOT KHI STATA (OPEN AND EXIT)...........................................................3
GIAO DIN STATA 7 (STATA INTERFACE).........................................................................3
BIN BN LM VIC (LOG FILE)....................................................................................6
NHP V LU D LIU (USE, INPUT AND AND SAVE) ...........................................................7

CHNG II: KHAI THC D LIU..............................................................................10


1. CU TRC LNH TRONG STATA (STATA COMMAND SYNTAX)..................................................10
2.TON T V HM S (OPERATORS AND FUNCTIONS)............................................................14
3. M T D LIU (DATA REPORTING)...........................................................................15
4. BIN TP V SA CHA D LIU (DATA MANIPULATION)....................................................27
5. QUYN S TRONG VHLSS (WEIGHT).........................................................................41

CHNG III: KIM NH GI THIT V PHN TCH HI QUY..................45


1. C LNG V KIM NH GI THIT (ESTIMATION AND HYPOTHESIS TESTING)..............................45
2. PHN TCH TNG QUAN V HI QUY (CORRELATION AND REGRESSION)..................................53

CHNG IV: V TH...............................................................................................59


1. V TH (GRAPH)..............................................................................................59
2. MT S LOI TH THNG DNG..............................................................................68
3. LU TR V HIN TH TH (SAVING AND GRAPH USING)................................................75

CHNG V: LP TRNH TRONG STATA................................................................77


1.
2.
3.
4.
5.

GII THIU CHUNG V CHNG TRNH DO-FILE................................................................77


LOCAL V GLOBAL MACROS........................................................................................82
TCH V HNG V MA TRN (SCALAR AND MATRIX)............................................................85
LNH IU KIN V VSSNG LP.............................................................................87
GII THIU V FILE ADO..........................................................................................89

TI LIU THAM KHO............................................................................................91


PH LC.......................................................................................................................91

Chng I: Gii thiu chung v chng trnh Stata

1. T chc lu tr d liu trong Stata (Dataset in Stata)


Stata l phn mm thng k s dng qun l, phn tch s liu v v
th. Stata cho php lu tr thng tin v cc c im ca cc i tng
nghin cu. S liu lu tr trong Stata c th c hin th di dng bng nh
v d sau:
hhcode
headname
101
Nguyen Van A
102
Le Thi B
103
Tran Van C
Quan st (bn ghi)

hhsize
6
5
10

incomepc
2100
3210
1200

Mi mt hng ngang ca bng s liu c gi l mt quan st (observation),


hay mt bn ghi (record) lu tr s liu v mt i tng nghin cu. v d
trn c 3 quan st lu tr s liu v M h (hhcode); Tn ch h (headname);
Quy m h (hhsize); Thu nhp bnh qun (incomepc) ca 3 h gia nh.
Bin (trng; thuc tnh)
Thng tin v i tng nghin cu c thu thp v lu tr theo cc c
im ca chng. Cc c im ny c gi l bin (variable), hay trng
(field). Bin c xem l cc ct ca bng s liu. v d trn c 4 bin, vi
tn l hhcoed, hedname, hhsize, v incomepc. Tn bin di t 1 n 32 k
t, c bt u ch hoc du gch di (_). Tn bin ch bao gm ch, s
v du gch di. Cc k t c bit khc khng th dng t tn cho
bin.
Bin xc nh (identifying variables)
Thng thng trong cc bin s c cc bin dng nhn dng quan st, c
gi l bin xc nh. Nh c cc bin xc nh ny m cc quan st c th
phn bit c vi nhau. Mi mt quan st c mt gi tr ca cc bin ny.
v d trn, bin xc nh l hhcode, i vi mi mt quan st bin hhcode
nhn mt gi tr.
Cc c im ca bin
Cc bin c th c gn nhn (ch thch). V d bin hhcode c th c
gn nhn l M h.
Bin c th c nh dng (format) l bin s v bin k t vi cc loi lu
tr khc nhau. Bin s c th lu tr di loi byte; int; long; float; double. Cn
bin k t th c th lu tr di dng str1 n str80 cho cc di khc
nhau.
Kiu lu tr Dung lng
Gi tr nh
Gi tr ln nht
dng s
(Byte)
nht
byte
1
-127
126
int
2
-32,767
32,766
long
4
-2,147,483,647 2,147,483,646
float
4
-10^36
10^36

Kiu
S nguyn
S nguyn
S nguyn
S thc
2

double
8
-10^308
10^308
S thc
Cc bin s c th bao gm cc bin ri rc v lin tc. Cc bin nh l quy
m h gia nh, gii tnh ch h, vng a l, trnh gio dc l cc
bin ri rc (discrete) (hay cn gi l bin phn loi (categorical)). Cc bin
ny c th c lu tr di dng byte, int, v long. Cc bin lin tc
(continuous) nh thu nhp, chi tiu ca h th lu tr di dng float hoc
double.
Bin k t (string) dng lu tr cc loi k t. V d bin headname l
bin kiu k t dng lu tr tn ca ch h.
Kiu lu tr
dng ch
str1
str2
...
str80

Byte

di ln nht

1
2

1
2

80

80

2. Khi ng v thot khi Stata (Open and exit)


Stata c khi ng tng t nh cc chng trnh tin hc ng dng khc,
bng cch kch vo biu tng ca tp wstata.exe trong Windows explorer,
hoc chn bng cch chn Start -> Program -> Stata. Chng trnh c
thot ra bng lnh exit t ca s lnh Stata Command, hoc tu chn exit
trong thc n (menu) File.
3. Giao din Stata 7 (Stata interface)1
Sau khi Stata c khi ng, giao din ca Stata s c hin ln, bao gm
thanh thc n (menu bar) trn cng, di l thanh cng c (tool bar)
v cc ca s (windows).

Phin bn Stata 8 c giao din tng t nh phin bn Stata 7. Khc bit ln nht l
Stata 8 c thm tu chn Statistics trong thanh thc n. Tu chn ny cho php
thc hin cc mt s lnh thng k bng cc tu chn qua giao din ca s m
khng phi g cc lnh trong ca s Command.
1

Cc ca s ca Stata
Cc ca s ca Stata c m ra bng vic la chn cc tu chn thanh
thc n Windows (menu bar). Cc ca s ny bao gm:
Results

Hin th cc lnh v kt qu

Graph

Hin th th

Viewer

Hin th ca s tr gip (help) v hin th ni dung cc


file vn bn (text)

Command

Dng g cc cu lnh

Review

Hin th cc lnh thc hin

Variables

Hin th danh sch cc bin ca tp s liu

Data editor

Hin th v sa cha s liu di dng bng

Do-file editor

Hin th ca s son tho chng trnh

Thanh thc n (Menu bar)


Bng cch kch vo thanh thc n v cc tu chn trong , Stata s thc
hin cc lnh khc nhau. Thanh thc n bao gm cc nhm lnh sau
y:
4

File
Open

M file s liu

View

Xem cc file ca Stata trong ca s Viewer

Save

Lu file s liu

Save as

Lu file s liu di tn mi

File name

Chn tn file a vo ca s lnh

Log

ng, m, xem li log file

Save graph

Lu gi file th

Print graph

In th

Print results

In kt qu

Exit

Thot khi Stata

Edit
Copy text

Sao chp vn bn (text)

Copy tables

Sao chp bng biu

Paste

Dn

Table copy options

La chn sao chp bng s liu

Graph copy options

La chn sao chp th (khng c trong Stata 7)

Prefs
c

Cc tu chn v mu sc, phng ch, v kch

Windows
Results

M ca s kt qu

Graph

M ca s th

Log

M ca s log file

Viewer

M ca s tr gip (help) v xem ni dung file

Command

M ca s cu lnh

Review

M ca s cc lnh thc hin

Variables

M ca s danh sch cc bin ca tp s liu

Help/Search

M ca s tr gip (help)

Data editor

M ca xem s liu lu tr di dng bng

Do-file editor

M ca s vit chng trnh

Help

Cc tr gip lin quan n vic s dng Stata


5

Thanh cng c (tool bar)


Cc tu chn trn thanh cng c c thit k thc hin cc lnh thng
dng ca Stata. Nu chng ta di chuyn con tr n cc nt ny th s
hin ln cc cu hung dn, bao gm:
Open (use)

M file s liu Stata

Save

Lu tr file s liu ra a

Print results

In ni dung ca ca s kt qu

Begin log

M, ng v xem ni dung ca file log

Start viewer

M ca s tr tr (help)

Bring Dialog Window to a ca s hp thoi ra pha trc


font
Bring Result Window to a ca s kt qu ra pha trc
font
Bring Graph Window to a ca s v th ra pha trc
font
Do-file editor

M ca s son tho chng trnh

Data editor

M ca s sa cha s liu

Data browser

M ca s xem s liu

Clear more- condition

Tt lnh more

Break

Dng vic thc hin lnh hoc chng


trnh

4. Bin bn lm vic (log file)


Thng thng khi lm vic vi Stata, ngi s dng mun ghi li bin bn lm
vic bao gm cc lnh, cc thng bo v cc kt qu phn tch thu c.
Stata cho php ghi li cc bin bn lm vic bng lnh log using.
C php:
log using (ng dn\tn tp) [, append replace [ text | smcl ] ]
Cc tu chn:
append

Ghi bin bn lm vic tip vo 1 file c sn

replace

Ghi li bin bn lm vic ln 1 file c sn

text

To bin bn lm vic di dng vn bn (text) (phn


m rng l log)

smcl

To bin bn lm vic di dng smcl (phn m rng l


smcl), y cng l tu chn ngm nh

V d:
log using baitap1

To tp baitap1 ghi li bin bn lm vic ti


6

th mc hin thi, phn m rng mc nh


l smcl
. log using baitap1
------------------------------------------------------------------------------log: C:\baitap1.smcl
log type: smcl
opened on: 17 Feb 2004, 15:32:03
log using baitap1, replace

To tp baitap1 ghi ln tp baitap1


c sn

log using d:\baitap2, text

To tp baitap2 ti a D, di dng vn
bn (text) (phn m rng l log)

log using
append

d:\baitap2, Ghi tip tc bin bn lm vic tp


baitap2 ti a D

Cc tp vi phn m rng smcl c th chuyn thnh cc tp text bng lnh


translate.
V d:
translate baitap1.smcl exercise1.log
log off
Lnh ny tm thi dng vic ghi li bin bn lm vic vo tp log/smcl
ang m
log on
Lnh ny tip tc ghi bin bn lm vic vo tp log ang m. Lnh ny c dng sau ln log using hoc log off.
log close
Lnh ny ng v lu tr tp log ang m.
Ch :
-

Stata cho php ch ghi li nhng g m ngi s dng g trong


ca s command, vic ny gip cho vic sau ny vit cc chng
trnh da trn nhng bin bn lm vic. C php:
cmdlog using (ng dn\tn tp) [, append replace]
cmdlog {off | on | close}

xem cc file log/smcl vo thanh thc n: file/log/view (hoc


ca s lnh command g: view (tn tp)); hoc c th m bng cc
chng trnh son thao vn bn khc nh MS-Word; Notepad

5. Nhp v lu d liu (Use, input and and save)


M tp s liu ang c:
7

C php:
use (ng dn\tn tp)
Lnh ny m tp Stata, vi phn rng l .dta, c ch ra tn tp.
V d:
use ho1.dta

m tp ho1.dta th mc hin thi

use "D:\VHLSS 2004\ho1.dta",


clear

m tp ho1.ta th mc VHLSS 2004 trn


D

Tp s liu Stata c th c m bng la chn Open trn thc n File;


hoc nt Open (use) trn thanh cng c tool bar.
Nu file s liu c dung lng ln th chng ta phi thit lp b nh cn dng
cho Stata bng lnh:
set memory #[k|m]
V d:
set mem 32m
set mem 32000k
Nhp s liu
C mt s cch nhp s liu t bn phm vo b nh ca Stata.
-

S dng ca s Stata editor nhp s liu. Hoc t ca s


command, g lnh edit. Sau nhp s liu theo kiu biu bng
trong ca s ny.

S dng lnh: input [danh sch bin + nh dng nu cn]


Sau s dng bn phm nhp s liu ln lt cho cc bin ca
tng quan st. Gi tr c nhp cch nhau 1 k t trng. Kt thc
nhp s liu bng lnh end.
V d:
. input hhcode str15 name income
hhcode

name

income

1. 101 "Nguyen Van A" 1200


2. 102 "Nguyen Van B" 1350
3. 103 "Tran Thi C" 2310
4. end
Stata cho php nhp s liu t cc file c s d liu khc. Trc ht cc file
s liu ny cn c lu tr di dng text (c th bng chng trnh Excel),
cc quan st c cc nhau 1 dng v cc gi tr cch nhau 1 du phy

(commas) hoc du cch (tab). Sau dng lnh insheet nhp s liu
ny vo Stata.
C php:
insheet [danh sch bin] using (tn tp text) [, [no]names comma
tab clear]
Lnh ny s c vo b nh ca Stata cc quan st ca tp text, v ch ra
tn cc bin s c to ra.
Cc tu chn:
[no]names

Cho php nhp tn bin c ch ra dng th nht ca


file text

comma

Thng bo l cc gi tr ca file text c phn cch


bng du phy

tab

Thng bo l cc gi tr ca file text c phn cch


bng du tab

clear

S liu c c vo s thay th s liu ang c thng


tr trong b nh ca Stata

V d:
. insheet using c:\income.txt
(3 vars, 4 obs)
. insheet maho hoten thunhap using c:\income.txt
(note: variable names in file ignored)
(3 vars, 4 obs)
Lu tr s liu
C php:
save (ng dn\tn tp) [,replace]
Lnh ny lu tr s liu ang trong b nh ca Stata thnh tp ch nh
di tn tp. Nu tu chn replace c ch ra th tp s liu ny s ghi
ln tp hin thi (tt nhin tn tp s liu l ging nhau).
Vic lu tr s liu c th thc hin bng cc ty chn Save v Save as
trong thanh thc n (menu bar); hoc nt Save trn thanh cng c (tool
bar).
Ch : Xem thm lnh infile v outfile

Chng II: Khai thc d liu

1. Cu trc lnh trong Stata (Stata command syntax)


Cu trc c bn ca mt lnh trong Stata nh sau:
[by danh sch bin:] C php lnh [danh sch bin] [biu thc]
[iu kin] [phm vi] [quyn s] [, tu chn]
Trong phn Hng dn s dng (Help) ca Stata, c php lnh trnh by bng
ting Anh nh sau:
[by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [,
options]
Trong du ngoc vung k hiu cc tu chn.
Ch :

Cc cu lnh Stata c vit bng ch thng.

i vi tn bin, Stata phn bit ch vit thng vi ch vit hoa. V


d, trong cng mt tp s liu, bin Ho_ten v bin ho_ten l 2
bin khc nhau.

Cc tu chn c k hiu trong du ngoc vung [ ]. Cc tu chn


ny c th c hoc khng trong cu lnh. Cc tham s bt buc
(tn bin) c t trong du ngoc < >. Cc cu lnh s khng
thc hin c nu cc tham s bt buc ny khng c khai bo.

Mt s lnh Stata cho php vit tt. V d lnh summarize c th


vit tt l sum. Trong cun ti liu ny phn gch chn di c php
ca cu lnh l c php vit tt ca cu lnh .
10

Cc v d trong cun ti liu ny s dng s liu iu tra Mc sng


dn c nm 1998 do Tng cc Thng k tin hnh. Trong Tp chi
tiu tng hp Hhexp98n.dta thng xuyn c s dng.

by danh sch bin (by varlist): Stata s thc hin cu lnh vi theo tng
gi tr c ch ra bi danh sch bin. Bin c ch ra bi danh sch bin
c yu cu sp xp trc khi thc hin lnh.
V d:
. sort sex
. by sex: sum

rlpcex1

-> sex = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
4375
2980.906
2430.648
357.318
45801.71
-> sex = 2
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
1624
3748.368
3231.241
376.9805
30624.77

. sort sex urban98


. by sex urban98: sum

rlpcex1

-> sex = 1, urban98 = Rural


Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
3344
2308.134
1345.671
357.318
24386.43
-> sex = 1, urban98 = Urban
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
1031
5163.01
3602.245
682.9575
45801.71
-> sex = 2, urban98 = Rural
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
925
2553.448
1776.178
376.9805
25527.95
-> sex = 2, urban98 = Urban
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
699
5329.628
3962.946
1057.797
30624.77

11

Danh sch bin (varlist)


Ch ra danh sch cc bin chu tc ng ca cu lnh. Nu nh khng c
bin no c ch ra th lnh Stata s c tc dng ln tt c cc bin (all
variables)
V d:
. sum hhsize sex reg7
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------hhsize |
5999
4.752292
1.954292
1
19
sex |
5999
1.270712
.4443645
1
2
reg7 |
5999
4.01917
2.145305
1
7
. sum
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------househol |
5999
19617.86
11201.92
101
38820
year |
5999
97.94666
.2247337
97
98
month |
5999
6.340723
3.011082
1
12
--Break-r(1);

Lnh sum ny hin th thng k c bn ca tt c cc bin trong tp s


liu.
iu kin (if exp)
Stata ch thc hin cu lnh i vi cc quan st m gi tr ca n cho
kt qu ca biu thc l ng.
V d:
. sum

poor if reg7==1

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------poor |
859
.4982538
.5002882
0
1

Lnh ny ch c tc dng i vi cc quan st m bin reg7 c gi tr bng


1.
Phm vi (in range)
Ch ra phm vi cc quan st chu tc ng ca cu lnh. Range (phm
vi) c th c cc dng sau:
12

sum poor in 10

Tnh gi tr trung bnh ca bin poor cho quan st 10


(chnh bng gi tr ca bin poor ti quan st th 10)

sum
poor
10/100

in Tnh gi tr trung bnh ca bin poor cho quan st t


10 n 100

sum
f/100

poor

in Tnh gi tr trung bnh ca bin poor cho quan st t


u tin n 100

sum
100/l

poor

in Tnh gi tr trung bnh ca bin poor cho quan st t


th 100 n quan st cui cng

Quyn s (weight)
Cho php tnh ton s dng quyn s. Tu chn v quyn s s c trnh
by k mc 5 ca chng ny.
Cc tu chn (Options)
Nhiu cu lnh Stata cho php cc tu chn ring. Cc tu chn ny c
ch ra sau du phy.
V d:
Lnh sum c tu chn l detail, cho php tnh ton thm mt s thng k
khc ngoi gi tr trung bnh v lnh chun.
. sum

rlpcex1, detail
comp.M&Reg price adj.pc tot exp
------------------------------------------------------------Percentiles
Smallest
1%
682.9575
357.318
5%
1012.433
366.2792
10%
1238.088
376.9805
Obs
5999
25%
1671.054
381.3502
Sum of Wgt.
5999
50%

2397.042

75%
90%
95%
99%

3711.917
5940.803
8045.32
14163.04

Largest
26944.64
30624.77
31066.5
45801.71

Mean
Std. Dev.

3188.667
2692.567

Variance
Skewness
Kurtosis

7249918
3.791027
29.21398

Ch :

Stata cho php vit tt cc lnh v ty chn. Trong ti liu ny,


phn gch chn di cc lnh c ngha l lnh c th vit tt
bng k t trong phn gch chn ny. V d nh lnh use c ngha
l c th c vit tt bi u.

C php ca cc cu lnh trong ti liu ny c vit bng ting Anh,


cho php ngi c c th i chiu vi phn hng dn s dng trong
Stata.

13

2.Ton t v hm s (Operators and functions)


Cc ton t (operators)
Cc ton t trong Stata c k hiu nh sau:
K hiu

ngha

S hc
+

Cng

Tr

Nhn

Chia

Lu tha

Quan h
>

Ln hn

<

Nh hn

>=

Ln hn hoc bng

<=

Nh hn hoc bng

==

Bng

~=

Khng bng (khc)

!=

Khng bng (khc)

Lgc
~

Khng

Hoc

&

Ch :
Trong biu thc du == c dng cho vic kim nh biu thc, v d nh
c dng sau lnh if. Cn du = c dng cho lnh to bin.
V d:
gen RRD=0
replace RRD=1 if reg8==1
Cc hm s (function)
Hm s thng c dng trong biu thc (exp) ca cu lnh Stata. Nu coi Y
l mt hm s ca f(X1, X2,, Xn) th lnh v hm s trong Stata s tnh gi
tr ca Y nu cho cc gi tr ca Xi. Stata c 8 loi hm s:
Mathematical functions

Cc hm ton hc

Statistical functions

Hm thng k

Random numbers

Hm cho s ngu nhin

String functions

Hm lin quan n dy k t

Special functions

Hm c bit
14

Date functions

Hm ngy thng

Time-series functions

Hm chui thi gian

Matrix functions

Hm ma trn

V d:
gen absx=abs(x)
gen log_exp=log(rlpcex1)
Cc k hiu c th v cc hm s ny c th xem mc help functions.
3. M t d liu (Data reporting)
3.1. Xo b nh ca Stata
C php:
clear
Lnh ny xo cc d liu trong b nh ca Stata, bt u cho mt file lm
vic mi.
3.2. Hng dn s dng lnh Stata
C php:
help <Cu lnh Stata>
Lnh ny hin th hng dn s dng cc lnh Stata, lnh Stata cn phi c g y v chnh xc.
V d:
. help sum
help for sum not found
try help contents or search sum
. help summarize
----------------------------------------------------------------------------------------help for summarize

(manual: [R] summarize)

----------------------------------------------------------------------------------------Summary statistics
.
Ch :
Chng ta c th tm hng dn s dng theo t kho bng lnh search. Lnh
search c th c thc hin bng tu chn Search thc n help.
3.3. M t d liu
C php:
15

describe [danh sch bin]


Lnh ny hin th thng tin chung nh tn bin, nh dng, nhn bin ca
cc bin c lit k bi danh sch bin ca file s liu ang m. Nu nh
khng c bin no c ch ra th lnh describe s hin th thng tin ca
tt c cc bin.
V d:
. des

househol year month vlssmphs

storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------------househol
long
%12.0g
household code
year
float %9.0g
Year of interview
month
float %9.0g
Month of interview
vlssmphs
byte
%8.0g
1 if vlss, 2 if mphs
source

3.4. Hin th gi tr ca cc bin


C php:
list [danh sch bin] [iu kin] [phm vi] [, nolabel]
Lnh ny hin th gi tr ca cc bin c ch ra bi danh sch bin. Tu
chn nolable cho php hin th gi tr s ch khng phi l gi tr gn
nhn.
V d:
. list

1.
2.
3.
4.
5.
. list

1.
2.
3.
4.
5.

househol farm in 1/5


househol
36307
28002
36017
32418
15215

farm
farm
farm
farm
non farm
non farm

househol farm in 1/5, nolabel


househol
36307
28002
36017
32418
15215

farm
1
1
1
0
0
16

3.5. Hin th dy k t v biu thc


C php:
display ["Dy (chui) k t"] [biu thc]
Lnh ny hin th dy k t hoc gi tr ca biu thc.
V d:
. dis "So lieu VLSS 1998"
So lieu VLSS 1998

. dis 120*100/30
400
3.6. Sa cha, xem s liu
C php:
edit

[danh sch bin] [iu kin] [phm vi] [, nolabel]

browse [danh sch bin] [iu kin] [phm vi] [, nolabel]


Lnh edit ny m ca s Data editor ngi s dng sa cha, nhp s
liu. Tu chn nolable cho php hin th gi tr s ch khng phi l gi
tr gn nhn. Lnh ny c th c chn t tu chn Data editor trong thanh
thc n Windows.
Lnh browse ging lnh edit nhng khng cho php sa cha s liu.
3.7. m quan st
C php:
count [iu kin] [phm vi]
Lnh ny m s quan st c ch ra bi iu kin (exp) v phm vi
(range). Nu iu kin (exp) v phm vi (range) khng c ch ra th s
hin th s quan st ca tp s liu.
V d:
. count
5999
. count if reg7==1
859
. count if reg7==1 & urban98==1
187
. count if reg7==1 & urban98==0
672
17

3.8. Thng k c bn
C php:
summarize [danh sch bin] [quyn s] [iu kin] [phm vi] [,
detail]
Lnh ny tnh ton v hin th nhng thng k c bn ca cc bin c
ch ra bi danh sch bin. Tu chn detail cho php hin th thm mt
s thng k nh nhn, lnh v cc gi tr ca thp v phn.
V d:
. sum

rlpcex1

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
5999
3188.667
2692.567
357.318
45801.71
. sum

rlpcex1, detail

comp.M&Reg price adj.pc tot exp


------------------------------------------------------------Percentiles
Smallest
1%
682.9575
357.318
5%
1012.433
366.2792
10%
1238.088
376.9805
Obs
5999
25%
1671.054
381.3502
Sum of Wgt.
5999
50%
75%
90%
95%
99%

2397.042
3711.917
5940.803
8045.32
14163.04

Largest
26944.64
30624.77
31066.5
45801.71

Mean
Std. Dev.

3188.667
2692.567

Variance
Skewness
Kurtosis

7249918
3.791027
29.21398

3.9. Hin th thng tin chung v bin


C php:
inspect [danh sch bin] [iu kin] [phm vi]
Lnh ny m t v s liu ca cc bin kiu s. N a ra thng tin v s
m, dng, s nguyn, gi tr thiu (missing) ca gi tr ca bin.
V d:
. gen x=invnorm(uniform())
. inspect x

18

x:
---|
#
|
#
|
#
|
#
|
#
#
#
| .
#
#
#
.
+----------------------3.918931
3.641588
(More than 99 unique values)

Negative
Zero
Positive
Total
Missing

Number of Observations
NonTotal
Integers
Integers
2964
2964
3035
3035
------------5999
5999
----5999

Ch : c th xem thm lnh codebook


3.10. To bng tn sut
To bng tn sut 1 chiu
C php:
tabulate <tn bin> [quyn s] [iu kin] [phm vi] [, missing
nolabel]
tab1 <danh sch bin> [quyn s] [iu kin] [phm vi] [, missing
nolabel]
Lnh ny to bng tn sut 1 chiu ca bin c ch ra. Lnh tabulate ch
cho php c 1 bin c ch ra, nu c hn 1 bin c ch ra th Stata s
hiu l to bng tn sut 2 chiu.
Cc tu chn:
missing
xp vo 1 loi.

Cho php cc quan st khng c gi tr (missing) c

nolabel
phi nhn bin

Cho php hin th gi tr s ca bin, ch khng

V d:
. tab sex
Gender of |
HH.head |
(1:M;2:F) |
Freq.
Percent
Cum.
------------+----------------------------------1 |
4375
72.93
72.93
2 |
1624
27.07
100.00
------------+----------------------------------Total |
5999
100.00
. tab1 urban98 reg7
-> tabulation of urban98

19

1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
4269
71.16
71.16
Urban |
1730
28.84
100.00
------------+----------------------------------Total |
5999
100.00
-> tabulation of reg7
Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00

To bng tn sut 2 chiu


C php:
tabulate <tn bin 1> <tn bin 2> [quyn s] [iu kin] [phm
vi] [, chi2 missing nofreq cell column row]
tab2 <danh sch bin> [quyn s] [iu kin] [phm vi] [, chi2
missing nofreq cell column row]
Lnh tablulate ny tnh v hin th bng tn sut 2 chiu ca 2 bin c
ch ra. Lnh tab2 to bng tn sut 2 chiu ca tng cp bin c ch ra
trong danh sch bin.
V d:
. tab urban98 farm
1:urban | Type of HH (1:farm;
98; |
0:nonfarm)
0:rural 98 | non farm
farm |
Total
-----------+----------------------+---------Rural |
1021
3248 |
4269
Urban |
1540
190 |
1730
-----------+----------------------+---------Total |
2561
3438 |
5999

Cc tu chn:
chi2

Thc hin kim nh gi thit l hai bin c lp


20

missing

Cho php cc quan st khng c gi tr c xp vo 1


loi

nofreq

Khng hin th tn sut

cell

Hin th tn sut tng i (t l %) ca cc

column

Hin th tn sut tng i (t l %) ca cc theo ct

row

Hin th tn sut tng i (t l %) ca cc theo


hng

V d:
. tab

reg7 urban98, cell nof

| 1:urban 98; 0:rural


Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 |
11.20
3.12 |
14.32
region2 |
13.05
6.53 |
19.59
region3 |
10.00
1.80 |
11.80
region4 |
8.37
4.20 |
12.57
region5 |
6.13
0.00 |
6.13
region6 |
8.57
8.48 |
17.05
region7 |
13.84
4.70 |
18.54
-----------+----------------------+---------Total |
71.16
28.84 |
100.00
. tab farm urban98, column row
Type of HH | 1:urban 98; 0:rural
(1:farm; |
98
0:nonfarm) |
Rural
Urban |
Total
-----------+----------------------+---------non farm |
1021
1540 |
2561
|
39.87
60.13 |
100.00
|
23.92
89.02 |
42.69
-----------+----------------------+---------farm |
3248
190 |
3438
|
94.47
5.53 |
100.00
|
76.08
10.98 |
57.31
-----------+----------------------+---------Total |
4269
1730 |
5999
|
71.16
28.84 |
100.00
|
100.00
100.00 |
100.00

3.11. To bng thng k tng hp bng lnh tabulatesummarize


C php:
tabulate <tn bin 1> <tn bin 2> [quyn s] [iu kin] [phm
vi] , summarize(tn bin 3) [means standard freq missing ]
Lnh ny to bng mt hoc hai chiu nh ngha bi bin 1 hoc bin
2 v mi cho gi tr thng k trung bnh, lch chun v tn sut ca
bin 3.
21

V d:
. tab

farm urban98, sum(poor)


Means, Standard Deviations and Frequencies of poor

Type of HH | 1:urban 98; 0:rural


(1:farm; |
98
0:nonfarm) |
Rural
Urban |
Total
-----------+----------------------+---------non farm | .2791381 .06168831 | .14837954
| .44879538 .24066673 | .35554523
|
1021
1540 |
2561
-----------+----------------------+---------farm | .42302956 .12105263 | .4063409
| .4941161 .32705022 | .49122109
|
3248
190 |
3438
-----------+----------------------+---------Total | .3886156 .06820809 | .29621604
| .48749275 .25217555 | .45662551
|
4269
1730 |
5999

Cc tu chn:
means

Hin th mi gi tr trung bnh

standard

Hin th mi gi tr lch chun

freq

Hin th mi gi tr tn sut

missing

Cho php cc quan st khng c gi tr c xp vo 1


loi

V d:
. replace poor=poor*100
(1777 real changes made)
. format poor %4.2f
. tab reg7 urban98, sum(poor) means
Means of poor
| 1:urban 98; 0:rural
Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 |
61.46
8.02 |
49.83
region2 |
32.57
5.87 |
23.66
region3 |
44.83
10.19 |
39.55
region4 |
37.25
11.51 |
28.65
region5 |
47.28
. |
47.28
region6 |
12.45
2.16 |
7.33
region7 |
35.78
10.28 |
29.32
-----------+----------------------+---------Total |
38.86
6.82 |
29.62

22

3.12. To bng thng k tng hp bng lnh tabstat


C php:
tabstat <danh sch bin> [quyn s] [iu kin] [phm vi] [,
statistics(c php tk [...]) by(tn bin) missing format[(%fmt)]]
Lnh ny tnh ton cc thng k ca cc bin c ch ra bi danh sch
bin cho tng gi tr ca bin phn loi (categorical) c ch ra bi by(tn
bin).
V d:
. tabstat

rlfood rlhhex1, stats(mean median) by(reg7)

Summary statistics: mean, p50


by categories of: reg7 (Code by 7 regions)
reg7 |
rlfood
rlhhex1
--------+-------------------region1 | 5595.556 9560.349
| 5350.916 8536.373
----------------------------region2 | 6419.427 12951.14
| 5664.145 9997.146
----------------------------region3 | 5692.201 10885.38
| 5369.411 9022.334
----------------------------region4 | 6512.576 13525.41
| 5790.046 11077.51
----------------------------region5 | 5894.983 11217.05
| 5380.505 9421.447
----------------------------region6 | 9746.158 23515.01
| 8428.743 18514.39
----------------------------region7 | 6556.616 13068.11
| 6066.128 11043.99
----------------------------Total | 6787.898 14010.74
| 5951.567 10733.19
-----------------------------

Cc tu chn:
statistics(statname
[...])

Ch ra thng k cn tnh cho danh sch bin

by(tn bin)

Ch ra bin phn loi (categorical)

Missing

Gi tr thiu (mising) ca bin loi c xem nh 1


loi

format[(%fmt)]

Ch ra nh dng ca s liu hin th


23

Stata cho php cc loi thng k c ch ra bi statistics(c php thng k


[...]) nh sau:
C php thng k

ngha

mean

Trung bnh mean

count

m s quan st

Ging nh lnh count (m s quan st)

sum

Tng cng

max

Gi tr ln nht

min

Gi tr nh nht

range

Bin = Gi tr ln nht - Gi tr nh nht

sd

lch chun

sdmean

lch chun ca trung bnh = lch chun /


{(S quan st)^0.5}

skewness

lch ca phn phi

kurtosis

nhn

median

Trung v (Ging nh p50)

p1

1% phn v

p5

5% phn v

p10

10% phn v

p25

25% phn v

p50

50% phn v (trung v)

p75

75% phn v

p90

90% phn v

p95

95% phn v

p99

99% phn v

iqr

p75 - p25

tng ng vi "p25 p50 p75"

V d:
. tabstat

rlpcex1, stats(mean sd q) by(reg7) format(%5.1f)

Summary for variables: rlpcex1


by categories of: reg7 (Code by 7 regions)
reg7 |
mean
sd
p25
p50
p75
--------+--------------------------------------------------

24

region1 |
2174.8
1265.1
1328.0
1792.1
2710.8
region2 |
3294.0
2511.9
1816.7
2532.5
3822.0
region3 |
2503.3
1918.0
1489.7
2001.2
2808.1
region4 |
2933.7
2260.5
1697.9
2362.2
3471.4
region5 |
2087.3
1285.4
1217.3
1850.8
2700.5
region6 |
5257.5
4005.7
2676.7
4154.1
6431.8
region7 |
2931.1
2137.2
1680.1
2321.9
3414.7
----------------------------------------------------------Total |
3188.7
2692.6
1671.1
2397.0
3711.9
-----------------------------------------------------------

3.13. To bng thng k tng hp bng lnh table


C php:
table <bin dng> [bin ct [bin ct trn cng]] [iu kin]
[phm vi] [quyn s] [, contents(ni dung) row col format(%fmt)
missing]
Lnh ny cho php tnh cc thng k ca cc bin c ch ra trong
contents theo dng bng, trong cc hng c nh ngha bi bin
dng, cn cc ct c nh ngha bi bin ct (v bin ct trn cng). Cc
bin hng v ct ny l cc bin phn loi (categorical).
V d:
. table reg7 urban98 farm, contents(mean poor)
---------------------------------------------------|
Type of HH (1:farm; 0:nonfarm) and
|
1:urban 98; 0:rural 98
Code by 7 | ---- non farm --------- farm -----regions
|
Rural
Urban
Rural
Urban
----------+----------------------------------------region1 | 19.35484 6.015038
65.7377 12.96296
region2 | 26.66667 4.624278
33.96524 15.21739
region3 | 40.98361 10.11236
45.8159 10.52632
region4 |
21.6 11.63793
42.44032
10
region5 | 30.76923
49.24012
region6 | 15.04065 2.195609
10.07463
0
region7 | 38.62816 10.04184
34.35805 11.62791
----------------------------------------------------

Cc tu chn:
Contents(ni dung)

Lit k danh sch cc bin v cc thng k. Cc k


hiu thng k tng t nh lnh tabstat

row

Hin th thng k tng ca cc dng

col

Hin th thng k tng ca cc ct

format(%fmt)

Ch ra nh dng ca s liu hin th


25

missing
loi

Gi tr thiu (mising) ca bin loi c xem nh 1

V d:
. table reg7 urban98 farm, contents(mean poor) row col format(%4.2f)
-----------------------------------------------------| Type of HH (1:farm; 0:nonfarm) and 1:urban
|
98; 0:rural 98
Code by 7 | ----- non farm ---------- farm -----regions
| Rural Urban Total
Rural Urban Total
----------+------------------------------------------region1 | 19.35
6.02 10.26
65.74 12.96 61.45
region2 | 26.67
4.62 11.29
33.97 15.22 32.70
region3 | 40.98 10.11 27.96
45.82 10.53 44.47
region4 | 21.60 11.64 15.13
42.44 10.00 40.81
region5 | 30.77
30.77
49.24
49.24
region6 | 15.04
2.20
6.43
10.07
0.00
9.78
region7 | 38.63 10.04 25.39
34.36 11.63 32.72
|
Total | 27.91
6.17 14.84
42.30 12.11 40.63
-----------------------------------------------------. table urban98 farm, contents(mean poor sd poor) row col format(%4.2f)
---------------------------------------1:urban
|
98;
|
Type of HH (1:farm;
0:rural
|
0:nonfarm)
98
| non farm
farm
Total
----------+----------------------------Rural |
27.91
42.30
38.86
|
44.88
49.41
48.75
|
Urban |
6.17
12.11
6.82
|
24.07
32.71
25.22
|
Total |
14.84
40.63
29.62
|
35.55
49.12
45.66
---------------------------------------. table urban98
format(%4.2f)

farm,

contents(mean

rlpcex1

mean

rlhhex1)

row

col

---------------------------------------1:urban
|
98;
|
Type of HH (1:farm;
0:rural
|
0:nonfarm)
98
| non farm
farm
Total
----------+----------------------------Rural | 2835.83
2212.12
2361.29
| 13242.03 10120.89 10867.36
|
Urban | 5476.86
3232.17
5230.33

26

| 22984.44 11903.19 21767.43


|
Total | 4423.95
2268.49
3188.67
| 19100.41 10219.39 14010.74
----------------------------------------

4. Bin tp v sa cha d liu (Data manipulation)


4.1.

To bin mi

To bin bng lnh generate


C php:
generate <bin mi> = biu thc [iu kin] [phm vi]
Lnh ny cho php to bin mi c gi tr bng gi tr ca biu thc c
ch ra.
V d:
. gen poor = 1 if rlpcex1 < 1790
(4222 missing values generated)
. gen nonpoor=1 if rlpcex1 >= 1790
(1777 missing values generated)
Lnh to bin gi tabulategenerate
C php:
tabulate <bin phn loi>, generate(bin mi)
Lnh generate c th kt hp vi tab to cc bin gi . Bin mi to ra s
c dng l bin mi 1, bin mi 2, bin mi 3, v..v. Bin ny chnh l
cc bin gi c to ra trn c s ca bin phn loi.
V d:

. tab reg7, gen(region)


Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00
. tab1 region1 region2

27

-> tabulation of region1


reg7==regio |
n1 |
Freq.
Percent
Cum.
------------+----------------------------------0 |
5140
85.68
85.68
1 |
859
14.32
100.00
------------+----------------------------------Total |
5999
100.00
-> tabulation of region2
reg7==regio |
n2 |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4824
80.41
80.41
1 |
1175
19.59
100.00
------------+----------------------------------Total |
5999
100.00

y bin reg7 c 7 gi tr t 1 n 7 tng ng vi 7 bin gi t region1


n region7 s c to ra. Bin region1 nhn gi tr bng 1 nu nh bin
reg7 nhn gi tr 1, nu khng th bng 0. Tng t bin region7 nhn gi
tr 1 nu nh bin reg7 bng 7.
v d trn lnh tabulategenerate tng ng vi 7 lnh sau:
gen region1=(reg7==1)
gen region2=(reg7==2)

gen region7=(reg7==7)
To bin bng lnh egen
C php:
egen <bin mi> = fcn(tham s) [iu kin] [phm vi] [, by(bin)]
Lnh ny cho php to bin mi theo gi tr ca hm s c ch ra bi fcn.
Bin mi ny s nhn gi tr c nh cho mi quan st. Hm s y c
th l:
count(exp)

m s quan st ca biu thc

mean(exp)

Cho gi tr trung bnh ca biu thc

median(exp)

Cho gi tr trung v ca biu thc

sd(exp)

Cho gi tr lch chun ca biu thc

Cc hm s khc c th xem phn help egen.


V d:
. egen sumexp=sum(rlpcex1)
. sum sumexp

28

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------sumexp |
5999
1.91e+07
0
1.91e+07
1.91e+07
. egen g=median( food+ nonfood1)
. sum g
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------g |
5999
11063.6
0
11063.6
11063.6

Thay th gi tr ca bin
C php:
replace <bin> = biu thc [iu kin] [phm vi]
Lnh ny thay th gi tr ca bin hin c bng gi tr mi xc nh bi
biu thc exp.
V d:
replace poor=poor*100
replace pcexp = hhexp/hhsize
To bin phn loi bng lnh encode
C php:
encode <bin> [iu kin] [phm vi], generate(bin mi)
Lnh ny cho php to bin phn loi mi (categorical) kiu s tng ng vi
cc gi tr ca bin kiu ch ch ra bi tn bin (c xp theo vn ch
ci).
V d:
. gen str15(mucsong) = "Kha"
. drop

mucsong

. gen mucsong="Rat ngheo"


type mismatch
r(109);
. gen str15(mucsong)="Rat ngheo"
. replace mucsong="Ngheo" if
(1087 real changes made)

rlpcex1<1790 &

. replace mucsong="Khong ngheo" if


(4222 real changes made)

rlpcex1>1290

rlpcex1>=1790

. tab mucsong

29

mucsong |
Freq.
Percent
Cum.
----------------+----------------------------------Khong ngheo |
4222
70.38
70.38
Ngheo |
1087
18.12
88.50
Rat ngheo |
690
11.50
100.00
----------------+----------------------------------Total |
5999
100.00
. sum mucsong
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------mucsong |
0
. encode mucsong, gen(ma_ms)
. tab ma_ms
ma_ms |
Freq.
Percent
Cum.
------------+----------------------------------Khong ngheo |
4222
70.38
70.38
Ngheo |
1087
18.12
88.50
Rat ngheo |
690
11.50
100.00
------------+----------------------------------Total |
5999
100.00
. sum ma_ms
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------ma_ms |
5999
1.411235
.6871957
1
3

To bin bng lnh xtile


C php:
xtile <bin mi> = biu thc [quyn s] [iu kin] [phm vi] [,
nquantiles(#)]
Lnh ny to bin phn nhm cho biu thc theo phn v.
nquantiles(#) ch ra s lng phn v.

Trong

V d: To bin ng v phn theo chi tiu


. xtile quinexp= rlpcex1, nq(5)
. tab quinexp
5 quantiles |
of rlpcex1 |
Freq.
Percent
Cum.
------------+----------------------------------1 |
1200
20.00
20.00
2 |
1200
20.00
40.01
3 |
1200
20.00
60.01
4 |
1200
20.00
80.01

30

5 |
1199
19.99
100.00
------------+----------------------------------Total |
5999
100.00
. tab

quinexp, sum( rlpcex1)


| Summary of comp.M&Reg price adj.pc
5 quantiles |
tot exp
of rlpcex1 |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
1184.3975
261.20537
1200
2 |
1803.6331
151.66604
1200
3 |
2408.4867
211.5407
1200
4 |
3390.1065
403.08913
1200
5 |
7160.021
3690.3672
1199
------------+-----------------------------------Total |
3188.6671
2692.5673
5999

4.2.

i tn bin

C php:
rename <tn bin c> <tn bin mi>
Lnh ny thc hin vic i tn c ca mt bin sang tn mi.
V d:
rename poor nguoingheo
rename rpcexp1 chitieu
4.3. Lnh xo bin, xo quan st
C php:
drop <danh sch bin>
sch bin

Lnh ny xo bin c ch ra bi danh

drop <iu kin>


biu thc

Lnh ny xo quan st tho mn iu kin

drop <phm vi> [iu kin] Lnh ny xo quan st c ch ra bi phm


vi (v c th phi tho mn iu kin biu thc)
keep <danh sch bin>
Lnh ny gi li cc bin c ch ra bi
danh sch bin, cc bin khng c ch ra s b
xo i
keep <iu kin>

Lnh ny gi li cc quan st tho mn iu kin


biu thc, cc quan st khc s b xo i

keep <phm vi> [iu kin] Lnh ny gi li cc quan st c ch ra bi


phm vi (v c th tho mn iu kin biu
thc), cc quan st khc s b xo i.
V d:
31

drop poor urban98


drop if sex==1
bng 1
drop in 1/20

Xo 2 bin poor v urban98


Xo cc quan st c bin sex nhn gi tr
Xo quan st t 1 n 20

keep househol
xo i

Ch gi li bin househol, cc bin khc b

keep in f/50
khc b xo i

Gi li quan st t u tin n 50, cc quan st

4.4. Lnh i gi tr ca bin phn loi


C php:
recode <tn bin> gi tr c = gi tr mi [iu kin] [phm vi]
Lnh ny i gi tr ca bin phn loi theo cc quy tc c ch ra sau
.
V d:
. recode sex 0=1
(0 changes made)
. recode sex . = 0
(0 changes made)
. recode hhsize 1/5=1 6/10 = 2 * = 3
(5785 changes made)
. tab hhsize
Household |
size |
Freq.
Percent
Cum.
------------+----------------------------------1 |
4164
69.41
69.41
2 |
1786
29.77
99.18
3 |
49
0.82
100.00
------------+----------------------------------Total |
5999
100.00
. tab urban98
1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
4269
71.16
71.16
Urban |
1730
28.84
100.00
------------+----------------------------------Total |
5999
100.00

. recode urban98 0=1 1=0


(5999 changes made)

32

. tab urban98
1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
1730
28.84
28.84
Urban |
4269
71.16
100.00
------------+----------------------------------Total |
5999
100.00

4.5. Lnh gn nhn cho bin


Gn nhn cho bin
C php:
label variable <tn bin> Nhn ca bin
Lnh ny gn nhn l mt dy k t cho bin.
V d:
. gen ngheo=poor
. des ngheo
storage display
value
variable name
type
format
label
variable label
--------------------------------------------------------------------------ngheo
float %9.0g
. tab ngheo
ngheo |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4222
70.38
70.38
1 |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. label var ngheo "Nguoi co thu nhap duoi chuan ngheo"
. tab ngheo
Nguoi co |
thu nhap |
duoi chuan |
ngheo |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4222
70.38
70.38
1 |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. des ngheo
storage display
value
variable name
type
format
label
variable label
---------------------------------------------------------------------------ngheo
float %9.0g
Nguoi co thu nhap duoi chuan
ngheo

33

Gn gi tr cho bin phn loi


label define <tn b nhn> # "nhn" [# "nhn" ...] [, add modify]
label dir
label list <tn b nhn>
label drop {tn b nhn [tn b nhn ...] | _all}
label values <tn bin> [tn b nhn]
Lnh label define gn nhn cho mt b gi tr s. Tn ca b nhn c ch
ra sau t kho define, # l gi tr s, nhn l chui k t tng ng vi gi
tr s y. C hai tu chn y: tu chn add thm gi tr v nhn tng
ng vo 1 b nhn c sn. Tu chn modify cho php sa cha gi tr
v nhn ca 1 b nhn c sn.
Lnh label dir hin th nhng b nhn c sn, cn lnh label list hin
th gi tr ca b nhn c ch ra. Lnh label drop xo cc b nhn c
sn.
V d:
To nhn c tn l nngheo vi gi tr 1 c ngha l ngi ngho, cn 0 c
ngha l ngi khng ngho.
. label define nngheo 0 "Ngheo" 1 "Khong ngheo"
. label dir
nngheo
region
loaiho
diploma
urban
agegroup
. label list nngheo
nngheo:
0 Khong ngheo
1 Ngheo
. label drop _all
. label dir

Lnh label values s gn cc nhn ca 1 b nhn cho cc gi tr s ca 1


bin phn loi.
V d:
. tab ngheo
ngheo |

Freq.

Percent

Cum.

34

------------+----------------------------------0 |
4222
70.38
70.38
1 |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. list ngheo in 1/5

1.
2.
3.
4.
5.

ngheo
1
0
1
1
0

. label values ngheo nngheo


. tab ngheo
ngheo |
Freq.
Percent
Cum.
------------+----------------------------------Ngheo |
4222
70.38
70.38
Khong ngheo |
1777
29.62
100.00
------------+----------------------------------Total |
5999
100.00
. list ngheo in 1/5
ngheo
1. Khong ngheo
2.
Ngheo
3. Khong ngheo
4. Khong ngheo
5.
Ngheo

4.6. Sp xp s liu
C php:
sort <danh sch bin> [phm vi]
gsort [+|-]tn bin [[+|-]tn bin [...]]
Lnh sort sp xp quan st theo th t tng dn ca gi tr ca cc bin
c ch ra trong danh sch bin.
Lnh gsort cho php sp xp cc quan st theo th t tng dn ca ca
cc bin (danh sch bin), nu du + c ch ra (y cng l gi tr
ngm nh), hoc theo th t gim dn, nu du - c ch ra.
V d:

35

sort reg7 hhsize

Lnh ny sp xp cc quan st theo th t tng dn ca


bin vng reg7, trong mi vng cc quan st li c sp
xp theo th t tng dn ca bin quy m h hhsize.

gsort reg7 hhsize Lnh ny sp xp cc quan st theo th t tng dn ca


bin vng reg7, nhng trong mi vng cc quan st li c
sp xp theo th t gim dn ca bin quy m h
hhsize.
4.7. Trn s liu
Lnh thu gn s liu - collapse
C php:
collapse <biu thc thng k> [quyn s] [iu kin] [phm vi] [,
by(danh sch bin)]
trong :
Biu thc thng k l danh sch cc thng k v cc bin tng ng. Cc
thng k c k hiu nh mc 3.12 ca chng ny.
Lnh collapse s to ra mt tp s liu mi bao gm cc bin c ch ra
bi danh sch bin, vi cc gi tr c tnh theo thng k tng ng. Cc
quan st ca tp s liu c s c nhm li theo cc gi tr cng loi ca bin
c ch ra bi by(danh sch bin).
V d:
Chng ta c file s liu v thu nhp v chi tiu ca cc h thnh vin trong
gia nh:
ma_tv
ma_ho thunhap Chitieu
1
101
200
500
2
101
1200
400
3
101
0
200
4
101
0
200
1
102
3200
500
2
102
1200
320
3
102
200
200
1
103
300
500
2
103
2100
250
3
103
0
300
4
103
0
300
1
104
4300
800
2
104
3500
500
3
104
300
500
4
104
0
300
5
104
0
200
6
104
0
200
Chng ta s dng lnh collapse to file v thu nhp v chi tiu bnh
qun ca cc h, v to thm 1 bin v qui m h.
36

. gen quimo=1
. collapse (mean) thunhap (mean) chitieu (sum) quimo, by(ma_ho)
Tp s liu mi c dng:
ma_ho thunhap chitieu
quimo
101
350
325
4
102
1533.33
340
3
103
600
337.5
4
104
1350
416.667
6
Kt hp s liu - lnh merge
C php:
merge [danh sch bin] using <tn tp s dng> [, update replace]
Lnh merge s ni cc quan st ca tp s liu ang m trong Stata (gi l
tp ch (master dataset)) vi cc quan st tng ng ca tp s liu khc c ch ra sau t kho using (gi l tp s dng (using dataset)) thnh 1 tp
mi. Cc bin ch ra trong danh sch bin c gi l bin xc nh
(identifying variables), v phi c sp xt bng lnh sort (hoc gsort) trc
khi thc hin lnh merge.
V d:
Chng ta c 2 tp s liu nh sau:
thunhap.dta
ma_ho thunhap chitieu
101
350
325
102
1533.33
340
103
600
337.5
104
1350
416.667
dialy.dta

quimo
4
3
4
6

ma_ho
thanhthi
vung
204
0
1
102
1
4
103
0
3
104
0
6
Lnh merge s c thc hin nh sau:
. use "C:\dialy.dta", clear
. sort ma_ho
. save "C:\dialy.dta"
file C:\dialy.dta saved
. use "C:\thunhap.dta", clear
. sort ma_ho
. merge ma_ho using "C:\dialy.dta"
37

ma_ho was byte now int


. edit
Tp kt qu c dng nh sau:
ma_ho thunhap chitieu
quimo thanhthi vung
_merge
101
350
325
4
.
.
1
102
1533.33
340
3
1
4
3
103
600
337.5
4
0
3
3
104
1350
416.667
6
0
6
3
204
.
.
.
0
1
2
Trong tp kt qu c thm 1 bin tn l _merge, bin ny nhn cc gi tr
nh sau:
_merge==1 Nu nh quan st ch c to t tp ch
_merge==2

Nu nh quan st ch c to t s dng

_merge==3

Nu nh quan st c to t c tp ch v tp s dng

Cc tu chn:
Trong trng hp hai tp s liu c cc bin trng nhau, cc tu chn sau
y cho php x l s liu theo cc cch khc nhau:
update

Nu s liu ca bin trng nhau ca tp ch c gi tr thiu


th gi tr thiu ny nhn gi tr ca bin trng nhau ca tp
s dng.

replace

Gi tr ca bin trng nhau ca tp ch s nhn gi tr ca


bin trng nhau ca tp s dng.

Nu khng tu chn no c ch ra th theo ngm nh, gi tr ca bin


ca tp ch s khng thay i.
Ni s liu lnh append
C php:
append using <tn tp>
Lnh ny cho php ni tp c ch ra bi using vo vi tp ang c m
theo cc bin c cng tn v nh dng. S quan st ca tp mi bng tng
s s quan st ca 2 tp.
V d: c tp thunhap2.dta nh sau
ma_ho thunhap chitieu gioitinh
105
1350
425
1
106
1500
370
0
107
800
556
0
108
1500
417
0
109
2500
540
1
Hai tp ny s c ni vi nhau bng lnh append nh sau:
. use "C:\thunhap.dta", clear
38

. append using "C:\thunhap2.dta"


. edit
Tp kt qu c dng:
ma_ho thunhap chitieu
quimo gioitinh
101
350
325
4
102
1533.33
340
3
103
600
337.5
4
104
1350
416.667
6
105
1350
425
1
106
1500
370
0
107
800
556
0
108
1500
417
0
109
2500
540
1
Ch : Xem thm lnh expand dung to ra cc quan st ging nhau.
4.8. Chuyn dng s liu
C php:
reshape wide <tn bin>, i(danh sch bin) [ j(tn bin [values]) ...
]
reshape long
[values]) ... ]

<tn

bin>,

i(danh

sch

bin)

j(tn

bin

reshape wide
reshape long
Lnh ny cho php chuyn s liu t dng ngang sang s liu dng dc
(tu chn long), v t dng dc sang dng ngang (tu chn wide). i(danh
sch bin) ch ra bin xc nh (indentifying variables) dng phn
bit cc quan st vi nhau trong s liu dng ngang (gi l quan st cp 1).
j(tn bin) ch ra bin dng phn bit gia cc quan st cp 2 s
liu dng dc.
V d 1:
Chng ta c th s liu dng bng ngang nh mt ma trn nh sau:
-i-

-------------------- xj
------------------mah quim thunhap thunhap thunhap
o
o
95
96
97
101
5
4500
4400
5400
102
4
3400
3300
3700
103
6
5000
5400
5500
s liu ny s c chuyn sang dng bng dc nh sau:
-imaho
101

quimo
5

-jnam
95

- xji thunhap
4500
39

101
5
96
101
5
97
102
4
95
102
4
96
102
4
97
103
6
95
103
6
96
103
6
97
V lnh reshape s c vit nh sau:

4400
5400
3400
3300
3700
5000
5400
5500

. reshape long thunhap, i(maho) j(nam)


(note: j = 95 96 97)
Data
wide
->
long
--------------------------------------------------------------------Number of obs.
3
->
9
Number of variables
5
->
4
j variable (3 values)
->
nam
xij variables:
thunhap95 thunhap96 thunhap97
->
thunhap
--------------------------------------------------------------------* Va chuyen nguoc lai tu dang doc sang dang ngang nhu sau
. reshape wide thunhap, i(maho) j(nam)
(note: j = 95 96 97)
Data
long
->
wide
-----------------------------------------------------------------------Number of obs.
9
->
3
Number of variables
4
->
5
j variable (3 values)
nam
->
(dropped)
xij variables:
thunhap
->
thunhap95 thunhap96 thunhap97
----------------------------------------------------------------------

V d 2:
Chng ta c s liu dng bng sau y:
maho
101

sotien
1
1200

nguon1

sotien
2
2000

Ngan hang
A
102
1300
Ngan hang
.
B
103
2500
Ngan hang
1000
A
104
3000
Ngan hang
2000
A
Bng ny c chuyn sang bng dng dc nh

nguon2
Ngan hang A
.
Ngan hang C
Ngan hang B
sau:

. reshape long sotien nguon, i(maho) j(lanvay)

40

(note: j = 1 2)
Data
wide
->
long
--------------------------------------------------------------------Number of obs.
4
->
8
Number of variables
5
->
4
j variable (2 values)
->
lanvay
xij variables:
sotien1 sotien2
->
sotien
nguon1 nguon2
->
nguon
---------------------------------------------------------------------

Bng dc c dng nh sau:


maho
101
101
102
102
103
103
104
104

lanvay
1
2
1
2
1
2
1
2

sotien
1200
2000
1300

nguon
Ngan hang A
Ngan hang A
Ngan hang B

2500
1000
3000
2000

Ngan hang A
Ngan hang C
Ngan hang A
Ngan hang B

5. Quyn s trong VHLSS (Weight)


5.1. Quyn s trong iu tra chn mu
Trong iu tra chn mu, cc quan st c la chn mt cch ngu nhin
nhng thng thng cc quan st thng c xc sut la chn khc nhau.
Quyn s bng gi tr nghch o ca xc sut c chn vo mu. Nu
nh quan st i c quyn s l wi th c th ni quan st i trong mu i din
cho wi phn t trong tng th. Cc c lng suy din v tng th cn phi
tnh n quyn s chn mu, nu khng th kt qu s b sai lch.
V d:
Gi s min ng bng Sng Hng gm 2 tnh l H Ni v Bc Ninh vi
dn s tng ng l 4.5 triu v 500 nghn ngi. Chng ta mun chn mt
mu ngu nhin vi c mu l 500 quan st nghin cu v thu nhp ca
ng bng Sng Hng cng nh 2 tnh ny. Nu nh theo t l v dn s
gia 2 tnh th chng ta s thu c mu gm 450 h ti H Ni v 50 h ti
Nam nh. Tuy nhin mu c chn mt cch ngu nhin trn c vng nn s
c kh nng l chng ta thu c mt mu m khng c quan st no ca
tnh Nam nh, hoc c vi s lng rt nh. cho mu mang tnh i
din cho cc tnh th nn chn 400 quan st ti H Ni v 100 quan st ti
Nam nh.
Nu thu nhp bnh qun ca H Ni l 900 nghn/ thng, v ca Nam
nh l 300 nghn/thng th thu nhp bnh qun ca c vng ng bng
Sng Hng khng th tnh l (900 + 300)/2, v cc quan st trong mu
41

khng c chn t l vi cc tnh. Mi quan st ti H ni i din cho


11250 h trong vng (4500000/400). y chnh l quyn s ca quan st,
bng gi tr nghch o ca xc sut c chn vo mu. Cn mi quan st
ti Nam nh i din cho 50000 quan st ca vng (500000/100). Thu nhp
ca vng ng bng Sng Hng s c tnh nh sau:
Thu nhap =

900 400 11250 + 300 100 50000


= 840
400 11250 + 100 50000

Trong VLSS 1998 c 2 quyn s. Th nht l quyn s h, bin wt, chnh l


s h ca Vit Nam m mi h i din. Quyn s th hai l quyn s ca
thnh vin h, hhsizewt l s ngi Vit Nam m mi thnh vin ca h i
din. Quyn s ca thnh vin h bng quyn s h nhn vi quy m h.
V d: Quyn s trong VLSS 1998
. tab reg7, sum(wt)
Code by 7 |
Summary of sample quyn s
regions |
Mean
Std. Dev.
Freq.
------------+-----------------------------------region1 |
3218.4296
850.74246
859
region2 |
3133.7277
849.12325
1175
region3 |
3185.1794
801.74266
708
region4 |
2199.37
492.37202
754
region5 |
1336.3098
269.14747
368
region6 |
1963.8964
528.69328
1023
region7 |
2938.2122
547.72125
1112
------------+-----------------------------------Total |
2688.5003
900.01379
5999
. tab reg7, sum(hhsizewt)
Code by 7 |
Summary of =hhsize*wt
regions |
Mean
Std. Dev.
Freq.
------------+-----------------------------------region1 |
15790.857
7555.7552
859
region2 |
12656.003
5970.9089
1175
region3 |
14814.504
7236.7592
708
region4 |
10794.537
5235.562
754
region5 |
7564.731
3185.9336
368
region6 |
9447.7077
4535.0816
1023
region7 |
14653.702
6639.8297
1112
------------+-----------------------------------Total |
12636.546
6597.6574
5999
. di 2688.5003*5999
16128313
. di 12636.546*5999
75806639

5.2. Cc la chn v quyn s


42

Stata cho php s dng 4 loi loi quyn s sau y:


fweights:

quyn s tn sut (frequency weights), Stata s hiu quyn s


y c ngha l s ln m mi quan st mi quan st c
lp li trong tnh ton.

pweights:

quyn s chn mu (sampling weights), Stata s hiu quyn s


l gi tr nghch o ca xc sut c chn vo mu, hay s
phn t trong tng th m mi quan st trong mu i din.

aweights

quyn s phn tch (analytical weights), Stata s hiu quyn


s t l nghch vi phng sai ca quan st.

iweights

quyn s quan trng (importance weights), y l quyn s


ch mc quan trng ca cc quan st.

i vi iu tra mc sng cc lnh s dng quyn s pweights v fweights.


V d:
. sum poor
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------poor |
5999
29.6216
45.66255
0
100
. sum poor [fw=hhsize]
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------poor |
28509
34.17517
47.43051
0
100
.
.
.

tab

reg7 urban98

| 1:urban 98; 0:rural


Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 |
672
187 |
859
region2 |
783
392 |
1175
region3 |
600
108 |
708
region4 |
502
252 |
754
region5 |
368
0 |
368
region6 |
514
509 |
1023
region7 |
830
282 |
1112
-----------+----------------------+---------Total |
4269
1730 |
5999

.
.

tab

reg7 urban98 [fw= hhsizewt]


|

1:urban 98; 0:rural

43

Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 | 11993763
1570583 | 13564346
region2 | 11057932
3812871 | 14870803
region3 |
9582621
906048 | 10488669
region4 |
5618709
2520372 |
8139081
region5 |
2783821
0 |
2783821
region6 |
4545303
5119702 |
9665005
region7 | 13220727
3074190 | 16294917
-----------+----------------------+---------Total | 58802876
17003766 | 75806642

. tab reg7 urban98 , sum(hhsize) means


Means of Household size
| 1:urban 98; 0:rural
Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 | 5.1205357 3.7326203 | 4.8183935
region2 | 4.045977 4.0459184 | 4.0459574
region3 | 4.6666667 4.6759259 | 4.6680791
region4 | 4.8027888 5.1190476 | 4.9084881
region5 | 5.7065217
. | 5.7065217
region6 | 5.0719844 4.7131631 | 4.8934506
region7 | 5.1373494 4.3971631 | 4.9496403
-----------+----------------------+---------Total | 4.8702272 4.4612717 | 4.752292
. tab reg7 urban98 [fw=wt], sum(hhsize) means
Means and Number of Observations of Household size
| 1:urban 98; 0:rural
Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 | 5.1328749 3.6698008 | 4.9063857
|
2336656
427975 |
2764631
-----------+----------------------+---------region2 | 4.0564115
3.987975 | 4.0386415
|
2726038
956092 |
3682130
-----------+----------------------+---------region3 | 4.6508908 4.6530097 | 4.6510738
|
2060384
194723 |
2255107
-----------+----------------------+---------region4 | 4.8136253
5.132367 | 4.9080132
|
1167251
491074 |
1658325
-----------+----------------------+---------region5 | 5.6609112
. | 5.6609112
|
491762
0 |
491762
-----------+----------------------+---------region6 | 5.0486426 4.6174858 | 4.8106956

44

|
900302
1108764 |
2009066
-----------+----------------------+---------region7 | 5.1494132 4.3925283 | 4.9872852
|
2567424
699868 |
3267292
-----------+----------------------+---------Total | 4.8003065 4.3841133 | 4.7002214
| 12249817
3878496 | 16128313
.
. table reg7 urban98 , c(mean poor) col row format(%4.1f)
------------------------------| 1:urban 98; 0:rural
Code by 7 |
98
regions
| Rural Urban Total
----------+-------------------region1 | 61.5
8.0
49.8
region2 | 32.6
5.9
23.7
region3 | 44.8
10.2
39.5
region4 | 37.3
11.5
28.6
region5 | 47.3
47.3
region6 | 12.5
2.2
7.3
region7 | 35.8
10.3
29.3
|
Total | 38.9
6.8
29.6
------------------------------. table reg7 urban98 [pw=hhsizewt], c(mean poor) col row format(%4.1f)
------------------------------| 1:urban 98; 0:rural
Code by 7 |
98
regions
| Rural Urban Total
----------+-------------------region1 | 65.2
8.3
58.6
region2 | 36.1
7.0
28.7
region3 | 51.3
14.3
48.1
region4 | 43.6
16.6
35.2
region5 | 52.4
52.4
region6 | 13.0
2.9
7.6
region7 | 42.0
15.3
36.9
|
Total | 45.5
9.2
37.4
-------------------------------

Chng III: Kim nh gi thit v phn tch hi quy

1. c lng v kim nh gi thit (Estimation and hypothesis


testing)
1.1.

c lng gi tr trung bnh bng khong tin cy

C php:
45

ci [danh sch bin] [quyn s] [iu kin] [phm vi] [, level(#)


binomial poisson exposure(tn bin) total]
Lnh ny tnh sai s chun v khong tin cy cho gi tr trung bnh ca
mu theo quy lut chun, nh thc v Poatxng.
Cc tu chn:
level(#)

ch nh mc tin cy cho c lng khong tin


cy. # nhn gi tr t 10 n 99, gi tr ngm
nh l 95.

binomial

p dng cho quy lut nh thc

poisson

p dng cho quy lut Poatxng

exposure(tn bin)

p dng cho quy lut Poatxng, tn bin ch ra


bin thi lng (thng thng l thi gian hoc
din tch) m trong xy ra cc s kin c
ch ra bi danh sch bin

total

dng khi ma by prefix c s dung, yu cu c lng khong tin cy cho ton b nhm.

V d:
. ci

poor

Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
5999
29.6216
.5895501
28.46587
30.77733
.
.
. sort reg7
. by reg7: ci poor, total
_______________________________________________________________________________
-> reg7 = region1
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
859
49.82538
1.706961
46.47507
53.17569
_______________________________________________________________________________
-> reg7 = region2
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1175
23.65957
1.240357
21.22601
26.09314
_______________________________________________________________________________
-> reg7 = region3

46

Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
708
39.54802
1.838899
35.93767
43.15838
_______________________________________________________________________________
-> reg7 = region4
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
754
28.64721
1.64759
25.4128
31.88163
_______________________________________________________________________________
-> reg7 = region5
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
368
47.28261
2.606121
42.1578
52.40741
_______________________________________________________________________________
-> reg7 = region6
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1023
7.331378
.8153306
5.731465
8.931292
_______________________________________________________________________________
-> reg7 = region7
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1112
29.31655
1.365709
26.63689
31.99621
_______________________________________________________________________________
-> Total
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
5999
29.6216
.5895501
28.46587
30.77733

Ch :
Cc lnh c lng c th c s dng khi bit cc tham s v mu. y c
th c gi l cc lnh s dng tham s trc tip (Commands using
immediate arguments). Cc lnh ny rt hu dng khi chng ta khng c s
liu gc v bin.
cii <s quan st> <gi tr trungbnh> < lch chun> [,
level(#) ] (phn phi chun)
cii <s quan st> <s ln thnh cng ca quan st> [, level(#) ]
(phn phi nh thc)

47

#obs ch ra s quan st, #succ ch ra s ln gi tr bin nhn gi tr tng


ng vi php th thnh cng (thng thng nhn gi tr bng 1)
cii <gi tr thi lng> <s ln s kin xy ra> poisson [ level(#) ]
(phn phi Poatxng)
V d:
. cii 5999 1777, level (90)
-- Binomial Exact -Variable |
Obs
Mean
Std. Err.
[90% Conf. Interval]
-------------+------------------------------------------------------------|
5999
.296216
.005895
.2865107
.3060676
. cii 12 27, poisson
-- Poisson Exact -Variable | Exposure
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------|
12
2.25
.4330127
1.483144
3.273587

1.2.

Kim nh gi thuyt thng k

1.2.1. Kim nh gi tr trung bnh ca mu


Phn phi khng mt
C php:
prtest <bin>= # [iu kin] [phm vi] [, level(#)]
Lnh ny thc hin kim nh gi thuyt v t l gi tr ca bin phn
phi theo quy lut khng mt (Ho: p = p0).
V d:
. prtest poor=0.44 if reg7==1
One-sample test of proportion

poor: Number of obs =

859

---------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-----------------------------------------------------------------poor | .4982538
.0170597
29.2065
0.0000
.4648174 .5316901
---------------------------------------------------------------------------Ho: proportion(poor) = .44
Ha: poor < .44
z = 3.440
P < z = 0.9997

Ha: poor ~= .44


z = 3.440
P > |z| = 0.0006

Ha: poor > .44


z = 3.440
P > z = 0.0003

prtest <bin 1> = <tn bin2> [iu kin] [phm vi] [, level(#)]
48

Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca


hai gi tr bin c ch ra bi tn bin (Ho: pX = pY).
V d: Kim nh xem t l ngho i gia vng 2 v vng 4 c
khac nhau khng:
. gen poor2=poor if reg7==2
(4824 missing values generated)
. gen poor4=poor if reg7==4
(5245 missing values generated)
. prtest poor2 = poor4
Two-sample test of proportion

poor2: Number of obs =


poor4: Number of obs =

1175
754

-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------poor2 | .2365957
.0123983
19.0829
0.0000
.2122955
.2608959
poor4 | .2864721
.016465
17.3989
0.0000
.2542014
.3187429
---------+-------------------------------------------------------------------diff | -.0498764
.020611
-.0902732
-.0094796
| under Ho:
.0203666 -2.44893
0.0143
-----------------------------------------------------------------------------Ho: proportion(poor2) - proportion(poor4) = diff = 0
Ha: diff < 0
z = -2.449
P < z = 0.0072

prtest <bin>
[level(#)]

Ha: diff ~= 0
z = -2.449
P > |z| = 0.0143

[iu

kin]

[phm

Ha: diff > 0


z = -2.449
P > z = 0.9928

vi],

by(bin

phn

nhm)

Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca


hai nhm c ch ra bi bin phn nhm (Ho: pX1 = pX2).
V d:
. prtest poor, by(sex)
Two-sample test of proportion

1: Number of obs =
2: Number of obs =

4375
1624

-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------1 |
.3248
.00708
45.8755
0.0000
.3109234
.3386766
2 | .2192118
.0102661
21.353
0.0000
.1990906
.239333
---------+-------------------------------------------------------------------diff | .1055882
.0124708
.0811459
.1300304
| under Ho:
.0132673
7.95855
0.0000
-----------------------------------------------------------------------------Ho: proportion(1) - proportion(2) = diff = 0

49

Ha: diff < 0


z = 7.959
P < z = 1.0000

Ha: diff ~= 0
z = 7.959
P > |z| = 0.0000

Ha: diff > 0


z = 7.959
P > z = 0.0000

Phn phi nh thc


C php:
bitest <bin> = #p [quyn s] [iu kin] [phm vi]
Lnh ny kim nh gi thuyt v tham s p trong quy lut nh thc (xc
sut thnh cng ca php th) ca bin c ch ra bi tn bin. (Ho: p =
p0)
V d:
. bitest poor=0.44 if reg7==1
Variable |
N
Observed k
Expected k
Assumed p
Observed p
-------------+-----------------------------------------------------------poor |
859
428
377.96
0.44000
0.49825
Pr(k >= 428)
= 0.000344
Pr(k <= 428)
= 0.999732
Pr(k <= 328 or k >= 428) = 0.000660

(one-sided test)
(one-sided test)
(two-sided test)

. bitesti 859 428 0.44


N
Observed k
Expected k
Assumed p
Observed p
-----------------------------------------------------------859
428
377.96
0.44000
0.49825
Pr(k >= 428)
= 0.000344
Pr(k <= 428)
= 0.999732
Pr(k <= 328 or k >= 428) = 0.000660

(one-sided test)
(one-sided test)
(two-sided test)

Quy lut phn phi chun


C php:
ttest <bin> = # [iu kin] [phm vi] [, level(#) ]
Lnh ny kim nh gi thuyt v gi tr ca tham s trung bnh ca
bin ngu nhin tun theo quy lut chun c ch ra bi tn bin (Ho: =
0)
V d:
.

ttest

rlpcex1=3200

One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]

50

---------+-------------------------------------------------------------------rlpcex1 |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
-----------------------------------------------------------------------------Degrees of freedom: 5998
Ho: mean(rlpcex1) = 3200
Ha: mean < 3200
t = -0.3260
P < t =
0.3722

Ha: mean ~= 3200


t = -0.3260
P > |t| =
0.7444

Ha: mean > 3200


t = -0.3260
P > t =
0.6278

ttest <bin 1> = <bin 2> [iu kin] [phm vi] [, unpaired
unequal level(#) ]
Lnh ny thc hin kim nh gi thuyt rng hai bin c gi tr trung
bnh bng nhau. (Ho:

= Y).

Cc tu chn:
unpaired

S liu ca hai bin khng cng cp

unequal

Phung sai ca hai bin khng bng nhau

V d:
. ttest poor2=poor4, unpaired unequal
Two-sample t test with unequal variances
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------poor2 |
1175
.2365957
.0124036
.425173
.2122601
.2609314
poor4 |
754
.2864721
.0164759
.4524128
.254128
.3188163
---------+-------------------------------------------------------------------combined |
1929
.2560912
.0099404
.436586
.2365962
.2755863
---------+-------------------------------------------------------------------diff |
-.0498764
.0206229
-.0903285
-.0094243
-----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 1532.64
Ho: mean(poor2) - mean(poor4) = diff = 0
Ha: diff < 0
t = -2.4185
P < t =
0.0079

Ha: diff ~= 0
t = -2.4185
P > |t| =
0.0157

Ha: diff > 0


t = -2.4185
P > t =
0.9921

ttest <bin> [iu kin] [phm vi], by(bin phn nhm) [ unequal
level(#) ]
Lnh ny thc hin kim nh gi thuyt v s bng nhau ca gi tr
trung bnh ca hai nhm c ch ra bi bin phn nhm (Ho:

X1

X2

).

V d:
51

. ttest

rlpcex1, by(sex)

Two-sample t test with equal variances


-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------1 |
4375
2980.906
36.74795
2430.648
2908.862
3052.951
2 |
1624
3748.368
80.18189
3231.241
3591.097
3905.638
---------+-------------------------------------------------------------------combined |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
---------+-------------------------------------------------------------------diff |
-767.4613
77.6155
-919.6156
-615.3071
-----------------------------------------------------------------------------Degrees of freedom: 5997
Ho: mean(1) - mean(2) = diff = 0
Ha: diff < 0
t = -9.8880
P < t =
0.0000

Ha: diff ~= 0
t = -9.8880
P > |t| =
0.0000

Ha: diff > 0


t = -9.8880
P > t =
1.0000

1.2.2. Kim nh gi tr lch chun


C php:
sdtest <bin> = # [iu kin] [phm vi] [, level(#) ]
sdtest <bin 1> = <bin 2> [iu kin] [phm vi] [, level(#) ]
sdtest
<bin> [iu kin] [phm vi] , by(bin phn nhm)
[ level(#) ]
Lnh ny kim dnh tham s lch chun ca bin ngu nhin tun
theo quy lut chun c ch ra bi tn bin. C php ca ln ny tng t
vi c php ca lnh ttest
V d:
. sum

rlpcex1

Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
5999
3188.667
2692.567
357.318
45801.71
. sdtest rlpcex1=2700
One-sample test of variance
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------rlpcex1 |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
-----------------------------------------------------------------------------Ho: sd(rlpcex1) = 2700
chi2(5998) = 5965.022
Ha: sd(rlpcex1) < 2700

Ha: sd(rlpcex1) ~= 2700

Ha: sd(rlpcex1) > 2700

52

P < chi2 = 0.3838

2*(P < chi2) = 0.7676

P > chi2 = 0.6162

2. Phn tch tng quan v hi quy (Correlation and regression)


2.1. Phn tch tng quan
C php:
correlate [danh sch bin] [quyn s] [iu kin] [phm vi] [,
means covariance _coef wrap]
Lnh ny tnh ma trn h s tong quan (correlation coefficient), hoc hip
phng sai (covariance) cho cc bin c lit k trong danh sch bin. S
quan st c dng l s quan st ca bin c t quan st nht.
Cc tu chn:
means

Hin th cc thng k khc nh gi tr trung bnh,


lch chun, gi tr ln nht, nh nht

covariance

a ra ma trn hip phng sai thay v h s tng


quan

_coef

Tnh ma trn tung quan ca cc h s ca c lng


gn nht

wrap

Hin th cc dng ca ma trn lin nhau nu c qua


nhiu cc bin c lit k

V d:
. corr hhsize poor
(obs=5999)

rlpcex1 sex

|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize |
1.0000
poor |
0.2425
1.0000
rlpcex1 | -0.2172 -0.4452
1.0000
sex | -0.2570 -0.1028
0.1267
1.0000

. corr hhsize poor


(obs=5999)

rlpcex1 sex, means cov

Variable |
Mean
Std. Dev.
Min
Max
-------------+---------------------------------------------------hhsize |
4.752292
1.954292
1
19
poor |
.296216
.4566255
0
1
rlpcex1 |
3188.667
2692.567
357.318
45801.71
sex |
1.270712
.4443645
1
2

|
hhsize
poor rlpcex1
sex
-------------+------------------------------------

53

hhsize
poor
rlpcex1
sex

| 3.81926
| .216435 .208507
| -1142.93 -547.335
| -.223195 -.020849

7.2e+06
151.543

.19746

pwcorr
[danh sch bin] [quyn s] [iu kin] [phm vi] [, obs
sig print(#) star(#)]
Lnh ny tnh h s tng quan cho tng cp bin c ch ra bi danh
sch bin.
Cc tu chn:
obs

Hin th s quan st dng tnh h s tng quan

sig

Hin th mc ngha ca cc h s tng quan

print(#)

Ch ra mc ngha theo ch cc h s tng


quan c mc ngha nh hn mc ny mi c hin
th

star(#)

nh du sao i vi cc h s tng quan c mc


ngh nh hn mc c ch ra bi star

V d:
. pwcorr hhsize poor rlpcex1 sex, obs sig star(5)
|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize |
1.0000
|
|
5999
|
poor |
0.2425* 1.0000
|
0.0000
|
5999
5999
|
rlpcex1 | -0.2172* -0.4452* 1.0000
|
0.0000
0.0000
|
5999
5999
5999
|
sex | -0.2570* -0.1028* 0.1267* 1.0000
|
0.0000
0.0000
0.0000
|
5999
5999
5999
5999
|

pcorr <bin> <danh sch bin> [quyn s] [iu kin] [phm vi]
Lnh ny tnh h s tng quan ca bin c ch ra bi tn bin vi cc
bin c trong danh sch bin
V d:
54

. pwcorr poor hhsize

rlpcex1 sex

|
poor
hhsize rlpcex1
sex
-------------+-----------------------------------poor |
1.0000
hhsize |
0.2425
1.0000
rlpcex1 | -0.4452 -0.2172
1.0000
sex | -0.1028 -0.2570
0.1267
1.0000

2.2. Phn tch hi quy


Phng php bnh phng nh nht (Ordinary-Least Square)
C php:
regress <bin ph thuc> [danh sch bin] [quyn s] [iu kin]
[phm vi] [, option]
Lnh ny c lng cc h s ca hm bin ph thuc (dependent variable)
theo cc bin c lp (danh sch bin) theo phng php bnh phng nh
nht.
V d:
. reg

rlpcex1

reg7

sex

hhsize

Source |
SS
df
MS
-------------+-----------------------------Model | 3.8639e+09
3 1.2880e+09
Residual | 3.9621e+10 5995 6609032.15
-------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

Number of obs
F( 3, 5995)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

5999
194.88
0.0000
0.0889
0.0884
2570.8

-----------------------------------------------------------------------------rlpcex1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reg7 |
240.9633
15.5905
15.46
0.000
210.4003
271.5263
sex |
403.2984
77.38324
5.21
0.000
251.5994
554.9974
hhsize | -305.6382
17.70692
-17.26
0.000
-340.3501
-270.9263
_cons |
3160.201
155.6576
20.30
0.000
2855.056
3465.346
------------------------------------------------------------------------------

Cc tu chn:
level(#)

Ch ra mc tin cy cho c lng khong tin cy ca h s

noconstant

Khng c h s (intercept) trong hm hi quy

noheader

Ch hin th kt qu phn tch v cc h s

beta

Hin th h s c chun ho, dng so snh mc nh hng


ca cc h s vi nhau

Phng php kh nng ln nht (Maximum-Likelihood)


C php:
55

probit <bin ph thuc> [danh sch bin] [quyn s] [iu kin]


[phm vi] [, tu chn]
Lnh ny thc hin hi quy bin ph thuc theo cc bin c ch ra trong
danh sch bin theo phng php kh nng ln nht. Bin ph thuc thng l
bin gi vi hai gi tr 0 v 1.
V d:
. probit

poor

Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:

reg7 sex
log
log
log
log

hhsize

likelihood
likelihood
likelihood
likelihood

=
=
=
=

-3645.1363
-3367.2185
-3364.8032
-3364.8025

Probit estimates

Log likelihood = -3364.8025

Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2

=
=
=
=

5999
560.67
0.0000
0.0769

-----------------------------------------------------------------------------poor |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reg7 |
-.116342
.0084551
-13.76
0.000
-.1329136
-.0997703
sex | -.1284525
.0422247
-3.04
0.002
-.2112113
-.0456937
hhsize |
.1808115
.0095806
18.87
0.000
.1620338
.1995892
_cons | -.8088731
.0824798
-9.81
0.000
-.9705306
-.6472157
------------------------------------------------------------------------------

c lng gi tr bin ph thuc v phn d


C php:
predict <tn bin mi> [iu kin] [phm vi] [, xb stdp resid]
Lnh ny c thc hin sau lnh regress (hoc probit) to ra 1 bin
mi c gi tr c tnh tu theo tu chn c ch ra.
Cc tu chn:
xb
quy:

cho php c lng gi tr ca bin ph thuc thu c t hm hi


+
X
=
Y
i
0
1 i

stdp

c lng sai s chun ca gia tr c lng:


2
SE i = Var (0 ) + X i Var ( 1 ) 2X i Cov (0 , 1 )

redid

c lng gi tr phn d:

e i = Yi Y
i

V d:
56

predict exphat, xb
To ra bin mi exphat c gi tr c lng ca bin ph thuc (fitted value)
theo h s thu c t hm hi quy.
predict expres, resid
To ra bin expres c gi tr ca phn d.
Kim nh v h s ca hm hi quy
C php:
test [gi tr biu thc]
test [danh sch bin]
testparm <danh sch bin> [, equal ]
Lnh test kim nh cc gi thit v h s ca hm hi quy va mi c c
lng
V d:
test urban98 =2000
Kim nh gi thit h s ca bin urban98 = 0
test region1 = region2
Kim nh gi thit h s ca bin region1 bng h s ca bin region2
test region1 = (region2+region3)/2
Kim nh gi thit v quan h gia cc h s ca bin region1, region2,
va region3
test region1 region2 region3
Kim nh gi thit h s ca bin region1, region2, va region3 u
bng 0
testparm region*
Kim nh gi thit v ca h s ca bin region1 n region7 u
bng 0

. tab reg7, gen(region)


Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00

57

------------+----------------------------------Total |
5999
100.00
. reg

rlpcex1 urban98 region* sex

educyr98 hhsize

Source |
SS
df
MS
-------------+-----------------------------Model | 1.6960e+10
10 1.6960e+09
Residual | 2.6525e+10 5988 4429712.49
-------------+-----------------------------Total | 4.3485e+10 5998 7249918.40

Number of obs
F( 10, 5988)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

5999
382.87
0.0000
0.3900
0.3890
2104.7

-----------------------------------------------------------------------------rlpcex1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban98 |
1995.163
66.46943
30.02
0.000
1864.859
2125.467
region1 | -923.7066
132.8334
-6.95
0.000
-1184.108
-663.3052
region2 | -362.6047
130.2254
-2.78
0.005
-617.8934
-107.316
region3 | -558.0354
137.1551
-4.07
0.000
-826.9089
-289.1619
region4 | -100.7586
135.8372
-0.74
0.458
-367.0486
165.5313
region5 | (dropped)
region6 |
1742.688
131.9928
13.20
0.000
1483.934
2001.441
region7 |
151.9854
128.0272
1.19
0.235
-98.99396
402.9648
sex |
270.9142
66.61031
4.07
0.000
140.3339
401.4944
educyr98 |
153.3281
6.836934
22.43
0.000
139.9253
166.731
hhsize |
-257.691
14.73741
-17.49
0.000
-286.5816
-228.8004
_cons |
2362.355
178.3197
13.25
0.000
2012.784
2711.926
-----------------------------------------------------------------------------. test
( 1)

urban98 =2000
urban98 = 2000.0
F(

. test
( 1)

( 1)

region1 - region2 = 0.0

( 1)
( 2)
( 3)

1, 5988) =
Prob > F =

34.57
0.0000

region1 = (region2+region3)/2
region1 - .5 region2 - .5 region3 = 0.0
F(

. test

0.01
0.9420

region1 = region2

F(

. test

1, 5988) =
Prob > F =

1, 5988) =
Prob > F =

27.80
0.0000

region1 region2 region3


region1 = 0.0
region2 = 0.0
region3 = 0.0
F(

3, 5988) =
Prob > F =

20.22
0.0000

58

. testparm
(
(
(
(
(
(
(

1)
2)
3)
4)
5)
6)
7)

region*

region1 = 0.0
region2 = 0.0
region3 = 0.0
region4 = 0.0
region5 = 0.0
region6 = 0.0
region7 = 0.0
Constraint 5 dropped
F(

6, 5988) =
Prob > F =

148.55
0.0000

Chng IV: V th

1. V th (graph)
C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi] [,
loi__th tu_chn_ring tu_chn_chung]
Trong :
loi__th (graph_type)

Ch ra loi th cn v

tu_chn_ring (specific_options)
th

Cc tu chn lin quan n tng loi

tu_chn_chung (common_options)
Cc tu chn c th s dng chung
cho cc loi th nh tu chn v nh
nhn trn cc trc ca th

Stata cho php v 8 loi th nh sau (graph_type):


59

(1) th 2 chiu (two-way scatterplots)


. graph rlpcex1 age

comp.M&Reg price adj.pc tot exp

45801.7

357.318
16

95

Age of household head

(2) Ma trn th 2 chiu (two-way scatterplot matrices)


. gr rlpcex1 age educyr98 hhsize, matrix
16

95

19
45801.7

comp.M&Reg price
adj.pc tot exp
357.318
95

Age of household
head
16
22

schooling year
of HH.head
0
19

Household size

1
357.318

45801.7

22

(3) th tn sut (histograms)


. gr rlpcex1, bin(50) normal

60

Fraction

.329888

0
357.318

comp.M&Reg price adj.pc tot exp

45801.7

(4) th ri mt chiu (one-way scatterplots)


. gr rlpcex1, oneway

357.318

comp.M&Reg price adj.pc tot exp

45801.71

(5) th hnh hp (box-and-whisker plots)

61

comp.M&Reg price adj.pc tot exp


45801.7

357.318

(6) th ct (bar chart)


. sort reg7
. gr poor, bar means by(reg7)
poor
.498254

(7) th hnh trn (pie charts)


. for num 1/7: gen poorX=poor if reg7==X
-> gen poor1=poor if reg7==1
(5140 missing values generated)
-> gen poor2=poor if reg7==2
(4824 missing values generated)
-> gen poor3=poor if reg7==3
62

(5291 missing values generated)


-> gen poor4=poor if reg7==4
(5245 missing values generated)
-> gen poor5=poor if reg7==5
(5631 missing values generated)
-> gen poor6=poor if reg7==6
(4976 missing values generated)
-> gen poor7=poor if reg7==7
(4887 missing values generated)
. graph poor1-poor7, pie
24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7

(8) th hnh sao (star charts)


chart_type l star

63

Audi 5000

Audi Fox

BMW 320i

Datsun 200

Datsun 210

Price
Mileage (mpg)
Repair Record 1978

Datsun 510

Datsun 810

Fiat Strada

Honda Accord

Honda Civic

Headroom (in.)
Trunk space (cu. ft.)
Weight (lbs.)
Length (in.)

Mazda GLC

Renault

Subaru

Toyota Celica

Toyota Corolla

Turn Circle (ft.)


Displacement (cu. in.)

Toyota Corona

VW Dasher

VW Diesel

VW Rabbit

VW Scirocco

Volvo 260

Cc la chn chung (common_options)


* To tp s liu
. tabulate

hhsize, sum

(rlpcex1)

| Summary of comp.M&Reg price adj.pc


Household |
tot exp
size |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
4696.0254
4619.5012
214
2 |
4131.4892
3677.2297
497
3 |
3834.8615
2913.8177
731
4 |
3428.8011
2599.7301
1404
5 |
2930.5486
2168.0644
1318
6 |
2626.6848
2277.1893
867
7 |
2501.0912
2186.1605
480
8 |
2329.7009
1803.7873
255
9 |
2207.0166
1380.5607
126
10 |
2252.3772
1423.7576
58
11 |
2370.7034
1404.7148
29
12 |
1747.3691
924.72977
9
13 |
2114.1337
2109.0077
4
14 |
1579.78
990.81152
4
16 |
2994.5771
2061.6804
2
19 |
4833.936
0
1
------------+-----------------------------------Total |
3188.6671
2692.5673
5999
. tab hhsize,
|
Household |
size |

sum(educyr98)
Summary of schooling year of
HH.head
Mean
Std. Dev.
Freq.

64

------------+-----------------------------------1 |
3.7897196
4.3956537
214
2 |
5.7545272
4.7225549
497
3 |
7.3023256
4.6396425
731
4 |
8.2578348
4.2659841
1404
5 |
7.7243298
4.2998488
1318
6 |
6.8788927
4.0778062
867
7 |
6.3348958
4.1241759
480
8 |
5.7333333
3.9623557
255
9 |
5.7936508
3.4878474
126
10 |
6.1724138
3.1851516
58
11 |
4.7931034
3.1665586
29
12 |
4.4444444
3.6438685
9
13 |
5
5.0990195
4
14 |
3
2.1602469
4
16 |
4
1.4142136
2
19 |
2
0
1
------------+-----------------------------------Total |
7.0944185
4.4160917
5999
. replace meanexp= meanexp/1000
(16 real changes made)
. replace meanexp= meanexp/1000
. rename var71 ahhsize
. rename var72 meanexp
. rename var73 meanedu
. replace meanexp= meanexp/1000
. label var meanexp Chi tieu binh quan
. label var meanedu So nam hoc
. label var ahhsize Quy mo ho

* Cc tu chn v tiu v trc to


Ly v d th 2 chiu, trc tung th hin chi tiu bnh qun v s nm
hc bnh qun ca ch h, trc honh th hin quy m h gia nh.
. gr meanexp meanedu ahhsize
meanexp

meanedu

8.25783

1.57978
1

ahhsize

19

65

* La chn v tiu :
title("chui k t") t1title("chui k t") t2title("chui k t")
b1title("chui k t") b2title("chui k t") l1title("chui k t")
l2title("chui k t") r1title("chui k t") r2title("chui k t")
Lnh ny ghi cc tiu trn pha trn (top), pha di (bottom), bn tri
(left) v bn phi (right) th.
V d:
gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title
(Quy mo ho gia dinh)
Chi tieu binh quan

So nam hoc

Chi tieu binh quan (tr dong)


So nam hoc cua chu ho

8.25783

1.57978
1

Quy mo ho gia dinh

19

Do thi chi tieu va hoc van chu ho

* Hin th gi tr trc th
xlabel[(gi tr s)] ylabel[(gi tr s)] rlabel[(gi tr s)] tlabel[(gi
tr s)]
V d:
gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title
(Quy mo ho gia dinh) xlabel ylabel

66

Chi tieu binh quan

So nam hoc

Chi tieu binh quan (tr dong)


So nam hoc cua chu ho

2
0

10
Quy mo ho gia dinh

15

20

Do thi chi tieu va hoc van chu ho


Ch : Cc la chn khc c th xem phn help bng lnh: help graxes
Cc tu chn v ng ni
xline[(gi tr s)] yline[(gi tr s)] rline[(gi tr s)] tline[(gi tr
s)]
connect(c[[p]] ... c[[p]])
V d:
. gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title
(Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8)
connect(ll)

67

Chi tieu binh quan

So nam hoc

Chi tieu binh quan (tr dong)


So nam hoc cua chu ho

2
0

10
Quy mo ho gia dinh

15

20

Do thi chi tieu va hoc van chu ho


2. Mt s loi th thng dng
2.1. th 2 chiu
C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi], twoway
[tu_chn_chung rescale]
Tu chn rescale cho php hin th hai trc tung vi gi tr khc nhau
. gen meanexp1=meanexp*1000
. label var meanexp1 "Chi tieu binh quan"
. gr meanexp1 meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (nghin dong)) b2title (Quy mo ho gia dinh) xlabel
ylabel rlabel(2 4 to 8) connect(ll) rescale

68

Chi tieu binh quan

So nam hoc
8

4000
6
3000

So nam hoc

Chi tieu binh quan (nghin dong)

5000

4
2000

1000

2
0

10
Quy mo ho gia dinh

15

20

Do thi chi tieu va hoc van chu ho

2.2. th tn sut
C php:
graph [bin] [quyn s] [iu kin] [phm vi],
[tu_chn_chung bin(#) freq normal[(#,#)] density(#)]

histogram

Cc tu chn:
bin(#)

Ch ra s lng khong cho th, gi tr ngm nh


l bin(5)

Freq

Gi tr tn sut s c hin th trn trc tung

normal[(#,#)]

V hm phn phi chun

density(#)]

c dng vi la chn normal, ch ra s lng im


c lng hm mt theo phn phi chun

V d:
th tn sut ca chi tiu binh qun u ngi
. gr rlpcex1, hist bin(20) normal

69

Fraction

.56026

0
357.318

comp.M&Reg price adj.pc tot exp

45801.7

. gr rlpcex1, hist bin(50) normal freq

Frequency

1979

0
357.318

comp.M&Reg price adj.pc tot exp

45801.7

. gr rlpcex1, hist bin(50) normal freq by(reg7)

70

region1

region2

region3

region4

region5

region6

415

Frequency

415

0
357.318

region7

45801.7

357.318

45801.7

415

0
357.318

45801.7

comp.M&Reg price adj.pc tot exp

Histograms by Code by 7 regions

2.3. th hnh ct
C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi], bar
[tu_chn_chung [no]alt means stack]
V d:
th gi tr trung bnh hc vn ca ch h v quy m h gia nh theo
7 vng
. gr educyr98 hhsize, bar means by(reg7)
schooling year of HH.head

Household size

8.64426

71

. label define region 1 "region1" 2 "region2" 3 "region3" 4 "region4" 5


"region5" 6 "region6" 7 "region7"
. label values reg7 region
. tab reg7
Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00
. gr educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt
schooling year of HH.head

Household size

10

region1

region2

region3

region4

region5

region6

region7

La chn stack
. gen persons=1
. gr persons urban98, bar ylabel by(reg7) stack alt

72

persons

1:urban 98; 0:rural 98

1500

1000

500

region1

region2

region3

region4

region5

region6

region7

V d:
Hy v th sau:
foodpoor

poor

600

400

200

region1

region2

region3

region4

region5

region6

region7

2.4. th hnh trn


C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi], pie
[tu_chn_chung]
Lnh ny v th hnh trn Mi bin s chim 1 phn ca hnh trn v
t l ca phn ny do tng gi tr ca cc quan st cu bin quyt
nh.
V d:
V th t l phn trm s ngi ngho ca mi vng trn tng s ngi
ngho ca c nc.
73

. gr poor1-poor7, pie
24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7

. gen nonfpood=poor- foodpoor


. label var nonfpood "poor but still above food poverty line"
. gen nonpoor=( rlpcex1>=1790)
. gr foodpoor nonfpood nonpoor, pie
. set textsize 90
12% foodpoor
18% poor but still above food povert
70% nonpoor

. set textsize 100


. gr foodpoor nonfpood nonpoor, pie by(reg7) total

74

region1

region2

region3

12% foodpoor
18% poor but still above food povert
70% nonpoor

region4

region5

region7

region6

Total

3. Lu tr v hin th th (Saving and graph using)


lu tr th th ti ca s graph, vo thc n File, chn Save graph,
sau la chn ng dn v tn file cho th, phn m rng ngm
nh l gph.
th cng c th c lu tr bng tu chn saving(tn tp [,replace])
vit sau lnh graph
V d:
. gr educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt saving ("c:\
do thi 1")
. gr persons urban98, bar ylabel by(reg7) stack alt saving("c:\do thi 2")
khng hin th th th c th dng lnh tt ch hin th
th bng lnh
set graphics { on | off }
. set graphics off
. gr poor1-poor7, pie saving ("c:\do thi 3", replace)
(note: file c:\do thi 3.gph not found)
Stata cho php hin th cc th lu tr bng lnh:
graph using <tp tp th 1> [tp tp th 2 ...] [,
margin(#)]
75

margin(#) ch ra khong cch l bao quanh th theo gi tr phn trm


ca din tch th. Gi tr ngm nh l 0.
V d:
. set graphics on
. graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi 3", margin(10) title("Mot so
dac diem cua ho gia dinh")
region1

region2

region3

persons

1:urban 98; 0:rural 98

12% foodpoor
18% poor but still above food povert

1500

70% nonpoor

region4

region7

region5

region6

1000

500

Total

region1

region2

region3

region4

region5

region6

region7

24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7

Mot so dac diem cua ho gia dinh


Ch :
Chng ta co th kt hp lnh saving vi using lu tr ra th mi. V
d:
. graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi 3", margin(10) title("Mot so
dac die m cua ho gia dinh") saving("c:\do thi tong hop")
. graph using "c:\do thi tong hop"

76

Chng V: Lp trnh trong Stata

1. Gii thiu chung v chng trnh do-file


1.1. M v lu tr do-file
Stata cho php vit cc tp c gi l do-file bao gm cc lnh ca Stata.
Thay v thc hin tng lnh mt t ca s lnh command, cc tp do-file
s ln lt thc hin cc lnh .
Chng trnh Stata c son tho trong ca s do-file editor. Ca s ny
c m bng cch kch vo thc n Windows v chn tu chn do-file
editor. Mt cch khc m ca s ny l g lnh doedit ti ca s lnh
command.
V d:
Mt chng trnh c th c son tho trong ca s do-file editor nh sau:
---------------clear
set mem 32m
use "C:\VLSS98\Hhexp98n.dta", clear
tab urban98
sum hhsize
gen new=hhsizet
gen new=hhsize
----------------

Sau khi son tho, do-file s c lu tr bng tu chn Save as trong thc
n File ca ca s do-file editor. Tn ca do-file c th c ch ra ngay
ti lnh doedit nh sau:
doedit (tn do-file)
Tp do-file c phn m rng l do.
v d trn chng ta c th lu tr on chng trnh di tn l chng
trnh 1 ti th mc Vlss98 trn a C.
77

1.2. Thc hin cc tp do-file


chy do-file th ti ca s lnh chng ta g mt trong hai lnh sau:
do filename [, nostop]
run filename [, nostop]
Lnh run thc hin cc lnh trong do-file nhng khng hin th kt qu ra
mn hnh.
Trong qu trnh thc hin do-file, nu c cu lnh sai th Stata s bo li
v ngng vic thc hin cc cu lnh sau . Tuy nhin nu tu chn
nostop c ch ra th Stata s b qua cu lnh b li v tip tc thc
hin cc lnh sau cu lnh li .
V d:
. do "c:\vlss98\chuong trinh 1"
. clear
. set mem 32m
(32768k)
. use "C:\VLSS98\Hhexp98n.dta", clear
. tab urban98
1:urban 98; |
0:rural 98 |

Freq.

Percent

Cum.

------------+----------------------------------Rural |
Urban |

4269
1730

71.16
28.84

71.16
100.00

------------+----------------------------------Total |

5999

100.00

. sum hhsize
Variable |

Obs

Mean Std. Dev.

Min

Max

-------------+----------------------------------------------------hhsize |

5999

4.752292 1.954292

19

. gen new=hhsizet
hhsizet not found
r(111);
78

end of do-file
r(111);

Vi tu chn nostop
. do "c:\vlss98\chuong trinh 1", nostop
. clear
. set mem 32m
(32768k)
. use "C:\VLSS98\Hhexp98n.dta", clear
. tab urban98
1:urban 98; |
0:rural 98 |

Freq.

Percent

Cum.

------------+----------------------------------Rural |

4269

Urban |

1730

71.16
28.84

71.16
100.00

------------+----------------------------------Total |

5999

100.00

. sum hhsize
Variable |

Obs

Mean Std. Dev.

Min

Max

-------------+----------------------------------------------------hhsize |

5999

4.752292 1.954292

19

. gen new=hhsizet
hhsizet not found
r(111);
. gen new=hhsize
. end of do-file
Thc hin (chy) bng lnh run
. run "c:\vlss98\chuong trinh 1", nostop
hhsizet not found

79

Cc do-file c th thc hin bng tu chn Do trong thc n File, hoc


thc hin trc tip trong ca s Do-file editor bng tu chn Do hoc Run
trong thc n Tool.
1.3. Mt s lu khi son tho do-file
version #
Khi son tho cc tp do-file chng ta nn a dng lnh ny vo u chng trnh thng bo phin bn Stata c dng son tho do-file. V
d nu nh chng ta dng Stata 7.0 son tho do-file th cu lnh ny s
c a vo u chng trnh nh sau:
version 7.0
clear
use Hhexp98n.dta
tab reg7
.
Cc phin bn Stata khc nhau s c th c s khc nhau v c php hoc
ngha ca cc cu lnh. Lnh version cho php chng trnh Stata chy c
th hiu ng c ni dung ca tp do-file c vit bi cc phin bn khc.
set memory #[k|m]
Nu nh file s liu i hi b nh ln hn b nh m Stata ang s dng
th chng ta phi thit lp b nh ln hn cho Stata bng lnh trn. Ch l
khng nn thit lp b nh ln hn b nh ca RAM my tnh.
V d:
. use "C:\Hhexp98n.dta", clear
no room to add more observations
r(901);
. set mem 32m
(32768k)
. use "C:\Hhexp98n.dta", clear
set more off/on
Theo ch ngm nh, khi thc hin mt lnh nu nh kt qu ca vic
x l lnh di hn ca s kt qu (Stata Results), mn hnh s dng li
v chng ta s phi n phm (chng hn Enter hoc Space bar) kt
qu tip tc c hin th. Lnh set more off cho php kt qu khng b
dng li m c hin th lin tc cho n khi thc hin xong cu lnh
hoc do-file. Lnh set more on khi phc li ch ngm nh.
K t * v /* */
80

Stata s khng thc hin cc cu lnh c bt u bng k t * hoc


nm gia hai nhm k t /* */. Cc k t ny dng vit ch thch trong
do-file.
V d:
-------------------version 7.0
set mem 32m
use "C:\Hhexp98n.dta", clear
* Tao bien thu nhap cua ho gia dinh
/* Bien nay bang Thu nhap binh quan
nhan voi Quy mo ho*/
gen hhexp = rlpcex1 * hhsize
#delimit ;
Khi cu lnh trong do-file editor qu di th chng ta c th dng lnh ny
thng bo rng 1 cu lnh c kt thc bng k t (;). Theo ch
ngm nh th cu lnh c kt thc khi xung dng bng vic g phm
Enter. khi phc li ch ngm nh th dng lnh #delimit cr
V d: lnh v th chng trc:
graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title
(Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20) yline(2 4 to 8)
connect(ll)
tung ng vi:
#delimit ;
graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho)
b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20)
yline(2 4 to 8) connect(ll) ;
gen hhexp = rlpcex1 * hhsize ;
..
Sau chng ta nn khi phc li ch ngm nh nu nh cc cu
lnh sau c th vit trn 1 dng bng lnh:
#delimit cr
Ch :
-

Chng ta c th dng k t /* */ vit cu lnh di nh sau:


81

graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu
ho) /*
*/ l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu
ho) /*
*/ b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20)
yline(2 4 to 8) connect(ll);
-

Cc lnh # delimit v cch vit cu lnh di s dng k t /* */ ch


dng c trong do-file ch khng dng c ti ca s lnh command.

2. Local v global macros


Macros l cc bin c dng trong cc chng trnh Stata. Bin macros c
xem nh 1 on k t - gi l macroname (tn ca macros) - tng ng vi 1
dy k t khc - c gi l macro contents (ni dung ca macro).
C hai loi macros l local macros (macros ni b) v global macros (macros
ton b).
2.1. Local macros
Nu chng ta g:
. local hogd age hhsize rlpcex1
(Du nhy kp co th b qua, tc l c th g: local hogd age hhsize
rlpcex1)
Khi th `hogd s c hiu tng ng vi: age hhsize rlpcex1. hogd c gi l tn ca macros, cn age hhsize rlpcex1 l ni dung ca macros.
s dng ni dung ca macros, chng ta g tn ca macros gia du trch
dn bn tri ( ) nm pha trn bn tri bn phm - v du trch dn bn
phi ( ) nm pha phi bn di ca bn phm.
Nh vy nu chng ta g:
. summarize `hogd
th tng ng vi g:
. summarize age hhsize rlpcex1
Nu chng ta g:
. local tb summarize
th chng ta c th thc hin lnh summarize
bng cch g:

age

hhsize

rlpcex1

. `tb' `hogd'
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-----------------------------------------------------------82

age | 5999 48.01284 13.7702


hhsize | 5999 4.752292 1.954292
rlpcex1 | 5999 3188.667 2692.567

16
95
1
19
357.318 45801.71

hin th ni dung ca local macros th chng ta g lnh


macros list _(tn local macros)
V d:
. macro list _hogd
_hogd:

age hhsize rlpcex1

xo local macros th chng ta c th dung lnh


macros drop _(tn local macros)
V d:
. macro drop _hogd
. macro list _hogd
local macro `hogd' not found
r(111);
2.2. Global macros
Nu chng ta g:
. global diaban reg7 province commune
(hoc c th b qua du ngoc kp: global diaban reg7 province
commune)
Khi th $diaban tng ng vi: reg7 province commune. diaban c
gi l tn ca macros, cn reg7 province commune l ni dung ca macros.
s dng c ni dung ca global macros chng ta g k hiu $ lin trc
tn ca macros.
Nh vy nu chng ta g:
. describe $diaban
th tng ng vi g:
. describe : reg7 province commune
. describe $diaban
storage display
value
variable name type
format
label
variable label
------------------------------------------------------------------------------reg7
int
%8.0g
Code by 7 regions
province
float %9.0g
Province code
commune
float
%9.0g
commune code PSU-SVY
commands
83

. global mota "describe"


. $mota $diaban
storage display
value
variable name type format
label
variable label
------------------------------------------------------------------------------reg7
int
%8.0g
Code by 7 regions
province
float %9.0g
Province code
commune
float
%9.0g
commune code PSU-SVY
commands
hin th ni dung ca global macros th chng ta g lnh
macros list (tn global macros)
V d:
. global diaban "reg7 province commune"
. macro list diaban
diaban:

reg7 province commune

xo global macros th chng ta c th dng lnh


macros drop (tn local macros)
V d:
. macro drop diaban
. macro list diaban
global macro $diaban not found
r(111);
2.3. S khc nhau gia local macros v global macros
Local macros ch tn ti trong 1 chng trnh. Mt chng trnh s khng
hiu c cc local macros c s dng cc chng trnh khc. Trong khi
, mt khi c khai bo, global macros c hiu bi tt c cc chng
trnh v tn ti trong b nh ca Stata trong sut qu trnh hot ng.
V d:
Thc hin on chng trnh khai bo local macros a. Sau thc hin
lnh hin th ni dung local macros ny, nhng macros ny khng tn ti
on chng trinh khc hay b nh ca Stata.
. do "C:\WINDOWS\TEMP\STD010000.tmp"
. local a "chuong trinh thong ke Stata"
. end of do-file
84

. macro list _a
local macro `a' not found
r(111);
Trong khi i vi global macros
. do "C:\WINDOWS\TEMP\STD010000.tmp"
. global b "chuong trinh thong ke Stata"
. end of do-file
. macro list b
b:

chuong trinh thong ke Stata

3. Tch v hng v ma trn (scalar and matrix)


3.1. Ma trn (matrix)
Stata nh ngha ma trn A[r, c] l mt mng hnh ch nht gm r hng
(row) v c ct (column).
V d:
Nu ma trn A c to ra th chng ta c th xem ni dung ca ma trn
nh sau:
. matrix list A

A[3,3]
c1 c2 c3
r1

r2

r3 10 11 14
y ma trn A bao gm 9 phn t (element): 1, 2, 4, 3, 4, 7, 10, 11, 14.
Cc ct c t tn l c1, c2, v c3, v cc hng l r1, r2, v r3. Phn t l
giao im ca dng 1 v ct 2 c k hiu l A[1, 2]. Trong v d ny A[1,
2] cha gi tr bng 2.
3.2. Tch v hng (scalar)
Tch v hng cha 1 phn t l s. Tch v hng c nh ngha bng lnh
sau:
scalar scalar_name = expression
V d:
. scalar a = 10
. scalar list a
85

a = 10
. scalar b = a* 2
. scalar list b
b=

20

Trong chng mc no , tch v hng c th xem nh mt trng hp c


bit ca ma trn ch c 1 phn t (mt hng v mt ct).
3.3. Mt s lnh lm vic vi ma trn
Thit lp kch thc ma trn
Gia tr ngm nh ca kch thc ma trn l ti a 40 hng v 40 ct.
Chng ta c th thay i kch thc ti a ny bng lnh:
. set matsize 500
Lnh ny cho php cc ma trn c to ra c th bao gm 500 hng v 500
ct.
To ma trn
Ma trn c th to ra bng cc cu lnh trc tip.
V d:
matrix
mymat
(1,2\3,4)

= Cc phn t c phn bit bi du phy, cn cc


hng c phn bit bi du gch cho

matrix myvec = (1 5 To ra vct hng


3 1 3)
matrix
mycol
(1/5/3/1/3)

= To ra vct ct

Ma trn cng c th c to ra t s liu bng lnh:


mkmat <danh sch bin> [iu kin] [phm vi] [, matrix(tn ma
trn) ]
V d:
. input maho quymo thunhap
maho

quymo

thunhap

1. 101 6 1200
2. 103 5 1400
3. 105 5 3200
4. 107 9 1000
5. 109 4 2500
6. end
86

. mkmat maho quymo thunhap, matrix(A)


. matrix list A
A[5,3]
maho

quymo thunhap

r1

101

1200

r2

103

1400

r3

105

3200

r4

107

1000

r5

109

2500

Tnh ton ma trn


matrix D

=B

To ra ma trn D bng ma trn B

matrix C

= (C+C)/2

Tnh li ma trn C da trn gi tr ca ca n

matrix D = A*A

To ra ma trn D bng tch ma trn A v ma trn


chuyn v A

Xo ma trn
Ma trn v tch v hng c th xo khi b nh bng lnh:
matrix drop <ma trn>
scalar drop <tch v hng>
V d:
. matrix drop A
. scalar drop B
4. Lnh iu kin v vng lp
4.1. Lnh ifelse
C php:
iu kin (iu kin logic) {
Nhm cu lnh 1
}
else Cu lnh
Stata s kim tra iu kin logic (expression), nu iu kin ny ng
th cc lnh Nhm cu lnh 1 s c thc hin, nu iu kin sai th
lnh ng sau else s c thc hin, trong trng hp else khng c ch
ra th Stata s thc hin cc lnh sau lnh if {}.
V d:
87

----------------local a=invnorm(uniform())
if `a'>=0 {
display "So ngau nhien tao ra lon hon hoac bang 0"
}
else di "So ngau nhien tao ra nho hon 0"
macro list _a
Ch :

S k hiu { } cho php vit nhiu cu lnh sau else


iu kin (iu kin) {
commands 1

}
else {
comands 2
}
-

Cc lnh ifelse c th c s dng lng vi nhau


iu kin (iu kin) {
Nhm cu lnh 1

}
else iu kin (iu kin) {
.
4.2. Lnh while
C php:
while <iu kin logic> {
Nhm cu lnh
}
Stata s kim tra iu kin logic (expression), nu iu kin ny ng
th cc lnh Nhm cu lnh s c thc hin, nu iu kin sai th
cc lnh ny s khng c thc hin.
V d:
local i=1
while `i<= 10 {
88

if mod(`i',2) {
display "`i' is odd"
}
else {
display "`i' is even"
}
local i=`i+1
}
Ch :
Vng lp c th c dng li nu s dng tu chn sau y gia vng
lp:
continue [, break]
Nu gp lnh continue, Stata s b qua cc lnh sau v quay li lnh
u tin ca vng lp. Nu c tu chn break c ch ra th Stata s
thot khi vng lp.
V d: Tm tch s chung nh nht ca 2, 3 v 5
local i=1
while `i<= 1000 {
if mod(`i',2)==0 & mod(`i',3)==0 & mod(`i',5)==0 {

di "The least common multiple of 2, 3, and 5 is `i'"


continue, break
}
}
5. Gii thiu v file ado
To chng trnh
Mt on chng trnh trong Stata c th c nh ngha bng lnh:
Program define <tn chng trnh>
Cc cu lnh
end
on chng trnh ny c vit trong ca s Do-file editor. Mt khi n c
chy th on chng trnh ny s lu tr trong b nh ca Stata, v ch cn
gi ra bng cch g tn chng trnh (progname)
89

V d:
quietly program define povline
display as text _col(3) "Poverty line" _col(16) "{c |}" _col(20) "Food"
_col(30) "Overall"
di as text _col(2) "{hline 14}{c +}{hline 26}"
di as text _col(8) "Value" _col(16) "{c |}" as result _col(20) "1380"
_col(33) "1920"
end
Sau khi chng ta chy lnh ny bng run hoc do, th ti ca s
command, chng ta g:
. povline
Poverty line | Food

Overall

---------------+-------------------------Value

| 1380

1920

Ch :
Nu chng ta chy li lnh program define povline, v nhn c thng
bo:
povline already defined
r(110);
Tc l chng trnh povline c to ra ri, xo chng ny i th
chng ta dng lnh:
program drop poveline
hoc xo tt c cc chng trnh
program drop _all
Ado-file
Cc ado-file to ra cc lnh ca Stata. Trong Stata c hai loi lnh. Loi th
nht c vit trong Stata, v d nh lnh summarize. Loi th hai c
nh ngha bi cc tp ado, v d nh lnh ci.
bit c lnh Stata thuc loi no, g lnh which:
. which sum
built-in command: summarize

. which ci
C:\STATA\ado\base\c\ci.ado
90

*! version 3.3.4 04sep2000


Cc ado-file chnh l cc chng trnh c nh ngha bng lnh program
define, v lu tr vi phn m rng l ado. Stata s tm kim cc ado-file
cc th mc:
. sysdir
STATA: C:\STATA\
UPDATES: C:\STATA\ado\updates\
BASE: C:\STATA\ado\base\
SITE: C:\STATA\ado\site\
STBPLUS: c:\ado\stbplus\
PERSONAL: c:\ado\personal\
OLDPLACE: c:\ado\
V d:
Chng ta c th lu tr lnh povline di dng ado v lu tr thu mc
C:\STATA\ado\base\
Lnh ny s c thc hin khi ta g povline m khng cn chng ta phi
thc hin cu lnh trc do-file.
Bi tp: Vit lnh povline vi cc la chn cho cc nm 1993, 1998, v 2002.

Ti liu tham kho


Hng dn s dng trong phn mm Stata 7.0 (on-line help). (Tu chn
Contents trong thc n Help).

Ph lc
Cc thng k c bn ca mu tun theo quy lut chun
Trung bnh:
n

x=

i =1

Phng sai:
n

s2 =

(x
i =1

x)2

n 1

lch chun:
s =

s2

lch trung bnh tuyt i:


91

MAD =

i =1

lch:
n

Skewness =

(x

x)3 / n

i =1

s3

nhn:
n

Kurtosis =

(x

x)4 / n

i =1

s4

92

You might also like