Professional Documents
Culture Documents
L ng Trung?
ni dung
Chng I: Gii thiu chung v chng trnh Stata .................................................................. 2
1. T chc lu tr d liu trong Stata (Dataset in Stata).......................................................... 2
2. Khi ng v thot khi Stata (Open and exit) ................................................................... 3
3. Giao din Stata (Stata interface) ......................................................................................... 3
4. Bin bn lm vic (log file) ............................................... Error! Bookmark not defined.
5. Nhp v lu d liu (Use, input and and save).................................................................... 6
Chng II: Khai thc d liu...................................................................................................... 8
1. Cu trc lnh trong Stata (Stata command syntax).............................................................. 8
2.Ton t v hm s (Operators and functions) .................................................................... 11
3. M t d liu (Data reporting)........................................................................................... 12
4. Bin tp v sa cha d liu (Data manipulation)............................................................. 25
5. Quyn s trong VHLSS (Weight)...................................................................................... 38
Chng III: Kim nh gi thit v phn tch hi quy........................................................... 43
1. c lng v kim nh gi thit (Estimation and hypothesis testing)............................. 43
2. Phn tch tng quan v hi quy (Correlation and regression) ........................................ 49
Chng IV: V th ................................................................................................................ 55
1. V th (graph) ............................................................................................................... 55
2. Mt s loi th thng dng ......................................................................................... 64
3. Lu tr v hin th th (Saving and graph using) .......................................................... 71
Chng V: Lp trnh trong Stata ............................................................................................ 73
1. Gii thiu chung v chng trnh do-file .......................................................................... 73
2. Local v global macros ..................................................................................................... 77
3. Tch v hng v ma trn (scalar and matrix)................................................................... 80
4. Lnh iu kin v vng lp ............................................................................................... 82
5. Gii thiu v file ado......................................................................................................... 84
Ti liu tham kho..................................................................................................................... 85
Ph lc........................................................................................................................................ 85
tenchuho
Nguyen Van A
Le Thi B
Tran Van C
quymoho
6
5
10
thunhapbq
2100
3210
1200
Quan st (observation)
Mi mt hng ngang ca bng s liu c gi l mt quan st, hay mt bn ghi (record) lu tr s
liu v mt i tng nghin cu. v d trn c 3 quan st lu tr s liu v M h (maho); Tn ch
h (tenchuho); Quy m h (quymoho); Thu nhp bnh qun (thunhapbq) ca 3 h gia nh.
Bin (variable)
Thng tin v i tng nghin cu c thu thp v lu tr theo cc c im ca chng. Cc c im
ny c gi l bin. Bin c xem l cc ct ca bng s liu. v d trn c 4 bin, vi tn l
maho, tenchuho, quymoho, v thunhapbq. Tn bin di t 1 n 32 k t, c bt u ch hoc du
gch di (_). Tn bin ch bao gm ch, s v du gch di. Cc k t c bit khc khng th dng
t tn cho bin.
Bin xc nh (identifying variables)
Thng thng trong cc bin s c cc bin dng nhn dng quan st, c gi l bin xc nh.
Nh c cc bin xc nh ny m cc quan st c th phn bit c vi nhau. Mi mt quan st c
mt gi tr ca cc bin ny. v d trn, bin xc nh l maho, i vi mi mt quan st bin maho
nhn mt gi tr.
Cc c im ca bin
Cc bin c th c gn nhn (ch thch). V d bin maho c th c gn nhn l M h.
Bin c th c nh dng (format) l bin s v bin k t vi cc loi lu tr khc nhau. Bin s c
th lu tr di loi byte; int; long; float; double. Cn bin k t th c th lu tr di dng str1 n
str80 cho cc di khc nhau.
Kiu lu tr
dng s
byte
int
long
float
double
Dung lng
(Byte)
1
2
4
4
8
Gi tr nh nht
Gi tr ln nht
Kiu
-127
-32,767
-2,147,483,647
-10^36
-10^308
126
32,766
2,147,483,646
10^36
10^308
S nguyn
S nguyn
S nguyn
S thc
S thc
Cc bin s c th bao gm cc bin ri rc v lin tc. Cc bin nh l quy m h gia nh, gii tnh
ch h, vng a l, trnh gio dc l cc bin ri rc (discrete) (hay cn gi l bin phn loi
2
Byte
di ln nht
1
2
1
2
80
80
Cc ca s ca Stata
Cc ca s ca Stata c m ra bng vic la chn cc tu chn thanh thc n Windows (menu
bar). Cc ca s ny bao gm:
Results
Hin th cc lnh v kt qu
Graph
Hin th th
Viewer
Command
Dng g cc cu lnh
Review
Variables
Data editor
Do-file editor
M file s liu
View
Save
Lu file s liu
Save as
Lu file s liu di tn mi
File name
Log
Save graph
Lu gi file th
Print graph
In th
Print results
In kt qu
Exit
Edit
Copy text
Copy tables
Paste
Dn
Prefs
Windows
Results
M ca s kt qu
Graph
M ca s th
Log
M ca s log file
Viewer
Command
M ca s cu lnh
Review
Variables
Help/Search
M ca s tr gip (help)
Data editor
Do-file editor
Help
Save
Lu tr file s liu ra a
Print results
In ni dung ca ca s kt qu
Begin log
Start viewer
M ca s tr tr (help)
a ca s kt qu ra pha trc
a ca s v th ra pha trc
Do-file editor
Data editor
M ca s sa cha s liu
Data browser
M ca s xem s liu
Tt lnh more
Break
Nu file s liu c dung lng ln th chng ta phi thit lp b nh cn dng cho Stata bng lnh:
set memory #[k|m]
V d:
set mem 32m
set mem 32000k
Nhp s liu
C mt s cch nhp s liu t bn phm vo b nh ca Stata.
-
S dng ca s Stata editor nhp s liu. Hoc t ca s command, g lnh edit. Sau
nhp s liu theo kiu biu bng trong ca s ny.
Stata cho php nhp s liu t cc file c s d liu khc. Trc ht cc file s liu ny cn c lu
tr di dng text (c th bng chng trnh Excel), cc quan st c cc nhau 1 dng v cc gi tr
cch nhau 1 du phy (commas) hoc du cch (tab). Sau dng lnh copy v paste nhp s liu
ny vo Stata.
Lu tr s liu
Vic lu tr s liu c th thc hin bng cc ty chn Save v Save as trong thanh thc n (menu
bar); hoc nt Save trn thanh cng c (tool bar).
i vi tn bin, Stata phn bit ch vit thng vi ch vit hoa. V d, trong cng mt tp s
liu, bin Ho_ten v bin ho_ten l 2 bin khc nhau.
Mt s lnh Stata cho php vit tt. V d lnh summarize c th vit tt l sum. Trong cun
ti liu ny phn gch chn di c php ca cu lnh l c php vit tt ca cu lnh .
by danh sch bin (by varlist): Stata s thc hin cu lnh vi theo tng gi tr c ch ra bi danh
sch bin. Bin c ch ra bi danh sch bin c yu cu sp xp trc khi thc hin lnh.
V d:
. sort sex
. by sex: sum
rlpcex1
-> sex = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
4375
2980.906
2430.648
357.318
45801.71
-> sex = 2
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
1624
3748.368
3231.241
376.9805
30624.77
poor if reg7==1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------poor |
859
.4982538
.5002882
0
1
Quyn s (weight)
Cho php tnh ton s dng quyn s. Tu chn v quyn s s c trnh by k mc 5 ca chng
ny.
Cc tu chn (Options)
Nhiu cu lnh Stata cho php cc tu chn ring. Cc tu chn ny c ch ra sau du phy.
V d:
Lnh sum c tu chn l detail, cho php tnh ton thm mt s thng k khc ngoi gi tr trung bnh
v lnh chun.
. sum
rlpcex1, detail
comp.M&Reg price adj.pc tot exp
------------------------------------------------------------Percentiles
Smallest
1%
682.9575
357.318
5%
1012.433
366.2792
10%
1238.088
376.9805
Obs
5999
25%
1671.054
381.3502
Sum of Wgt.
5999
50%
75%
90%
2397.042
3711.917
5940.803
Largest
26944.64
30624.77
Mean
Std. Dev.
Variance
3188.667
2692.567
7249918
10
95%
99%
8045.32
14163.04
31066.5
45801.71
Skewness
Kurtosis
3.791027
29.21398
Ch :
-
Stata cho php vit tt cc lnh v ty chn. Trong ti liu ny, phn gch chn di cc lnh
c ngha l lnh c th vit tt bng k t trong phn gch chn ny. V d nh lnh use c
ngha l c th c vit tt bi u.
C php ca cc cu lnh trong ti liu ny c vit bng ting Anh, cho php ngi c c
th i chiu vi phn hng dn s dng trong Stata.
ngha
S hc
+
Cng
Tr
Nhn
Chia
Lu tha
Quan h
>
Ln hn
<
Nh hn
>=
Ln hn hoc bng
<=
Nh hn hoc bng
==
Bng
~=
!=
Lgc
~
Khng
Hoc
&
Ch :
Trong biu thc du == c dng cho vic kim nh biu thc, v d nh c dng sau lnh if. Cn
du = c dng cho lnh to bin.
V d:
gen RRD=0
replace RRD=1 if reg8==1
11
Cc hm s (function)
Hm s thng c dng trong biu thc (exp) ca cu lnh Stata. Nu coi Y l mt hm s ca f(X1,
X2,, Xn) th lnh v hm s trong Stata s tnh gi tr ca Y nu cho cc gi tr ca Xi. Stata c 8 loi
hm s:
Mathematical functions
Cc hm ton hc
Statistical functions
Hm thng k
Random numbers
String functions
Hm lin quan n dy k t
Special functions
Hm c bit
Date functions
Hm ngy thng
Time-series functions
Matrix functions
Hm ma trn
V d:
gen absx=abs(x)
gen log_exp=log(rlpcex1)
Cc k hiu c th v cc hm s ny c th xem mc help functions.
3. M t d liu (Data reporting)
3.1. Xo b nh ca Stata
C php:
clear
Lnh ny xo cc d liu trong b nh ca Stata, bt u cho mt file lm vic mi.
3.2. Hng dn s dng lnh Stata
C php:
help <Cu lnh Stata>
Lnh ny hin th hng dn s dng cc lnh Stata, lnh Stata cn phi c g y v chnh xc.
V d:
. help sum
help for sum not found
try help contents or search sum
. help summarize
----------------------------------------------------------------------------------------help for summarize
----------------------------------------------------------------------------------------Summary statistics
12
.
Ch :
Chng ta c th tm hng dn s dng theo t kho bng lnh search. Lnh search c th c thc
hin bng tu chn Search thc n help.
Chng ta cng c th dng ca s lnh bng menu bar
3.3. M t d liu
C php:
describe [danh sch bin]
Lnh ny hin th thng tin chung nh tn bin, nh dng, nhn bin ca cc bin c lit k bi
danh sch bin ca file s liu ang m. Nu nh khng c bin no c ch ra th lnh describe s
hin th thng tin ca tt c cc bin.
V d:
. des
storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------------househol
long
%12.0g
household code
13
year
month
vlssmphs
source
float
float
byte
%9.0g
%9.0g
%8.0g
Year of interview
Month of interview
1 if vlss, 2 if mphs
1.
2.
3.
4.
5.
. list
1.
2.
3.
4.
5.
farm
farm
farm
farm
non farm
non farm
farm
1
1
1
0
0
rlpcex1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
5999
3188.667
2692.567
357.318
45801.71
. sum
rlpcex1, detail
2397.042
3711.917
5940.803
8045.32
14163.04
Largest
26944.64
30624.77
31066.5
45801.71
Mean
Std. Dev.
3188.667
2692.567
Variance
Skewness
Kurtosis
7249918
3.791027
29.21398
15
Negative
Zero
Positive
Total
Missing
Number of Observations
NonTotal
Integers
Integers
2964
2964
3035
3035
------------5999
5999
----5999
nolabel
V d:
. tab sex
Gender of |
HH.head |
(1:M;2:F) |
Freq.
Percent
Cum.
16
------------+----------------------------------1 |
4375
72.93
72.93
2 |
1624
27.07
100.00
------------+----------------------------------Total |
5999
100.00
. tab1 urban98 reg7
-> tabulation of urban98
1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
4269
71.16
71.16
Urban |
1730
28.84
100.00
------------+----------------------------------Total |
5999
100.00
-> tabulation of reg7
Code by 7 |
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00
17
Cc tu chn:
chi2
missing
nofreq
cell
column
row
V d:
. tab
. tab
Cc tu chn:
means
standard
freq
Hin th mi gi tr tn sut
missing
V d:
. replace poor=poor*100
(1777 real changes made)
. format poor %4.2f
. tab reg7 urban98, sum(poor) means
Means of poor
| 1:urban 98; 0:rural
Code by 7 |
98
regions |
Rural
Urban |
Total
-----------+----------------------+---------region1 |
61.46
8.02 |
49.83
region2 |
32.57
5.87 |
23.66
region3 |
44.83
10.19 |
39.55
region4 |
37.25
11.51 |
28.65
region5 |
47.28
. |
47.28
region6 |
12.45
2.16 |
7.33
region7 |
35.78
10.28 |
29.32
-----------+----------------------+---------Total |
38.86
6.82 |
29.62
tabstat <danh sch bin> [quyn s] [iu kin] [phm vi] [, statistics(c php tk [...]) by(tn
bin) missing format[(%fmt)]]
Lnh ny tnh ton cc thng k ca cc bin c ch ra bi danh sch bin cho tng gi tr ca bin
phn loi (categorical) c ch ra bi by(tn bin).
Chng ta cng c th dng ca s lnh bng menu bar
V d:
. tabstat
21
| 5790.046 11077.51
----------------------------region5 | 5894.983 11217.05
| 5380.505 9421.447
----------------------------region6 | 9746.158 23515.01
| 8428.743 18514.39
----------------------------region7 | 6556.616 13068.11
| 6066.128 11043.99
----------------------------Total | 6787.898 14010.74
| 5951.567 10733.19
-----------------------------
Cc tu chn:
statistics(statname [...])
by(tn bin)
Missing
format[(%fmt)]
Stata cho php cc loi thng k c ch ra bi statistics(c php thng k [...]) nh sau:
C php thng k
ngha
mean
count
m s quan st
sum
Tng cng
max
Gi tr ln nht
min
Gi tr nh nht
range
sd
lch chun
sdmean
skewness
kurtosis
nhn
median
p1
1% phn v
p5
5% phn v
p10
10% phn v
p25
25% phn v
p50
p75
75% phn v
p90
90% phn v
p95
95% phn v
p99
99% phn v
iqr
p75 - p25
V d:
. tabstat
23
Cc tu chn:
Contents(ni dung)
row
col
format(%fmt)
missing
V d:
. table reg7 urban98 farm, contents(mean poor) row col format(%4.2f)
-----------------------------------------------------| Type of HH (1:farm; 0:nonfarm) and 1:urban
|
98; 0:rural 98
Code by 7 | ----- non farm ---------- farm -----regions
| Rural Urban Total
Rural Urban Total
----------+------------------------------------------region1 | 19.35
6.02 10.26
65.74 12.96 61.45
region2 | 26.67
4.62 11.29
33.97 15.22 32.70
region3 | 40.98 10.11 27.96
45.82 10.53 44.47
region4 | 21.60 11.64 15.13
42.44 10.00 40.81
region5 | 30.77
30.77
49.24
49.24
region6 | 15.04
2.20
6.43
10.07
0.00
9.78
region7 | 38.63 10.04 25.39
34.36 11.63 32.72
|
Total | 27.91
6.17 14.84
42.30 12.11 40.63
-----------------------------------------------------. table urban98 farm, contents(mean poor sd poor) row col format(%4.2f)
---------------------------------------1:urban
|
98;
|
Type of HH (1:farm;
0:rural
|
0:nonfarm)
98
| non farm
farm
Total
----------+----------------------------Rural |
27.91
42.30
38.86
|
44.88
49.41
48.75
|
Urban |
6.17
12.11
6.82
|
24.07
32.71
25.22
|
Total |
14.84
40.63
29.62
|
35.55
49.12
45.66
24
farm,
contents(mean
rlpcex1
mean
rlhhex1)
row
col
---------------------------------------1:urban
|
98;
|
Type of HH (1:farm;
0:rural
|
0:nonfarm)
98
| non farm
farm
Total
----------+----------------------------Rural | 2835.83
2212.12
2361.29
| 13242.03 10120.89 10867.36
|
Urban | 5476.86
3232.17
5230.33
| 22984.44 11903.19 21767.43
|
Total | 4423.95
2268.49
3188.67
| 19100.41 10219.39 14010.74
----------------------------------------
25
regions |
Freq.
Percent
Cum.
------------+----------------------------------region1 |
859
14.32
14.32
region2 |
1175
19.59
33.91
region3 |
708
11.80
45.71
region4 |
754
12.57
58.28
region5 |
368
6.13
64.41
region6 |
1023
17.05
81.46
region7 |
1112
18.54
100.00
------------+----------------------------------Total |
5999
100.00
. tab1 region1 region2
-> tabulation of region1
reg7==regio |
n1 |
Freq.
Percent
Cum.
------------+----------------------------------0 |
5140
85.68
85.68
1 |
859
14.32
100.00
------------+----------------------------------Total |
5999
100.00
-> tabulation of region2
reg7==regio |
n2 |
Freq.
Percent
Cum.
------------+----------------------------------0 |
4824
80.41
80.41
1 |
1175
19.59
100.00
------------+----------------------------------Total |
5999
100.00
gen region7=(reg7==7)
To bin bng lnh egen
C php:
egen <bin mi> = fcn(tham s) [iu kin] [phm vi] [, by(bin)]
Lnh ny cho php to bin mi theo gi tr ca hm s c ch ra bi fcn. Bin mi ny s nhn gi
tr c nh cho mi quan st. Hm s y c th l:
count(exp)
mean(exp)
median(exp)
sd(exp)
Thay th gi tr ca bin
C php:
replace <bin> = biu thc [iu kin] [phm vi]
Lnh ny thay th gi tr ca bin hin c bng gi tr mi xc nh bi biu thc exp.
V d:
replace poor=poor*100
replace pcexp = hhexp/hhsize
To bin phn loi bng lnh encode
C php:
encode <bin> [iu kin] [phm vi], generate(bin mi)
Lnh ny cho php to bin phn loi mi (categorical) kiu s tng ng vi cc gi tr ca bin kiu
ch ch ra bi tn bin (c xp theo vn ch ci).
V d:
. gen str15(mucsong) = "Kha"
. drop
mucsong
27
rlpcex1<1790 &
rlpcex1>1290
rlpcex1>=1790
. tab mucsong
mucsong |
Freq.
Percent
Cum.
----------------+----------------------------------Khong ngheo |
4222
70.38
70.38
Ngheo |
1087
18.12
88.50
Rat ngheo |
690
11.50
100.00
----------------+----------------------------------Total |
5999
100.00
. sum mucsong
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------mucsong |
0
. encode mucsong, gen(ma_ms)
. tab ma_ms
ma_ms |
Freq.
Percent
Cum.
------------+----------------------------------Khong ngheo |
4222
70.38
70.38
Ngheo |
1087
18.12
88.50
Rat ngheo |
690
11.50
100.00
------------+----------------------------------Total |
5999
100.00
. sum ma_ms
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------ma_ms |
5999
1.411235
.6871957
1
3
quinexp
28
5 quantiles |
of rlpcex1 |
Freq.
Percent
Cum.
------------+----------------------------------1 |
1200
20.00
20.00
2 |
1200
20.00
40.01
3 |
1200
20.00
60.01
4 |
1200
20.00
80.01
5 |
1199
19.99
100.00
------------+----------------------------------Total |
5999
100.00
. tab
4.2. i tn bin
C php:
rename <tn bin c> <tn bin mi>
Lnh ny thc hin vic i tn c ca mt bin sang tn mi.
V d:
rename poor nguoingheo
rename rpcexp1 chitieu
4.3. Lnh xo bin, xo quan st
C php:
drop <danh sch bin>
V d:
drop poor urban98
drop if sex==1
drop in 1/20
Xo quan st t 1 n 20
keep househol
keep in f/50
30
0:rural 98 |
Freq.
Percent
Cum.
------------+----------------------------------Rural |
1730
28.84
28.84
Urban |
4269
71.16
100.00
------------+----------------------------------Total |
5999
100.00
label dir
label list <tn b nhn>
label drop {tn b nhn [tn b nhn ...] | _all}
label values <tn bin> [tn b nhn]
Lnh label define gn nhn cho mt b gi tr s. Tn ca b nhn c ch ra sau t kho define, # l
gi tr s, nhn l chui k t tng ng vi gi tr s y. C hai tu chn y: tu chn add
thm gi tr v nhn tng ng vo 1 b nhn c sn. Tu chn modify cho php sa cha gi tr v
nhn ca 1 b nhn c sn.
Lnh label dir hin th nhng b nhn c sn, cn lnh label list hin th gi tr ca b nhn c ch
ra. Lnh label drop xo cc b nhn c sn.
V d:
To nhn c tn l nngheo vi gi tr 1 c ngha l ngi ngho, cn 0 c ngha l ngi khng ngho.
. label define nngheo 0 "Ngheo" 1 "Khong ngheo"
. label dir
nngheo
region
loaiho
diploma
urban
agegroup
. label list nngheo
nngheo:
0 Khong ngheo
1 Ngheo
. label drop _all
. label dir
1.
ngheo
1
32
2.
3.
4.
5.
0
1
1
0
4.6. Sp xp s liu
C php:
sort <danh sch bin> [phm vi]
gsort [+|-]tn bin [[+|-]tn bin [...]]
Lnh sort sp xp quan st theo th t tng dn ca gi tr ca cc bin c ch ra trong danh sch
bin.
Lnh gsort cho php sp xp cc quan st theo th t tng dn ca ca cc bin (danh sch bin), nu
du + c ch ra (y cng l gi tr ngm nh), hoc theo th t gim dn, nu du - c ch ra.
V d:
sort reg7 hhsize
trong :
Biu thc thng k l danh sch cc thng k v cc bin tng ng. Cc thng k c k hiu nh
mc 3.12 ca chng ny.
Lnh collapse s to ra mt tp s liu mi bao gm cc bin c ch ra bi danh sch bin, vi cc
gi tr c tnh theo thng k tng ng. Cc quan st ca tp s liu c s c nhm li theo cc gi
tr cng loi ca bin c ch ra bi by(danh sch bin).
V d:
Chng ta c file s liu v thu nhp v chi tiu ca cc h thnh vin trong gia nh:
ma_tv
1
2
3
4
1
2
3
1
2
3
4
1
2
3
4
5
6
ma_ho
101
101
101
101
102
102
102
103
103
103
103
104
104
104
104
104
104
thunhap Chitieu
200
500
1200
400
0
200
0
200
3200
500
1200
320
200
200
300
500
2100
250
0
300
0
300
4300
800
3500
500
300
500
0
300
0
200
0
200
Chng ta s dng lnh collapse to file v thu nhp v chi tiu bnh qun ca cc h, v to thm 1
bin v qui m h.
. gen quimo=1
. collapse (mean) thunhap (mean) chitieu (sum) quimo, by(ma_ho)
Tp s liu mi c dng:
ma_ho
101
102
103
104
thunhap chitieu
350
325
1533.33
340
600
337.5
1350
416.667
quimo
4
3
4
6
Lnh merge s ni cc quan st ca tp s liu ang m trong Stata (gi l tp ch (master dataset))
vi cc quan st tng ng ca tp s liu khc c ch ra sau t kho using (gi l tp s dng (using
dataset)) thnh 1 tp mi. Cc bin ch ra trong danh sch bin c gi l bin xc nh (identifying
variables), v phi c sp xt bng lnh sort (hoc gsort) trc khi thc hin lnh merge.
V d:
Chng ta c 2 tp s liu nh sau:
thunhap.dta
ma_ho
101
102
103
104
thunhap chitieu
350
325
1533.33
340
600
337.5
1350
416.667
quimo
4
3
4
6
dialy.dta
ma_ho
204
102
103
104
thanhthi
0
1
0
0
vung
1
4
3
6
thunhap chitieu
350
325
1533.33
340
600
337.5
1350
416.667
.
.
quimo
4
3
4
6
.
thanhthi
.
1
0
0
0
vung
.
4
3
6
1
_merge
1
3
3
3
2
Nu nh quan st ch c to t tp ch
_merge==2
Nu nh quan st ch c to t s dng
35
_merge==3
Nu nh quan st c to t c tp ch v tp s dng
Cc tu chn:
Trong trng hp hai tp s liu c cc bin trng nhau, cc tu chn sau y cho php x l s liu
theo cc cch khc nhau:
update
replace
thunhap
1350
1500
800
1500
2500
chitieu
425
370
556
417
540
gioitinh
1
0
0
0
1
thunhap chitieu
350
325
1533.33
340
600
337.5
1350
416.667
1350
425
1500
370
800
556
1500
417
2500
540
quimo
4
3
4
6
gioitinh
1
0
0
0
1
C php:
reshape wide <tn bin>, i(danh sch bin) [ j(tn bin [values]) ... ]
reshape long <tn bin>, i(danh sch bin) [ j(tn bin [values]) ... ]
reshape wide
reshape long
Lnh ny cho php chuyn s liu t dng ngang sang s liu dng dc (tu chn long), v t dng dc
sang dng ngang (tu chn wide). i(danh sch bin) ch ra bin xc nh (indentifying variables) dng
phn bit cc quan st vi nhau trong s liu dng ngang (gi l quan st cp 1). j(tn bin) ch ra
bin dng phn bit gia cc quan st cp 2 s liu dng dc.
V d 1:
Chng ta c th s liu dng bng ngang nh mt ma trn nh sau:
-imaho quimo
101
5
102
4
103
6
quimo
5
5
5
4
4
4
6
6
6
-jnam
95
96
97
95
96
97
95
96
97
- xji thunhap
4500
4400
5400
3400
3300
3700
5000
5400
5500
37
Data
long
->
wide
-----------------------------------------------------------------------Number of obs.
9
->
3
Number of variables
4
->
5
j variable (3 values)
nam
->
(dropped)
xij variables:
thunhap
->
thunhap95 thunhap96 thunhap97
----------------------------------------------------------------------
V d 2:
Chng ta c s liu dng bng sau y:
maho
101
102
103
104
sotien1
1200
1300
2500
3000
nguon1
Ngan hang A
Ngan hang B
Ngan hang A
Ngan hang A
sotien2
2000
.
1000
2000
nguon2
Ngan hang A
.
Ngan hang C
Ngan hang B
lanvay
1
2
1
2
1
2
1
2
sotien
1200
2000
1300
nguon
Ngan hang A
Ngan hang A
Ngan hang B
2500
1000
3000
2000
Ngan hang A
Ngan hang C
Ngan hang A
Ngan hang B
phn t trong tng th. Cc c lng suy din v tng th cn phi tnh n quyn s chn mu, nu
khng th kt qu s b sai lch.
V d:
Gi s min ng bng Sng Hng gm 2 tnh l H Ni v Bc Ninh vi dn s tng ng l 4.5 triu
v 500 nghn ngi. Chng ta mun chn mt mu ngu nhin vi c mu l 500 quan st nghin
cu v thu nhp ca ng bng Sng Hng cng nh 2 tnh ny. Nu nh theo t l v dn s gia 2
tnh th chng ta s thu c mu gm 450 h ti H Ni v 50 h ti Nam nh. Tuy nhin mu c
chn mt cch ngu nhin trn c vng nn s c kh nng l chng ta thu c mt mu m khng c
quan st no ca tnh Nam nh, hoc c vi s lng rt nh. cho mu mang tnh i din cho cc
tnh th nn chn 400 quan st ti H Ni v 100 quan st ti Nam nh.
Nu thu nhp bnh qun ca H Ni l 900 nghn/ thng, v ca Nam nh l 300 nghn/thng th thu
nhp bnh qun ca c vng ng bng Sng Hng khng th tnh l (900 + 300)/2, v cc quan st
trong mu khng c chn t l vi cc tnh. Mi quan st ti H ni i din cho 11250 h trong
vng (4500000/400). y chnh l quyn s ca quan st, bng gi tr nghch o ca xc sut c
chn vo mu. Cn mi quan st ti Nam nh i din cho 50000 quan st ca vng (500000/100).
Thu nhp ca vng ng bng Sng Hng s c tnh nh sau:
Thu nhap =
Trong VLSS 1998 c 2 quyn s. Th nht l quyn s h, bin wt, chnh l s h ca Vit Nam m
mi h i din. Quyn s th hai l quyn s ca thnh vin h, hhsizewt l s ngi Vit Nam m
mi thnh vin ca h i din. Quyn s ca thnh vin h bng quyn s h nhn vi quy m h.
V d: Quyn s trong VLSS 1998
. tab reg7, sum(wt)
Code by 7 |
Summary of sample quyn s
regions |
Mean
Std. Dev.
Freq.
------------+-----------------------------------region1 |
3218.4296
850.74246
859
region2 |
3133.7277
849.12325
1175
region3 |
3185.1794
801.74266
708
region4 |
2199.37
492.37202
754
region5 |
1336.3098
269.14747
368
region6 |
1963.8964
528.69328
1023
region7 |
2938.2122
547.72125
1112
------------+-----------------------------------Total |
2688.5003
900.01379
5999
. tab reg7, sum(hhsizewt)
Code by 7 |
Summary of =hhsize*wt
regions |
Mean
Std. Dev.
Freq.
------------+-----------------------------------region1 |
15790.857
7555.7552
859
region2 |
12656.003
5970.9089
1175
region3 |
14814.504
7236.7592
708
region4 |
10794.537
5235.562
754
region5 |
7564.731
3185.9336
368
39
region6 |
9447.7077
4535.0816
1023
region7 |
14653.702
6639.8297
1112
------------+-----------------------------------Total |
12636.546
6597.6574
5999
. di 2688.5003*5999
16128313
. di 12636.546*5999
75806639
pweights:
aweights
quyn s phn tch (analytical weights), Stata s hiu quyn s t l nghch vi phng
sai ca quan st.
iweights
tab
reg7 urban98
40
region6 |
514
509 |
1023
region7 |
830
282 |
1112
-----------+----------------------+---------Total |
4269
1730 |
5999
.
.
tab
41
42
binomial
poisson
exposure(tn bin)
total
V d:
. ci
poor
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
5999
29.6216
.5895501
28.46587
30.77733
.
.
. sort reg7
. by reg7: ci poor, total
_______________________________________________________________________________
-> reg7 = region1
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
859
49.82538
1.706961
46.47507
53.17569
_______________________________________________________________________________
-> reg7 = region2
Variable |
Obs
Mean
Std. Err.
43
-------------+------------------------------------------------------------poor |
1175
23.65957
1.240357
21.22601
26.09314
_______________________________________________________________________________
-> reg7 = region3
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
708
39.54802
1.838899
35.93767
43.15838
_______________________________________________________________________________
-> reg7 = region4
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
754
28.64721
1.64759
25.4128
31.88163
_______________________________________________________________________________
-> reg7 = region5
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
368
47.28261
2.606121
42.1578
52.40741
_______________________________________________________________________________
-> reg7 = region6
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1023
7.331378
.8153306
5.731465
8.931292
_______________________________________________________________________________
-> reg7 = region7
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
1112
29.31655
1.365709
26.63689
31.99621
_______________________________________________________________________________
-> Total
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------poor |
5999
29.6216
.5895501
28.46587
30.77733
Ch :
Cc lnh c lng c th c s dng khi bit cc tham s v mu. y c th c gi l cc lnh
s dng tham s trc tip (Commands using immediate arguments). Cc lnh ny rt hu dng khi
chng ta khng c s liu gc v bin.
cii <s quan st> <gi tr trungbnh> < lch chun> [, level(#) ]
cii <s quan st> <s ln thnh cng ca quan st> [, level(#) ]
#obs ch ra s quan st, #succ ch ra s ln gi tr bin nhn gi tr tng ng vi php th thnh cng
(thng thng nhn gi tr bng 1)
cii <gi tr thi lng> <s ln s kin xy ra> poisson [ level(#) ]
V d:
. cii 5999 1777, level (90)
-- Binomial Exact -Variable |
Obs
Mean
Std. Err.
[90% Conf. Interval]
-------------+------------------------------------------------------------|
5999
.296216
.005895
.2865107
.3060676
. cii 12 27, poisson
-- Poisson Exact -Variable | Exposure
Mean
Std. Err.
[95% Conf. Interval]
-------------+------------------------------------------------------------|
12
2.25
.4330127
1.483144
3.273587
1.2.
859
---------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-----------------------------------------------------------------poor | .4982538
.0170597
29.2065
0.0000
.4648174 .5316901
---------------------------------------------------------------------------Ho: proportion(poor) = .44
Ha: poor < .44
z = 3.440
P < z = 0.9997
prtest <bin 1> = <tn bin2> [iu kin] [phm vi] [, level(#)]
Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca hai gi tr bin c ch ra bi
tn bin (Ho: pX = pY).
45
1175
754
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------poor2 | .2365957
.0123983
19.0829
0.0000
.2122955
.2608959
poor4 | .2864721
.016465
17.3989
0.0000
.2542014
.3187429
---------+-------------------------------------------------------------------diff | -.0498764
.020611
-.0902732
-.0094796
| under Ho:
.0203666 -2.44893
0.0143
-----------------------------------------------------------------------------Ho: proportion(poor2) - proportion(poor4) = diff = 0
Ha: diff < 0
z = -2.449
P < z = 0.0072
Ha: diff ~= 0
z = -2.449
P > |z| = 0.0143
prtest <bin> [iu kin] [phm vi], by(bin phn nhm) [level(#)]
Lnh ny thc hin kim nh gi thuyt v s bng nhau ca t l ca hai nhm c ch ra bi bin
phn nhm (Ho: pX1 = pX2).
V d:
. prtest poor, by(sex)
Two-sample test of proportion
1: Number of obs =
2: Number of obs =
4375
1624
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------1 |
.3248
.00708
45.8755
0.0000
.3109234
.3386766
2 | .2192118
.0102661
21.353
0.0000
.1990906
.239333
---------+-------------------------------------------------------------------diff | .1055882
.0124708
.0811459
.1300304
| under Ho:
.0132673
7.95855
0.0000
-----------------------------------------------------------------------------Ho: proportion(1) - proportion(2) = diff = 0
Ha: diff < 0
z = 7.959
P < z = 1.0000
Ha: diff ~= 0
z = 7.959
P > |z| = 0.0000
46
(one-sided test)
(one-sided test)
(two-sided test)
(one-sided test)
(one-sided test)
(two-sided test)
ttest
rlpcex1=3200
One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------rlpcex1 |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
-----------------------------------------------------------------------------Degrees of freedom: 5998
Ho: mean(rlpcex1) = 3200
Ha: mean < 3200
t = -0.3260
P < t =
0.3722
47
ttest <bin 1> = <bin 2> [iu kin] [phm vi] [, unpaired unequal level(#) ]
Lnh ny thc hin kim nh gi thuyt rng hai bin c gi tr trung bnh bng nhau. (Ho: X = Y).
Cc tu chn:
unpaired
unequal
V d:
. ttest poor2=poor4, unpaired unequal
Two-sample t test with unequal variances
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------poor2 |
1175
.2365957
.0124036
.425173
.2122601
.2609314
poor4 |
754
.2864721
.0164759
.4524128
.254128
.3188163
---------+-------------------------------------------------------------------combined |
1929
.2560912
.0099404
.436586
.2365962
.2755863
---------+-------------------------------------------------------------------diff |
-.0498764
.0206229
-.0903285
-.0094243
-----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 1532.64
Ho: mean(poor2) - mean(poor4) = diff = 0
Ha: diff < 0
t = -2.4185
P < t =
0.0079
Ha: diff ~= 0
t = -2.4185
P > |t| =
0.0157
ttest <bin> [iu kin] [phm vi], by(bin phn nhm) [ unequal level(#) ]
Lnh ny thc hin kim nh gi thuyt v s bng nhau ca gi tr trung bnh ca hai nhm c ch
ra bi bin phn nhm (Ho: X1 = X2).
V d:
. ttest
rlpcex1, by(sex)
48
Ha: diff ~= 0
t = -9.8880
P > |t| =
0.0000
rlpcex1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+----------------------------------------------------rlpcex1 |
5999
3188.667
2692.567
357.318
45801.71
. sdtest rlpcex1=2700
One-sample test of variance
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------rlpcex1 |
5999
3188.667
34.76379
2692.567
3120.518
3256.817
-----------------------------------------------------------------------------Ho: sd(rlpcex1) = 2700
chi2(5998) = 5965.022
Ha: sd(rlpcex1) < 2700
P < chi2 = 0.3838
means
covariance
_coef
wrap
V d:
. corr hhsize poor
(obs=5999)
rlpcex1 sex
|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize |
1.0000
poor |
0.2425
1.0000
rlpcex1 | -0.2172 -0.4452
1.0000
sex | -0.2570 -0.1028
0.1267
1.0000
Variable |
Mean
Std. Dev.
Min
Max
-------------+---------------------------------------------------hhsize |
4.752292
1.954292
1
19
poor |
.296216
.4566255
0
1
rlpcex1 |
3188.667
2692.567
357.318
45801.71
sex |
1.270712
.4443645
1
2
|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize | 3.81926
poor | .216435 .208507
rlpcex1 | -1142.93 -547.335 7.2e+06
sex | -.223195 -.020849 151.543
.19746
pwcorr [danh sch bin] [quyn s] [iu kin] [phm vi] [, obs sig print(#) star(#)]
Lnh ny tnh h s tng quan cho tng cp bin c ch ra bi danh sch bin.
Cc tu chn:
obs
sig
print(#)
star(#)
mc c ch ra bi star
V d:
. pwcorr hhsize poor rlpcex1 sex, obs sig star(5)
|
hhsize
poor rlpcex1
sex
-------------+-----------------------------------hhsize |
1.0000
|
|
5999
|
poor |
0.2425* 1.0000
|
0.0000
|
5999
5999
|
rlpcex1 | -0.2172* -0.4452* 1.0000
|
0.0000
0.0000
|
5999
5999
5999
|
sex | -0.2570* -0.1028* 0.1267* 1.0000
|
0.0000
0.0000
0.0000
|
5999
5999
5999
5999
|
pcorr <bin> <danh sch bin> [quyn s] [iu kin] [phm vi]
Lnh ny tnh h s tng quan ca bin c ch ra bi tn bin vi cc bin c trong danh sch
bin
V d:
. pwcorr poor hhsize
rlpcex1 sex
|
poor
hhsize rlpcex1
sex
-------------+-----------------------------------poor |
1.0000
hhsize |
0.2425
1.0000
rlpcex1 | -0.4452 -0.2172
1.0000
sex | -0.1028 -0.2570
0.1267
1.0000
. reg
rlpcex1
reg7
sex
hhsize
Source |
SS
df
MS
-------------+-----------------------------Model | 3.8639e+09
3 1.2880e+09
Residual | 3.9621e+10 5995 6609032.15
-------------+-----------------------------Total | 4.3485e+10 5998 7249918.40
Number of obs
F( 3, 5995)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
5999
194.88
0.0000
0.0889
0.0884
2570.8
-----------------------------------------------------------------------------rlpcex1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reg7 |
240.9633
15.5905
15.46
0.000
210.4003
271.5263
sex |
403.2984
77.38324
5.21
0.000
251.5994
554.9974
hhsize | -305.6382
17.70692
-17.26
0.000
-340.3501
-270.9263
_cons |
3160.201
155.6576
20.30
0.000
2855.056
3465.346
------------------------------------------------------------------------------
Cc tu chn:
level(#)
noconstant
noheader
beta
poor
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
reg7 sex
log
log
log
log
hhsize
likelihood
likelihood
likelihood
likelihood
Probit estimates
=
=
=
=
-3645.1363
-3367.2185
-3364.8032
-3364.8025
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
5999
560.67
0.0000
0.0769
-----------------------------------------------------------------------------poor |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reg7 |
-.116342
.0084551
-13.76
0.000
-.1329136
-.0997703
sex | -.1284525
.0422247
-3.04
0.002
-.2112113
-.0456937
hhsize |
.1808115
.0095806
18.87
0.000
.1620338
.1995892
_cons | -.8088731
.0824798
-9.81
0.000
-.9705306
-.6472157
------------------------------------------------------------------------------
52
stdp
redid
c lng gi tr phn d:
e i = Yi Y
i
V d:
predict exphat, xb
To ra bin mi exphat c gi tr c lng ca bin ph thuc (fitted value) theo h s thu c t
hm hi quy.
predict expres, resid
To ra bin expres c gi tr ca phn d.
Kim nh v h s ca hm hi quy
C php:
test [gi tr biu thc]
test [danh sch bin]
testparm <danh sch bin> [, equal ]
Lnh test kim nh cc gi thit v h s ca hm hi quy va mi c c lng
V d:
test urban98 =2000
Kim nh gi thit h s ca bin urban98 = 0
test region1 = region2
Kim nh gi thit h s ca bin region1 bng h s ca bin region2
test region1 = (region2+region3)/2
Kim nh gi thit v quan h gia cc h s ca bin region1, region2, va region3
test region1 region2 region3
53
educyr98 hhsize
Source |
SS
df
MS
-------------+-----------------------------Model | 1.6960e+10
10 1.6960e+09
Residual | 2.6525e+10 5988 4429712.49
-------------+-----------------------------Total | 4.3485e+10 5998 7249918.40
Number of obs
F( 10, 5988)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
5999
382.87
0.0000
0.3900
0.3890
2104.7
-----------------------------------------------------------------------------rlpcex1 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban98 |
1995.163
66.46943
30.02
0.000
1864.859
2125.467
region1 | -923.7066
132.8334
-6.95
0.000
-1184.108
-663.3052
region2 | -362.6047
130.2254
-2.78
0.005
-617.8934
-107.316
region3 | -558.0354
137.1551
-4.07
0.000
-826.9089
-289.1619
region4 | -100.7586
135.8372
-0.74
0.458
-367.0486
165.5313
region5 | (dropped)
region6 |
1742.688
131.9928
13.20
0.000
1483.934
2001.441
region7 |
151.9854
128.0272
1.19
0.235
-98.99396
402.9648
sex |
270.9142
66.61031
4.07
0.000
140.3339
401.4944
educyr98 |
153.3281
6.836934
22.43
0.000
139.9253
166.731
hhsize |
-257.691
14.73741
-17.49
0.000
-286.5816
-228.8004
_cons |
2362.355
178.3197
13.25
0.000
2012.784
2711.926
-----------------------------------------------------------------------------. test
urban98 =2000
( 1)
urban98 = 2000.0
F(
. test
( 1)
1, 5988) =
Prob > F =
0.01
0.9420
region1 = region2
region1 - region2 = 0.0
F(
1,
5988) =
34.57
54
Prob > F =
. test
( 1)
region1 = (region2+region3)/2
region1 - .5 region2 - .5 region3 = 0.0
F(
. test
( 1)
( 2)
( 3)
27.80
0.0000
region1 = 0.0
region2 = 0.0
region3 = 0.0
. testparm
1)
2)
3)
4)
5)
6)
7)
1, 5988) =
Prob > F =
F(
(
(
(
(
(
(
(
0.0000
3, 5988) =
Prob > F =
20.22
0.0000
region*
region1 = 0.0
region2 = 0.0
region3 = 0.0
region4 = 0.0
region5 = 0.0
region6 = 0.0
region7 = 0.0
Constraint 5 dropped
F(
6, 5988) =
Prob > F =
148.55
0.0000
Chng IV: V th
1. V th (graph)
C php:
55
graph [danh sch bin] [quyn s] [iu kin] [phm vi] [, loi__th tu_chn_ring
tu_chn_chung]
Trong :
loi__th (graph_type)
Ch ra loi th cn v
tu_chn_ring (specific_options)
tu_chn_chung (common_options)
45801.7
357.318
16
95
Age of household head
56
16
95
19
45801.7
comp.M&Reg price
adj.pc tot exp
357.318
95
Age of household
head
16
22
schooling year
of HH.head
0
19
Household size
1
357.318
45801.7
22
Fraction
.329888
0
357.318
45801.7
comp.M&Reg price adj.pc tot exp
57
357.318
45801.71
357.318
58
poor
.498254
59
24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7
Audi Fox
BMW 320i
Datsun 200
Datsun 210
Price
Mileage (mpg)
Repair Record 1978
Datsun 510
Datsun 810
Fiat Strada
Honda Accord
Honda Civic
Headroom (in.)
Trunk space (cu. ft.)
Weight (lbs.)
Length (in.)
Mazda GLC
Renault
Subaru
Toyota Celica
Toyota Corolla
Toyota Corona
VW Dasher
VW Diesel
VW Rabbit
VW Scirocco
Volvo 260
hhsize, sum
(rlpcex1)
60
2 |
4131.4892
3677.2297
497
3 |
3834.8615
2913.8177
731
4 |
3428.8011
2599.7301
1404
5 |
2930.5486
2168.0644
1318
6 |
2626.6848
2277.1893
867
7 |
2501.0912
2186.1605
480
8 |
2329.7009
1803.7873
255
9 |
2207.0166
1380.5607
126
10 |
2252.3772
1423.7576
58
11 |
2370.7034
1404.7148
29
12 |
1747.3691
924.72977
9
13 |
2114.1337
2109.0077
4
14 |
1579.78
990.81152
4
16 |
2994.5771
2061.6804
2
19 |
4833.936
0
1
------------+-----------------------------------Total |
3188.6671
2692.5673
5999
. tab hhsize,
sum(educyr98)
|
Summary of schooling year of
Household |
HH.head
size |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
3.7897196
4.3956537
214
2 |
5.7545272
4.7225549
497
3 |
7.3023256
4.6396425
731
4 |
8.2578348
4.2659841
1404
5 |
7.7243298
4.2998488
1318
6 |
6.8788927
4.0778062
867
7 |
6.3348958
4.1241759
480
8 |
5.7333333
3.9623557
255
9 |
5.7936508
3.4878474
126
10 |
6.1724138
3.1851516
58
11 |
4.7931034
3.1665586
29
12 |
4.4444444
3.6438685
9
13 |
5
5.0990195
4
14 |
3
2.1602469
4
16 |
4
1.4142136
2
19 |
2
0
1
------------+-----------------------------------Total |
7.0944185
4.4160917
5999
. replace meanexp= meanexp/1000
(16 real changes made)
. replace meanexp= meanexp/1000
. rename var71 ahhsize
. rename var72 meanexp
. rename var73 meanedu
. replace meanexp= meanexp/1000
. label var meanexp Chi tieu binh quan
. label var meanedu So nam hoc
. label var ahhsize Quy mo ho
61
meanexp
meanedu
8.25783
1.57978
1
19
ahhsize
* La chn v tiu :
title("chui k t") t1title("chui k t") t2title("chui k t") b1title("chui k t")
b2title("chui k t") l1title("chui k t") l2title("chui k t") r1title("chui k t")
r2title("chui k t")
Lnh ny ghi cc tiu trn pha trn (top), pha di (bottom), bn tri (left) v bn phi (right)
th.
V d:
gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr
dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh)
Chi tieu binh quan
So nam hoc
8.25783
1.57978
1
19
Quy mo ho gia dinh
* Hin th gi tr trc th
xlabel[(gi tr s)] ylabel[(gi tr s)] rlabel[(gi tr s)] tlabel[(gi tr s)]
V d:
gr meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr
dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel ylabel
Chi tieu binh quan
So nam hoc
2
5
10
Quy mo ho gia dinh
15
20
63
So nam hoc
2
0
10
Quy mo ho gia dinh
15
20
64
So nam hoc
8
4000
6
3000
So nam hoc
5000
4
2000
1000
2
0
10
Quy mo ho gia dinh
15
20
2.2. th tn sut
C php:
graph [bin] [quyn s] [iu kin] [phm vi], histogram [tu_chn_chung bin(#) freq
normal[(#,#)] density(#)]
Cc tu chn:
bin(#)
Freq
normal[(#,#)]
density(#)]
V d:
th tn sut ca chi tiu binh qun u ngi
. gr rlpcex1, hist bin(20) normal
65
Fraction
.56026
0
357.318
45801.7
comp.M&Reg price adj.pc tot exp
Frequency
1979
0
357.318
45801.7
comp.M&Reg price adj.pc tot exp
66
region1
region2
region3
region4
region5
region6
415
Frequency
415
0
357.318
region7
357.318
45801.7
45801.7
415
0
357.318
45801.7
2.3. th hnh ct
C php:
graph [danh sch bin] [quyn s] [iu kin] [phm vi], bar [tu_chn_chung [no]alt means
stack]
V d:
th gi tr trung bnh hc vn ca ch h v quy m h gia nh theo 7 vng
. gr educyr98 hhsize, bar means by(reg7)
schooling year of HH.head
Household size
8.64426
Household size
10
region1
region3
region2
region5
region4
region7
region6
La chn stack
. gen persons=1
. gr persons urban98, bar ylabel by(reg7) stack alt
68
persons
1500
1000
500
region1
region3
region5
region2
region4
region7
region6
V d:
Hy v th sau:
foodpoor
poor
600
400
200
region1
region3
region2
region5
region4
region7
region6
69
24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7
70
region1
region2
region3
12% foodpoor
18% poor but still above food povert
70% nonpoor
region4
region5
region7
region6
Total
71
. graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi 3", margin(10) title("Mot so dac diem cua ho gia
dinh")
region1
region2
region3
persons
12% foodpoor
18% poor but still above food povert
1500
70% nonpoor
region4
region7
region5
region6
1000
500
Total
region1
region3
region2
region5
region4
region7
region6
24% poor1
16% poor2
16% poor3
12% poor4
10% poor5
4% poor6
18% poor7
72
Sau khi son tho, do-file s c lu tr bng tu chn Save as trong thc n File ca ca s do-file
editor. Tn ca do-file c th c ch ra ngay ti lnh doedit nh sau:
doedit (tn do-file)
Tp do-file c phn m rng l do.
v d trn chng ta c th lu tr on chng trnh di tn l chng trnh 1 ti th mc Vlss98
trn a C.
1.2. Thc hin cc tp do-file
chy do-file th ti ca s lnh chng ta g mt trong hai lnh sau:
do filename [, nostop]
run filename [, nostop]
Lnh run thc hin cc lnh trong do-file nhng khng hin th kt qu ra mn hnh.
Trong qu trnh thc hin do-file, nu c cu lnh sai th Stata s bo li v ngng vic thc hin cc
cu lnh sau . Tuy nhin nu tu chn nostop c ch ra th Stata s b qua cu lnh b li v tip
tc thc hin cc lnh sau cu lnh li .
V d:
73
Freq.
Percent
Cum.
------------+----------------------------------Rural |
4269
71.16
71.16
Urban |
1730
28.84
100.00
------------+----------------------------------Total |
5999
100.00
. sum hhsize
Variable |
Obs
Min
Max
-------------+----------------------------------------------------hhsize | 5999
4.752292 1.954292
19
. gen new=hhsizet
hhsizet not found
r(111);
end of do-file
r(111);
Vi tu chn nostop
. do "c:\vlss98\chuong trinh 1", nostop
. clear
. set mem 32m
(32768k)
. use "C:\VLSS98\Hhexp98n.dta", clear
. tab urban98
1:urban 98; |
0:rural 98 |
Freq.
Percent
Cum.
74
------------+----------------------------------Rural |
4269
71.16
71.16
Urban |
1730
28.84
100.00
------------+----------------------------------Total |
5999
100.00
. sum hhsize
Variable |
Obs
Min
Max
-------------+----------------------------------------------------hhsize | 5999
4.752292 1.954292
19
. gen new=hhsizet
hhsizet not found
r(111);
. gen new=hhsize
. end of do-file
Thc hin (chy) bng lnh run
. run "c:\vlss98\chuong trinh 1", nostop
hhsizet not found
Cc do-file c th thc hin bng tu chn Do trong thc n File, hoc thc hin trc tip trong ca
s Do-file editor bng tu chn Do hoc Run trong thc n Tool.
1.3. Mt s lu khi son tho do-file
version #
Khi son tho cc tp do-file chng ta nn a dng lnh ny vo u chng trnh thng bo phin
bn Stata c dng son tho do-file. V d nu nh chng ta dng Stata 7.0 son tho do-file
th cu lnh ny s c a vo u chng trnh nh sau:
version 7.0
clear
use Hhexp98n.dta
tab reg7
.
Cc phin bn Stata khc nhau s c th c s khc nhau v c php hoc ngha ca cc cu lnh.
Lnh version cho php chng trnh Stata chy c th hiu ng c ni dung ca tp do-file c
vit bi cc phin bn khc.
set memory #[k|m]
75
tung ng vi:
#delimit ;
graph meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu ho)
b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20)
yline(2 4 to 8) connect(ll) ;
gen hhexp = rlpcex1 * hhsize ;
..
Sau chng ta nn khi phc li ch ngm nh nu nh cc cu lnh sau c th vit trn 1
dng bng lnh:
#delimit cr
Ch :
-
. macro list _a
local macro `a' not found
r(111);
Trong khi i vi global macros
. do "C:\WINDOWS\TEMP\STD010000.tmp"
. global b "chuong trinh thong ke Stata"
. end of do-file
. macro list b
b:
A[3,3]
c1 c2 c3
r1 1
r2 3
r3 10 11 14
y ma trn A bao gm 9 phn t (element): 1, 2, 4, 3, 4, 7, 10, 11, 14. Cc ct c t tn l c1,
c2, v c3, v cc hng l r1, r2, v r3. Phn t l giao im ca dng 1 v ct 2 c k hiu l A[1, 2].
Trong v d ny A[1, 2] cha gi tr bng 2.
3.2. Tch v hng (scalar)
Tch v hng cha 1 phn t l s. Tch v hng c nh ngha bng lnh sau:
scalar scalar_name = expression
V d:
. scalar a = 10
. scalar list a
a = 10
. scalar b = a* 2
. scalar list b
b=
20
80
matrix myvec = (1 5 3 1 3)
To ra vct hng
To ra vct ct
quymo
thunhap
1. 101 6 1200
2. 103 5 1400
3. 105 5 3200
4. 107 9 1000
5. 109 4 2500
6. end
. mkmat maho quymo thunhap, matrix(A)
. matrix list A
A[5,3]
maho
quymo thunhap
r1
101
1200
r2
103
1400
r3
105
3200
r4
107
1000
81
r5
109
2500
matrix C = (C+C)/2
matrix D = A*A
Xo ma trn
Ma trn v tch v hng c th xo khi b nh bng lnh:
matrix drop <ma trn>
scalar drop <tch v hng>
V d:
. matrix drop A
. scalar drop B
4. Lnh iu kin v vng lp
4.1. Lnh ifelse
C php:
iu kin (iu kin logic) {
Nhm cu lnh 1
}
else Cu lnh
Stata s kim tra iu kin logic (expression), nu iu kin ny ng th cc lnh Nhm cu lnh 1
s c thc hin, nu iu kin sai th lnh ng sau else s c thc hin, trong trng hp else
khng c ch ra th Stata s thc hin cc lnh sau lnh if {}.
V d:
----------------local a=invnorm(uniform())
if `a'>=0 {
display "So ngau nhien tao ra lon hon hoac bang 0"
}
else di "So ngau nhien tao ra nho hon 0"
macro list _a
Ch :
-
commands 1
}
else {
comands 2
}
Cc lnh ifelse c th c s dng lng vi nhau
}
else iu kin (iu kin) {
.
4.2. Lnh while
C php:
while <iu kin logic> {
Nhm cu lnh
}
Stata s kim tra iu kin logic (expression), nu iu kin ny ng th cc lnh Nhm cu lnh
s c thc hin, nu iu kin sai th cc lnh ny s khng c thc hin.
V d:
local i=1
while `i<= 10 {
if mod(`i',2) {
display "`i' is odd"
}
else {
display "`i' is even"
}
local i=`i+1
}
Ch :
Vng lp c th c dng li nu s dng tu chn sau y gia vng lp:
continue [, break]
Nu gp lnh continue, Stata s b qua cc lnh sau v quay li lnh u tin ca vng lp. Nu
c tu chn break c ch ra th Stata s thot khi vng lp.
83
Overall
---------------+-------------------------Value
| 1380
1920
Ch :
Nu chng ta chy li lnh program define povline, v nhn c thng bo:
povline already defined
r(110);
Tc l chng trnh povline c to ra ri, xo chng ny i th chng ta dng lnh:
84
. which ci
C:\STATA\ado\base\c\ci.ado
*! version 3.3.4 04sep2000
Cc ado-file chnh l cc chng trnh c nh ngha bng lnh program define, v lu tr vi phn
m rng l ado. Stata s tm kim cc ado-file cc th mc:
. sysdir
STATA: C:\STATA\
UPDATES: C:\STATA\ado\updates\
BASE: C:\STATA\ado\base\
SITE: C:\STATA\ado\site\
STBPLUS: c:\ado\stbplus\
PERSONAL: c:\ado\personal\
OLDPLACE: c:\ado\
V d:
Chng ta c th lu tr lnh povline di dng ado v lu tr thu mc C:\STATA\ado\base\
Lnh ny s c thc hin khi ta g povline m khng cn chng ta phi thc hin cu lnh trc
do-file.
Bi tp: Vit lnh povline vi cc la chn cho cc nm 1993, 1998, v 2002.
Ph lc
Cc thng k c bn ca mu tun theo quy lut chun
Trung bnh:
85
x
x=
i =1
Phng sai:
n
(x
2
s =
x)2
i =1
n 1
lch chun:
s=
s2
x
MAD =
i =1
lch:
n
(x
Skewness =
x)3 / n
i =1
s3
nhn:
n
(x
Kurtosis =
x) 4 / n
i =1
s4
86