You are on page 1of 31

Bi ging tin ng dng

Gv: Trn Trung Hiu B mn CNPM Khoa CNTT Email: tthieu@hua.edu.vn Website: http://fita.hua.edu.vn/tthieu

Chng III: Cc thng k c bn, tng quan & hi quy


I. Thng k m t (Desriptive Statistics)
a. Cc bc thc hin b. Phn tch kt qu

II. T chc (Histogram)


a. Cc bc thc hin b. Phn tch kt qu

III. Tng quan v hi qui


a. Tnh h s tng quan b. Hi quy tuyn tnh c. Hi quy phi tuyn

Gii thiu v phn phi chun


1. Phn phi chun, cn gi l phn phi Gauss, l mt phn phi xc sut cc k quan trng trong nhiu lnh vc. N l h phn phi c dng tng qut ging nhau, ch khc tham s v tr (gi tr trung bnh ) v t l (phng sai 2). nh ngha: Bin ngu nhin X c phn phi chun vi cc tham s m (k vng), 2 (phng sai) nu n c hm mt :

2.

th hm mt phn phi chun

th hm phn b trong phn phi chun

I. Thng k m t (Descriptive Statistics)


1. 2. V d 1 trang 23 Lin h xc sut thng k v cc thut ng Excel s dng trong thng k m t
Thng k m t cho php tnh cc s c trng mu, cc gi tr thng k mu nh trung bnh, lch chun, sai s chun, trung v, modeS liu tnh ton c b tr theo ct hoc theo dng

Mean (trung bnh hay k vng): c trng cho gi tr trung bnh ca DLNN Standard Deviation ( lch chun), Sample Variance (phng sai mu): c trng cho phn tn cc gi tr ca DLNN xung quanh gi tr trung bnh Standard Error (sai s chun): Sai s ca trung bnh Median (trung v): cho gi tr im gia ca dy s, trong xc sut l gi tr Me ca i lng ngu nhin X sao cho P(X<Me)=P(X>Me) Mode: l gi tr ca bin ngu nhin ng vi xc sut cc i hay gi tr c tn sut xut hin trong mu ln nht

I. Thng k m t (Descriptive Statistics)


2. Lin h xc sut thng k v cc thut ng Excel s dng trong thng k m t (tip) Kurtosis ( nhn): trong xc sut, ngi ta chng minh c nu DLNN X c phn phi chun th nhn bng 0. y, nhn nh gi ng mt phn phi ca dy s liu c nhn hn hay t hn ng mt chun tc (dng l nhn hn, m l t hn). Nu trong khong [-2,2] th c th coi s liu xp x chun Skewness ( lch): Trong xc sut gi l h s bt i xng nh gi s phn phi cc gi tr c cn i i vi gi tr trung bnh hay khng, nu cc gi tr ca X i xng qua k vng th Skewness=0 biu hin ng phn phi lch tri hay lch phi (m l lch tri, dng l lch phi). Nu trong khong [-2,2] th c th coi nh s liu cn i nh trong phn phi chun. Confidence Level (Na di khong tin cy):
V d: Confidence level = 95% Trong xc sut tng ng bi ton tm gi tr sao cho P(m<=X<=m+ ) = 95% Tm gi tr sao cho xc sut X ri vo khong [m- ,m+ ] l 95%

Kurtosis > 0 ng mu , Kurtosis <0 ng mu xanh pha di, =0 ng mu xanh gia (chun)

Nu Kurtosis > 0, kurtosis cng ln th cng nhn. Nu kurtosis <0, kurtosis cng b th cng t

Skewness > 0 l lch phi, <0 l lch tri

II. T chc
1. 2. V d 2 trang 25 Tn s xut hin ca s liu trong cc khong cch u nhau cho php phc ha biu tn s. v biu cn thc hin qua 2 bc: bc chun b v bc v t chc
Chun b:

D s liu mt ct, mt hng hay mt bng ch nht Tm gi tr ln nht (hm Max), nh nht (hm Min) Tnh khong bin thin R=Max-Min Chn s khong k ca min phn t (thc t chn k t 20-30, v d minh ha chn k t 6-10), c th ly bng cng thc 6*log(n) trong n l s gi tr ca DLNN X (ly gi tr nguyn xp x) Tm gi tr bc tng trong min phn t h = R/k (S dng hm Round(R/k,s ch s l) To ct bin (Edit->Fill->Series, xem trang 25, 20)

II. T chc
Chun b: V t chc
Chn Tool -> Data Analysis-> Histogram khai bo cc mc:
Input range: Min d liu Input Bin: Min phn t Labels: Nhn dng u nu c Output range: Min kt qu Pareto: Tn s sp xp trong t chc l gim dn Cumulative Percentage: Hin th ng tn sut cng dn % Chart output: Hin th biu

Phn tch kt qu t biu


Trong khong no s liu xut hin nhiu nht Hnh dng t chc c ging hnh dng ng mt trong phn phi chun khng (c tnh i xng, nh cao gia-> dng ng cong chung). Nu c th kt lun d liu c th tun theo lut chun

Hnh nh v t chc
Histogram
7 6 5 120.00% 100.00% 80.00% 60.00% 3 2 1 0 10 15 20 25 30 35 Bin 40 45 50 55 More 40.00% 20.00% 0.00% Frequency Cumulative %

Frequency

Xem xt cc nh ca cc HCN xp x ng cong hm mt trong phn phi chun hay khng ?

Xem xt trng hp sau

III. Tng quan v hi quy


a. -

Tnh h s tng quan ngha ca h s tng quan: (xem li) Excel cho php tnh h s tng quan n gia cc bin Cch thc hin: Vo Tools-> Data Analysis-> Correlation v khai bo cc mc
Input range: min d liu k c nhn (nn c) Grouped by: s liu nhm theo ct hay hng Labels in first row: tch vo nu c nhn dng u hay ct u Output range: min output

Nhn OK kt thc, cho kt qu l bng n hng n ct V d 3: (Gio trnh)

III. Tng quan v hi quy


b. Hi quy tuyn tnh - Cho php tm phng trnh hi quy tuyn tnh n y=a*x +b v hi quy tuyn tnh bi y=a1*x1 + a2*x2 + + an*xn + b - D liu input: Cc bin c lp cha trong n ct, bin ph thuc y trong mt ct, cc gi tr tng ng gia bin c lp v bin ph thuc c xp trn cng mt hng - V d (gio trnh): Tm ng hi quy cu nng xut la y ph thuc vo di bng, trng lng 1000 ht, v s bng

Cch thc hin: Data-> Analysis-> Regression

18

Min d liu ca bin ph thuc Y Min d liu ca cc bin X

Xut hin hp thoi

C nhn u dng khng? tin cy (bng 1-, vi l mc xc sut) H s t do b = 0 khi tch vo Hin phn d hay sai lch gia y thc nghim v y theo hi quy

Hin th phn d

Hin th ng d bo
19

Kt qu

20

Phn tch kt qu
1. Nu h s tng quan bi (Multiple R) xp x >= 0.75 th m hnh qui hoch tuyn tnh l thch hp.
V d: Multiple R = 0.8589 -> m hnh tuyn tnh coi l thch hp.

2. H s tng quan (R Square) cho bit s bin ng y do x1, x2, x3 gy nn. H s Adjusted R Square khng st R Square khng phi tt c cc bin a vo l cn thit.
R Square =0.7377 cho bit 73.77% s bin ng ca y do x1, x2, x3 gy nn. Adjusted R Square = 66.62% khng st R Square
21

Phn tch kt qu
1. F thc nghim = 10.31281 vi xc xut 0.00158 nh hn xc xut ngha 0.05 nn phng trnh hi quy tuyn tnh c chp nhn 2. Da vo cc h s ta vit c ng hi quy d bo
y = -4.06364 + 0.1116x1 + 0.075684x2 + 0.02011x3 H s x1 khng ng tin cy v P-value = 0.093621 > 0.05 (mc ngha chn) -> cn tin hnh lc bt bin x1 ng hi quy vi cc h s u c ngha.
22

6. Hi quy phi tuyn


1. Cc dng hi quy phi tuyn nh hm m, hm logarit, hm a thc, hm cn bc hai 2. C hai cch:
Thng qua cch bin i a v dng hi quy tuyn tnh bi. V th v tm ng ngoi suy.

23

6.1 Cch 1
1. Thng qua bin i ta a v dng hi quy tuyn tnh bi nh hm m, hm logarit, hm a thc, hm cn bc hai 2. V d
Nghin cu v dn s, x l nm, y l dn s. Tm ng hi quy phi tuyn dng a thc bc hai: Y = ax2 + bx + c. Khi ta thm ct X2 = X2 sau thc hin tm ng hi quy bi tuyn tnh vi cc bin c lp X, X2 v hm l Y.
24

Lm tng t nh mc 5

25

Phn tch kt qu

1. Phn tch kt qu c phng trnh:


Y = 0.397435 X2 + 8.228951 X + 12.96242
26

6.2 Cch 2
1. V th XY (Scatter) biu din tng quan gia y v x, th dng im, sau tm ng ngoi suy v hin ra phng trnh hi quy. 2. Sinh vin xem li v phn th 3. Sau khi v xong th dng XY bn vo Menu Chart>Add Trendline

27

V d

28

Hp thoi Add Trendline


Tuyn tnh Ln(x) a thc

Hm m ax Hm ex Trung bnh ng

29

Hp thoi Option
t tn cho ng ngoi suy D bo Hin phng trnh trn th Hin gi tr h s tng quan trn th Tng x Gim x Ct trc y ti

30

Kt qu

31

You might also like