Professional Documents
Culture Documents
Statistics V54000 -> government (laid off 125000) ->Private (gained 67000) - Stock market goes up when employment goes down Look behind the numbers They take a sample and use it to exemplify the population
(Receipts + Deficits)= Govt. Exp. -entitlements -services (highways, education etc) -[discretionary ]-> defense $600 Bill y= a+b*other exp Sample: Data - Qualitative - Quantitative - Discrete 0 1 2 3 - 100 - Continual 2.1 Systematic sampling - Take every third or large section Example: beer bottles- every thousand are tested to make sure all have 12 oz ? Random sampling - Every member of the universe has equal probability to get included into the sample Ex. 5'3'' 5'4'' 5'7'' 5'8'' 5'9'' 5'11'' 6'0'' 6'2'' The average does not have to be a number in the sample Class Frequency Midpoint Interval The cross boundaries must me mutually exclusive. interval 3x5'5''=16'5'' 5'3''-5'7'' 5'3''-> 5'7'' 3 5'5'' 16'4'' 5'8''->5'11'' 3 5'9'' 3x5'9''=17'9'' 5'8''-5'11'' 6'-> up 12'2'' 2 6'1'' 49'4''/8=average height in the class Frequency Histogram average
5'5''
5'9'' 6'1''
Morethan
6"
Presentation is best when there is a story that is simple and tells exactly whats going on
<
y Y L y= L/ 1+a*e^(-bt)
time
65 Ex. Polls (sample) +/- 4 Obama: 52% Other : 48% Pay attention to the numbers these are essentially the same
Statistic n
p.60 exercise 1 6 = 0.6 3 = -2.4 5 = -0.4 7 = 1.6 6 = 0.6 3 /x= 27/5=5.4 5.4 Size Mean 7
-2.8 +2.8 =0
n=5 Exi=27
W1 300 @ $20 a share W2 400 @ $25 per share W3 400 @ $23 per share [(W1*X1) + (W2*X2) + (W3*X3)]/(w1+ w2= w3) = 25,200/1,100=22.9091
MEAN:
Simple Weighted Med Mode /x The middle value in ordered data (the mid-point) Mode med /x
Arithmetic mean Geometric mean =3 x1*x2*x3 -> Arithmetic mean is in some cases higher than the geometric mean.
MEDIAN MODE
Annual equilibrium
2008
Class interval Frequency (f) 5.5-10.5 10.55-15.5 2) 15.55-20.5 20.55-25.5 25.55-30.5 30.55-35.5 35.55-40.5 1 2 3 5 4 3 2
fXm/ n
=490/20 =24.5
Group with highest number of frequencies= mode 20.55-20.5 f=mode, xm= midpoint
Dispersion How far is each value from the mean Range Its best to chose the distribution with the shorter range Because data is more uniform
5.5
24.5/x
40.4
Absolute value (every sign is positive) Ix-/xI s= (x-/x) / n = (x-) /N STD Deviations= s = -3.5 -2.5 -5 5 25 35 (x-/x)/ s 68% 95% 99.7%
= [13310- (490)/20]/19 68.7 How many distributions lie beneath What is the average (the mean) How far are the values spread from it Standard deviation Ch1-4
Probability
The chance that something is going to happen Experiment: Results in an outcome
Conditional probability
Classical
When a favorable outcome P= # of favorable outcomes Total # of Possibilities = 4/52 (to get an ace) P(E)= 0.08 P(/E)= 0.92 (the alternative) P(E) + P(/E)=1 0_<P(E)_< 1
Empirical
#time Event occurred Total # of Observations
Subjective
(Knowledge or Authority-there is not necessarily data to support decision) P(E)= n(E) n(S) 1 Ace or 1 king 0.08 0.08 8/52=0.1538 1 ace and 1 king 0.08*0.08=0.0064 Coin (2 conscutive heads) H H 1/2 1/2 .5*.5=0.25 Efficiency Bias R R&B B
22 f 50 n 28 50 = A
A and B
Independent Probability of selecting A doesnt effect the probability of selecting B K V 4/52 Q V 4/52 P(A) and P(B)= P(A) * P(B) (K+Q)= (4/52) * (4/52)
Not Independent
Combinations
nCr=
n! (n-r)! * r!
30! 30x29x28 (27)!* 3! = 3x2x1
2x2x2=8 h h h t T h t t M= (x*P(x)) = (x-) *P(x) t h t h t h Hhh Hht Hth Htt Thh Tht Tth ttt
Hhh hht hth thh htt tht tth ttt 1/8=0.125 1/8=0.125 3/8=.275 3/8=0.375
X 3 2 1 0
(x-m) 1.5
(x-) 2.25
3 h
2 h 1 t
1 h 2 t
o h 3 t
= 0.74 = 0.86
Binomial 1. Limited # of Trials= n 2. Only 2 possibilities 3. P(success) P(failure) p + q =1 P(x)= nCr * p^x * q^n-x = 3C2 * 0.5^2 * 0.5^1 =3! 1! =3 * 0.25 * 0.5=.375
=x n
=mean
.136 .34 .34 .136 .023 -36 -26 -16 2 .023 +16 +26 +36 15 =2
=1.4
STD Normal= x-
2-2 1.4
Z=x-
Mt Z*=x-
The closer we are to the mean the more accurate we are If the sample size is 30 or more than the number of samples gets the same result
50 +1 =15
.3413 .3413
100
X=+*z
z= -3 38
-2 -1 52 66
0 1 2 3 80 94 108 122
0 42%
1.8
z=-2.48 + z=-0.83
-.4934+.2967=-0.1967
Why a sample other than entire population? Its not possible sometimes , so you take a representative area and come to a conclusion of the population. -x= Bias, sampling error x x3 x1 x2 x4
x The mean of the means of sample is always uniform, you always end up with normal distribution. The result is better the higher number of samples
z= Xi- /n
# samples
(1)pop
(2) Bias V
k>1
Stratified sampling:
100 150
250
600
n=n1 + n2 + n3 X is estimator of 1) Unbiased 2) Consistent - as n^ Bias v 3) Efficient - smallest "s" -> -2.6 Max error E=X+ Z(/2)* (/ n) E*n=X+ Z(/2)* E n=X-+Z(/2)* E interval X V +2.6 z=x- z*=x z=Xi- /n z*(/n)=Xi-
X-Z/2(/n) <M< X +Z/2(/n) Average age of students; n=50 X=21.2 =2 years Confidence level = .95 21.2-.6=20.6 21.2+.6 .5 .025
.025
Years 20.6 Z 1.96 20.6 1.96
Average height of UD students: @ .99 confidence level Within one year .5 =3" -.005 /2= .005 .495
X 7 9 10 11
x1 8 7 10 11
n-1 5-1=4
pop normally
not known N<30 t N>30 Z Df
11 13 50 x=10
11 14 50
5-1=4
Hypothesis Testing Hypothesis: some statement about some population parameter - Could be non numerical UD students drink beer Copernicus heliocentric model Do not reject =0.025 Rejection zone Null Hypothesis: Ho: =k (350) Alternate Hypothesis: H1= k Level of significance - Confidence level - Test Value= X -> z If the value is >x> then it is rejected
=0.025
+1.96 v 371
H1: <K
Ho False
Type II Error
A medical report states, that average cost of rehabilitation is $24,672 its the mean- so its two sided Researcher -> n=35 x=25,226 Is the number significantly different ? =3,251 =0.01 Ho:=24672 H1:24672 Cv= 2.58 TV= x- /n =25226-24672
The average starting salary for a nurse is $2400 =$24,000 n=10 x=23,450 s=400 =0.05 Ho: =24000 H1:24000 CV= t= .5- .025=0.475 Df=(n-1)= 9 TV= 23450-24000 400/10 = -4.35
-2.262 -4.35
+2.262
Rejected!
The average salary of an assistant professor > $42,000 n=30 -One sided example x=43,260 =5,230 = 0.05 Ho:<42,000 H1:>42,000 0
Ho false Do not reject
1.32
II
error
Ho: >80
H1: <80
TV=-1.56
Quiz answers
n=28 , Ho: m<23, H1: m>23, =.05, df=27, CVt=1.703, TV=4.5 29 24 24 .05 28 1.701 1.88 27 25 25 .1 26 1.315 1.84 Reject Ho
When is Ho true? TV, 1.55 z= 1.55 = -0.4392 p .0608 = 0.05 >P I Ho is true - Reject P> I Ho is true - Do Not Reject
CV 2.33
P= P(TV> Pi)
The Project:
Qd=a-b*Price Burro of labor statistics Y=a+b*x Select any two series where the independent variable impacts the Infl=a+B*M1 dependent variable Housing starts= a-b* Mortgage Rate Y X Y1 X1 Stationary(within 1 year timeframe) Y2 X2 ajadhav@udallas.edu Cars sold 1 yr DONT DO A TIME SERIES Simple linear regression in excel 15k 16k 17k 18k 19k 20k
Mortgage rates effect housing start Hupo Data Source: Find two variables: cause and effect- number of classes missed and grade achieved Interest rate goes up borrowing goes down For one specific year One line saying what im trying to relate Give source of data Appendix A Testing how close our sample mean is to population mean Compare two sample means: the sample means are independent of each other, populations are normally distributed Your IQ before stat class and after Ho: 1=2 or 1-2= H1: 12 or 1-2 (x1-x2)= 2 1 + 2 2 n1 n2
Minimum Pairs
Price of car
The average price of a hotel room in Dallas= $88.42, n1=50 s1=$5.62 Denton= $80.61 n2=50 s2= $4.38 =0.05
Cv
cv
tv Rejection zone
-1.96
+1.96
7.45
Test Value= (88.42-80.61) = 7.45 5.62 + 4.83 50 50 # of sports for boys=8.6 # of sports for girls =7.9 Ho: 1<2 H1: 1>2 z=8.6-7.9 (3.3/50)+(3.3/50) =1.06=0.3554 P=.5-.3554=.1446 n1=50 n2=50 S1=3.3 S2=3.3 =0.10
Find the P value In p value you always contest the aternate hypothesis
.3554
(x1-x2)-z/2
<1-2<(x1-x2)+z/2* 1/n1 + 2/n2 => S1S2 S1=S2 => 1) Sp= (n-1)s1 + (n2-1)S2 n1+n2-2 2) t= .(x1-x2) Sp(1/n1+1/n2)
Df=1
Df=9 Df=49
Find Right-> =0.1/2= 0.5 -> 36.415 Left -> 0.95 ---->13.848 Df=49 =0.1 Conf=0.9 n=25 df=24 (n-1)s<<(n-1)s right left Smaller<2.56<larger STD DEV Test: Study through the probabilities 8,9,10,11 Many samples: Central limits theorem - As n increases the bias goes down 2 errors: 1- Bias X~ 2- Efficiency s~
z=X- /n
z= area under half the curve t= series of distributions Depends on degrees of freedom (n-1)
known - N>30 z - N<30 Population normally distributed z not known - N>30 z - n<30 Population normally distributed t How we resolve the hypothesis: X= (two sided) X< or > (one sided) Whatever the claim is that becomes the alternate hypothesis Type 1 error Type 2 error P value method:
X1-X2=0 S1 S2
X1 - X2 = X1 -+X2 X1 - X2 = 1 + 2 n1 n2
Left tail means the null hypothesis means it has a left end Ho:1>2 H1:1<2
71
95% confidence interval Standard deviation= 1.6 mgs =0.05 n=19 (n-1)*s 18*1.6=46.08 46.08 < < 46.08 32.852 8.907
F value is when we are comparing variances in two different samples. We find a range around each variance. - Independent sample has no effect on the other samples - Both populations are normal- so assume that they are normally distributed - The variance of the population are critical n=size of sample x=mean s=standard deviation S in one sample vs s in another sample
Right
F= S1 S2 ~1
As n increases the distribution around the mean becomes more and more unified. Positively skewed 0 1 -->
F= S1 S2
Critical value of F distribution Medical researcher see whether variance in heart rates of smokers and non smokers are different? Smokers n1=26 non-smokers n2=18 Ho: 1=2 H1: 12 F=36/10=3.6 s1=36 s2=10 =0.1 /2=0.05 2.19 Not Reject 3.6
s1=36 s2=10
3.6
Project: plug data into excel then select the scatter function
Completely inelastic
We want negative correlation or positive correlation: r= (x-x)*(y-y) (n-1)*Sx*Sy = n(x*y))-(x)*(y) [n(x 2)-(x)2]*[n(y2)-(y)2]
-1
0 H0:=0 H1:0
X2 36 4 225 81
9 12 5 8 X=57
74 58 90 78 Y=511
81 144 25 64
50 25 0 5 10 15
=3745 =579