Professional Documents
Culture Documents
Poisson Distribution
Fitting the Poisson Distribution to Data
The Poisson distribution is often used for modelling counts. A classic example is given by the number of deaths by horse kicks in the Prussian Army from 1875-1894 for 14 Corps. which is available in the file:
HorseKick.txt
The probability function for the Poisson distribution with parameter l may be written,
l pk = e-l k!
k
for k = 0, 1, ... . In general the parameter l can be estimated by the sample mean of the data.
where X is the sample mean has a c2 -distribution with n - 1 degrees of freedom. The p-value under a one sided upper tail test is given by, p = 1 - FHDL, where FHDL is the value of cumulative distribution of the c2 -distribution with n - 1 degrees of freedom. The cumulative distribution for the c2 distribution is available in S-Plus as the function pchisq. The reason why an upper tail test is often used in practice is that it often happens with actual data that the variance is significantly larger than the mean. This phenomenon is known as overdispersion and has been researched for over 150 years.
DispersionTests.nb
> x <- scan("n:/259b/data/horsekick.txt") > mu <- mean(x) > mu [1] 0.7 > D <- sum((x - mu)^2)/mu > df <- length(x) - 1 > pval <- 1 - pchisq(D, df) > ans <- c(D, df, pval) > names(ans) <- c("D", "df", "p-value") > ans D df p-value 304 279 0.1454152
We conclude that there is no evidence against the null hypothesis that the true distribution is Poisson with a mean of about 0.7 deaths per year per Corps.
` x where denotes the sample mean and p = N . If the data were generated by a binomial distribution then D x has a c2 -distribution with n - 1 degrees of freedom. Large values of D indicate suggest that the data has a dispersion or variance greater than that given by the binomial distribution. In practice, an upper one-tailed test is normally done.
DispersionTests.nb
Observed Frequency 185 1149 3265 5475 6114 5194 3067 1331 403 105 14 4 0
Source: R.A. Fisher (1925), Statistical Methods for Research Workers, p.64. For convenience, the second column of the above table is available in the file: n:259b data/Weldon.txt ` Input these observed frequencies. Compute p and compute the Fisher Dispersion Test for testing if this data was generated by a Binomial distribution. Test if the the dice appear to be fair, that is, if the probability of 5 or 6 is equal to p = 1 3. A script file to generate the output is available WeldonBinomialDispersionTest.ssc on our course homepage.
> weldon <- scan("n:/259B/data/weldon.txt") > x <- rep(0:12, weldon) > p.hat <- mean(x)/12 > p.hat [1] 0.3376986 > FisherD <- sum((x - mean(x))^2)/(p.hat * (1 - p.hat) * 12) > df <- length(x) - 1 > pvalue <- 1 - pchisq(FisherD, df) > ans <- c(FisherD, df, pvalue) > names(ans) <- c("Fisher D", "df", "p-value") > ans Fisher D df p-value 26445.78 26305 0.2690794
Solution
Remarks: 1. Note how using a vector form of the rep function avoids using a for loop. 2. Notice that the table function gives you back the frequency tabulations. That is:
> table(x) 0 1 2 3 4 5 6 7 8 9 10 11 185 1149 3265 5475 6114 5194 3067 1331 403 105 14 4