Dispersion Tests

Dispersion Tests for Poisson and Binomial
Poisson Distribution
Fitting the Poisson Distribution to Data
The Poisson distribution is often used for modelling counts. A classic example is given by the number of deaths by horse kicks in the Prussian Army from 1875-1894 for 14 Corps. which is available in the file:
HorseKick.txt
The probability function for the Poisson distribution with parameter l may be written,
l pk = e-l k!
k
for k = 0, 1, ... . In general the parameter l can be estimated by the sample mean of the data.
Testing the Model Adequacy

The Poisson dispersion test provides the most powerful ominbus method for testing the adequacy of the fitted Poisson distribution. The statistic for this test is based on the fact that under the assumption that the Poisson distribution, the mean and variance are equal. Furthermore, R.A. Fisher showed that under the assumption that data X1 , ..., Xn are generated by a Poisson distribution with some parameter l, then the statistic,
n 2 i=1 HXi - X L D = X
where X is the sample mean has a c2 -distribution with n - 1 degrees of freedom. The p-value under a one sided upper tail test is given by, p = 1 - FHDL, where FHDL is the value of cumulative distribution of the c2 -distribution with n - 1 degrees of freedom. The cumulative distribution for the c2 distribution is available in S-Plus as the function pchisq. The reason why an upper tail test is often used in practice is that it often happens with actual data that the variance is significantly larger than the mean. This phenomenon is known as overdispersion and has been researched for over 150 years.
Application to Horse Kick Data

A script file to generate the output is available HorseKickDispersionTest.ssc on our course homepage.
DispersionTests.nb
> x <- scan("n:/259b/data/horsekick.txt") > mu <- mean(x) > mu [1] 0.7 > D <- sum((x - mu)^2)/mu > df <- length(x) - 1 > pval <- 1 - pchisq(D, df) > ans <- c(D, df, pval) > names(ans) <- c("D", "df", "p-value") > ans D df p-value 304 279 0.1454152
We conclude that there is no evidence against the null hypothesis that the true distribution is Poisson with a mean of about 0.7 deaths per year per Corps.
Binomial Dispersion Test

Let x1 , ..., xn be a random sample. To test if the data are from a binomial distribution with parameters N and p, Fisher recommended the binomial dispersion statistic, x n Hxi - L2 i=1 D = ` ` N p H1 - pL
` x where denotes the sample mean and p = N . If the data were generated by a binomial distribution then D x has a c2 -distribution with n - 1 degrees of freedom. Large values of D indicate suggest that the data has a dispersion or variance greater than that given by the binomial distribution. In practice, an upper one-tailed test is normally done.
Weldon Dice Data

W.F.R. Weldon (1860-1906) reportedly threw 12 dice a total of 26,306 times. The observed frequencies for the number of dice showing a 5 or 6 is shown in the following table:
DispersionTests.nb
Number of Dice With 5 or 6 0 1 2 3 4 5 6 7 8 9 10 11 12
Observed Frequency 185 1149 3265 5475 6114 5194 3067 1331 403 105 14 4 0
Source: R.A. Fisher (1925), Statistical Methods for Research Workers, p.64. For convenience, the second column of the above table is available in the file: n:259b data/Weldon.txt ` Input these observed frequencies. Compute p and compute the Fisher Dispersion Test for testing if this data was generated by a Binomial distribution. Test if the the dice appear to be fair, that is, if the probability of 5 or 6 is equal to p = 1 3. A script file to generate the output is available WeldonBinomialDispersionTest.ssc on our course homepage.
> weldon <- scan("n:/259B/data/weldon.txt") > x <- rep(0:12, weldon) > p.hat <- mean(x)/12 > p.hat [1] 0.3376986 > FisherD <- sum((x - mean(x))^2)/(p.hat * (1 - p.hat) * 12) > df <- length(x) - 1 > pvalue <- 1 - pchisq(FisherD, df) > ans <- c(FisherD, df, pvalue) > names(ans) <- c("Fisher D", "df", "p-value") > ans Fisher D df p-value 26445.78 26305 0.2690794
Solution
Remarks: 1. Note how using a vector form of the rep function avoids using a for loop. 2. Notice that the table function gives you back the frequency tabulations. That is:
> table(x) 0 1 2 3 4 5 6 7 8 9 10 11 185 1149 3265 5475 6114 5194 3067 1331 403 105 14 4

Dispersion Tests

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dispersion Tests

Uploaded by

Copyright:

Available Formats

Dispersion Tests for Poisson and Binomial

Testing the Model Adequacy

Application to Horse Kick Data

Binomial Dispersion Test

Weldon Dice Data

Number of Dice With 5 or 6 0 1 2 3 4 5 6 7 8 9 10 11 12

You might also like