You are on page 1of 6

Algorithm AS 171: Fisher's Exact Variance Test for the Poisson Distribution

Author(s): E. L. Frome
Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 31, No. 1
(1982), pp. 67-71
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2347079 .
Accessed: 27/12/2014 01:11
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to
Journal of the Royal Statistical Society. Series C (Applied Statistics).

http://www.jstor.org

This content downloaded from 136.159.235.223 on Sat, 27 Dec 2014 01:11:39 AM


All use subject to JSTOR Terms and Conditions

67

AS 171
Algorithm

Fisher'sExact VarianceTest forthe Poisson Distribution


By E. L. FROMEt
Oak Ridge,Tennessee,USA
Medical and HealthSciencesDivision,Oak RidgeAssociatedUniversities,
[ReceivedJanuary1980. Final revisionDecember 1980]

Keywords: INDEX OF DISPERSION; HOMOGENEITY OF VARIANCE; CHI-SQUARE


LANGUAGE

Fortran66
DESCRIPTION AND PURPOSE

Let Yi1Y2, Yn denote n observations from a Poisson population, and let fi, j = 0,..., k
denote the frequencywith which the numberj occurs in the given sample. (Note that
forthe
k = max {yi,i = 1,...,n}.) It is wellknownthatthesampletotal T= E% 1 yi is sufficient
Poisson parameterand thattheindexof dispersion

(Yi_Y)2/

as a x2statisticwith(n- 1) degreesoffreedomwhenn is largeand


distributed
is approximately
y = T/nis over3 (Rao and Chakravarti,1956).The indexofdispersionis widelyused to testfor
In
to as the variancetestforhomogeneity.
of the observationsand is referred
homogeneity
ofD is notaccuratelygivenby
thesamplingdistribution
manysituationsofpracticalinterest,
the x2approximation,and Fisher (1950) proposed an exact test which is based on the
conditionaldistribution
(1)
P(fo f, .I T) = C{ [Jo!f1!f2! ... ] [(2!)f2(3!)f3 ...]}-1,
whereC = T! n!/nT.The indexof dispersiondependson E y2 = E j2 fj,and ifS denotesthe
value obtained for a given value of T and n, then the desiredprobabilityis obtained by
thetotalfrom1.
summing(1) overall possiblesampleswithE y2 less thanS and subtracting
That is, conditionalon 7Twe computethe probabilityof obtaininga value of the index of
dispersionthatis greaterthanor equal to theobservedvalue,if,in fact,theobservationsare
fromthe same Poisson population.
NUMERICAL METHOD

Partitionsof the sample total Tare generatedusinga modifiedversionof an algorithm

given by Lehmer (Section 1.8, 1964). A partition of Tis definedby T= a1 + a2 +... + am,where
a1, a2 a...< am,and m is thenumberof"parts".We beginwith
theai's are positiveintegers,

in dictionary
orderas described
m = min(n,T), and thengenerateall possiblempartpartitions
by Lehmer.Then,iftheYi a3 is less thantheobservedvalue S, themultinomialprobability(1)
is evaluated.Note thatifa*, a2, ... a* is the"last"mpartpartitionand S* = Em= i a*2,then S* is
and S* is lessthan k 1b2,whereb1,...,b, is a klessthanE a3 foreveryotherm-partpartition,
part partitionwithk<im. Consequently,ifS < S*, thenthe sum of squares forall partitions
withless than m parts will exceed S, and no further
computationis required.The desired
probabilityis obtained by summingthe probabilitiesfor each "admissible"partitionand
thistotalfrom1.
subtracting

t Presentaddress:Mathsand StatisticsResearchDept.,Union CarbideCorporation,Oak Ridge,Tennessee,U.S.A.


1982 Royal StatisticalSociety

0035-9254/82/31067 $2.00

This content downloaded from 136.159.235.223 on Sat, 27 Dec 2014 01:11:39 AM


All use subject to JSTOR Terms and Conditions

68

APPLIED STATISTICS
STRUCTURE

SUBROUTINE TPSHV(N, T,NTMAX, SY2, PROB, KGP, KPR, A, KF, DLF, IFAULT)
Formalparameters
N
Integer
input:samplesize (n)
T
Integer
input:sampletotal
NTMAX
Integer
input:maximum(N, T)
SY2
Integer
input:samplesum of squares
PROB
Real
output:probability
KGP
Integer
output:numberof partitionsgenerated
KPR
Integer
output:numberof partitionsfor which equation (1)
was evaluated
A
Integerarray(T)
workspace:
KF
Integerarray(T)
workspace:
DLF
Real (NTMAX)
workspace:used to storelog factorials
IFAULT
Integer
output:faultindicator,equal to:
1 PROB is onlyan upperbound on theexact
probability(see RESTRICTIONS);
2 ifN<2 or N>NTMAX;
3 if T<2, T>NTMAX or T>MAXT
(see RESTRICTIONS);

0 otherwise
RESTRICTIONS

The local constant,MAXT7 provides an upper limit on the sample total for which
partitionscan be generated.The local constant,MAXP, is used to limitthemaximumnumber
ofpartitionsthatwillbe evaluated.If thislimitis reached,IFAULTis setequal to 1; and the
value ofPROB is an upperboundon theexactprobability.
WithMAXP = 1000 000,theexact
probabilitywillbe obtainedwheneverTis less than60, providedit is not less thanDMIN.
The local constant,DMIN, is used to definethesmallestprobability
thatcan be computed.
If thisvalue is exceeded,thenIFAULTis set equal to 1, and PROB is assignedthevalue of
DMIN. It is suggestedthatDMIN be at least 100 timesthe precisionof the computer.
TIME

The numberof partitionsof Tis approximatelyequal to (4T 1,/3)-'exp(Tc1/23/T),


and
executiontimeis proportionalto the numberof partitionsthatmustbe generated.It is not
necessaryto generateall partitionsof T (see NUMERICALMETHOD), and the numberof
partitionsthatmustbe generateddependson the observedvalue of the indexof dispersion
whichis determined
by E y2 fora givenvalue ofn and T. The situationis complicatedby the
factthat the executiontimeincreasesand then decreasesas n increasesfora givenvalue
ofT. Thisis illustrated
inTable 1 whichcontainsexecutiontimeson a DEC PDP-10 forvarious
values of T and n. The values reportedin Table 1 are CPU times (and total number
of partitionsgenerated)requiredto reach significance
at the 0 01 and the 0 0001 levels.
PRECISION

Ifthewordsize ofthecomputeris 36 bitsor less,theusershouldspecifydouble precision


arithmeticfor the calculation and accumulationof probabilities.The REAL declaration
should be changedto DOUBLE PRECISION; the DATA statementmodified;ALOG and
EXP changed to DLOG and DEXP; and the statementlabelled 200 changed to 200
PROB = SNGL(D 1-DPRB).

ADDITIONAL COMMENTS

of thealgorithmcan be improvedby allocatingarraysA, KF and DLF for


The efficiency

This content downloaded from 136.159.235.223 on Sat, 27 Dec 2014 01:11:39 AM


All use subject to JSTOR Terms and Conditions

69

STATISTICAL ALGORITHMS
TABLE 1

Typicalcomputer
timest
Significance
level
0*01

Sample
total
T

Sample
size
n

Numberof
partitionst

32

8
16
32
48
64
128
8
16
32
48
64
128

2 968
5 265
2 641
1 210
915
97
23 666
87 841
50 646
22 855
11723
1212

48

0*0001
Time
37
64
33
15
12
4
207
873
574
278
149
23

Numberof
partitionst

Time

3 217
6 834
4 143
2 641
1 588
373
24482
97 590
72 302
42 398
22 855
4 508

58
104
53
32
22
7
344
1268
869
489
277
65

ofa second)
on DEC PDP-10 computer.The timeshownis CPU time(in hundredths
t All calculationsperformed
to findFisher'sexactprobability
fortheindexofdispersionfora givenvalue of Tand n.The indexofdispersionvalue
levelgivenat thetop of thecolumn.
was determined
so thatexact probabilitywouldjust reachthesignificance
of Tthatweregenerated(thetotalnumberofpartitionsof32 and 48 are 8349 and 147273,
t Numberofpartitions
respectively).

This requiresadditionalerrorcheckingto ensurethatTdoes not


privateuse ofthesubroutine.
exceed the maximumdimensionsof A and KF and that N does not exceed the maximum
dimensionofDLF. If TPSHVis called repeatedlyby anotherprogramunit,thenthevaluesof
the log factorialsin DLF can be computedonce in the callingprogramunit and labelled
COMMON used.
ACKNOWLEDGEMENT

This reportis based on work performedunder Contract No. DE-AC05-760R00033


Research,and
betweenthe U.S. Departmentof Energy,Officeof Health and Environmental
Oak Ridge AssociatedUniversities.
REFERENCES
6, 17-24.
of deviationsfromexpectationsin a Poisson series.Biometrics,
FISHER, R. A. (1950). The significance
Mathematics
(E. F. Bechenbach,
In AppliedCombinatorial
LEHMER,D. H. (1964).The machinetoolsofcombinatorics.
ed.). New York: Wiley.

for a Poisson distribution.


RAO, C. R. and CHAKRAVARTI,I. M. (1956). Some small sample testsof significance
12, 264-282.
Biometrics,

C
C
C
C

SUBROUTINE TPSHV(Ni
DLFi IFAULT:)
ALGORITHM AS 171
FISHERS

Tv NTIIAXY SY29
APPLa

STATIST.

EXACT VARIANCE TEST

PROE4 KOPv KPR,


(1982)

VOL.31v

FOR THE POISSON

A,

NO-3

DISTRIBUTION

KF(T)
INTEGER Tv SY2v TSi A(T:)
REAL DD. DC. DMv DIIINi DPi D1l DZi DPRB, DLF(NTMAX)
DATA MAXP /10OO0OO/v MAXT /80/
DMIN /1.OE-5/
DATA Dl /1.)EC)/i
DZ /(.OEO/v
IFAULT = C:
DPRB

KFi

DZ

This content downloaded from 136.159.235.223 on Sat, 27 Dec 2014 01:11:39 AM


All use subject to JSTOR Terms and Conditions

70

APPLIED STATISTICS
DM =

D1

DMIN

(
.LT. 2 . OR. N .GT. NTMAX:) IFULT
(T .LT2 2OR.
T GT. NTMAX .OR. T.
.
CIFAULT .NE. o) GOTO
(0)C0
KGP = C)
IF
IF
IF

KPR

C
C

GENERATE LOG FACTORIALS


=

DLF(1)

= DLF(I
10 DLF(:I')
DC = -FLOT(T:)

C
C
C

1:)

ALOG0FLOAT:I:))

ALOG(FLOAT(:N'.))

+ DLF(N)

DLF(T)

GENERATE PARTITIONS OF SAMPLE TOTAL (T:) STARTING WITH


M THE NUMBER OF NON-ZERO VALUES IN THE SAMPLE
M = MINOCTi N:)
DO 15 J = 1* M
15 KFCJ) = 0
20 DO 25 J = 1* M
25 ACJ) = 1
TS

M -

30 A(M: = T
DO 35 J = 2.
35

- ACJ
ACM) = A(M)
TS = TS + ACM:) **
KGP = KGP + 1

1)

THE CURRENT M-PART PARTITION

C
C

A(1)

AC2:

SY2)

GOTO 60

...

ACM)

OF T
T

WITH SUMS OF SQUARES


TS = SUM ACJ:) **
IS ADMISSIBLE
IF TS IS LESS THAN SY2

C
C
C

IF

C
C
C

CTS .GE.

CONVERT SAMPLE INTO FREQUENCY DISTRIBUTION.


KZ IS THE NUMBER OF ZERO VALUES IN THE SAMPLE.
KFCJ) IS THE NUMBER OF A( :)S EU.UAL TO J.

AND CONSTANT

DZ

DO 10 I = 2v NTMAX

= 2
GT. MAXT) IFAULT

K(Z = N - M
MX = ACM)
DO 60 J = 1. M
IAK = ACJ)
= KF(IAK)
KFCIAK)

60:) CONTINUE

COMPUTE DP = PROBABILITY

KPR = KPR
DD = DZ

IF

FOR THIS

SAMPLE

GT0. 0:) DD = DLF0(Z:

CKZ

IAK

KF(:1)

IAK

F(F(J)

IF (KFCI)
G0Th 0) DD = DD + DLFCIAK)
= 0
KF(1)
IF CMX .LT. 2:) GOTO 7C:)
DO 65 J = 2. MX
IF CKF(J) .EQ. 0) GOTO 65
DD = DD + DLF(IAK:)
K?F(J) = 0
65 CONTINUE
7C:) DP =
DPRB

IF

EXPCDC
= DPRB

(:DPRB .LT.

IFAULT
DPRB =

= 1
DM

GOTO 200t:
:30 CONTINUE
90

IT

IF

M -

CIT

EQ.

0)

FLOAT(CKFJ:))

DLF:J:)

DD)
DP

DM) GOTO 80

GOTO 120

This content downloaded from 136.159.235.223 on Sat, 27 Dec 2014 01:11:39 AM


All use subject to JSTOR Terms and Conditions

71

STATISTICAL ALGORITHMS
DETERMINE NEXT M-PART PARTITION
UPDATE SUM OF SQUARES

C
C
C

IF (:(A(M) - AIT):)
IT - IT - 1
GOTO :f(
100) IAT = A(IT) + 1

GT. 1) GOTO 10O

Ml = M - 1
DO llo J = IT, Ml
B
TS - A(:J:) ** 2 + IAT **
TS

110
C

A(J)

CONTINUE
TB = TS -

C
C

IF (KGOP .LT.
IFAULT = 1
GOTO 200

C
C
C
C

M -

A(M) ** 2
. A(M - 1) ARE DETERMINED

NEW VALUES FOR A(:l:)

120:) M =

= IAT

MAXP) GOTO 30

END EVALUATION OF M-PART PARTITIONS.


DECREABE M BY 1 AND CONTINUE IF SY2
IS GREATER THAN THE SUM OF SQUARES
FOR THE LAST M-PART PARTITION.
IF (TS .GE. SY2) GOTO 200
IF (M .GT. 1) GOTO 20
200 PROB = DI - DPRB
RETURN
END

AS 172
Algorithm
Direct Simulationof Nested Fortran DO-LOOPS
By M.

O'FLAHERTY

and G. MAcKENZIE

Department
ofCommunity
Medicine,The Queen's University
ofBelfast,NorthernIreland
[ReceivedAugust1979. Final revisionJuly1981]

Keywords: K-DIMENSIONAL DO-LOOPS; SIMULATED SUBSCRIPTING; MULTI-DIMENSIONAL TABLES


LANGUAGE

Fortran66
INTRODUCTION

The Fortranlanguage does not provide a facilityto vary dynamicallythe depth of a


sequenceofnestedDO-LOOPS. Even ifsuch a facilityweregenerallyavailable its usefulness
mightbe limitedby the depth to which the different
dialects of Fortran permitarray
subscripting.
Yet,in statisticalapplications,thereare manycontextswherethealternatives
of
dynamicnestingare awkwardand inefficient.
SubroutineSIMDO providesa simpleand completelygeneralsystemforsimulatinga nest
of FortranDO-LOOPS to depthk. Gentleman(1975) describedan algorithm(AS 88) which
employeda simulatedDO-LOOP techniqueto generatethe nCrcombinations.The present
algorithm,whichin purpose and constructionis materiallydifferent
fromAS 88, generates
thecell subscriptsofa multi-dimensional
systematically
table,makingeach subscriptpattern
available to thecallingsegment.Apartfromobvious savingsin thedeclarationoffixedmultidimensionalarrays,the authorsanticipatethat the routinewill prove usefulin statistical
applicationswherethenestedDO-LOOPS are unconditionally
traversed.
? 1982 Royal StatisticalSociety

0035-9254/82/31071 $2.00

This content downloaded from 136.159.235.223 on Sat, 27 Dec 2014 01:11:39 AM


All use subject to JSTOR Terms and Conditions

You might also like