Professional Documents
Culture Documents
W. C. KRUMBEIN
SUMMARY
INTRODUCTION
stage the general approach may be referred to as descriptive statistics. The second
stage of development uses formal statistical design and statistical inference in sedi-
mentary analysis and interpretation. It is appropriately referred to as analytical
statistics.
The third stage of statistical evolution involves the use of stochastic process
models. These are concerned with the patterns of behavior displayed by sedimentary
processes and deposits through time or as they spread over an area. BARTLETT (1960,
p.1) refers to this approach as the dynamic part of statistical theory (the statistics of
change) as opposed to the static (i.e., conventional) statistical theory used in the first
two evolutionary stages mentioned above.
A stochastic process is defined (BARTLETT, 1960, p. 1) as “some possible actual,
e.g., physical process in the real world, that has some random or stochastic element
involved in its structure”. Stochastic process models represent an exciting develop-
ment in statistical sedimentology. They provide, among other things, mechanisms
for simulation of sedimentological processes and deposits through time or over areas,
as recently described and computerized by HARBAUGH (1966).
The main purposes of this paper are briefly to consider some aspects of descrip-
tive and analytical statistics, and then to develop the new frameworks of thinking
that have been introduced into sedimentology by stochastic process models.
that sedimentologists began to use these methods at all extensively. Allen, of the
University of Reading, was among the first to apply analysis of variance to sedimen-
tary problems. In North America, Griffiths and co-workers at Pennsylvania State
University were leaders in developing and applying these methods. They demonstrated
the advantages of using formal statistical models instead of taking the data “as they
came”, which had characterized much of statistical endeavor in the first stage. The
shift from sample to population brought out the importance of confidence intervals
on means and variances, t-tests and F-tests for comparing suites of samples, and in
general demonstrated the advantages of experimental design as a formal approach
to sedimentary statistics.
One of the main contributions of analytical statistics to sedimentology is its
emphasis on the importance of the variance of the population as well as the mean
value in examining sedimentological data. A fictitious example, useful for class
demonstration, is shown in Fig. 1. The two upper sets of pebbles (A and B) are notice-
ably different in their long dimensions, and simple inspection suggests that these
could easily have come from two different well-sorted gravel beds. In the lower two
sets of pebbles (C and D), however, the long dimensions in each group vary suffi-
ciently among themselves so that these sets could easily have come from a single bed
of poorly-sorted gravel. We can formalize these judgments statistically by comparing
the variability between the pairs to the variability within the pairs. Table I does this,
and shows that for the top pair the variability (mean square) between the means is
very much greater than the variability within the sets. For the lower pair the between-
variability is very much smaller than the within-variability.
It is evident from Fig.1 and Table I that the pebbles in these two examples are
Fig.1. Pebbles arranged in different ways to bring out the concept of within- and between-
variance. In set A-B the between-variance is much larger than the within-variance; in set C-D the
reverse is true (see also Table I).
Sedimentology, 10 (1968) 7-23
10 W. C. KRUMBEIN
TABLE I
VARIANCE ANALYSIS OF PEBBLE DIAMETERS IN FKi. 1
Sets A and B
~ _ _ __ .__~_
Sets C and D
~ - ~ ~- __
Source Sum of squares d.J Mean square F1
the same; they have merely been mixed up to illustrate the simple principle of variation
between and within groups, which lies at the heart of all conventional analysis-of-
variance models, and is the basis on which we make our intuitive judgments. An
excellent student exercise is to have various mixtures prepared, checking them each
time with analysis of variance (or with a t-test) until there is a perceptible difference
between them. Students quickly grasp the principle, and improve their ability to judge
similarities and differences by inspection even in the field.
Realization that variability is equally as important as the average entered some
aspects of sedimentology even in its first stage of statistical development. In size-
frequency distributions of sediments the average grain size and the degree of sorting
(a measure of grain-size variability within a sample) were early recognized as having
importance in sediment description and interpretation. However, the realization
that the variability among sample means can itself be partitioned and tested for
significance of geographic, stratigraphic, or environmental factors, had to wait until
formal models of variance analysis appeared on the scene. These models require con-
siderably more computation than was needed with individual samples, where the
mean grain size and sorting (as well as skewness and kurtosis) can be estimated by
simple graphic procedures.
The fact that sedimentary deposits are composed of myriads of grains or
crystals, each with its own size, shape, and mineral composition, early forced the
attention of sedimentologists on the frequency distributions, as mentioned above.
The study of frequency distributions is by definition a statistical endeavor, and it is
facilitated by use of numerical data. The extension of statistical analysis from initial
study of frequency distributions to consideration of sampling problems, areal varia-
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 11
TABLE I1
SUMMARY OF STATISTICAL OBJECTIVES AND METHODS IN SEDIMENTOLOGY
~ ~ _ ~ _
Objectives Methods
Stages I and 2
~~~
SEDIMENTOLOGICAL MODELS
items that are dynamically related”. The items (elements of a model) may be consider-
ed as points connected by a network of relationships, which can be illustrated by
drawing lines between the points. The pattern of lines may change through time as
the system interacts within itself, in part through feedback mechanisms and other
controls on the system. Systems analysis is essentially new in sedimentology, but
AMOROCHO and HART (1964) give an excellent discussion with diagrams of this
approach in hydrology, which is transferable to sedimentological stream studies.
It is evident from the foregoing that a wide spectrum extends from the statistical
study of a single particle-size frequency curve to the analysis of a complex multi-
variate data matrix. Moreover, there is a complete range from qualitative conceptual
models through statistical predictor models, to increasingly analytical models that
seek to “explain” a given process in terms of physically-meaningful functional
relationships. Although many sedimentological models are expressed in qualitative
form, there normally is a background of numerical data that supports the generali-
zations in the model. Thus, a diagrammatic model showing sediment dispersal paths
by arrows of varying thickness to indicate relative contributions from several source
areas is in effect a generalization that may involve detailed study of particle size
distributions, cross-bedding directions, heavy mineral content of sand or lithologic
types of pebbles, and so on.
The point of emphasis here is that a given sedimentological model tends to
integrate large amounts of information, some of which may be purely qualitative
(color of beds, for example), some may be statistical (mean grain size, etc.), and some
may involve application of physical principles of stream transportation. Hence,
distinction between purely “statistical” models and other kinds of sedimentological
models tends to become blurred.
It would be instructive to compile a bilbiography of sedimentological models,
with references cross-indexed according to the structure of the models proposed.
Such an endeavor is beyond the scope of this paper, but there is one aspect of model
building that does bring out an important distinction between ways of structuring
sedimentological data. This brings us to the final stage of our discourse.
As multivariate statistical models come into more general use, sufficient data
become available for experimentation with new kinds of models. Among these, as
mentioned in the introduction, are stochastic process models. In order to indicate
their relation to other kinds of mathematical models, distinction may be made between
deterministic models1 and probabilistic models as devices for examining sedimentolo-
gical phenomena.
1 I use the term deterministic model for the conventional mathematical expression of a process without
stochastic components, so that, given a set of values for the independent variables, the magnitude of
the dependent variable can be computed exactly. Terms that may be synonymous include functional
models, mechanistic models, analytical models, actualistic models, and perhaps others.
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 15
[MODELS]
I
I
/Ll
MEMORY
INDEPENDENT
’PATH -
STO CHAST I C
T MEMORY
DE PEN DENT
EVENTS PROCESS DETERM I N -
MODELS MODELS ISTIC
(STAT I T I C A L )
A-
MODELS
out the major patterns of behavior. Because the variations are not exactly predictable
in the sense that they cannot be directly related to a deterministic mechanism, these
variations may appropriately be designated as random. But randomness is not to be
thought of in this connection as completely haphazard, without rhyme or reason.
A random variable is a mathematical entity that arises from probabilistic mecha-
nisms, just as conventional nonstochastic (systematic) variables arise from deter-
ministic mechanisms. In a completely deterministic model the outcome of an experi-
ment is exactly predictable, as stated above. In a probabilistic model the outcome
of an experiment is controlled by an underlying set of probabilities, and hence it is
predictable only in terms of relative likelihoods associated with a set of possible
outcomes. Models that contain both deterministic and probabilistic elements define
processes that follow the systematic path of the deterministic core, with superimposed
random fluctuations introduced by the “built-in” probabilistic mechanism.
The probabilistic mechanism can be built into physical processes in various
ways. At one extreme is the independent-events model, in which each experimental
outcome is completely independent of preceding outcomes. This is followed by a
broad spectrum of models showing partial dependence of each outcome on preceding
outcomes, to the other extreme of the completely path-dependent deterministic model
with no probabilistic element at all (see Fig.2). It is the broad band in the middle of
this spectrum that is being examined by statistically-minded geologists in various
fields. Out of a variety of stochastic process models the3rst-order Markov chain has
been more widely used in geology than any other, and it is illustrated here.
MARKOV MODELS
TABLE111
GENERALIZED TRANSITION PROBABILITY MATRIX, py
To j +
A B C
TABLE IV
TRANSITIONPROBABILITY MATRIX OF MAZOMANIE (UPPER CAMBRIAN) FORMATION, SOUTHERN WISCONSIN
(After COATES,
1967)
____________-
--~
A B C D
of Table IV, in which state C is smallest, states A and B are intermediate, and state
D is thickest. Thus the prominence of the green beds when they occur is attributable
to their greater average thickness, despite their smaller proportion in the whole
section.
The off-diagonal elements in Table IV are instructive also. The transitions
AA, AB, BA, and BB form a compact group, such that the probability of moving
from state A to state C or D is very small, and the probability of moving from state
B either to state C or D is not much larger. What this means is that when the system
is in the white to buff sandstone loop it tends to remain there, but once it gets out
(usually by transition BC in the ideal cycle) it tends to hover in states C and D,
although the passage back to A or B is notably greater than the reverse. If the system
moves to D, the relatively large probability (0.53) of staying there tends to produce
the thick “bursts” of green sand in the dominantly white to buff sections.
What this example demonstrates is that by structuring the data into a transition
probability matrix relationships observed on the outcrop become much clearer, and
the pattern of succession is more apparent. Naturally, the question arises whether
the process is Markovian or not; fortunately, statistical tests are available (KRUMBEIN,
1967; POTTER and BLAKELY, 1967a) and the data in Table I11 safely satisfy the criterion.
When a Markov property is present the transition matrix may be used to
simulate the cycles, and the long-term equilibrium proportions of each sediment
type can be computed. Fig.3 shows part of the actual section and part of a computer
simulation. The agreement is evident, although such agreement does not necessarily
demonstrate the adequacy of the model. Of equal importance is this question: given
that the process is Markovian, what does this mean geologically, in terms of sedimen-
tary environment, source area of the quartz and glauconite, and so on ?
Fig.3. Observed and simulated stratigraphic sections of Mazomanie (Upper Cambrian) For-
mation. Section on right from matrix of Table 1V with computer program “Marchain” (KRUMBEIN,
1967). Length of sections about 10 m.
Sedimentology, 10 (1968) 7-23
20 W. C. KRUMBElN
The question just raised comes immediately to the nub of the problem as to
whether probabilistic models can play a legitimate role in sedimentology as devices
for “explaining” a process rather than merely describing it. There is some difference
of opinion regarding the validity of invoking random processes, as well as of the
geological meaningfulness of simulations obtained from transition probability ma-
trices. To some it seems that reliance on random processes implies surrender of the
basic methods of science; to others it seems equally apparent that for most natural
processes the deterministic model is merely an oversimplified expression of processes
that are basically probabilistic.
COLEMAN’S book on mathematical sociology (1964) presents an interesting
and relatively unbiased discussion of deterministic and probabilistic models in a
field not unlike geology in its complexity of interlocked variables. In chapter 18
Coleman discusses “tactics and strategies in the use of mathematics”, which includes
a section on probabilistic and deterministic mathematics. He emphasizes that in general
the deterministic model is mathematically simpler than its stochastic equivalent. In
some cases at least, the deterministic model contains the mean values (= expected
values) of the probability distributions in the corresponding stochastic process model.
Thus, where interest is mainly in mean values, the deterministic model is to be pre-
ferred. In geology, BRIGGSand POLLACK (1967) conclude that “. . .when sufficient
knowledge exists concerning the processes and the geology, there is no need to invoke
random, probabilistic processes”. Another viewpoint in the spectrum of opinion is
that of HARBAUGH (1966) and HARBAUGH and WAHLSTEDT (1967), who present
RELATIONS A M O N G MODELS
CONCEPTUAL M O D E L
( M E N T A L IMAGE)
CONVENTIONAL
STATISTICAL
I /ANAYs’s\
DETERM I N ISLSTATISTICAL-STOCHASTIC
1
TIC MODEL PREDICTOR PROCESS
I MODEL MODEL
REAL-WORLD DATA
(RE-LOOP A S NECESSARY)
Fig.4. Generalized diagram of some relations among deterministic, statistical, and stochastic
process models. Although the cross-path from deterministic to stochastic process models is by way of
predictor models, there is also a direct relation between them. See text for discussion.
Sedimentology, 10 (1 968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 21
Fig.4, based on a lantern slide used in the oral presentation, is inserted here
to illustrate in a broad way how current sedimentological studies are interlocked in
terms of deterministic, conventional statistical, and stochastic process models. If
one starts with a conceptual model, he may follow the left-hand path in the diagram
as a direct approach to the development of a deterministic model. The middle path
represents a conventional statistical approcach that leads normally to a statistical
predictor model. The right-hand path represents one kind of probabilistic approach,
leading for example to a Markov chain. In any case, the final model is ultimately
tested with observational data to evaluate its adequacy in the real-life world.
What the figure shows primarily is that all three paths are very closely inter-
locked in methodology, and there is considerable freedom of flow from one path to
another in the diagram. Thus, conventional statistical analysis (some form of the
general linear model, usually) may be used to “sort-out’’ the more important variables
or states in a system. These particular variables may then be used in formulating
deterministic or stochastic process models. The choice of a particular path through
the diagram is controlled by the purposes of the study, by the kinds of data available,
and by the way in which the data are structured for analysis. In particular, the deter-
ministic and stochastic process models endeavor to make explicit the physical meaning
of coefficients and exponents, whereas in the statistical predictor model this is not
necessarily a primary consideration. Predictor models can be very important in the
search for natural resources, for example, even though the physical meaning of each
element in the model may not be fully understood.
In the long run the predictor model represents an intermediate stage in theo-
retical analysis, and the ultimate choice for theoretical sedimentology appears to lie
between deterministic and stochastic process models.
CONCLUDlNG REMARKS
REFERENCES
ALL~GRE, C., 1964. Vers une logique mathematique des series stdimentaires. Bull. SOC.Gkol. France,
6 : 214-218.
AMOROCHO, J. and HART,W. E., 1964. Critique of current methods in hydrologic systems investi-
gation. Trans. Am. Geophys. Union,45 : 307-321.
BAILY,N. T. J., 1957. The Mathematical Theory ofEpidemics. Griffin, London.
BARTLETT, M. S., 1960. An Introduction to Stochastic Processes with Special Reference to Methods and
Applications. Cambridge University Press, London, 3 12 pp.
BEER,S . , 1959. Cybernetics and Management. Wiley, New York, N.Y., 214 pp.
BRIGGS,L. I. and POLLACK, H. N., 1967. Digital model of evaporite sedimentation. Science, 155 :
453456.
CARR,D. D., and others, 1966. Stratigraphic sections, bedding sequences, and random processes.
Science, 154 : 1162-1164.
COATES,M. S., 1967. Application o,f a Markov Chain to the Mazomanie-Reno Sandstone in South-
Central Wisconsin. Thesis, Northwestern Univ., Evanston, Ill., 97 pp. (unpublished).
COLEMAN, J. S., 1964.Introduction to Mathematical Sociology. Free Press, Glencoe, Ill., 554 pp.
CONOVER, W. J. and MATALAS, N. C., 1967. A statistical model of sediment transport. U. S., Geol.
Surv., Profess. Papers, 575-B : B60-B61.
GRIFFITHS, J. C., 1966. Future trends in geomathematics. MineralZnd., Penna. State Univ., 35 : 1-8.
GRIFFITHS, J. C., 1967. Scientz$c Methods in Analysis of Sediments. McGraw-Hill, New York, N.Y.,
508 pp.
HARBAUGH, J. W., 1966. Mathematical simulation of marine sedimentation with IBM 7090/7094
computers. State Geol. Surv., Kans., Computer Contr., 1 : 1-52.
HARBAUGH, J. W. and MERRIAM, D. F., 1968. Computer Applications in Stratigraphy. Wiley, New
York, N.Y.. (in press).
HARBAUGH, J. W. and WALHSTEDT, W. J., 1967. FORTRAN IV program for mathematical simulation
of marine sedimentation with IBM 7040 or 7094 computers. State Geol. Surv., Kans., Computer
Contr., 9 : 1 4 0 .
KEMENY, J. G. and SNELL,J. L., 1960. Finite Markov Chains. Van Nostrand, Princeton, N. J., 210 pp.
KRUMBEIN, W. C., 1967. FORTRAN IV computer programs for Markov chain experiments in geol-
ogy. State Geol. Surv., Kans., Computer Contr., 13 : 1-38.
KRUMBEIN, W. C. and GRAYBILL, F. A., 1965. An Introduction to Statistical Models in Geology.
McGraw-Hill, New York, N.Y., 475 pp.
MILLER,R. L. and KAHN,J. S., 1962. Statistical Analysis in the Geological Sciences. Wiley, New
York, N.Y., 483 pp.
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 23
PATTISON,A., 1965. Synthesis of hourly rainfall data. Water Resources Res., 1 :489498.
POTTER,P. E. and BLAKELY, R. E., 1967a. Generation of a synthetic vertical profile of a fluvial
sandstone body. J . SOC.Petrol. Eng., 6 : 243-251.
POTTER,P. E. and BLAKELY, R. F., 1967b. Random processes and lithologic transitions. J. GeoZ., in
press.
SCHWARZACHER, W., 1964. An application of statistical time-series analysis of a limestone-shale
sequence. J . Geol., 72 : 195-213.
SMITH,F. G., 1966. Geological Data Processing: Using FORTRAN IV. Harper and Row, New York,
N.Y., 284 pp.
VISTELIUS,A. B., 1949. On the question of the mechanism of the formation of strata. Dokl. Akad.
Nauk S.S.S.R., 65 : 191-194.
A. B. and FAAS,
VISTELIUS, A. V., 1965. On the character of the alternation of strata in certain sedi-
mentaryrock masses. Dokl. Akad. Nauk S.S.S.R., 164 : 629-632.
VISTELIUS,A. B. and FEIGEL’SON, T. S., 1965. On the theory of bed formation. Dokl. Akad. Nauk
S.S.S.R.. 164 : 158-160.