You are on page 1of 17

Sedimentology - Elsevier Publishing Company, Amsterdam - Printed in The Netherlands

STATISTICAL MODELS IN SEDIMENTOLOGYI

W. C. KRUMBEIN

Northwestern University, Evanston, Ill. ( U . S. A . )

(Received September 12,1967)

SUMMARY

Three stages of statistical development can be recognized in sedimentology.


The first is descriptive statistics, in which the sample is the object of interest, and the
second is analytical statistics, in which the population assumes major importance.
A very large variety of statistical techniques is available for estimating mean values,
degrees of variability, tests of differences among population means, linear relations
(correlations) among the variables, and ways of evaluating areal variations (trends)
in sedimentary phenomena.
The third stage of statistical development is the application of stochastic
process models to sedimentology, in which the objective is to discern the proba-
bilistic elements in sedimentary processes, in part by simulation with the high-speed
computer. Stochastic process models thus provide one way of examining sedimentary
processes through time or over an area. In conjunction with deterministic models
they provide a framework for exploring the underlying physical, chemical, and
biological controls on sedimentary processes and deposits, with superimposed random
fluctuations introduced by the “built-in” probabilistic mechanism.

INTRODUCTION

Applications of statistics in sedimentology can be divided into several catego-


ries, ranging from initial use of relatively simple charts, graphs, and tables for sum-
marizing data, to advanced multivariate models and stochastic process models. I
shall discuss these as three stages of evolution in statistical applications to sedimentary
processes and sedimentary deposits.
These stages overlap one another in time of application and from one aspect
of sedimentology to another, but we may conveniently consider them in historical
sequence. The first stage involves the development of methods for measuring sedi-
mentary attributes (such as grain size and shape), and the development or adaptation
of ways in which to present and interpret the resulting numerical data. In this first

1 Paper presented by invitation at the Seventh International Sedimentological Congress, Reading,


Berks., England, August, 1967. This work was supported by the Office of Naval Research (Geography
Branch) under contract Nonr-1228(36), ONR Task No.388-078. Reproduction in whole or in part is
permitted for any purpose of the United States Government.
Sedimentology, 10 (1968) 1-23
8 W. C. KRUMBElN

stage the general approach may be referred to as descriptive statistics. The second
stage of development uses formal statistical design and statistical inference in sedi-
mentary analysis and interpretation. It is appropriately referred to as analytical
statistics.
The third stage of statistical evolution involves the use of stochastic process
models. These are concerned with the patterns of behavior displayed by sedimentary
processes and deposits through time or as they spread over an area. BARTLETT (1960,
p.1) refers to this approach as the dynamic part of statistical theory (the statistics of
change) as opposed to the static (i.e., conventional) statistical theory used in the first
two evolutionary stages mentioned above.
A stochastic process is defined (BARTLETT, 1960, p. 1) as “some possible actual,
e.g., physical process in the real world, that has some random or stochastic element
involved in its structure”. Stochastic process models represent an exciting develop-
ment in statistical sedimentology. They provide, among other things, mechanisms
for simulation of sedimentological processes and deposits through time or over areas,
as recently described and computerized by HARBAUGH (1966).
The main purposes of this paper are briefly to consider some aspects of descrip-
tive and analytical statistics, and then to develop the new frameworks of thinking
that have been introduced into sedimentology by stochastic process models.

DESCRIPTIVE AND ANALYTICAL STATISTICS

In concept descriptive and analytical statistics are distinctly different. In the


descriptive approach one asserts that a “typical” sample yields measurable facts about
a sediment, and that within some given range of measurement error these facts form
the logical basis for inferences about the deposit. Thus, it is argued, typical specimens
may be collected on the basis of substantive judgment, from such localities as may
reasonably be expected to yield maximum information about the problem at hand.
The analytical point of view asserts that in addition to variability introduced
by measurement error, variability may also be introduced by sampling error. That
is, the population from which a sediment sample is taken has some fixed value of
mean grain size, but there is no objective way of judging whether a sample picked
by personal judgment affords a good or poor estimate of the population mean.
Without some randomization procedure in collecting the samples, there is no assu-
rance that personal bias may not strongly influence the particular sample value that
is observed. Thus, one important difference between descriptive and analytical sta-
tistics is that in the former the emphasis is on the sample, whereas in the latter the
population is the target of interest. We may point up this difference by a fundamental
dictum of analytical statistics that a sample is of value only for the insight it gives into
thepopulation of interest.
The tremendous expansion in analytical statistics from the late 1920’s on,
brought about by Sir Ronald Fisher’s introduction of variance analysis, opened
entirely new vistas in virtually all sciences. It was not until the mid-l940’s, however,
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 9

that sedimentologists began to use these methods at all extensively. Allen, of the
University of Reading, was among the first to apply analysis of variance to sedimen-
tary problems. In North America, Griffiths and co-workers at Pennsylvania State
University were leaders in developing and applying these methods. They demonstrated
the advantages of using formal statistical models instead of taking the data “as they
came”, which had characterized much of statistical endeavor in the first stage. The
shift from sample to population brought out the importance of confidence intervals
on means and variances, t-tests and F-tests for comparing suites of samples, and in
general demonstrated the advantages of experimental design as a formal approach
to sedimentary statistics.
One of the main contributions of analytical statistics to sedimentology is its
emphasis on the importance of the variance of the population as well as the mean
value in examining sedimentological data. A fictitious example, useful for class
demonstration, is shown in Fig. 1. The two upper sets of pebbles (A and B) are notice-
ably different in their long dimensions, and simple inspection suggests that these
could easily have come from two different well-sorted gravel beds. In the lower two
sets of pebbles (C and D), however, the long dimensions in each group vary suffi-
ciently among themselves so that these sets could easily have come from a single bed
of poorly-sorted gravel. We can formalize these judgments statistically by comparing
the variability between the pairs to the variability within the pairs. Table I does this,
and shows that for the top pair the variability (mean square) between the means is
very much greater than the variability within the sets. For the lower pair the between-
variability is very much smaller than the within-variability.
It is evident from Fig.1 and Table I that the pebbles in these two examples are

Fig.1. Pebbles arranged in different ways to bring out the concept of within- and between-
variance. In set A-B the between-variance is much larger than the within-variance; in set C-D the
reverse is true (see also Table I).
Sedimentology, 10 (1968) 7-23
10 W. C. KRUMBEIN

TABLE I
VARIANCE ANALYSIS OF PEBBLE DIAMETERS IN FKi. 1

Sets A and B
~ _ _ __ .__~_

Source Sum of squares d.J Mean square F1


- ~_______
Between sets 696.6 1 696.60 44.5
Within sets 282.0 18 15.67
Total 978.6 19

Sets C and D
~ - ~ ~- __
Source Sum of squares d.J Mean square F1

Between sets 7.6 1 7.60 <1


Within sets 971 .O 18 53.94
Total 978.6 19

Critical F at 95% = 4.41.

the same; they have merely been mixed up to illustrate the simple principle of variation
between and within groups, which lies at the heart of all conventional analysis-of-
variance models, and is the basis on which we make our intuitive judgments. An
excellent student exercise is to have various mixtures prepared, checking them each
time with analysis of variance (or with a t-test) until there is a perceptible difference
between them. Students quickly grasp the principle, and improve their ability to judge
similarities and differences by inspection even in the field.
Realization that variability is equally as important as the average entered some
aspects of sedimentology even in its first stage of statistical development. In size-
frequency distributions of sediments the average grain size and the degree of sorting
(a measure of grain-size variability within a sample) were early recognized as having
importance in sediment description and interpretation. However, the realization
that the variability among sample means can itself be partitioned and tested for
significance of geographic, stratigraphic, or environmental factors, had to wait until
formal models of variance analysis appeared on the scene. These models require con-
siderably more computation than was needed with individual samples, where the
mean grain size and sorting (as well as skewness and kurtosis) can be estimated by
simple graphic procedures.
The fact that sedimentary deposits are composed of myriads of grains or
crystals, each with its own size, shape, and mineral composition, early forced the
attention of sedimentologists on the frequency distributions, as mentioned above.
The study of frequency distributions is by definition a statistical endeavor, and it is
facilitated by use of numerical data. The extension of statistical analysis from initial
study of frequency distributions to consideration of sampling problems, areal varia-
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 11

tions in sedimentary properties, and to the analysis of complex aggregates of multi-


variate data, is a normal and expected growth.

IMPACT OF THE COMPUTER

Some sedimentologists began experimenting with the computer in 1955 and


1956, and by 1960 a number of computer programs was available for geological
problems. Before 1965 the computer had become an essential instrument in sedimen-
tological research among the statistically-minded because of its capability for handling
very complex computations very quickly.
The high-speed computer has unquestionably influenced the structure of sedi-
mentological research by encouraging a multivariate approach. A typical study of a
decade ago might involve three or four variables (such as grain size, sorting, grain
shape, heavy mineral content and a few others), but it is not uncommon now to
measure twenty or more attributes, including not only sedimentary characteristics,
but observations on faunal content, trace elements, vector properties, and (for
present-day environments) various process elements as well.
Statistical methods in sedimentology have kept abreast fairly well with the
increasing needs for advanced statistical models applicable to multivariate studies.
Table I1 is an attempt to summarize the majoI statistical objectives in the left column,
with a list of methods on the right. The upper part of the table covers stages 1 and 2
of statistical applications, and the bottom portion refers to stage 3, which is discussed
later in the text. For stages 1 and 2 most methods are concerned with the third block
of the table (i.e., associations among measurable attributes of sedimentary processes
or deposits). These attest to the importance given to the search for relations among
sedimentological variables.
The wide choice of methods available for analysis of multivariate data raises
questions regarding the optimum method to be used for any given problem. At the
time of writing, for example, much discussion centers on factor analysis and the kinds
of problems to which it is particularly suited. Similarly, in trend analysis the choice
of the polynomial or the double Fourier series model involves decisions regarding
the nonperiodic or periodic nature of the observed surfaces that are to be fitted. A
striking feature of Table I1 is the pervasiveness of the general linear model in its
geological and sedimentological applications. Analysis of variance, multiple corre-
lation and regression, factor analysis, discriminant function analysis, and the poly-
nomial and double Fourier models for map analysis all are variants of this same model.
The literature on statistical applications in virtually all fields of geology is SO
large that it is not feasible in this paper to include a comprehensive bibliography of
applications, or of the large number of computer programs now available. It may be
mentioned, however, that the State Geological Survey of Kansas (Lawrence, Kans.,
U. S. A.) published a Computer Contribution Series, which includes program listings
for many of the methods in Table 11. Similarly, the Sedimentological Research Labor-
atory of the University of Reading (Reading, Berks., U. K.) has copies of basic
Sedimentology, 10 (1968) 7-23
12 W. C. KRUMBEIN

TABLE I1
SUMMARY OF STATISTICAL OBJECTIVES AND METHODS IN SEDIMENTOLOGY
~ ~ _ ~ _
Objectives Methods
Stages I and 2
~~~

Estimation of population mean and variance. frequency distribution analysis


General method is based on computation of confidence intervals
moments or quartile measures. chi-square tests for goodness of fit
nested analysis of variance
Detection of similaritiesor differences between t-tests
population parameters. General method is analysis of variance
analysis of variance. single factor
row-column
multifactor
nested models
discriminant analysis
Estimation of associations among measurable simple linear correlation
properties of populations. General methods are simple linear regression
correlation and regression to “sort out” important multiple and partial correlation
variables, or to develop predictor models. multiple linear regression
Classification is also an objective here. polynomial regression
stepwise regression
R-mode and Q-mode factor analysis
cluster analysis
time-series analysis
autocorrelation
spectral analysis
discriminant analysis
Detection and evaluation of patterns of areal polynominal functions
variation in measurable properties of populations. gridded map data
General method is trend analysis. nongridded data
double Fourier series
gridded map data
nongridded data
double power spectra
canonical correlation
stepwise regression
____

Description and analysis of probabilistic elements systems analysis


in sedimentary processes and deposits. General simulation studies
method uses stochastic processes. use of transition probabilities and probability
trees
use of conditional probabilities
independent-events trials

programs and maintains contact with computer-oriented sedimentologists. As for


the statistical methods included in Table 11, and the modek on which they are based,
these may be found in standard statistical texts and references, and in their geological
context in M~LLER and KAHN(1962) and KRUMBE~N and GRAYB~LL (1965). Some
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 13

aspects of geological data processing are covered in SMITH(1966); and a forthcoming


text by HARBAUGH and MERRTAM (1968), covers a wide variety of computer methods
for stratigraphic and structural analysis. Another, recent, book by GRIFFITHS (1967)
is more specifically oriented toward methodological and statistical aspects in sedimen-
tology.
The last class of objectives in Table I1 covers stochastic processes, as applied
to the study of sedimentary processes through time or as they spread over an area
(as in transgression-regression cycles), to discern the patterns ot behavior of sediments
successively deposited either vertically or laterally. These methods are new in sedi-
mentology, and they are included in the table for completeness. Definitions and an
example are discussed later in this report.

SEDIMENTOLOGICAL MODELS

In sedimentology one objective of analytical statistics is identification of the


major variables involved in sedimentary processes. Once these are “sorted out”,
the statistical generalizations need to be tied in with underlying physical, chemical,
and biological controls that are operative during weathering, erosion, transportation,
deposition, and lithification.
It was recognized long ago that sets of samples collected along streams or
beaches display the kinds of changes that sediments undergo during transportation.
Such studies are facilitated by quantitative data, and even in the first stage of statistical
study graphs of mean grain size, or of heavy mineral content of sands along lines of
transport, proved to be very instructive. Such diagrams aided in development of
conceptual models (mental images) of the processes taking place. As interest in sedi-
mentary environments grew, it became apparent that environmental factors as well
as the attributes of sediments had to be studied in order to understand why the deposits
show the sorts of changes they display. Out of these considerations emerged the con-
cept of a process-response (“cause and effect”) model, in which relations between
process elements and sedimentary responses can be shown at least qualitatively.
Process-response models may be implemented in various ways. The commonest
approach is to isolate some specific process, such as stream transportation, and to
study it in controlled flume experiments, or purely theoretically from underlying
principles of physics. A field approach might be to find an “ideal stream” (i.e., where
a sediment can be followed from a known source with minimum disturbances due
to tributaries), in which to observe and measure attributes of the channel, of the
flowing water, and of the sedimentary materials in transit or in stream deposits. The
data can be examined in terms of physical theory, or purely statistically in terms of a
predictor model (see Table 11) based on multiple regression, that relates a particular
response (such as pebble roundness) to the several most important independent
variables in the process.
Stream transportation and deposition can also be examined in the framework
of systems analysis. BEER(1959, p.7) defines a system as “any cohesive collection of
Sedimentology, 10 (1968) 7-23
14 W. C . KRUMBEIN

items that are dynamically related”. The items (elements of a model) may be consider-
ed as points connected by a network of relationships, which can be illustrated by
drawing lines between the points. The pattern of lines may change through time as
the system interacts within itself, in part through feedback mechanisms and other
controls on the system. Systems analysis is essentially new in sedimentology, but
AMOROCHO and HART (1964) give an excellent discussion with diagrams of this
approach in hydrology, which is transferable to sedimentological stream studies.
It is evident from the foregoing that a wide spectrum extends from the statistical
study of a single particle-size frequency curve to the analysis of a complex multi-
variate data matrix. Moreover, there is a complete range from qualitative conceptual
models through statistical predictor models, to increasingly analytical models that
seek to “explain” a given process in terms of physically-meaningful functional
relationships. Although many sedimentological models are expressed in qualitative
form, there normally is a background of numerical data that supports the generali-
zations in the model. Thus, a diagrammatic model showing sediment dispersal paths
by arrows of varying thickness to indicate relative contributions from several source
areas is in effect a generalization that may involve detailed study of particle size
distributions, cross-bedding directions, heavy mineral content of sand or lithologic
types of pebbles, and so on.
The point of emphasis here is that a given sedimentological model tends to
integrate large amounts of information, some of which may be purely qualitative
(color of beds, for example), some may be statistical (mean grain size, etc.), and some
may involve application of physical principles of stream transportation. Hence,
distinction between purely “statistical” models and other kinds of sedimentological
models tends to become blurred.
It would be instructive to compile a bilbiography of sedimentological models,
with references cross-indexed according to the structure of the models proposed.
Such an endeavor is beyond the scope of this paper, but there is one aspect of model
building that does bring out an important distinction between ways of structuring
sedimentological data. This brings us to the final stage of our discourse.

THE THIRD STAGE OF STATISTICAL APPLICATION: PROBABILISTIC MODELS

As multivariate statistical models come into more general use, sufficient data
become available for experimentation with new kinds of models. Among these, as
mentioned in the introduction, are stochastic process models. In order to indicate
their relation to other kinds of mathematical models, distinction may be made between
deterministic models1 and probabilistic models as devices for examining sedimentolo-
gical phenomena.

1 I use the term deterministic model for the conventional mathematical expression of a process without
stochastic components, so that, given a set of values for the independent variables, the magnitude of
the dependent variable can be computed exactly. Terms that may be synonymous include functional
models, mechanistic models, analytical models, actualistic models, and perhaps others.
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 15

The goal of most scientific endeavor is to “explain” a phenomenon so thor-


oughly that every state of a system in time or space can be exactly predicted. In
conventional language this means relating each effect to its cause in a continuum
of space and time.
The classical approach to this goal is that of mathematical physics, which starts
from basic physical principles and derives a quantitative expression for the phenom-
enon under study. This expression commonly is a differential equation, which when
integrated with suitable boundary conditions, becomes a deterministic model, whose
coefficients, exponents, and other parameters are constants for the conditions stipu-
lated in the model. By inserting selected values of the independent variables into the
equation, the value of the dependent variable can be exactly predicted.
Where the underlying process is thoroughly understood and well documented
by observational or experimental data, it is possible to set up deterministic models
for sedimentological phenomena. In a very large number of problems, however,
it is not possible to evaluate all of the underlying controls on a sedimentary process.
A good example is afforded by cyclical successions of sediments, in which an under-
lying pattern of rock succession can be discerned, but in which the actual sequence
of sedimentary types and thicknesses cannot be exactly predicted. No doubt the place-
ment and thickness of each bed was determined by one or more controls in the complex
of processes that went on, but whether the control was exerted by local uplift in the
source area, by lateral shifts of environment due to remote tectonic events in ocean
basins, or by shifting streams that bring varying supplies and kinds of debris to the
moving strand line may be indeterminate. Such controls in ancient rock successions
are never directly observable, nor may there, in many cases, be sufficient evidence in
the rocks to point unerringly to a specific controlling factor.
Where a system is highly complex, or where the ultimate “cause” of each
variation cannot be traced and evaluated, a probabilistic model provides a device by
which the observed succession of beds can be structured in such a way as to bring

[MODELS]
I
I

/Ll
MEMORY

INDEPENDENT
’PATH -
STO CHAST I C
T MEMORY

DE PEN DENT
EVENTS PROCESS DETERM I N -
MODELS MODELS ISTIC
(STAT I T I C A L )
A-
MODELS

(COMPLETELY (PARTIAL (FULL


RANDOM) DEPENDENCE) D’E P E N DE N CE )

Fig.2. Diagram of models in terms of dependence and memory.


Sedirnentology, 10 (1968) 1-23
16 W. C. KRUMBEIN

out the major patterns of behavior. Because the variations are not exactly predictable
in the sense that they cannot be directly related to a deterministic mechanism, these
variations may appropriately be designated as random. But randomness is not to be
thought of in this connection as completely haphazard, without rhyme or reason.
A random variable is a mathematical entity that arises from probabilistic mecha-
nisms, just as conventional nonstochastic (systematic) variables arise from deter-
ministic mechanisms. In a completely deterministic model the outcome of an experi-
ment is exactly predictable, as stated above. In a probabilistic model the outcome
of an experiment is controlled by an underlying set of probabilities, and hence it is
predictable only in terms of relative likelihoods associated with a set of possible
outcomes. Models that contain both deterministic and probabilistic elements define
processes that follow the systematic path of the deterministic core, with superimposed
random fluctuations introduced by the “built-in” probabilistic mechanism.
The probabilistic mechanism can be built into physical processes in various
ways. At one extreme is the independent-events model, in which each experimental
outcome is completely independent of preceding outcomes. This is followed by a
broad spectrum of models showing partial dependence of each outcome on preceding
outcomes, to the other extreme of the completely path-dependent deterministic model
with no probabilistic element at all (see Fig.2). It is the broad band in the middle of
this spectrum that is being examined by statistically-minded geologists in various
fields. Out of a variety of stochastic process models the3rst-order Markov chain has
been more widely used in geology than any other, and it is illustrated here.

MARKOV MODELS

Markov chains occupy a position of partial dependence in the spectrum from


independent-events models to deterministic models. Specifically, for a first-order
Markov chain, the state of the system at time tr is dependent only on the state of
the system at time t,l. Hence, this model has a very short “memory” in contrast
to the long memory of the classical deterministic model, where the state at time tr
depends upon all previous states. In this framework, the independent-events model
has no memory at all.
The geological literature on Markov chains is expanding rapidly. They have
been used in geomorphology, stratigraphy, sedimentology, paleontology, and petro-
logy. The following list covers sedimentology and stratigraphy, the latter because
most papers consider cyclical deposits. ALL~GRE (1964), SCHWARZACHER (1964), CARR
et al. (1966), GRWF~THS (1966), HARBAUGH (1966), KRUMBEIN (1967), and POTTER
and BLAKELY (1967b), are concerned with stratigraphic examples, and POTTERand
BLAKELY (1967a) apply the model to synthesis of stream deposits. VISTELIUS (1949)
was the first to use Markov models in stratigraphy, and his recent papers include
VISTELIUS and FE~GEL’SON (1965) and VISTELIUS and FAAS(1965). References to
papers in other geological fields are given in KRUMBEIN (1967). KEMENY and SNELL
(1960) give an elementary introduction to the mathematics of Markov chains.
Sedirnentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 17

TABLE111
GENERALIZED TRANSITION PROBABILITY MATRIX, py

To j +
A B C

A Markov chain is normally structured as a transition probability matrix,


which controls the transitions from a given state to itself or to another state. The
elements of this matrix are the probabilities pq, which represent the probability of
moving to s t a t e j at time t,, given that the process is in state i at time t,.-l. These
probabilities are the “driving forces” on the transitions, and the model is appropria-
tely classified as a statistical model.
A three-state transition probability matrix is shown in Table 111. The diagonals,
p 1 1 , p 2 2 , and p 3 3 represent the probability of a given state being succeeded by itself,
whereas the off-diagonal elements plz, p 1 3 , etc., represent the probabilities of transi-
tions from one state to another. In sedimentology the states of a system may be defined
as the kinds of sediments present in a given deposit. Thus in a stream deposit made
up of massive sand, cross-bedded sand, and ripple-bedded sand, these sedimentary
types may be designated as states A, B, and C, respectively.
The transition matrix for such a deposit can be developed in several ways.
One is to record the state of the system at short but fixed vertical intervals. If the
observed sequence is AABBBABBCB.. ., then the first transition is from A to
itself, followed by a transition from A to B, succeeded by several transitions from
B to itself, leading to a transition to C , and so on. This method is described
in KRUMBEIN (1967).
A second way of structuring the transition matrix is to note only the changes
from one sediment type to another, and to keep a separate record of the thicknesses
of each type. Thus, the preceding sequence would be structured simply as ABABCB. .,
representing transitions from massive sand (A) to cross-bedded sand (B), back to
massive sand followed by cross-bedded sand, and then by ripple-bedded sand (C),
and so on. In this instance the diagonal probabilities are zero. CARRet al. (1966)
as well as POTTER and BLAKELY (1967a) describe this procedure, including a variant
with “multistory” lithologies that introduce diagonal probabilities greater than zero.
These different ways of setting up transition probability matrices indicate,
among other things, that the Markov model is flexible, and may be structured in
various ways. Thus, the chain may be second-order or higher, in which case events
at times tr- 2 , tr- 3 , . . .,may be examined in addition to tr- 1 in setting up the transi-
tion matrix. ALLEGRE (1964) describes first- and second-order chains, and PATTISON
(1965) used a sixth-order chain in a study of hourly rainfall rates. In addition to
Sedimentology, 10 (1968) 7-23
18 W. C. KRUMBEIN

flexibility in these respects, Markov chains may be structured as a system operating


at equal or unequal intervals of time, as points along a line, or as unit areas over a
surface. The states may also be defined in various discretized ways, so that even a
continuous variable (particle size for instance) may be arranged as discrete diameter
states.

UPPER CAMBRIAN GLAUCONITIC SANDSTONES IN A MARKOV FRAMEWORK

Transition probability matrices have interesting properties when they are


examined for the patterns of sedimentary successions that they disclose. The Upper
Cambrian Mazomanie Formation of Wisconsin is dominantly white- to buff-colored
quartzose sandstone, with abrupt incursions of dark green glauconitic sandstone
(the Reno Formation) that seem to occur sporadically through the section. Field
evidence suggests a cyclical pattern that starts with clean quartzose sandstone ( A ) ,
followed upward by light buff dolomitic quartzose sandstone (B), and succeeded by
glauconitic sandstone (C) which also is followed upward by dolomitized glauconitic
sandstone (0).
The seemingly sporadic positioning of the glauconite beds and the abruptness
of their occurrence suggested that relationships between the white and green sandsto-
nes might be brought out by structuring the data as a first-order Markov chain
(COATES, 1967). A transition matrix, based on 103 observations at 1-foot vertical
intervals through a continuous outcrop, is shown in Table IV, with the four states
arranged in the presumed order of cyclical succession. The matrix clearly discloses
two “loops” involving the white and buff sandstones (states A and B), and the green
sandstone (states C and 0). The following points regarding the Mazomanie are perti-
nent for interpreting the matrix. In the first place, the white and buff sandstone
represents on the average about 69% of the section, with green glauconitic sand
making up the other 31 %. The average bed thickness of the four kinds of sandstone
is 1.8 ft. for clean white sandstone, 1.8 ft. for the buff dolomitic sandstone, 1.4 ft. for
clean glauconitic sandstone, and 2.1 ft. for the dolomitic glauconitic sandstone.
These relative thicknesses are fairly well shown by the probabilities in the diagonal

TABLE IV
TRANSITIONPROBABILITY MATRIX OF MAZOMANIE (UPPER CAMBRIAN) FORMATION, SOUTHERN WISCONSIN
(After COATES,
1967)
____________-
--~
A B C D

White quartzose sandstone A


-0.47 0.47 0.03 O.O?
Buff dolomitic sandstone B 0.33 0.44 0.17 0.06
Green glauconiticsandstone C 0.23 0.15 0.31 0.31
Green dolomitic-glauconitic sandstone D 0.21 0.10 0.16 0.53
STATISTICAL MODELS IN SEDIMENTOLOGY 19

of Table IV, in which state C is smallest, states A and B are intermediate, and state
D is thickest. Thus the prominence of the green beds when they occur is attributable
to their greater average thickness, despite their smaller proportion in the whole
section.
The off-diagonal elements in Table IV are instructive also. The transitions
AA, AB, BA, and BB form a compact group, such that the probability of moving
from state A to state C or D is very small, and the probability of moving from state
B either to state C or D is not much larger. What this means is that when the system
is in the white to buff sandstone loop it tends to remain there, but once it gets out
(usually by transition BC in the ideal cycle) it tends to hover in states C and D,
although the passage back to A or B is notably greater than the reverse. If the system
moves to D, the relatively large probability (0.53) of staying there tends to produce
the thick “bursts” of green sand in the dominantly white to buff sections.
What this example demonstrates is that by structuring the data into a transition
probability matrix relationships observed on the outcrop become much clearer, and
the pattern of succession is more apparent. Naturally, the question arises whether
the process is Markovian or not; fortunately, statistical tests are available (KRUMBEIN,
1967; POTTER and BLAKELY, 1967a) and the data in Table I11 safely satisfy the criterion.
When a Markov property is present the transition matrix may be used to
simulate the cycles, and the long-term equilibrium proportions of each sediment
type can be computed. Fig.3 shows part of the actual section and part of a computer
simulation. The agreement is evident, although such agreement does not necessarily
demonstrate the adequacy of the model. Of equal importance is this question: given
that the process is Markovian, what does this mean geologically, in terms of sedimen-
tary environment, source area of the quartz and glauconite, and so on ?

OBSERVED MARCHAIN LEGEND


STATE A
QUARTZOSE
ss
j:=’=lA
STATE B
DOLO-QTZ
ss
m B
STATE C
GLAUC SS
m
C
STATE D
DOLO-GLAUC
ss
D D

Fig.3. Observed and simulated stratigraphic sections of Mazomanie (Upper Cambrian) For-
mation. Section on right from matrix of Table 1V with computer program “Marchain” (KRUMBEIN,
1967). Length of sections about 10 m.
Sedimentology, 10 (1968) 7-23
20 W. C. KRUMBElN

DETERMINISTIC AND PROBABILISTIC MODELS IN SEDIMENTOLOGY

The question just raised comes immediately to the nub of the problem as to
whether probabilistic models can play a legitimate role in sedimentology as devices
for “explaining” a process rather than merely describing it. There is some difference
of opinion regarding the validity of invoking random processes, as well as of the
geological meaningfulness of simulations obtained from transition probability ma-
trices. To some it seems that reliance on random processes implies surrender of the
basic methods of science; to others it seems equally apparent that for most natural
processes the deterministic model is merely an oversimplified expression of processes
that are basically probabilistic.
COLEMAN’S book on mathematical sociology (1964) presents an interesting
and relatively unbiased discussion of deterministic and probabilistic models in a
field not unlike geology in its complexity of interlocked variables. In chapter 18
Coleman discusses “tactics and strategies in the use of mathematics”, which includes
a section on probabilistic and deterministic mathematics. He emphasizes that in general
the deterministic model is mathematically simpler than its stochastic equivalent. In
some cases at least, the deterministic model contains the mean values (= expected
values) of the probability distributions in the corresponding stochastic process model.
Thus, where interest is mainly in mean values, the deterministic model is to be pre-
ferred. In geology, BRIGGSand POLLACK (1967) conclude that “. . .when sufficient
knowledge exists concerning the processes and the geology, there is no need to invoke
random, probabilistic processes”. Another viewpoint in the spectrum of opinion is
that of HARBAUGH (1966) and HARBAUGH and WAHLSTEDT (1967), who present

RELATIONS A M O N G MODELS
CONCEPTUAL M O D E L
( M E N T A L IMAGE)

CONVENTIONAL
STATISTICAL

I /ANAYs’s\
DETERM I N ISLSTATISTICAL-STOCHASTIC
1
TIC MODEL PREDICTOR PROCESS
I MODEL MODEL

REAL-WORLD DATA
(RE-LOOP A S NECESSARY)

Fig.4. Generalized diagram of some relations among deterministic, statistical, and stochastic
process models. Although the cross-path from deterministic to stochastic process models is by way of
predictor models, there is also a direct relation between them. See text for discussion.
Sedimentology, 10 (1 968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 21

computer programs for simulating sedimentary processes by use of stochastic process


models. The probabilities used as input can be selected on the basis of their geological
reasonableness for the process under study, without necessarily using real data. By
changing the input, a variety of situations can be simulated including deltaic deposits,
beaches, algal reefs, and other sedimentary features that “ . . .develop progressively. . .
with startling realism”.

RELATIONS AMONG CURRENT APPROACHES

Fig.4, based on a lantern slide used in the oral presentation, is inserted here
to illustrate in a broad way how current sedimentological studies are interlocked in
terms of deterministic, conventional statistical, and stochastic process models. If
one starts with a conceptual model, he may follow the left-hand path in the diagram
as a direct approach to the development of a deterministic model. The middle path
represents a conventional statistical approcach that leads normally to a statistical
predictor model. The right-hand path represents one kind of probabilistic approach,
leading for example to a Markov chain. In any case, the final model is ultimately
tested with observational data to evaluate its adequacy in the real-life world.
What the figure shows primarily is that all three paths are very closely inter-
locked in methodology, and there is considerable freedom of flow from one path to
another in the diagram. Thus, conventional statistical analysis (some form of the
general linear model, usually) may be used to “sort-out’’ the more important variables
or states in a system. These particular variables may then be used in formulating
deterministic or stochastic process models. The choice of a particular path through
the diagram is controlled by the purposes of the study, by the kinds of data available,
and by the way in which the data are structured for analysis. In particular, the deter-
ministic and stochastic process models endeavor to make explicit the physical meaning
of coefficients and exponents, whereas in the statistical predictor model this is not
necessarily a primary consideration. Predictor models can be very important in the
search for natural resources, for example, even though the physical meaning of each
element in the model may not be fully understood.
In the long run the predictor model represents an intermediate stage in theo-
retical analysis, and the ultimate choice for theoretical sedimentology appears to lie
between deterministic and stochastic process models.

CONCLUDlNG REMARKS

The divergence of opinion regarding the relative usefulness of deterministic


and probabilistic approaches suggests that the truth lies somewhere between. This
may not be necessarily so, however, as the theory of more complex or more compre-
hensive sedimentological processes is investigated, especially with succeeding gene-
rations of computers. No doubt some present “unexplainable” variations, now
assigned to random fluctuations, will be accounted for by more sophisticated deter-
Sedimentology, 10 (1968) 7-23
22 W. C. KRUMBEIN

ministic models. However, with greater computer capability the mathematically


more complicated stochastic models can conveniently be used. It will always be
possible to reduce the complexity of a system by taking the expected values of the
distributions involved (thereby obtaining the corresponding deterministic model),
but this will be at the risk of ignoring important stochastic components in the system.
This present paper can hardly be used to support the foregoing judgment
regarding the future of theoretical sedimentology, inasmuch as my own example
of a Markov chain is largely descriptive. However, when the full range of stochastic
process models is examined, it is apparent that they afford a wide choice of ways
for analyzing the same problems as are now treated deterministically. This was
illustrated very recently by CONOVER and MATALAS (1967) with respect to suspended-
sediment concentrations in streams. Such models, developed in a probabilistic frame-
work, have the advantages that accrue from including variability as well as mean
values, which was seen to be basically important even in the first two stages of statisti-
cal applications in sedimentology.

REFERENCES
ALL~GRE, C., 1964. Vers une logique mathematique des series stdimentaires. Bull. SOC.Gkol. France,
6 : 214-218.
AMOROCHO, J. and HART,W. E., 1964. Critique of current methods in hydrologic systems investi-
gation. Trans. Am. Geophys. Union,45 : 307-321.
BAILY,N. T. J., 1957. The Mathematical Theory ofEpidemics. Griffin, London.
BARTLETT, M. S., 1960. An Introduction to Stochastic Processes with Special Reference to Methods and
Applications. Cambridge University Press, London, 3 12 pp.
BEER,S . , 1959. Cybernetics and Management. Wiley, New York, N.Y., 214 pp.
BRIGGS,L. I. and POLLACK, H. N., 1967. Digital model of evaporite sedimentation. Science, 155 :
453456.
CARR,D. D., and others, 1966. Stratigraphic sections, bedding sequences, and random processes.
Science, 154 : 1162-1164.
COATES,M. S., 1967. Application o,f a Markov Chain to the Mazomanie-Reno Sandstone in South-
Central Wisconsin. Thesis, Northwestern Univ., Evanston, Ill., 97 pp. (unpublished).
COLEMAN, J. S., 1964.Introduction to Mathematical Sociology. Free Press, Glencoe, Ill., 554 pp.
CONOVER, W. J. and MATALAS, N. C., 1967. A statistical model of sediment transport. U. S., Geol.
Surv., Profess. Papers, 575-B : B60-B61.
GRIFFITHS, J. C., 1966. Future trends in geomathematics. MineralZnd., Penna. State Univ., 35 : 1-8.
GRIFFITHS, J. C., 1967. Scientz$c Methods in Analysis of Sediments. McGraw-Hill, New York, N.Y.,
508 pp.
HARBAUGH, J. W., 1966. Mathematical simulation of marine sedimentation with IBM 7090/7094
computers. State Geol. Surv., Kans., Computer Contr., 1 : 1-52.
HARBAUGH, J. W. and MERRIAM, D. F., 1968. Computer Applications in Stratigraphy. Wiley, New
York, N.Y.. (in press).
HARBAUGH, J. W. and WALHSTEDT, W. J., 1967. FORTRAN IV program for mathematical simulation
of marine sedimentation with IBM 7040 or 7094 computers. State Geol. Surv., Kans., Computer
Contr., 9 : 1 4 0 .
KEMENY, J. G. and SNELL,J. L., 1960. Finite Markov Chains. Van Nostrand, Princeton, N. J., 210 pp.
KRUMBEIN, W. C., 1967. FORTRAN IV computer programs for Markov chain experiments in geol-
ogy. State Geol. Surv., Kans., Computer Contr., 13 : 1-38.
KRUMBEIN, W. C. and GRAYBILL, F. A., 1965. An Introduction to Statistical Models in Geology.
McGraw-Hill, New York, N.Y., 475 pp.
MILLER,R. L. and KAHN,J. S., 1962. Statistical Analysis in the Geological Sciences. Wiley, New
York, N.Y., 483 pp.
Sedimentology, 10 (1968) 7-23
STATISTICAL MODELS IN SEDIMENTOLOGY 23

PATTISON,A., 1965. Synthesis of hourly rainfall data. Water Resources Res., 1 :489498.
POTTER,P. E. and BLAKELY, R. E., 1967a. Generation of a synthetic vertical profile of a fluvial
sandstone body. J . SOC.Petrol. Eng., 6 : 243-251.
POTTER,P. E. and BLAKELY, R. F., 1967b. Random processes and lithologic transitions. J. GeoZ., in
press.
SCHWARZACHER, W., 1964. An application of statistical time-series analysis of a limestone-shale
sequence. J . Geol., 72 : 195-213.
SMITH,F. G., 1966. Geological Data Processing: Using FORTRAN IV. Harper and Row, New York,
N.Y., 284 pp.
VISTELIUS,A. B., 1949. On the question of the mechanism of the formation of strata. Dokl. Akad.
Nauk S.S.S.R., 65 : 191-194.
A. B. and FAAS,
VISTELIUS, A. V., 1965. On the character of the alternation of strata in certain sedi-
mentaryrock masses. Dokl. Akad. Nauk S.S.S.R., 164 : 629-632.
VISTELIUS,A. B. and FEIGEL’SON, T. S., 1965. On the theory of bed formation. Dokl. Akad. Nauk
S.S.S.R.. 164 : 158-160.

Sedimentology, 10 (1968) 7-23

You might also like