You are on page 1of 15

COMPUTER

MODELLIRIG
PERGAMON Mathematical and Computer Modellhrg 33 (2001) 707-721
www.elsevier.nl/locate/mcm

Modelling Rainfall-Runoff
using Genetic Programming
P. A. WHIGHAM* AND P. F. CRAPPER
CSIRO Land and Water
P.O. Box 1666,Canberra,A.C.T. 2601, Australia
pwhighamQlnfoscience.otago.ac.nz
Peter.CrapperOcbr.clv.csiro.au

Abstract+enetic programming is an inductive form of machine learning that evolves a computer


program to perform a task defined by a set of prmented (training) examples and has been succeesfully
applied to problems that are complex, nonlinear and where the sise, shape, and overall form of the
solution are not explicitly known in advance. This paper describes the application of a grammatically-
based genetic programming system to dii rainfall-runoff relationships for two vestly different
catchments. A context-free grammar is used to deilne the search space for the mathematical language
used to express the evolving programs. A daily time series of rainfall-runoff is used to train the
evolving population. A deterministic lumped parameter model, based on the unit hydrograph, is
compared with the results of the evolved models on an independent data wt. The favourable results
of the genetic programming approach show that machine learning techniques are potentially a useful
tool for developing hydrological models, especially when surface water movement and water losses
are poorly understood. @ 2091 Elsevier Science Ltd. All rights reserved.

Keywords--Rainfall runoff, Genetic programming.

1. INTRODUCTION

The major objective of rainfall-runoffmodelling is to predict the runoff of a catchment from the
rainfall incident on the catchment. The response of the catchment (especially Australian catch-
merits) is highly capricious depending not only on the catchment characteristics(e.g., topography,
area), vegetation characteristics, and antecedent conditions, but the meteorological conditions
(e.g., areal distribution of r&fall) in a highly nonlinear and unpredictable fashion. Developing
models that describe this relationship may help in understanding the overall behaviour of the
catchment and support the development of more process-based models and catchment classific&
tion schemes.

1.1. Machine Learning


The field of evolutionary computation has been widely studied since the 1969s [1,2]. This form
of machine learning is characterisedby the use of a population of objects that compete to perform
some specified task. Using biological analogies, the population of possible solutions are modified
in two main ways:

*Presentaddress:Dept.of Information
Science,University
of Otago, P.O. Box 53, Dunedin,
NewZealand.

08957177/01/t - ses front matter @ 2091 Elsevier Science Ltd. All rights mserved. FW=tby&@-Wf
PII: SO8957177(90)002740
708 P. A. WHIGHAM AND P. F. CRAPPER

l mutation of an individual, which is an asexual operation causing (normally) a small change


in the individual, and
l mating between individuals, which mixes the parent representations to create a child.
Individuals of the population are given a fitness measure, which is deiined by their performance
against a training set of examples relating input and output patterns. Selection of individuals for
mutation and mating, which creates the next generation of objects, are based in a proportional
way on their fitness. This selection mechanism (similar in concept to Darwinian natural selection)
drives the population towards better individuals, and therefore, better solutions.

1.2.Bias and Learning


Bias may be defined as the factors that influence a learning system to favour certain hypotheses
or strategies. The application of a learning system always involves some form of bias. Bias may
be introduced in any of the following areas:
l the problem representation,
l the operators used to search the representation space,
l the structural constraints of the representation,
l the search constraints when manipulating the representation, and
l the criterion used to evaluate proposed solutions.
The use of bias in machine learning has been promoted for many years when knowledge is
available that can help narrow the search space of the problem. For example, Lenat [3] stated
All our experiences in AI research have led us to believe that for automatic programming, the
answer lies in knowzedge, in adding a collection of expert rules which will guide code synthesis
and transformation.
When knowledge about the structure and form of good solutions is known, there should be the
opportunity to define this knowledge explicitly. This is described ss background knowledge+ and
includes bias of the language and how particular parts of the search space are to be explored.
The machine learning technique described in this paper allows language and search bias to be
explicitly defined. Although thii ability is used only to a limited extent within thii paper, it is
hoped that the possibilities for expression of bias with time series modelling will be made clear.

1.3.Rainfall-Runoff Modelling

One of the traditional approaches to hydrograph modelling is to use the concept of the in-
stantaneous unit hydrograph (IUH). The IUH is defined ss the hydrograph produced by the
instantaneous application of a unit depth of rainfall to a catchment. The shape of the IUH is
similar to a single peak hydrograph with a rapid rise and a slower decay. The fundamental as-
sumption in the IUH model is that the precipitation input is equal to the integrated streamflow
output. The nonlinear relationship between rainfall and streamflow has led to the development
of effective mk&& which is determined by applying a nonlinear filter to the raw rainfall data.
This effective rainfall is then equated with the integrated streamflow for the specified catchment.
The IHACRFS model applied in thii paper is based on IUH principles. The model defines a
unit hydrograph for total streamflow by defining separate unit hydrographs for the quick&v and
the slowflow components. The model is defined by six parameters, four of which are determined
directly from the raw rainfall, streamflow, and temperature (or a surrogate), while the other two
(the nonlinear parameters) are calibrated using a trial and error search procedure, optimismg
the model to fit the observed rainfall-runoff relationship. Additional details about the model are
contained in [4,5].

1.4. Aims
This paper will compare two diierent approaches to predicting rainfall-runoff relationships.
One approach uses an evolutionary machine learning algorithm which evolves programs defined
Modellhg Rainfall-Runoff 709

in a formal language; the other approach uses a (more traditional) deterministicmodellll frame
work baaed on the unit hydrograph.
The format of the paper will proceed as follows: genetic programming (GP) will be introduced
as a form of program induction using evolution. Formal grammars are then shown to be a useful
method for defining language structure and may be neatly described in the same framework as
GP. Grammars are shown to be able to be used as generators for sentences of structured lan-
guages and may be manipulatedusing a crossoveroperator that maintainsthe language structure.
Additionally, a grammar is shown to be able to express both language and search bias. Finally,
a grammatical GP system is used to model two catchments with vastly different rainfall-runoff
behaviour. The results are compared with the deterministicmodel IHACRJZS.

2. GENETIC PROGRAMMING
The field of program induction, using a tree-structured approach, was first clearly defined by
Koza [6]. This field, named genetic programming (GP), evolved a solution in the form of a Lisp
program using an evolutionary, population-based, search algorithm which extended the flxed-
length concepts of genetic algorithms defined by Holland [2]. The structures were defined as a
combination of functions (arity > 0) and terminals (0-arity functions) which combined to form
Lisp programs, The following steps summarisethe search procedure used with GP.
1. Create an initial population of programs, randomly generated as compositions of the
function and terminal sets.
2. WHILE termination criterion not reached DO
(a) Execute each program to obtain a performance (fitness) measure representing how
well each program performs the specified task.
(b) Use a fitness proportionate selection method to select programs for reproduction to
the next generation.
(c) Use probabilistic operators (crossover and mutation) to combine and modify compo-
nents of the selected programs.
3. The fittest program representsa (general&d) solution to the problem.
The termination criterion is based upon the problem that is being solved. Normally, an exact
solution cannot be obtained and so the search for a solution is complete after a certain number of
generations have been performed. In the csse of the timeseries model that is developed in thii
paper, an exact solution to the training data is neither desirable (since this would imply that the
solution has been overly specialised) nor is it likely to occur. Hence, the solution that is accepted
is the one with the best fitness after a fixed number of generations.
GP has been previously applied to th-neseries prediction (71; however, no previous work has
applied GP to the important problem of predicting rainfall-runoffpatterns.

3. FORMAL GRAMMARS AND LEARNING


This section introduces the use of formal grammars to silow explicit biasing in the framework
of genetic programming. The grammar structures are shown to be capable of explicitly stating
both the language bias and how the.language is searched.
A formal grammar is a production system which defines how nonterminal symbols may be
transformed to create terminal sentences of a language. A grammar is represented by a four-
tuple (N, D, P, S), where N is the alphabet of nonterminalsymbols, C is the alphabet of terminal
symbols, P is the set of productions, and S is the designated start symbol. The following
grammar, Gmath,defines a language for generating all possible mathematical expressions using
the operators +, -, *, /, and a set of random real numbers, representedby the symbol 8.
G math =
(8
N = {Ml,
710 P. A. WWGHAMAND P. F. CRAPPER

c = {+,-,*,/181),

P=

1
S-+M
M++MM 1 -MM 1*MM 1/MM 1%
1
1

3.1. Derivation Steps and Derivation Tbees

A derivation step represents the application of a production to some string which contains a
nonterminal. In general, a series of derivation steps may be represented by a syntax tree or deriva-
tion tree. As shown in Figure 1, the following derivation steps, which define the mathematical
expression for (5.2 + (2.12 * 73.89)),

S=sM=++MM=%+%M++5.2M=++5.2*MM
=s +5.2 * RM =+ i-5.2 * 2.12M =t- +5.2 * 2.12% =b++5.2 * 2.12 73.89, (1)

may be represented as a tree. The genetic search operators of crossover and mutation are applied
directly to these trees.

+hL\M
I */I\
;
5.2
7 i
; 7
2.12 73.89

Figure 1. A derivation tree for the expression 5.2 + 2.12 * 73.89 using cm&h.

3.2. How Do We Use the Grammar?

A grammar G may be used as a generator of sentences which are a part of the language L(G).
A limit on the depth of the derivation tree created from the grammar is necessary to ensure that
the generation process halts (there are also the practical issues of implementing the generation on
a finite machine and being able to execute the created programs). There are two steps involved
in using G to generate random sentences from L(G). Initially, each production P E G is labelled
with the minimum depth of derivation tree that can be created from thii production to produce a
string of terminal symbols. This mm-depth-tree value is used to guide the selection of productions
when randomly creating sentences from L(G), limited by some maximum depth of derivation tree.
Each derivation tree is evaluated against the test data to obtain a fitness measure. Selection of
programs (derivation trees) for crossover and mutation are based on their relative fitness. Each
generation, the population is transformed using the genetic operators of crossover and mutation
to give the next population. This process continues until some maximum number of generations
have passed, or an acceptable solution has been discovered.

3.3. Genetic Operators

This section describes how a derivation tree is modified from one generation to the next.
There are two search operators that are used to modify the evolving rainfall-runoff model; the
ModelliigRainfall-Runoff 711

crossover operator is used as the standard search operator for each generation, mixing elements
of potentially useful partial solutions in an attempt to build a better solution; a hill-climbing
mutation is used as a final operator (after a fixed number of generations) that allows the random
constants that, have been used to seed the population to be modiied in an attempt to move
towards a more optimal solution. These two operators will now be described.

3.3.1. Crossover

Crossover applies two (parent) individuals and creates two (offspring) individuals. Crossover is
defined by two parameters: the probability of crossover occurring and some nonterminal B E N
where crossover will be attempted. Given two derivation trees dl and dz, the steps involved are
as follows (see Figure 2).
l Randomly select a nonterminal site B from dl and ,B from ds.
l Swap the derivation trees below B, thereby creating two new derivation trees.
l Insert these trees into the next-generation population.
If dl or d2 do not contain the nonterminal B, then no crossover is possible and the operation is
aborted.

Figure 2. Croesoverbetweenderivationtrees.

The benefit of using the derivation trees to represent, the population now becomes clear; by
defining crossover to swap subtrees at, the same nonterminal guarantees that the language defined
by the grammar is maintained.

3.3.2. Hill climbing mutation for real numbers

The rainfall-runoff grammar (see Section 6) defines mathematical expressions which are initially
seeded with variables related to the process under study and random real numbers. These real
numbers are used as constants throughout the evolution, and are combined into the partial
solutions that. are evolved. A final solution that uses one or more of these constants may be
improved (based on the training data) by slight modifications of these constants and then
reevaluating the new solution to see if it performs better. The hill climbing mutation works by
applying small random changes to the constants and maintainii the new solution only if the
new solution improves the final performance based on the training data. The small changes to
constants are produced by creating a random number between -1 and +l which is added to the
constant. This is performed to each constant in the equation that is being mutated. Although
it, has been shown that different distributions for random mutations will perform better on some
problems [8], the purpose of this approach was only to show that the approach produced results
which were comparable with the currently accepted approaches for rainfall-runoff modelling. This
712 P. A. WHlGHAM AND P. F. CRAPPER

mutation is applied a fixed number of times and only affects solutions which contain random
constants. This operation may be considered as a fine tuning of the constants in the evolved
solution.

4. LANGUAGE AND SEARCH BIAS


The language L(G) is shaped by the grammar G. Since this grammar is declaratively defined,
the language bias for the learning system is clearly stated and easily changed without changing
underlying functions of the system. This promotes the exploration of new language constructions
which become apparent during the evolution of partial solutions. For example, a combination of
terms that consistently appear in partial solutions may be explicitly stated as part of the gram-
mar. Thus, an incremental approach to developing a solution may be performed in a declarative
manner.
An additional language bias may be introduced when defining the grammar. Each production
is labelled with a weighting factor, which biases the probability of each production being selected
during the generation of the initial population. This allows background knowledge about which
terminal symbols are likely to be most useful in developing a solution.
Search bias is introduced by allowing the crossover and mutation sites (i.e., the nonterminals) to
be individually specified. For a particular setup, a number of crossover and mutation operations
may be defined for different nonterminals with different probabilities of occurrence. The ability
to specify which nonterminals are used for these operators allows a declarative specification of
where the majority of the search for better solutions will be performed. It also promotes the
exploration of the search space in an interactive sense by the user.
This paper will not pursue the goal of achieving the best possible solution. Rather, the paper
is being used to demonstrate the appropriateness of thii technique to rainfall-runoff modelling,
recognising the potential for further discovery and improvements. The system described in this
paper has been previously defined by Whigham [9,10] and is entitled CFG-GP.

5. CATCHMENT DESCRIPTIONS
In order to test the modelling approaches, two very different catchments were chosen. The
first catchment was the Teifi catchment at Glan Teifi in Wales. The second catchment was
located within the Namoi River catchment in northeastern New South Wales, Australia, The Teifi
catchment is a rural catchment draining 893.6 km2 with an average annual rainfall of 1368 mm.
This station wss maintained and operated by the UK Environment Agency and the data can
be obtained from the Institute of Hydrology. Compared with the Namoi catchment, the number
of rain days per annum is much greater at Teifi but the maximum daily rainfall is only about
half the value for the Namoi. The other major difference is that runoff percentages are very
much higher at Teifi. For the calibration period, the runoff percentage was 66.7%, and for the
simulation period the runoff percentage was 74.95%.
The Namoi River catchment was chosen to be as different as possible from the Teifi catchment.
Using the Department of Land and Water Conservation (the gauging authority) naming conven-
tion, the catchment is referred to as 419030 or the Manilla Rv at Barraba (30 23 24 S and
150 37' 08" E). This catchment is approximately 568 km2 and drains the southern part of the
Nandewar Range. Within the catchment, there are three reliable long-term raingauge stations
with average annual rainfalls of 686 mm, 704 mm, and 727 mm. These stations have a reasonable
spread of location and altitude and the average of the three values has been used as the catchment
rainfall. For very large rainfall events (greater than 100 mm), there was a strong relationship be
tween rainfall and runoff, but as the size of the event decreased, the relationship between rainfall
and runoff became more random. The rainfall in thii part of the country is strongly summer
dominated, which influenced the selection of the calibration and simulation periods. The calibra-
tion run was done from 13 November 1965 to 10 March 1966, and the simulation run was done
ModellingRainfall-Runoff 713

from 4 November 1966 to 13 March 1967. In spite of this catchment having a comparatively high
rainfall (by Australian standards at least) and our selection of the high rainfall months, the runoff
percentage over the calibration period was only 6.12% and the simulation period was 8.24%.

6. CFG-GP SETUP
The grammar, Gexp, used by CFG-GP to develop the rainfall-runoff models allowed general
mathematical functions to be evolved, and was defined as follows.

G exp =

1%
N = {EQU, NL, EXPN},
c = {+, -, *,/,exp,rO,rl,r2;r3,r4,r5,av5,avlO,av15,
av20, av25, av30, ~40, av50, av60, av100, B},
P=

S + +EQU NL
NL + *EQU EXPN
EXPN -+ exp EQU
EQU + +EQU EQU 1 -EQU EQU
EQU --) *EQU EQU 1/EQU EQU
EQU ---texp EQU
EQU ---t r0 1rl 1r2 1r3 I r4 I r5
EQU + av5 I avl0 I av15 I av20 I av25
EQU + av30 I av40 I av50 I av60 I avlO0
EQU-+s

1
1

The terminal symbols r0, rl, . . . , r5 represent the rainfall for the current day up to the last five
days rain. The av5, avl0,. . . , avlO0 terminals are the average rainfall for the last 5,10,. . . ,100
days, respectively. The terminal s is a random floating point number between -10.0 and 10.0
which is generated for each occurrence of R when the initial population is created. These random
constants are potentially modified using the hill climbing mutation once the initial generation of
a solution has been created. The exponential function is represented by the terminal string exp.
The grammar has a structural bias to form equations that are composed of a linear component
and a nonlinear (exponential) component. This is shown by the production S --) +EQU NL,
which forces all programs to have the minimal structure of A + B * exp(C), where A, B, and C
are climate variables or a random real number. The production EQU t expEQU allows the
exponential function to be included with any part of the evolved mathematical expression. The
language bias just forces the use of the exponential function at least once.
Table 1 shows the CFG-GP setup parameters which were used to develop both catchment
models. The crossover operator is applied only to the nonterminal EXP, with a probability
of 90%. Hence, approximately 10% of the population is passed unchanged into each subsequent
generation. This ensures that good solutions are not prematurely removed from the population
and that the building blocks that are useful are maintained. The fitness measure used to evaluate
the performance of each program during calibration was the root mean square error (RMSE).
This approach treated each point, irrespective of magnitude, as equally important. Further work
714 P. A. WHIGHAM AND P. F. CRAPPER

of interest would be to study the effects of different metrics for the fitness measurement, based
on various criterion. A likely outcome of this work would be a classification for catchments based
on which type of metric best evolves an overall rainfall-runoff relationship.

Table 1. CFG-GP parameter settings for evolving a simple rainfall-runoff model.

CFG-GP Parameter 1 Setting I


I Population Size I 1000 I
I Generations I 50 __l

Fitness Measure Minimiss RMSE

The CFG-GP system used the same calibration data ss IHACRES to evolve the rainfall-runoff
models. Comparison of results will only refer to the simulation run which uses previously unseen
data of a similar length. The predictive performance for each model is measured by the R.MSE
for the simulation period and the error in predicted total discharge for the simulation period.

7. RESULTS

7.1. The Glan Ibifi Catchment

The measured rainfall and streamflow, for the Teifi catchment between 1979 and 1982, is
shown in Figure 3 (rainfall events are shown in black). The simulated streamflows determined by
IHACRES and CFG-GP are shown in Figures 4 and 6. The actual error for each model is shown
in Figures 5 and 7. This error represents the difference between predicted and actual runoff. A
visual comparison with the measured streamflow indicates that both approaches have captured
the basic response of the catchment, however IHACRES appears to have better represented the
extreme streamflow events. The root mean square error (RMSE) for IHACRES was 15.4 and
for CFG-GP was 15.7. The predicted total discharge for the simulation period was measured as
35,6OOcumecs, with IHACRES predicting 34,776cumecs (2.3% error) and the evolved CFGGP
solution predicting 33,295cumec.s (6.4% error). Note that the total discharge was not used as
part of the fitness measure for CFG-GP. The CFG-GP expression for the Teifi catchment was
defined as follows:
+ (+ (rl, + (av40, *(avlO, av100))) , * (*(av5,
(2)
+ (-17.121983, *(av5, av40))) ,exp(-3.739896))).
This may be simplified to give

rl + av40 + (avl0 * av100) + (av5 * (-17.121983 + (av5 * av40))) * 0.0237. (3)

Equation (3) shows that the catchment was influenced by antecedent conditions that could
extend for several months into the past (the avlO0 variable represents the average rainfall for
the last 100 days). Note also that the constant exponential expression in equation (2), namely
exp(-3.739896), means that the resultant equation for runoff is a linear function of rl (previous
days rain), av5, avl0, av40, and av100.

7.2. The Manilla River at Barraba, N.S.W.

The measured rainfall and subsequent streamflow for the simulation period in the Barraba
catchment is shown in Figure 8. As can be seen from this data, for large events, there is a strong
Modellmg Rainfall-Runoff 716

300

250

-x0

L
f 100

al

0
m79 l/7/80 m32

Figure 3. Measured rainfall and runoff at Glan IbiB.

ms

Figure 4. IHACRES modelled runoff at Glan Teifi.

Figure 5. IHACFtES error (predicted-actual) for the simulation period at Glan Teifi.
716 P. A. WHlGHAM AND P. F. CRAPPER

mm79 m3o In/81 l/7/82

Figure 6. CFG-GP modelled runoff at Glau Teifi.

rm

l/7/81 v?&2
400

Figure 7. CFG-GP error (predicted-actual) for the simulation period at Glen Teifi.

relationship between rainfall and runoff. For smaller events, however, there is not a significant
relationship between the rainfall and runoff. The simulated streamflows determined by the two
approaches are shown in Figures 9 and 11. The associated errors for each model are shown in
Figures 10 and 12. The RMSE for the IHACRES approach was 13.1 and for CFG-GP was 5.7. The
predicted total discharge for the simulation period wss measured as 187cumecs, with IHACRES
predicting 330 cumecs (76% error) and the evolved CFG-GP solution predicting 99 cumecs (47%
error). For the purposes of our comparison (based on RMSE), these results are similar. The
evolved expression found by CFG-GP was

+ (/ (/(/(/(T-O, -1.911790), -0.474622)) -4.164400),


- (/(-1.542888,5.944119), -(avlO,rO))) , (4)
* (r0, exp (/(0.251564, avl0)))) ,
which may be simplified to

((?-O/3.7)/(0.26 + avl0 - TO)]+ rO* exp(0.252/avlO). (5)


ModelUng Rainfall-Runoff 717

_gJ_.,........._...........................,.........~...
...".
..^.........
........
......I
.,,........
_ _,@)

Figure 8. Measured rainfall and runoff at Barraba.

4/l II66 01101167 I3/03/67

Figure 9. IHACRES modelled runoff at Barraba.

60

40

20

-20

-40

-6u

a0

Figure 10. IHACRFS error (predicted-actual) for the eimulation period at Barraba.
718 P. A. WHICGIAM AND P. F. CRAPPER

sur
C#IQPPmdebdFhnc4l~Bmmbm
en .

60 ..

!!a-
%

E
40 -

a .-

P..

10 -

Figure 11. CFG-GP modelled runoff at Barraba.

Figure 12. CFG-GP error (predicted-actual) for the simulation period at Barraba.

It is worth noting that equation (5) uses the current days rainfall (TO), and the average of the
last ten days rainfall (avl0). Additionally, (5) has the nonlinear term (exp(/(O.251564,avlO)),
which is a function of avl0. A comparison of equations (3) and (5) shows that the two catchments
have been modelled in very different ways. The Welsh catchment has been modelled using long
term averages in a linear combination, whereas the Australian catchment has been modelled
using short average times and the current day in a nonlinear fashion. This would suggest, that
the underlying processes that are driving the water movement throughout both catchments are
quite different.
When an attempt was made to calibrate over different consecutive seasons for the Barraba data,
the IHACRES model was not able to find coefficients to suit all seasons, and therefore, could
not converge. Thii accounts for the short calibration and simulation periods of only four months
that has been used for testing these models. However, the CFG-GP approach, because it makea
no assumption about underlying relationships, was able to be calibrated over successive seasons,
and therefore, use more information about the catchment rwponse to rainfall. When CFG-GP
was calibrated using this longer period (1000 days), the resultant model achieved significantly
Modelling Rainfall-Runoff 719

4/11/66 OlIDlI67 L31D3l67

Figure 13. Rainfall and CFG-GP modelled runoff (1000 days calibration).

80

60

40

20

-20

40

60

80

Figure 14. CFG-GP error (predicted-actual) using 1000 days calibration for Barraba.

better results on the original simulationdata set (RMSE = 3.1). Additionally, the predicted total
discharge changed to 157cumecs (16% error), which is superior to either previous solution. The
response of this modelled streamflow is shown in Figure 13, with the associated error between
predicted and actual shown in Figure 14. The evolved expression wss

+ (exp (+ (/ (exp(-4.874963), -0.796608))


/ (/(TO, -1.864706), +(-3.018028, -4.418388) (6)
))), *(-3.24042O,exp(-1.181253))),

which may be simplified to give

exp (0.0096 + (r0/13.35)) - 0.994. (7)

The interesting comparison between equations (5) and (7) is that using the larger dataset for
calibration (training) resulted in a solution that was a nonlinear function solely of (rO), which
representsthe current days measured rainfall. No average rainfall value was found to be useful.
This implies that the Barraba catchment has a very quick response between rainfall and runoff,
and no significant seasonal signal.
720 P. A. WHIGHAMAND P. F. CRAPPER

8. DISCUSSION
The previous examples of applying CFG-GP to modelling rainfall-runoff has been encourag-
ing. The use of a simple, nonlinear mathematical grammar has allowed the system to produce
equations that capture some measure of the underlying response of the catchment. The linear,
strongly seasonal model evolved for the Teifi catchment has a natural interpretation with the
underlying climatic and topographic characteristics of this catchment. The nonlinear, weakly
seasonal model evolved for the Namoi catchment also corresponds with the perceived behaviour
of this landscape.
The grammar, Gexp, had only a weak bias towards forming certain types of mathematical
expressions. Future work will involve extending the set of useful mathematical functions (such as
a power function between variables) and exploring other language forms which may have more
direct interpretation with natural processes.

9. CONCLUSION
In the present work, we have compared the results obtained with a deterministic lumped
parameter model, based on the unit hydrograph approach, with those obtained using a stochastic
machine learning model.
For the Welsh catchment, the results between the two models were similar. Since rainfall and
runoff were highly correlated, the deterministic assumption underlying the IHACRES model was
satisfied. Therefore, IHACREX could achieve a satisfactory correlation between calibration and
simulation data. It is also interesting to note that for this catchment, the runoff ratio was approx-
imately 70%, which suggests that a relationship does indeed exist between the rainfall and runoff.
The CFG-GP approach does not require any causal relationships but achieved similar results.
The behaviour of the studied Australian catchment is very different from the Welsh catchment.
The runoff ratio was very low (7%), and hence, the a priori assumptions of IHACRES (and
other deterministic models) were a poor representation of the real world. This was demonstrated
by the inability of IHACREJS to use more than one sessons data for calibration purposes and
only able to use data from a high rainfall period. Since the CFG-GP approach did not make
any assumptions about the underlying physical processes, calibration periods over more than
one season could be used. These led to significantly improved generalisations for the modelled
behaviour of the catchment.
In summary, either approach worked satisfactorily when rainfall and runoff were correlated.
However, when this correlation was poor, the CFG-GP had some advantages because it did
not assume any underlying relationships. This is particularly important when considering the
modelling of environmental problems, where typically the relationships are nonlinear, and are
often measured at a scale which does not match with conceptual or deterministic modellmg
assumptions.

REFERENCES
1. L.J. Fogel, A.J. Owens and M.J. Walsh, Artijicial Intelligence through Simulated Evolution, John Wiley &
Sons, ( 1966).
2. J.H. Holland, Adaptation in Natuml and Artificial Systems, MIT Press, (1975); Second edition, (1992).
3. D. Lenat, The role of heuristics in learning by discovery, In Machine Learning: An Artificial Intelligence
Approach, Chapter 9, (Edited by R.S. Michalski, J. Carbonell and T. Mitchell), Springer-Verlag, (1984).
4. A.J. Jakeman, LG. Littlewood and P.G. Whitehead, Computation of the instantaneous unit hydrograph and
identifiable component Rows with application to two small upland catchments, Journal of Hydrology, 117,
275-309, (1996).
5. LG. Littlewood and A.J. Jakeman, A new method of rainfall-runoff modelling and its applications in catch-
ment hydrology, In Environmental Modelling, Volume II, (Edited by P. Zannetti), pp. 143-171, Computa-
tional Mechanics, Southampton, UK, (1994).
6. J.R. Koza, Genetic Pmgmmming: On the Programming of Computers by Means of Natuml Selection, A
Bradford Book, MIT Press, (1992).
Modelling Rainfall-Runoff 721

7. B. Mulloy, It. Riolo and R. Savit, Dynamics of genetic programming and chaotic time series prediction, In
Genetic Programming: proceedings of the Firat Annual Conference, (Edited by J. Kosa), MIT, Cambridge,
MA, (1996).
8. X. Yso and Y. Lui, An Analysis of Evolutionary Algorithms Based on Neighbourhood and Step Size, Lecture
Notes in Computer Science, Volume 19f S, pp. 297-307, Springer-W&g, Berlin, (1997).
9. P.A. Whigham, Inductive bies and genetic programming, In First Znternational Conference on Genetic
Algorithms in Engineering Systems: Innovations and Applications, pp. 461-466, IEE, UK, (1995).
10. P.A. Whigham, Grammatical bias for evolutionary learning, Ph.D. Thesis, School of Computer Science,
University College, University of New South Wales, Australia, (1996).

You might also like