Validation of Air Combat Models

VALIDATION OF AIR COMBAT MODELS
Mrs L A Shakir
Project Manager - Air Combat Development
Centre for Defence Analysis (Air)
DRA Farnborough
Hampshire
Abstract
more importantly, the fact that the accuracy of the

missile has been tested under a single set of
conditions does not prove that it is accurate under any
other condition, it simply generates confidence that it
does. There is still an act of faith to be made. Indeed,
if it were possible to prove absolutely that a model
represents reality in all cases in which it is used, it
would not be necessary to write the model at all,
since any results it generated would only tell you of
an already known reality.
Although validation exercises of a variety of types

have been carried out over a number of years,
validation has recently become a more prominent
issue in combat modelling. This paper argues that air
combat models differ in nature from stable physical
models, and that the type of validation tests
applicable to stable physical models will not give
meaningful answers when applied in the same way to
an air combat model. The definition and processes of
validation therefore need to be widened to cope with
the variability in the possible outcome of air combat,
and a discussion of a variety of types of validation,
and their applicability to air combat follows.
CDA(Air) are at an early stage of developing a
structured approach to validation.
Conversely, however, if it were found that the model

did not represent reality in a single case, then the
invalidity of the model would have been proven. It is
therefore necessary for a model to pass validation
tests, but even if it does, validity in all cases cannot
be guaranteed. This argument places greater
importance on the fact that the validation process
takes place, rather than on the statement that the
model is valid: since this an impossible assertion. The
best we can do is to state that a model has not been
proven to be invalid even though extensive testing
has been applied.
Validation of physical models

The most frequent definition of model validation is to
show a good match between the output of the model
with that part of the real world which it models. For
example, consider a model of a physical system, such
as a missile. If that missile is one which actually
exists, then it is possible to carry out spot checks on
the model by comparing its output with the results of
an actual firing. If there is good agreement between
the performance predicted by the model, and that
which occurred in reality, then this generates
confidence that the model is an accurate
representation of reality. There is little doubt that this
process is a validation exercise. It should be noted,
however, that this exercise does not constitute proof
that the model is accurate. Firstly, it is unlikely to
have represented every small nuance of the behaviour
of the real missile, whose behaviour will be subject to
random noise effects. Even if random noise is
represented in the model, its randomness will have
manifested itself differently in reality. How close,
therefore, is close enough to reality? Secondly, and
Air combat modellin?

A missile system, as described in the example above,
is a relatively well behaved system. Firstly, it is rare
for a random event occurring during the course of a
missiles flight to cause a major deviation from the
normal flight path of the missile. For example, a
freak gust of wind might cause the missile to deviate
slightly from its normal path, but the missiles own
control system will tend to bring it back on course.
Secondly, a missile is a physical system, and
physical systems obey rules which are known,
relatively simple, and are invoked in the same way on
every missile firing.
Air combat as a whole (the missile forms a part of it)
behaves in a very different way. Firstly, random
American Institute of Aeronautics and Astronautics
43
events can cause radical differences in the outcome of

a combat. For example, in 2 v 2 combat, if an early
missile is successful, then the combat becomes 2 v 1
and the side with the early success is at a great
advantage. However, if that missile fails, then the
second shot may well be fired by the other side. If
that is successful, then the situation is reversed. If it is
not successful either, then there may well be a hardfought battle. So, even with the simple randomness of
the success of a missile warhead, differences in
outcome can range from complete success for one
side (Le. both aircraft on own side survive, both
opposition die) to complete success for the other side,
through any outcome in between, including the death
of all participants, or the survival of all participants.
Missile probability of kill is only one of many
random events which can occur during air combat.
to collect data from a large number (typically

hundreds) of real battles, all occurring under the same
start conditions, and all happening for the first time to
ensure that no learning effects blur the issue. Clearly
impossible!
What is validation?
There is difficulty, then, in establishing a ground
truth for air combat in the way that a missile test
firing supplies a ground truth (albeit in limited
circumstances) for a missile model. Given the wide
range of possible outcomes for a combat, it is actually
unlikely that a single run of an air combat model
matches a real-world combat. But the lack of
matching does not necessarily mean that the model is
wrong.
Secondly, there is the variability due to the presence

of the human. Air combat, unlike the model of the
physical system, is run by humans. The decisions
made by those humans are widely variable. Men vary
in their levels of training and skill, and the decisions
made by two different human beings under the same
circumstances may be very different. Indeed, the
decisions made by the same human under the same
conditions at different times may differ. This is a very
considerable effect in combat: good tactics can win
battles despite inferior equipment. One particular
issue pertinent to validation is that a human may
behave very differently under conditions of war than
the way in which he would in a trial.
This does not mean, however, that validation of an air

combat model should not be attempted. Indeed, it is
useful to consider the question more widely,
extending it to what tests can be applied to this
model that can generate confidence that the model is
giving the information required of it?
To do that, it is necessary to consider not simply the
requirement for the model to represent air combat,
but also the use to which that model is put. The
normal type of usage for an air combat model is to
assess and evaluate the outcome of air combat under
differing conditions of equipment and scenario. Most
frequently, this is carried out with the aim of
discovering which of a number of designs for a future
aircraft system or sub-system would be most suitable
for meeting the aims of that system or sub-system,
i.e. comparing the effectiveness of different weapon
systems in combat. In order to do this effectively, it is
not sufficient simply to carry out a single batch of
runs for each piece of equipment being compared. It
is necessary also to know how the results vary as
changes in the assumptions are made e.g. are the
results sensitive to small differences in the start
conditions? If some of the data are uncertain, how do
the results change as that data moves across its
spectrum of possibilities? The investigation of such
issues is called sensitivity analysis, and are an
important part of the service provided by a model.
These two sources of variability mean that, for a

single set of combat start conditions, there are many
possible outcomes which vary widely from each
other. Any real-world data is therefore only one of
the many possible outcomes. If the results of the
model differ from the real-world data, it is not clear
from the results alone whether the difference is due to
an error in the modelling or whether it is simply a
different possible outcome taken from the wide range
of possible outcomes, caused either by differences in
the outcomes of random events, or by differences in
the human decisions made.
Air combat models. are run with many replications (as
are missile models) to try to ensure that the
variability in outcome due to random events is
represented with statistical veracity. Where the
human decisions are made by computer rather than
by the man, attempts are made to balance the tactics
such that their quality is the same on each side of the
combat. Where the human decisions are made by
men, some bias can be removed by exchanging the
sides of the participating pilots half-way through the
set of runs. In order properly to validate the complete
model against ground truth it would be necessary
With the usage of the model in mind, it is reasonable

to reconsider validation issues anew for each study,
asking whether the model is able to deliver the type
of information required, with the required degree of
accuracy, for that study.
So, how can the efficacy of a model for such a
purpose be judged? The Centre for Defence Analysis
(CDA) owns a large number of models which
44
represent a wide variety of types of battle and war.

Many of these models are used in effectiveness
studies. When the CDA was set up, a review was
made of these models. Some thought was given to the
question of validation and the methods which are
currently thought to contribute to the validation of
models were listed. MASS consultants also did some
work to describe the validation status of a particular
air combat model. The work stressed that it is not
always reasonable or practical to apply every method
to every model, and that the approach should be to
apply the most appropriate validation tests to that
model with its current usage in mind, rather than to
try to apply all the tests and declare the model valid
for any future purpose. Validation is therefore an
ongoing process.
Validation of components
The validation of the physical subsystems (aircraft,
missiles, sensors, etc.) of the combat models tends
not to be a contentious issue where the equipment is
current, because it can be carried out against realworld data. More difficulties are presented where the
equipment has not yet been built, since real-world
data, by definition, does not exist. This is where
validation against the best data possible (normally the
manufacturers detailed model representing the
design, following checking by an independent expert)
becomes appropriate.
The biggest difficulty is the validation of the
contribution of the man: the skills and tactics. There
is no shortage of real-world men: however, there is
enormous variability in the decisions made by
different men, or even by the same man at different
times. A wide variety of possible tactics fall within
the bounds of realism, therefore, and the problems of
this variability, applied to the whole model in section
2, also apply to the validation of the component
man. Also, where a physical system behaves the
same way in real combat as it would in a trial, a man
may behave very differently. Therefore, data deriving
from real combat are the most valuable real valid
men against which to compare. Whilst the
difficulties raised in section 2 apply due to the
variability, it is still possible to compare particular
aspects of the behaviour of the inan in the model with
that of the man in combat. For example, what
manoeuvres are taken by a target who has a missile
locked to it? At what range does an aircraft use its
jammers? etc. Such aspects could be compared, and,
if appropriate, made to agree. This is an approach to
validation, whereby a mismatch is not necessarily
taken to be an error, but a reason to investigate more
deeply into the source of the mismatch, leading either
to the discovery of errors, or a deeper understanding
of the problem.
Types of validation
Validation methods can be divided into two major
types: those which validate the components of a
model, and those which validate the whole model.
Air combat models consist of a number of
subsystems: aircraft, sensors, missiles, ECM, etc. It is
possible to validate all these components, by methods
discussed below. The tactics, being the human
element, are rather more difficult to validate.
However, the real test is the validation of the whole
model. Whilst the whole model will not be valid
unless its subsystems are valid, it cannot be assumed
that, because every subsystem has been validated,
that the total model is valid. Methods of holistic
validation are, therefore, very important.
Validation methods have been classified as follows:
1.
Model component validation:

a.
b.
c.
d.
2.
Validation against historical data

Validation against trials data
Validation against the manufacturers
design specification, following checking
by independent expert, and
Comparison with other models.

Whilst the more obvious application for historical
data is in the holistic validation of a model, its
inclusion under the heading validation of
components derives from consideration of some
historical data which has recently become available.
Initially, thought was given to the use of the data for
holistic validation of models, but for reasons
described under holistic validation, (mainly the
incomplete nature of the data), it did not prove
possible. However, there are areas where the data can
be used for the validation of components.
Holistic validation:
a.
b.
c.
d.

Validation against exercise data
Comparison with other models, and
Expert scrutiny.
These methods, their application to air combat, and

the extent to which they have already been applied to
air combat models, is discussed in the following
sections.
45
Validation of the equipment models is one possible

area. While it could be argued that the performance
of aircraft, radar, missiles, etc. is better performed by
examining data from trials and exercises, where
telemetry is available; it is quite possible that the
equipment is used in different ways in actual combat,
and that there may be real data, for example, on
missile firings which take place near the edge of the
firing zone, an area not normally investigated in
fiiing trials.
Validation against the manufacturers design

specification
This method more frequently applies to the input data
to a model than the model itself, but it is reasonable
to think of the model as being the algorithms together
with the data that drive it. Often, the only data
available from which to model a future system are
either the specification data, or those supplied by the
manufacturer to meet the specification. If the
specification data are used, there is no guarantee that
the system is actually feasible in practice. However,
the data supplied by the manufacturer may also be
overly optimistic in its predictions.
Another area is the validation of the behaviour of the

man in combat. This is an important consideration:
Although real men are often used in combat
modelling, both in exercises (which are themselves a
form of combat model), and in man-in-the-loop trials;
both these aspects of modelling are free from the
element of fear, which one would suppose to be an
important factor in guiding the behaviour of a man in
battle. Only in real battle can the behaviour of inen be
observed in the presence of genuine fear. It may be
possible to measure such parameters as the point in
the zone where missiles are fired, or the manoeuvres
carried out following a missile firing, or the reaction
of a pilot when he finds that an enemy radar is locked
to him. This could then be compared with both the
models which represent pilots digitally, and man-inthe-loop models. In the case of man-in-the-loop
models, it might prove difficult, if the test failed, to
change the behaviour of the men such that they
become more like men in real combat, but there are
two possible methods of dealing with this. Firstly, it
may be possible to change the trial conditions such
that the behaviour of the men becomes more like that
of men in real combat, or secondly, it may be
possible to find out what distinguishes the behaviour
of the men in a trial from the behaviour of men in
combat, and factor this into the results.
Since the data supplied by the manufacturer is often

the only data available for a future system, it is often
necessary to use it in some form. Rather than use the
raw data, it is preferable to first subject it to scrutiny
by independent experts. This will not guarantee that
the data are good, but it will increase the probability
of their being so.
Such an issue arose in air combat modelling with
reference to data describing a particular sensor. Data
had been supplied by a manufacturer to the DRA
department who study that sensor, so the DRA
department were, in that instance, able to act as the
independent scrutineer of the data. The DRA
department felt that the data were overly optimistic in
that they were only applicable under ideal conditions.
Since the conditions under which the sensor would be
used would normally be far from ideal, this was not
felt to be realistic data. However, the DRA
department did not feel able to predict what the
performance would be under normal conditions.
Since the raw data were so good that its effect would
swamp other differences between the concepts whose
effectiveness were to be evaluated, this presented a
considerable dilemma: should the data be used, given
that we know that the equipment would carry that
sensor, or should we refrain from using it, given that
we knew that the data were likely to be in error, and
the effect of that error would be large? In neither case
would reality be represented, since if the data were
present, then dubious data would be affecting the
results severely, and if the data were absent then
equipment would not be representing equipment
which would be present in the concept when it was
made. The study did not focus upon sensors in
particular: if it had, the study could not have
continued in the face of such uncertainties about the
data. Ultimately, it was decided that the data should
be replaced by data on an alternative sensor, about
which there was more certainty. This, whilst not the
expected reality, at least represented a possible
reality, and since it applied equally to all the concepts
being compared, at least allowed the relative, if not
Validation against trials data

It is rare that subsystem trials, such as missile flight
trials or radar trials are used directly to validate
subsystem models in an air combat model. This is
more frequently carried out indirectly, by validating a
detailed subsystem model against the trial, and then
validating the simpler subsystem model within the air
combat model against the detailed model. In general,
validation of the detailed subsystem model against
reality involves the comparison of many facets of the
behaviour of the subsystem e.g. accelerations, turn
rates, etc., whereas validation of the simpler model
against the more detailed model involves
comparisons of larger scale outputs, such as launch
success zones in the case of a missile model.
46
the absolute, performance of the concepts to be

compared. An excursion was carried out with the
doubtful data, to give an indication of the
improvement that could be gained from it, under
ideal conditions.
what is being represented. This section concentrates

upon a recent opportunity to do so.
Historical data has recently become available from a
variety of real air combats and it was suggested that
this data should be used to validate air combat
models. The immediate thought was of holistic
validation, which raised two problems, before the
data was even examined: firstly, the point already
raised in section 2 is that the battles actually fought
are only one outcome out of the realms of possible
outcomes, and lack of consistency between the
models output and the outcome of the real battle
does not guarantee the invalidity of the model.
Secondly, the data is historical, and therefore the
equipment used is out of date. DERAs air combat
models are normally used for predicting outcome for
future systems, and whilst validation of the model
against past systems demonstrates that the structure
of the model is sound, it does not validate the whole
model in its normal usage.
Comparison with other models.

In the case of validation against the components of a
model, comparison is normally made with models
which have themselves been validated against trials
data. There have been a number of such instances of
validation in air combat modelling.
For example, a study was carried out in which it was
necessary to compare the effect of varying the sensor
carried by the aircraft on combat outcome. This made
it necessary for concepts to be compared which were
closer to each other than in most studies. Fine
discrimination was required from the model, and it
was therefore necessary to conduct rigorous checks to
ensure that the model represented the sensors in
question accurately. The sensors in question were in
existence, though only one was in service. They were
compared with the detailed design models owned by
the manufacturers, which had themselves been
compared with test trials. It was found that the air
combat model, in general, detected later than the
detailed model. The first reason found for this was
that the air combat model formed a track if 3
detections of the target had been made in 5 radar
scans. The detailed model formed a track if 2
detections had been made in 4 scans, leading to
earlier track formation than the air combat model in
most instances. However, there was still some
difference when this had been taken into account. It
was found that the air combat model was affected by
track association problems due to there being more
than one target. This was not present in the detailed
model, but would be present in reality. Also, the
detailed model was able to make more than one
detection in one radar scan, and associate them to
make a track, which the air combat model could not
do. Thus the exercise gave confidence in the radar
model within the air combat model, and at the same
gave useful insights into air combat.
A possible approach is to run a batch of model runs

in the historical scenario, and if the results of the real
combat falls within them, these would give some
confidence that the model is able to model past
scenarios accurately. Also, the differences between
the real-world data and the model output could be
investigated to try to find out whether the differences
are due to different random chance, different human
decisions, or an error in the model. This exercise
increases the probability of detecting an error in the
model, should there be one, and therefore increases
confidence in the lack of errors if one is not found.
Then the data itself was inspected, and it was found
that records were sparse, and in particular, one-sided.
There is little data on the opposition, their
disposition, or what the effect of the battle was upon
them. Since the scenario set-up is unknown, it is
impossible to set up a batch of runs in the same
scenario, to see if the actual results fall within it.
Therefore, holistic validation of the model, using this
data in particular, became impossible.
It is possible, however, to use the data for validation
of the model components, where equipment may be
used in unusual circumstances. It is also a rare
opportunity to validate the representation of the man
in combat, be that representation a man in a trial, or a
digital simulation of a man. This cannot be the
absolute type of validation applicable to physical
models, but the type applicable to models with
variable outputs, where a mismatch provokes further
investigation, until either a match is found or an
explanation for the mismatch.
Holistic Validation

This is the form of validation which immediately
comes to mind when considering validation, and has,
therefore, been covered, in philosophy, in section 2.
However, notwithstanding the difficulties involved in
carrying it out, attempts to validate against real-world
data are of high value, since a real battle is ultimately
47
Validation against exercise data
believed to be correct, then there is no need for

modelling at all, since it would be sufficient to use
the opinion of the expert. On the other hand, blind
acceptance of the model output is clearly a flawed
approach.
This is a promising source of holistic validation,

overcoming some of the problems of validation
against historical data. The equipment tends to be
current rather than past, and the data is far more
complete than that gained in real combat. The
problem that the outcome of the exercise is a single
outcome out of all the possible outcomes remains,
however. Nevertheless, it is still possible to conduct a
large sample of runs to see if the actual outcome falls
within it. This has rarely been done in practice, so it
is a promising method for improving the validation
status of the models.
The preferred solution is for the experts to make their

estimates before the model runs take place. The
estimate should apply to the results of a set of runs
rather than to individual runs, i.e. do the sensitivities
to changes in the input to the model accord with
intuition? If the model agrees with the experts, this
does not prove accuracy (no validation tests do), but
it generates confidence in the model. If the model and
the expert disagree, then it is necessary to ask
whether the model is wrong or the preconception of
the expert. Often, an error in the program is found,
which would not have been found had the test not
been applied. This can be fixed and the program rerun. However, the interesting cases are those where
the intuition of the expert was found to be in error.
For example:
Comparison with other models

It was mentioned in the discussion of componentlevel validation, that comparisons are normally made
against models which have themselves been validated
against trials data. For holistic validation, this is
rarely the case, since the difficulties of holistic
validation will also apply to the model against which
the model in question is being compared.
Nevertheless, it is still possible to compare the
outcomes generated by two models, and if the
outcomes agree, then there is no reason to disbelieve
the models. If the two models disagree, then there
may be an error in either model, or they may simply
be representing different possible truths. For
example, a set of runs carried out using a man-in-theloop model will tend to yield a wider distribution of
results in a batch, than a model where the man is
digital. This is because the man-in-the-loop model
will more closely represent the possible outcomes
that could occur in reality, whereas a digital model
will tend to operate with a standard man in order to
isolate the effects due to the equipment. This is an
instance where the purpose of the study should be
noted, and the correct model chosen appropriately: is
the study to predict the outcome of a combat, or to
compare equipment? Such a comparison is being
carried out at the time of writing, in support of a
study to compare equipment.
It was found that a more accurate radar tracker was

less able to follow the track of a weaving target than
a less accurate one (experts believing that a more
accurate tracker should always give better
information). This was found to be because the more
accurate tracker followed every weave and had to reconverge after every target manoeuvre. The less
accurate tracker failed to notice the weave, and
tracked, better, the general direction of the targets
travel.
Another example is the issue of missiles with and
without mid-course updates. Missiles which do not
acquire their targets until some time after launch
either have to guide themselves inertially during the
time prior to acquisition, or they can be updated by
the launch aircraft during this phase. Experts
expected that the provision of updated information on
the target from the launch aircraft would give the
missile an advantage. The first set of runs were
carried out with straight-flying targets. Since the
inertial guidance of the missiles assumed a straight
flying target, it is clear (with hindsight) that the
missiles which are not misled by the sensor errors of
the launch aircraft, but simply fly straight to the
target inertially do better. Then, targets which were
free to manoeuvre were introduced. In this case,
indeed, the missiles which could be updated did do
better, in themselves. However, the need for the
missile to be updated constrained the flight path of
the launch aircraft (provided the missile updates were
being supplied by the lauch aircraft), leading more
often to the launch aircraft being killed. So, the
capability to update the missile in flight is not so
clear an advantage as it might first appear.
However, if the two models disagree, the user is

forced to examine the models closely. If this causes
errors to be detected which would not otherwise have
been detected, this is good. If, instead, it yields
greater understanding of the processes of combat and
the nature of the models, then this is also good.
Expert scrutiny
Expert scrutiny of model outcome invariably occurs,
since the results are normally supplied to experts. The
treatment of the scrutiny needs careful thought,
however. On the one hand, if the expert is always
48
This type of instance confirms the value of

modelling, and the attempt to validate.
Conclusions
Two types of validation have been identified: one

where the reality is predictable in its beliaviour, and
the other where there could be wide variability in
possible real outcomes.
Validation of the physical subsystems within air
combat models falls into the first category.
Comparisons of model output with data deriving
from history, trials, specifications or other models
will result in either a match between the data and the
model output, or a mismatch, in which case it will
have been proven that the model, as it stands, is
invalid. If invalidity is proven, then it will usually be
possible to modify the model to remedy this.
The holistic validatioii of the model, and in tlie
validation of the man component of tlie model fall
into the second category. There is such a wide range
of possible valid outcomes that invalidity is not
proven by a mismatch between the output of the
model and the validation data, since they could be
different versions within the scope of the possible.
The important point, in this case, is that the inismatch
should provoke further investigation, leading either to
correction of the model, or to greater understanding
of the processes, both of air combat and of its
modelling.
49

Validation of Air Combat Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validation of Air Combat Models

Uploaded by

Copyright:

Available Formats

VALIDATION OF AIR COMBAT MODELS

more importantly, the fact that the accuracy of the

Although validation exercises of a variety of types

Conversely, however, if it were found that the model

Validation of physical models

Air combat modellin?

American Institute of Aeronautics and Astronautics

events can cause radical differences in the outcome of

to collect data from a large number (typically

Secondly, there is the variability due to the presence

This does not mean, however, that validation of an air

These two sources of variability mean that, for a

With the usage of the model in mind, it is reasonable

represent a wide variety of types of battle and war.

Model component validation:

Validation against historical data

Validation against historical data

Validation against historical data

These methods, their application to air combat, and

Validation of the equipment models is one possible

Validation against the manufacturers design

Another area is the validation of the behaviour of the

Since the data supplied by the manufacturer is often

Validation against trials data

the absolute, performance of the concepts to be

what is being represented. This section concentrates

Comparison with other models.

A possible approach is to run a batch of model runs

Validation against historical data

American Institute of Aeronautics and Astronautics

Validation against exercise data

believed to be correct, then there is no need for

This is a promising source of holistic validation,

The preferred solution is for the experts to make their

Comparison with other models

It was found that a more accurate radar tracker was

However, if the two models disagree, the user is

This type of instance confirms the value of

Two types of validation have been identified: one

You might also like