You are on page 1of 6

1

RISK-BASED RELIABILITY ANALYSIS:


A POWERFUL ALTERNATIVE TO THE
TRADITIONAL RELIABILITY ANALYSIS

Since removing a failure mode at the design stage is considerably cheaper


compared to removing it at the manufacturing stage or during service, it
is important that reliability is integrated early into the design of complex
systems. Accordingly, for a long time, the conventional reliability analysis
has been oriented towards maximising the reliability of a system.
In order to achieve their goal however, designers must also be able to
reveal the losses from failures. This creates the possibility to quickly lter
out inappropriate design solutions associated with large losses from failures
and select solutions associated with minimum losses.
More importantly, on the basis of a simple counterexample, we can
demonstrate that selecting the more reliable system does not necessarily
mean selecting the system with the smaller losses from failures.
Indeed, consider two very simple systems consisting of only two
components, logically arranged in series (Fig. 1.1).
For the rst system (Fig. 1.1(a)), suppose that component A1 fails on
average once a year ( fA1 = 1) and its failure is associated with CA1 = 2000
units of losses, while component B1 fails on average fB1 = 9 times a year and
CA1 = 2000 CB1 = 100
(a)
A1 B1

fA1 = 1 fB1 = 9

CA2 = 2000 CB2 = 100


(b)
A2 B2
fA2 = 3 fB2 = 2

Figure 1.1 Systems composed of two components only, demonstrating that the more
reliable system is not necessarily associated with the smaller losses from failures.

1
2 Risk-Based Reliability Analysis and Generic Principles for Risk Reduction

its failure is associated with CB1 = 100 units of losses. Suppose now that for
an alternative system consisting of the same type of components A and B
(Fig. 1.1(b)), the losses associated with failure of the separate components
are the same but the failure frequencies are different. Component A2 is now
characterised by fA2 = 3 average number of failures per year and component
B2 is characterised by fB2 = 2 average number of failures per year. Clearly,
the second system is more reliable than the rst system because it is char-
acterised by 5 average number of failures per year as opposed to the rst
system, characterised by 10 average number of failures per year. Since the
rst system fails whenever either component A1 or component B1 fails, the
expected (average) losses from failures L 1 for the system are

L 1 = fA1 CA1 + fB1 CB1 = 1 2000 + 9 100 = 2900 (1.1)

while for the second system, the expected losses from failures L 2 are

L 2 = fA2 CA2 + fB2 CB2 = 3 2000 + 2 100 = 6200 (1.2)

As can be veried, the more reliable system (the second system) is


associated with the larger losses from failures!
This simple example shows that a selection of a system solely based
on its reliability can be misleading, even if all components in the system
are characterised by constant failure rates and are arranged in series. In
case of system failures associated with the same cost, a system with larger
reliability does mean a system with smaller losses from failures. In the
common case of system failures associated with different costs however,
a system with larger reliability does not necessarily mean a system with
smaller losses from failures.
For many production systems, the losses from failures are extremely
high. They can be expressed in number of fatalities, lost production time,
volume of lost production, mass of released harmful chemicals into the
environment, lost customers, warranty payments, costs of mobilisation of
emergency resources, insurance costs, etc. For oil and gas production sys-
tems for example, major components of the losses from failures are the
amount of lost production which is directly related to the amount of lost
production time, the cost of mobilisation of resources and intervention, and
the cost of repair/replacement. A critical failure in a deep-water oil and
gas production system, in particular, entails long downtimes and extremely
high costs of lost production and intervention for repair. Furthermore, such
1. Risk-Based Reliability Analysis A Powerful Alternative 3

failures can have disastrous environmental and health consequences. The


need to measure all types of operational risk is crucial to revealing the
magnitude of the existing risk and implementing appropriate risk manage-
ment procedures. Even if the individual critical failures are associated with
relatively small losses, in the long run, particularly if such failures occur
with high frequency, the amount of the total accumulated loss can be very
large. As a result, reliability analyses related to production systems must
necessarily be based on the risk of failure.
Traditionally, the losses from failures have been accounted for by the
average production availability (the ratio of the actual production and the
maximum production capacity). As we shall demonstrate, two systems
with the same production availability can be characterised by very different
losses from failures.
The next counterexample claries this point. Figure 1.2 features two iden-
tical systems with the same operating life cycle of T years, which experience
exactly one failure each, associated with the same downtime for repair td .
For the rst system however, the failure occurs towards the end of life in year
k1 (Fig. 1.2(a)) while for the second system the failure occurs at the begin-
ning of life, in year k2 (Fig. 1.2(b)). By assuming an uniform production
prole which does not vary during the life cycle of the systems, both sys-
tems will be characterised by the same availability A1 = A2 = (T td )/T .
The component of the cost of lost production C which is proportional to
the downtime will also be the same for both systems.

A1, PV1 Uptime Uptime


(a)
Downtime
td
A2, PV2 Uptime
(b)
A1 = A2; PV1 < PV2
td

Figure 1.2 Two systems with the same availability and different present values of the losses
from failure (PV 1 < PV 2).

Because of the different time at which the failures occur however, the
nancial impacts of the lost production will be different for the two systems.
Indeed, PV1 = C/(1 + r)k1 is the present value of the lost production for
the rst system and PV2 = C/(1 + r)k2 is the present value of the lost pro-
duction for the second system. For example, substituting in these formulae
4 Risk-Based Reliability Analysis and Generic Principles for Risk Reduction

a discount rate r = 7.5%, k1 = 25 and k2 = 2, yields

PV2 (1 + r)k1
= 5.28
PV1 (1 + r)k2

Because of the different time of failure occurrence, despite that the avail-
abilities A1 and A2 are the same for the two systems, the second system
(Fig. 1.2(b)) is characterised by more than 5 times bigger losses compared
to the rst system (Fig. 1.2(a)).
Thus, although the availability does reect the cost of lost production
which is proportional to the system downtime, it does not account for the
dependence on the time of failure occurrence. Furthermore, availability
does not account for the cost of intervention, which for deep-water oil and
gas production, for example, can be signicant in relation to the cost of lost
production. Relying solely on the availability level does not reveal the real
losses from failures.
Apart from estimating the losses from failures, engineers also need to
specify reliability requirements regarding components and blocks in the
designed systems. None of the popular reliability allocation strategies how-
ever, such as the ARINC or the AGREE methods (Ebeling, 1997), are
capable of allocating reliability requirements for the components which
deliver the minimum losses from failures for the system. Most of the exist-
ing methods focus on manipulating the hazard rates of the components
so that a particular target system hazard rate is attained. Thus, for a sys-
tem with components arranged in series, with a required system reliability
level Rsys , reliabilities Ri = (Rsys )1/M are determined for each component.
In this way, the requirement Rsys = R1 R2 RM is indeed fullled
but an important circumstance is neglected. Component failures are usually
associated with different losses. Consequently, components associated with
large losses from failures should be designed to higher reliability levels.
Since 1977, there have been also a signicant number of articles and
books (Tillman et al., 1985; Xu et al., 1990; Kuo and Prasad, 2000;
Elegbede et al., 2003; Wattanapongsakorn and Levitan, 2004) related to
reliability optimisation involving costs. Most of the methods described in
these sources, however, are either related to maximising the reliability of a
system given an overall budget (a maximum total cost of resources towards
the reliability maximisation) or minimising the total cost of resources neces-
sary to achieve a specied level of system reliability. For embedded systems,
Wattanapongsakorn and Levitan (2004), for example, presented models for
1. Risk-Based Reliability Analysis A Powerful Alternative 5

maximising reliability while meeting cost constraints and also for min-
imising system cost under various reliability constraints. The reliability
optimisation involving cost minimisation in Elegbede et al. (2003), for
example, was also restricted to maximising the reliability at a minimum cost
of the components building the system. These models are not models for
risk-based reliability allocation because they do not incorporate the losses
associated with system failure. Instead, it is expected that once reliability
is maximised, the losses from failures will automatically be minimised,
which, as we demonstrated earlier is not necessarily true. Maximising the
reliability of a system does not necessarily guarantee smaller losses from
failures. This conclusion, which does not conform to the current under-
standing and practice, shows that the risk-based reliability analysis requires
a new generation of models and algorithms based on the losses from system
failures.
There exists also work related to reliability optimisation based on fuzzy
techniques, dealing with the cost of the system and the costs of the sep-
arate components (Ravi et al., 2000). The optimal redundancy allocation,
however, is again oriented towards maximising the system reliability by
minimising the system cost, not minimising the losses associated with
system failures.
In Pham (2003), for a parallel system consisting of n components, the
optimal sub-system size was determined that minimises the average system
cost. The average system cost included the cost of the components and the
cost of system failure. For parallelseries systems, the optimal sub-system
size was determined that maximises the average system prot.
Optimum reliability minimising the sum of the cost of failure and the cost
of reliability has been discussed by Hecht (2004) who acknowledged that
the total user cost has a minimum and the failure probability at which the
minimum is reached represents the optimum reliability in economic terms.
Often, alternative design solutions are compared and one of them
selected. As we demonstrate later, a sound scientic basis for such a selec-
tion is the distribution of the potential losses from failures associated with
the competing solutions which requires reliability analysis based on the
losses from failures.
A fundamental, scientically sound criterion for assessing competing
production architectures is their net present value, after estimating the
income stream (inow) and expenditure stream (outow) (Wright, 1973;
Mepham, 1980; Vose, 2000; Arnold, 2005). The correct estimation of
the losses from failures is at the heart of a correct determination of the
6 Risk-Based Reliability Analysis and Generic Principles for Risk Reduction

expenditure stream. All critical failures associated with losses, such as lost
production, intervention and repair costs must be tracked throughout the
design life of the system and their nancial impact assessed. The interaction
between the different components of the losses from failures, however, is
not well understood.
Currently, sound theoretical models for risk-based reliability analysis
involving the losses from failures are difcult to nd and this was the major
reason which prompted writing this book.
Until recently, one of the main obstacles to developing the theoretical
basis of the risk-based reliability analysis, as an alternative to the trad-
itional reliability analysis, was the absence of appropriate models related to
the losses from failures from multiple failure modes and the uncertainties
associated with the probabilities with which these failure modes are acti-
vated. In order to ll this gap, models based on potential losses from failures
from multiple mutually exclusive failure modes have been developed by the
author (Todinov, 2003, 2006b). The losses were modelled as distribution
mixtures and equations related to their cumulative distribution, their vari-
ance and its exact upper bound were derived. For systems characterised by
a constant hazard rate, a model for determining the optimum hazard rate
of the system at which the minimum of the total cost is attained was pro-
posed in Todinov (2004a). Recently, models and algorithms have been
developed for determining the expected losses from failures for non-
repairable and repairable systems whose components are logically arranged
in series and for systems with complex topology (Todinov, 2004c, 2006b, c).
These developments form the core of the book.

You might also like