You are on page 1of 10

Proceedings of the 2014 Industrial and Systems Engineering Research Conference

Y. Guan and H. Liao, eds.


Optimum Reliability and Maintainability Allocations for
Load-Sharing Continuous Flow System with Buffers
Khaled A. Farouk, Mohammad Younes, M. Nashat Fors,
Production Engineering Department
Alexandria University
Alexandria, Egypt
Abstract
Reliability and availability of systems are traditionally provided by using different forms of redundancies such as
multiple units in parallel or standby units. However, in oil and gas plants and other continuous ow large scale
industrial systems, applications of redundancies are neither structurally nor economically feasible. On the other hand,
multi-units sharing the load with buffers are commonly used. In these plants, buffers are essential to guarantee the
steady output ow in the cases of having failed units, or having units under maintenance and repair. In these load-
sharing multi-unit systems, the total capacity of the units forming the system should be greater than the design capacity
of the plant by a specied margin, aiming at maximizing the system availability within constraints of allocated budgets.
In the proposed work, the problem of optimal allocations of reliability and maintainability to such load-sharing con-
tinuous ow multi-units systems is considered. Optimum number of units, capacity distribution among them, and
capacity of buffers will be considered as decisions variables in addition to other variables describing the performance
of the system.
Keywords
System availability and maintainability, imperfect maintenance, buffer allocation.
1. Introduction
Process plants are comprehensive systems that have all combinations of system complexities (load sharing, multi-
state, and in-process buffers). Designing of such systems lead to a very challenging problem which can be formulated
as: "What is the optimal components reliability and maintainability apportionment to maximize availability at the
minimum possible cost?".
During the conceptual design phase of a system, a number of choices need to be made; such as the number of units, the
required system reliability, unit maintainability, buffers, performance specications of the chosen units, and system
attributes required to achieve the necessary performance level [1].
Process Plant is a continuous production multi-state system that consists of multiple processes. These processes are
arranged to produce a specied product based on the production sequence. The failure of any process has an immediate
effect on the nal production rate. Thus, reliability and maintainability of such intermediate equipment (intermediate
process) affect the resulted production rate. To compensate for such loss of production, the redundant parallel standby
equipment approach has been introduced naturally irrespective of the involved cost. It is not economically sound
to have double sized equipment capabilities to improve the production availability, especially for high production
capacity. The approach of introducing buffers at various stages in the production line, in place of standby redundant
machines, to improve the overall production line reliability has been investigated by Sethia et al. in [2]. Moreover in
continuous production systems, halts due to failure or maintenance create imbalances within the production system.
Hence, in order to contain the system imbalances, due to repair and maintenance halts, buffers between processes
should be introduced [3].
The comprehensive survey on the buffer allocation problem has been conducted by Demir et al. in [4]. They concluded
that the buffering allows the equipments (processes) to operate almost independently of each other and it helps to
increase the throughput rate of the system. However, there are usually oor space and budget constraints in reality.
Farouk, Younes, and Fors
Generally, the eventual aim is to improve system performance with minimum cost as in the case of all manufacturing
system problems. Therefore, considering the buffer allocation problem as a multi-objective problem is an important
research issue [5].
Using in-process buffer to mitigate the effect of maintenance and repair downtime, has been highlighted by many
researchers [36]. Actually, allocating an appropriate amount of buffers to meet the continuous operation constraint or
to reduce the loss caused by the preventive maintenance and occasional failure, is necessary for a prolonged operation
production system [6].
The process plant is a repairable system, which makes the maintenance function a paramount parameter that affects
the equipment availability not only by introducing the downtime but also by affecting the equipment failure rate when
maintenance is done imperfectly. Since 1979, Nakagawa highlights that the assumption that preventive maintenance
returns the equipment as good as new is often not true [7]. The equipment availability, under this assumption of perfect
maintenance, is constant with time. This relation with time indicates that the equipment will work forever as long as
it is periodically maintained, which is not true. Any repairable equipment in real life needs to be replaced by a new
equipment after a while even if it has been periodically maintained. This fact is supported by Cassady et al. [8] in
their availability simulation of one repairable equipment under imperfect repair using the Kijima model [9]. The
simulation shows that the availability function is degraded exponentially with time.
The main contribution of this study is to dene the generic formulation of optimal availability allocation considering
the equipment capacity margin and the buffer size under imperfect maintenance. The formulation imitates the real
design requirements of oil and gas industries.
2. Problem Statement
Given a system of K units (pumps) sharing the supply to satisfy the rated demand of Q cubic meters or barrels per unit
time. The units are sharing the load with proportions r
k
such that
K
k=1
r
k
=1. A capacity margin , where (0 < 1),
is provided for each unit of the system in order to guarantee the continuity of system supply even in case of having
some or all units down during maintenance or repair. Buffer storage should be made available to receive the excess
capacity ( Q) during the up-time of the whole or part of the system. Practically, system maintenance and repair are
assumed imperfect. The Kijima rst model of virtual-age [9] is used to take the imperfect repair and maintenance
into consideration while building our formulation. In the Kijima model, maintenance and repair do not necessarily
restore the units back to their initial state as good as new but only remedy a part of their deteriorations. The remaining
part is conceived as virtual age of units of the system to start within the next period of its operational life. As already
proposed by Kijima, a factor , where (0 < 1) is introduced to account for the imperfection of maintenance and
repair; =0 in case of perfect repair and maintenance, =1 in case of minimal repair, and <1 in case of imperfect
repair.
The lifetime of any one of the units could be described as a series of uptime periods T
k j
for (k = 1, 2, ..., K units) and
( j =1, 2, ..., N working periods), during which it provides useful work. Each uptime period is followed by a downtime
period during which the equipment is inoperative and is under corrective maintenance (CM) or preventive maintenance
(PM). The time to failure of an equipment during any period of time is a random variable with a value zero at the start
of the uptime period and distributed according to Weibull distributions with parameters
k
(scale parameter) and
k
(shape parameter) of unit k.
Required to determine the optimum value of:
TBM : Time between actions of PM preventive maintenance
: capacity margin that should be provided as an increase in capacity over the rated capacity of each unit of the
system in order to minimize the system downtime, i.e. to maximize system availability.
BV : Buffer storage volume necessary and sufcient to receive the excess capacity ( Q) during the systems uptime
period.
Moreover, the following elaborations will be provided:
1. Proposing an algorithm for evaluation of Replacement Age of a unit of the system as a function of its maintain-
ability and reliability parameters and responding to a statement of minimum possible unit reliability by the end
of its operating life.
2. Proposing an algorithm for evaluation of the downtime of a single-unit system and of a multi-unit system during
which the system is incapable of satisfying the demand.
Farouk, Younes, and Fors
Acronyms
BV Buffer Volume
CM Corrective Maintenance
DWNT Downtime
FC Failure Cost
MC Maintenance Cost
PM Preventive Maintenance
TC Total Cost
Notations
BVC Buffer volume cost
C Acquisition cost of one unit
DWNT(
i
) Downtime at the i
th
period of time
DTC Cost incurred by the system downtime
ICC Cost due to the increase in capacity
L Total operation time
MBV Maximum buffer size required
n
M
number of maintenance actions
n
F
number of failure events
Q rated demand per unit time
QD Production volume per one day
q exponent for capacity increase cost
R
min
Minimum allowable reliability by the end of unit operating life
r
k
proportion of total capacity taken by unit k
S
k
( j) state of the k
th
unit at the j
th
period of time
TBM Time between actions of PM preventive maintenance
t
k j
period of time taken by unit k from the start of period j
TTF
k j
time to failure of unit k in the operating period j
T
k j
the length of j
th
uptime period
TM Time necessary to perform PM
TR Time necessary to perform CM
u uniformly distributed random number [0, 1]
Y
km
units cumulated virtual age
UBVC Buffer volume cost per unit time
UDTC Downtime cost per unit time
UFC CM cost per unit time
UMC PM cost per unit time
factor of Virtual Age
shape factor in Weibull distribution
Capacity Margin provided to each unit of the system
scale factor in Weibull distribution
3. System Description
In the following subsections we calculate the failure probability and the corresponding reliability and the replacement-
age. Moreover, an algorithm is given in Section 3.3 to determine the maximum number of working periods dictated
by the limitation of reliability.
3.1 Unit History Timeline
As already mentioned above, the lifetime of any one of the units can be described as a series of uptime periods T
k j
for
(k = 1, 2, ..., K units) and ( j = 1, 2, ..., N working periods), during which it provides useful work. Each uptime period
Farouk, Younes, and Fors
is followed by a downtime period of length either TM or TR depending on whether it is PM or CM period. The unit
is not restored back as good as new after PM or CM, but however restored with a residual life known as virtual age.
Next, a formula according to Kijima rst model to evaluate the virtual age Y
km
of k
th
component after (m-1) periods of
operational life and at the start of m
th
period is given as follows:
Y
km
=

0 for m = 1

m1

j=1
T
k j
for m > 1
(1)
T
k j
is the length of j
th
uptime period. Y
km
is the virtual age accumulated along (m-1) operating periods and attributed
to performing (m-1) imperfect maintenance and repair actions.
Weibull distribution with
k
as a shape factor and
k
as a characteristic time is considered the most common distri-
bution applied to model time to failure t
k j
because of its universality. The length of uptime period T
k j
depends on
whether it is ended by PM or CM and is calculated as shown in Equation (2).
T
k j
=

TTF
k j
for t
k j
< TBM
TBM for t
k j
TBM
(2)
3.2 Failure Probability Density Function
The effect of accounting for virtual age in Weibull failure probability density function could be expressed as a limita-
tion of reliability to an upper limit of value of e
(
Y
k j

k
at the start of j
th
period after imperfect maintenance or repair
in the previous ( j 1)
th
periods. Therefore, probability density function of the time to failure of equipment t
k j
over
the j
th
period accounting for its virtual age at the start of the period could be expressed as follows:
f
k j
(t
k j
) =

k

t
k j
+Y
k j

k
1
e

t
k j
+Y
k j

(3)
The reliability is this case could be obtained as follows:
R
k j
(t
k j
) =


t
k j
f
k j
(t
k j
)dt
k j
= e

t
k j
+Y
k j

(4)
It is clear from Equation (4) that at the start of the j
th
period ( t
jk
= 0), the reliability is at its upper limits e
(
Y
k j

.
Upon simulating the time to failure of equipment, in any period of operating life by applying Monte Carlo simulation,
we have to account for the upper limit of the reliability as stated above. If u is a random number [0,1] uniformly
distributed, then the corresponding realization of time to failure over the j
th
period is expressed as follows:
t
k j
=Y
k j
+
k

Y
k j

k
+ln

1
u

k
(5)
3.3 Replacement-Age Threshold
The equipment is required to be replaced by a new one at the end of period N, because of increased equipment
deterioration from period to period due to the imperfect maintenance and repair. N can be determined by stating
a minimum allowable equipment reliability at the start of the last period N in its operating life. If this minimum
allowable reliability is given as R
min
, the sum of the allowable working periods is determined from Equation (4) by
substituting for the value of the virtual age from Equation (1):
Farouk, Younes, and Fors
N1

j=1
T
k j

ln

1
R
min
1

k
(6)
It is clear from Equation (6) that as = 0 (prefect maintenance and repair), there is a limitation on the lifetime of the
component.
No
Yes
yes
k= k +1
t
kj
=Y
kj
+
k
[
(
y
kj
k
k
)
k
+ln
(
1

)
]
1

k
tkj TBM
Tkj = 0
= random(1 : 0)
Tkj = Tkj-1 + TBM + TM
Tkj = Tkj-1 + tkj + TR
No
yes
TLk [ (Tkj - TM) : Tkj =0
Y
kj
>

[
ln
(
1
R
min
)
]
1

k
j = j +1
Ykj = ! Tkj-1
B" # 0 B" = B" + ( TLs [ t $ (1+%) & '( )
()NT = ()NT +1
B" =0
B" * MB" B" = MB"
Yes
No
t + L
TLk [0 : L = 'k
j = j +1
Ykj = 0
RMk = RMk +1
Tkj # L k # ,
Yes
No
-tart
k=0
TLs[0 : L = TLs[0:L + TLk[0:L
TLk [ (Tkj - TR) : Tkj =0
t = 0
No
Yes
No
Yes
.nd
N/
0Mk = 0Mk +1 1Mk = 1Mk +1
t = t +1
T1 = 211 + M1 +31 +(T1 + B"1
Figure 1: Identication of Preventive and Corrective events along system operating parameters
An Algorithm, given in Figure 1, is proposed to evaluate the maximum number of working periods dictated by the
limitation of reliability by the start of the last period of a lifetime. Moreover, the total number n
M
of PM and the total
number n
f
of CM are also obtained from this algorithm.
Farouk, Younes, and Fors
3.4 Buffer and Capacity Margin
Research in [36] discusses the use of in-process buffer to mitigate the effect of maintenance and repair downtime.
The buffer could not be lled without capacity margin () above the required rate of production as shown in Figure 2.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
2
4
6
8
10
12
14
16
18
20
Time (Days)
Q
D

(
1
0
3

G
a
l
l
o
n
)


Unit Production rate
Buffer Volume
Buffer volume change with time
Unit production rate with time
Figure 2: Single unit production rate & buffer volume with time
It is clear from Figure 2 that the availability of production does not only depend on the unit reliability and maintain-
ability, but it also depends on the available buffer volume and the rate of buffer relling.
3.4.1 Multi-Unit System (Load-Sharing System)
When one of the units fail in load-sharing systems, the system does not fail but it does not fully operate as shown in
Figure 3. The probability that the whole system falls into downtime decreases with the use of multiple units, however,
the probability that one unit fails increases.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
2
4
6
8
10
12
14
16
18
20
Time (Days)
Q
D

(
1
0
3

G
a
l
l
o
n
)


Single unit system production rate
4 units system production rate
Buffer volume
Figure 3: Multiple-unit system production rate and buffer volume with time
Farouk, Younes, and Fors
3.4.2 Evaluation of Downtime and Buffer Level in Load-Sharing Systems
In this subsection we describe the algorithm that calculates the downtime and the size of buffer storage of a system
composed of multiple units. A owchart of the algorithm is given in Figure 1. Here, it should be noted that another
scale of time should be introduced. This scale starts with zero time at the start of the operating life of the system and
continues up to the systems end of life. This systems end-of-life is determined by stating the minimum allowable
reliability of the components of the system. Discrete events of PM and failure of different components are located on
this new time scale (system time scale) as discrete points
i
denoting the time of occurrence of the i
th
event whether it
is PM or CM.
Step 1 Run the procedure as given by the algorithm described in Figure 1. The results will be a two dimensional array.
One dimension represents one of two events, either PM or CM on the units. The other corresponding dimension
represents the time of occurrence of the event (
i
).
Step 2 Combine the results of the K units into one array. Then sort the data in the combined array by the column of the
time of occurrence of the different events
i
in an ascending order. If G is the total number of events recorded in
the combined array, then 0 i G.
Step 3 Evaluate at each moment of time (
i
) the quantity supplied by each unit taking into account its capacity margin
and its state S
k
(
i1
) in the just previous moment of time whether it is in PM or in CM by the Equation (7).
Supply
k
(
i
) = r
k
Q (1+) [(
i

i1
) (
k
(
i1
) TM) (
k
(
i1
) TR)] ,
where N = the total number of events occurring up to the end of lifetime of the system,

1
= 0,

k
(
i1
) =

1 i f f S
k
(
i1
) = PM
0 otherwise,

k
(
i1
) =

1 i f f S
k
(
i1
) = CM
0 otherwise.
(7)
Step 4 Evaluate the quantity that should be delivered according to the demand Q (
i

i1
) in the interval of time
i1
to
i
Step 5 The buffer storage is assigned to receive the excess supply Q provided by capacity margin of each unit in
order to guarantee the continuous satisfaction of the demand. The quantity of uid in the buffer storage should
be updated at each time
i
by the following formula:
BV(
i
) =

0 Q (
i

i1
) >
K
k=1
Supply
k
(
i
)
Q (
i

i1
) +
K

k=1
Supply
k
(
i
) Otherwise
(8)
Step 6 When the quantity stored in the buffer storage reaches zero during the period
i

i1
and the buffer BV(
i
) is
not sufcient to satisfy the demand in this period then system is considered down. The downtime of the system
is evaluated as follows:
DWNT(
i
) = DWNT(
i1
) +

TM
BV(
i1
)
Q

k=1

k
(
i1
) +

TR
BV(
i1
)
Q

k=1

k
(
i1
) (9)
Step 7 The system availability or system probability of success is evaluated at
i
as follows:
A(
i
) = 1
DWNT(
i
)

i
(10)
Farouk, Younes, and Fors
4. Optimization Model
The optimization has been conducted using a hybrid model Genetic Algorithm (GA) and discrete event simulation.
The hybrid model is used to search for near optimum solution. The genes of GA chromosome signify the decision
variables. The decision variables indicate the number of units in the load-sharing system, the capacity margin per each
unit, the buffer volume, and the manufacturer. The parameters of the unit; such as cost, reliability and maintainability,
are dened by the manufacturer. Hence, the manufacturer decision variable takes values from a predened set of
manufacturers.
4.1 Assumptions
The system is a share-loading system that consists of N units. The maintenance is done imperfectly. The system has
extra capacity margin. The system has a nite volume of buffer.
4.2 Cost Analysis
The total cost is naturally affected by the following factors:
1. The extra capacity margin cost is a function in and q as a given power to be determined numerically from
statistics of prices. The cost is evaluated as follows:
ICC = ((1+)
q
1) C (11)
2. Expected value of preventive maintenance cost:
E[MC] = UMC TM
K

k=1
N

j=1
P

t
k j
TBM

E[MC] = UMC TM
K

k=1
N

j=1
e

TBM+Y
k j

k
(12)
3. Expect value of failure cost:
E[FC] = UFC TR
K

k=1
N

j=1

1e

TBM+Y
k j

(13)
4. Expected value of downtime cost:
E[DTC] = UDTC E [DWNT(
G
)] (14)
5. Expected cost of buffer storage:
E[BVC] = UBVC E [MBV] (15)
4.3 Objective Function
Minimize Z = ICC+E[MC] +E[FC] +E[DTC] +E[BVC]
subject to
e

Y
k j

k
R
min
, where 1 k K
(16)
5. Computation Results
Based on the above algorithm the simulation has been built and the interaction between the system parameters has
been investigated.
5.1 Buffer Volume versus Capacity Margin for Load-Sharing Systems
Capacity margin has been investigated at two levels; low ( = 0.2) and high ( = 0.5), versus buffer volume with ve
different levels; very low (VB = 4QD), low (VB = 5QD), moderate (VB = 6.7QD), high (VB = 10QD), extra high (VB
Farouk, Younes, and Fors
1 2 3 4 5 6 7 8 9 10
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Number of units
S
y
s
t
e
m

A
v
a
i
l
a
b
i
l
i
t
y


BV = 20 QD
BV = 10 QD
BV = 6.7 QD
BV = 5 QD
BV = 4 QD
BV = 20 QD
BV = 10 QD
BV = 6.7 QD
BV = 5 QD
BV = 4 QD
= 0.2
= 0.2
= 0.2
= 0.2
= 0.2
= 0.5
= 0.5
= 0.5
= 0.5
= 0.5
Figure 4: System availability for different number of units, different capacity margin and different buffer volume
= 20QD). The resulting system availability against different load-sharing N units is depicted in Figure 4. Note that all
other system parameters were xed during this experiment.
It is clear how the buffer volume affect the system availability. When the buffer volume increases the system availabil-
ity increases. The Capacity margin is a critical factor in system design which shows not only the fact that increasing
the capacity margin will increase the system availability but it also shows that it is critical to the system availability
behavioral change with a number of units in load-sharing systems.
6. Conclusion and Future Works
When it comes to the question of determining the optimal system conguration, the choice of the system parameters
is based on the minimum total cost (initial cost + operation cost) as shown in Figure 5.
1 2 3 4 5 6 7 8 9 10
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
x 10
7
Number of Units
T
o
t
a
l

C
o
s
t


= 0.2 , BV = 5 10e+4 gallon
= 0.2 , BV = 10 10e+4 gallon
= 0.2 , BV = 15 10e+4 gallon
= 0.2 , BV = 20 10e+4 gallon
= 0.2 , BV = 25 10e+4 gallon
= 0.5 , BV = 5 10e+4 gallon
= 0.5 , BV = 10 10e+4 gallon
= 0.5 , BV = 15 10e+4 gallon
= 0.5 , BV = 20 10e+4 gallon
= 0.5 , BV = 25 10e+4 gallon
Figure 5: Total Cost for different number of units, different capacity margin and different buffer volume
It is clear that when the buffer size increases, the cost decreases due to the increase in the system availability. The
Farouk, Younes, and Fors
capacity margin improves the utilization of buffers, which consequently increases the availability and decreases the
cost. The number of units in a multiple-unit system affects the availability and the initial cost. The availability of the
system increases with the number of units if and only if it is combined with a proper capacity margin. On the other
hand, the initial cost increases with the number of units in a multiple-unit system. All of these relations are clearly
depicted in Figure 5. It is also clear from this gure that the minimum total cost occurs when a multiple-unit system
consists of 4 units with a capacity margin = 0.5 and a buffer volume = 10
5
gallon.
The optimal availability and maintainability allocation with both buffer volume and capacity margin have been investi-
gated. The results show that the buffer and capacity margin are critical variables when deciding on the optimal system
parameters. The time to unit replacement without considering the imperfect maintenance will be innity, which is
unrealistic . In oil and gas plants and other continuous ow large scale industrial systems, the load-sharing system
parameters must be dened carefully to assure system consistency. For future work, investigations of different im-
perfect maintenance models using the above problem is required in order to reach a better understanding of the effect
of imperfect maintenance. The proposed model can be extended to cover multiple-process system with in-process
allocation of buffers to maximize the system availability and minimize the total cost.
References
[1] S. G. Gedam and D. Ph, Optimizing R & M Performance of a System Using Monte Carlo Simulation, IEEE,
pp. 05, 2012.
[2] P. C. Sethia, Enhancing reliability of a continuous manufacturing system using WIP buffers, International Jour-
nal of Simulation Modelling, vol. 7, pp. 6170, June 2008.
[3] T. Murino, E. Romano, and P. Zoppoli, Maintenance policies and buffer sizing: an optimization model, WSEAS
TRANSACTIONS on BUSINESS and ECONOMICS, vol. 6, no. 1, 2009.
[4] L. Demir, S. Tunali, and D. T. Eliiyi, The state of the art on buffer allocation problem: a comprehensive survey,
Journal of Intelligent Manufacturing, Sept. 2012.
[5] M. Amiri and A. Mohtashami, Buffer allocation in unreliable production lines based on design of experiments,
simulation, and genetic algorithm, The International Journal of Advanced Manufacturing Technology, vol. 62,
pp. 371383, Dec. 2011.
[6] Y. Zhang, S.-Y. Gong, and J.-Y. Sheng, Optimal buffer inventory for maintenance action under random produc-
tion capacity availability, 2012 International Conference on Quality, Reliability, Risk, Maintenance, and Safety
Engineering, pp. 400404, June 2012.
[7] T. Nakagawa, Imperfect Preventive-Maintenance, IEEE Transactions on Reliability, vol. R-28, p. 402, Dec.
1979.
[8] C. R. Cassady, I. M. Iyoob, K. Schneider, and E. A. Pohl, A Generic Model of Equipment Availability Under
Imperfect Maintenance, IEEE TRANSACTIONS ON RELIABILITY, vol. 54, no. 4, pp. 564571, 2005.
[9] M. Kijima, H. Morimura, and Y. Suzuki, Periodic replacement problem without assuming minimal repair, Eu-
ropean Journal of Operational Research, vol. 37, no. 2, pp. 194203, 1988.

You might also like