Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Keywords: This paper presents an improvement of hybrid of nonlinear autoregressive with exogenous input (NARX)
Autoregressive moving average (ARMA) model and autoregressive moving average (ARMA) model for long-term machine state forecasting based
Nonlinear autoregressive with exogenous on vibration data. In this study, vibration data is considered as a combination of two components which
input (NARX) are deterministic data and error. The deterministic component may describe the degradation index of
Long-term prediction
machine, whilst the error component can depict the appearance of uncertain parts. An improved hybrid
Machine state forecasting
forecasting model, namely NARX–ARMA model, is carried out to obtain the forecasting results in which
NARX network model which is suitable for nonlinear issue is used to forecast the deterministic compo-
nent and ARMA model are used to predict the error component due to appropriate capability in linear
prediction. The final forecasting results are the sum of the results obtained from these single models.
The performance of the NARX–ARMA model is then evaluated by using the data of low methane compres-
sor acquired from condition monitoring routine. In order to corroborate the advances of the proposed
method, a comparative study of the forecasting results obtained from NARX–ARMA model and traditional
models is also carried out. The comparative results show that NARX–ARMA model is outstanding and
could be used as a potential tool to machine state forecasting.
Ó 2009 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2009.10.020
H.T. Pham et al. / Expert Systems with Applications 37 (2010) 3310–3317 3311
techniques can be adequate in modeling and forecasting. Model- multilayer perceptron. The structure of an NARX network is de-
based techniques can adequately capture the linear component picted in Fig. 1.
of time series while data-driven based techniques are highly flex- Basically, NARX network is trained under one out of two
ible in modeling the nonlinear components. Accordingly, numer- models:
ous hybrid models have been depicted to provide the investors Parallel (P) mode: the output is fed back to the input of the feed-
with more precise prediction. For instance, Zhang (2003) combined forward neural network as part of the standard NARX architecture:
autoregressive integrated moving average (ARIMA) model and
^ðt þ 1Þ ¼ ^f ½yp ðtÞ; uðtÞ; W ¼ ^f ½y
y ^ðtÞ; y
^ðt 1Þ; . . . ; y
^ðt ny þ 1Þ;
neural network model to forecast three well-known time series
sets that were sunspot data, Canadian lynx data and the British uðtÞ; uðt 1Þ; . . . ; uðt nu þ 1Þ; W ð2Þ
pound/US dollar exchange rate data. Inde and Trafalis (2006) pro-
Series-parallel (SP) mode: the output’s regressor is formed only by
posed a hybrid model including parametric techniques (e.g. ARI-
actual values of the system’s output:
MA, vector autoregressive) and nonparametric techniques (e.g.
support vector regression, artificial neuron networks) for forecast- ^ðt þ 1Þ ¼ ^f ½ysp ðtÞ; uðtÞ; W ¼ ^f ½yðtÞ; yðt 1Þ; . . . ; yðt ny þ 1Þ;
y
ing the exchange market. A hybrid of ARIMA and support vector
uðtÞ; uðt 1Þ; . . . ; uðt nu þ 1Þ; W ð3Þ
machines was successfully presented by Pai and Lin (2005) for
predicting stock prices problems. Other outstanding hybrid ap- As mentioned above, NARX network inputs include the regres-
proaches could be found in Rojas et al. (2008), Tseng, Yu, and Tzeng sors of inputs and outputs of system while a time series is one or
(2002), Valenzuela et al. (2008). Most of these hybrid models were more measured output channels with no measured input. Hence,
implemented as a following process: first, the model-based tech- the forecasting abilities of the NARX network may be limited when
nique was used to predict the linear relation, then the data-driven applying for time series data without regressor of inputs. In this
based technique was utilized to forecast the residuals between ac- kind of application, the tapped-delay line over the input signal is
tual values and predicted results obtained from previous step. The eliminated, thus the NARX is reduced to the plain focused time-de-
final results were the sum of results gained each model. Further- lay neural network architecture (Lin, Horne, Tino, & Giles, 1997):
more, these hybrid approaches merely regarded as short-term
^ðt þ 1Þ ¼ f ½yðtÞ; yðt 1Þ; . . . ; yðt ny þ 1Þ; W
y ð4Þ
prediction methodology.
In this study, an improved hybrid forecasting model is proposed According to Menezes and Barreto (2006), a simple strategy
for long-term prediction the operating states of machine. The pre- based on Takens’ embedding theorem was proposed for solving
diction strategy used here is recursive which is one of the strate- this problem. This strategy allows the computational abilities of
gies mentioned in Sorjamaa, Hao, Reyhani, Ji, and Lendasse the original NARX network to be fully exploited in nonlinear time
(2007). This forecasting model involving nonlinear autoregressive series prediction tasks and is described as following processes:
with exogenous input (NARX) (Leontaritis & Billings, 1985) and Firstly, the input signal regressor, denoted by u(t), is defined by
autoregressive moving average (ARMA) (Box & Jenkins, 1970) is the delay embedding coordinates:
novel in the following aspects: (1) vibration data indicating the
state of machine is divided into deterministic component and error
uðtÞ ¼ ½yðtÞ; yðt sÞ; . . . ; yðt ðdE 1ÞsÞ ð5Þ
component that is the residual between the actual data and deter- where dE = nu is embedding dimension and s is embedding delay.
ministic component. NARX and ARMA are simultaneously em- Secondly, since the NARX network can be trained in two differ-
ployed to forecast the former and the latter, respectively. The ent modes, the output signal regressor y(t) can be written as
final forecasting results are the sum of results obtained from single follows:
model; (2) long-term forecasting, which is still a difficult and chal-
lenging task in time series prediction domain, is applied. ysp ðtÞ ¼ ½yðtÞ; yðt 1Þ; . . . ; yðt ny þ 1Þ ð6Þ
Additionally, the number of observations used as the input for ^ðtÞ; y
yp ðtÞ ¼ ½y ^ðt 1Þ; . . . ; y
^ðt ny þ 1Þ ð7Þ
forecasting model, so-called embedding dimension, is the problem
where the output regressor y(t) for the SP mode in Eq. (6) contains
often encountered in time series forecasting techniques. Embed-
ny past values of the actual time series, while the output regressor
ding dimension could be estimated by using either Cao’s method
(1997) or false nearest neighbor method (FNN) (Kennel, Brown, &
Abarbanel, 1992). However, FNN method depends on the chosen
parameters wherein different values lead to different results. Fur- u (t )
thermore, FNN method also depends on the number of available
observations and is sensitive to additional noise. Cao’s method z −1
u ( t − 1)
overcomes the shortcomings of the FNN approach and therefore,
it is chosen in this study. z −1 y (t )
u (t − 2)
2. Background knowledge
z −1
2.1. Nonlinear autoregressive model with exogenous inputs (NARX) u ( t − nu )
y(t) for the P mode in Eq. (7) contains ny past values of the estimated Eðy1 Þ ¼ Eðy2 Þ ¼ ¼ Eðyt Þ ¼ l
time series. Vðy1 Þ ¼ Vðy2 Þ ¼ ¼ Vðyt Þ ¼ r2 ð11Þ
For a suitably trained network, these outputs are estimates of
Cov ðyt ; ytk Þ ¼ Cov ðytþ1 ; ytkþ1 Þ ¼ ck
previous values of y(t + 1), and should obey the following predic-
tive relationships implemented by the NARX network: From the conditions in the Eq. (11), the covariances are func-
^ðt þ 1Þ ¼ f ½ysp ðtÞ; uðtÞ; W tional only of the lag k. These are usually called autocovariances.
y ð8Þ
The autocorrelations, denoted as qk, can be derived only depend
^ðt þ 1Þ ¼ f ½yp ðtÞ; uðtÞ; W
y ð9Þ on the lag
The NARX networks trained according to Eqs. (8) and (9) are de- Cov ðY ; Y
t tþk Þ E½ðY t lÞðY tþk lÞ ck
noted onwards by NARX-SP and NARX-P networks, respectively. qk ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffip ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ¼ ð12Þ
VarðY t Þ VarðY tþk Þ rky c0
2.2. Autoregressive moving average (ARMA) The autocorrelations considered as a function of k are referred to as
the ACF. Note that since:
ARMA (p, q) prediction model for time series yt is given as ck ¼ Cov ðY t ; Y tk Þ ¼ Cov ðY tk ; Y t Þ ¼ Cov ðY t ; Y tþk Þ ¼ ck ð13Þ
follows:
It follows that ck = ck, and only the positive half of the ACF is usu-
X
p X
q
yt ¼ c þ ui yti þ /j etj þ et ð10Þ ally given.
i¼1 j¼1 In practice, due to a finite time series with N observations, the
estimated autocorrelation can be only obtained. If rk denotes the
where c is a constant, p is the number of autoregressive orders, q is estimated autocorrelation coefficient, the formula to obtain these
the number of moving average orders, ui is autoregressive coeffi- parameters is
cients, /j is moving average coefficients and et is a normal white Pnk
noise process with zero mean and variance r2. l
t¼1 ðY t ÞðY tþk lÞ
rk ¼ Pn 2
ð14Þ
Box and Jenkins (1970) proposed three iterative steps to build l
t¼1 ðY t Þ
ARMA models for time series: model identification, parameter esti-
mation and diagnostic checking. The elaborate information of each Then the partial ACF can be attained as
step could be found in Zhang (2003). In order to determine the or- wt ¼ ðY t lÞ ¼ U1k wt1 þ U2k wt2 þ þ Ukk wtk þ et ð15Þ
ders of ARMA model, autocorrelation function (ACF) and partial
autocorrelation function (PACF) are used in conjunction with the
Akaike information criterion. Other selection technique in associ- 3. Improved hybrid model for long-term forecasting
ated with ACF and PACF for estimating the orders of ARMA model
is maximum likelihood estimation (MLE) (Ljung & Box, 1987) Vibration data which is used to indicate the state of machine is
which is used in this study. not easy to be captured due to its complexity. Hence, none of
For a weak stationary stochastic process, the first and second ARMA and NARX is a suitable model for forecasting this kind of
moments exist and do not depend on time: data. By using NARX network, the high noise of this data leads to
difficult convergence if the number of neurons is small or over-fit- 4. Proposed forecasting system
ting if the number of neurons is large. On the other hand, ARMA
model is not able to apply well for the data which includes nonlin- In order to forecast the future states of machine, the proposed
ear components and is inadequate the stationary condition. In this system comprises four procedures sequentially as shown in
paper, an improved hybrid model is proposed in which the vibra- Fig. 3, namely, data acquisition, building model, validating model,
tion data is divided into two components: deterministic and error. and forecasting. The role of each procedure is explained as follows:
The deterministic component x = [x1, x2, . . . , k, . . . , xt1] is obtained
from a time series data y = [y1, y2, . . . , yk, . . . , yt] by using filtering Step 1 Data acquisition: this procedure is used to obtain the
technique, where xk is described as: vibration data from machine condition. This data is then split
into two parts: training set and testing set. Different data is
yk1 þ yk þ ykþ1 used for different purposes in the prognosis system. Training
xk ¼ ; k ¼ 1; 2; 3; . . . ; t 1 ð16Þ
3 set is used for creating the forecasting models whilst testing
set is utilized to test the trained models.
The error component e = [e1, e2, . . . , ek, . . ., et1]is the residual be- Step 2 Building model: Training data is separated into two com-
tween y and x, where ek = yk xk, k = 1, 2, 3, . . . , t 1. The deter- ponents: deterministic component and error component. They
ministic component is degradation indicator which describes are used to build NARX–ARMA model as the process mentioned
clearly the machine’s health. This component is suitably captured in the previous section.
by NARX network. ARMA model is used to analyze the error compo- Step 3 Validating model: this procedure is used for measuring the
nent which describes the appearance of uncertain parts. The process performance capability.
of m step-ahead prediction using this proposal is shown in Fig. 2 Step 4 Forecasting: long-term forecasting method is used to fore-
cast the future states of machine. The forecasted results are
measured by the error between forecasted values and actual
values in the testing set.
5.1. Experiments
Table 1 3
Description of system.
Acceleration (g)
Bearing NDE:#6216, DE:#6216 Bearing Thrust: 7321 BDB
RPM 3565 rpm Radial: Sleeve type
1.5
1.4 1
1.2
0.5
1
0
0 200 400 600 800 1000 1200
Acceleration (g)
0.8 Time
0.4
2.5
0.2
2
0
0 200 400 600 800 1000 1200
Time 1.5
Acceleration (g)
5.2. Results
-0.5
0 100 200 300 400 500 600 700
In order to build the forecasting model, 719 points of peak Time
acceleration data and 749 points of envelope acceleration data
are used. The remaining points of each data are then utilized to test Fig. 7. The filtered envelope acceleration data of low methane compressor.
the forecasting model. Additionally, the root mean square error
(RMSE) given in Eq. (17) is employed to evaluate forecasting
capability 0.9
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PN 0.8
i¼1 ðyi yi Þ
^ 2
RMSE ¼ ð17Þ
N 0.7
where N represents the number of data points, yi is actual value and 0.6
Acceleration (g)
1.2 1
Sample Autocorrelation
0.8 0.5
E1, E2
0.6
0.4 0
0.2
E1
E2
0 -0.5
2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 16 18 20
Dimension (d) Lag
Fig. 9. The value E1 and E2 of deterministic component of envelope data. Fig. 11. ACF of error component of envelope acceleration data.
1.2 1
1
Sample Autocorrelation
0.5
0.8
E1,E2
0.6
0
0.4
0.2
E1
E2 -0.5
0 2 4 6 8 10 12 14 16 18 20
2 4 6 8 10 12 14 Lag
Dimension (d)
Fig. 12. ACF of error component of peak acceleration data.
Fig. 10. The value E1 and E2 of deterministic component of peak acceleration data.
Table 3
The RMSE values of forecasting results of ARMA, NARX and NARX–ARMA.
3.5 Finally, the final forecasting values of the hybrid model are the
Training Testing sum of the results obtained from NARX-SP and ARMA models.
3 Table 3 shows a summary of the RMSE values of three models
applied for peak and envelope acceleration data. In this table, all
2.5
the RMSEs of the NARX–ARMA model are vastly superior to the
2 other traditional models in that it is more accurate in both cases
Acceleration (g)
0.8
References
0.6
Box, G. E. P., & Jenkins, G. (1970). Time series analysis. Forecasting and control. San
Francisco, CA: Holden-Day.
0.4 Cao, L. Y. (1997). Practical method for determining the minimum embedding
dimension of a scalar time series. Physical D, 110, 43–50.
0.2 Inde, H., & Trafalis, T. B. (2006). A hybrid model for exchange rate prediction.
Real value
Decision Support Systems, 42, 1054–1062.
Predicted Value Kennel, M. B., Brown, R., & Abarbanel, H. D. I. (1992). Determining embedding
0
0 200 400 600 750 1000 1200 dimension for phase-space reconstruction using a geometrical construction.
Time Physical Review A, 45, 3403–3411.
Leontaritis, I. J., & Billings, S. A. (1985). Input–output parametric models for non-
linear systems. International Journal of Control, 41, 303–344.
Fig. 14. The forecasting results of NARX–ARMA model for peak acceleration data
Lin, T., Horne, B. G., Tino, P., & Giles, C. L. (1997). A delay damage model selection
using one-step-ahead.
algorithm for NARX neural networks. IEEE Transactions on Signal Processing,
45(11), 2719–2730.
Liu, J., Wang, W., & Golnaraghi, F. (2009). A multi-step predictor with a variable
tionary if its autocorrelation structure is constant over time or the input pattern for system state forecasting. Mechanical Systems and Signal
lag-1 autocorrelation is zero or negative. Figs. 11 and 12 which de- Processing, 23(5), 1586–1599.
Ljung, L., & Box, G. E. P. (1987). System identification theory for the user. Englewood
pict the ACF of envelope and peak acceleration data show that the
Cliffs, NJ: Prentice-Hall.
error component is satisfied the requirement of stationary condi- Menezes, J. M. P., & Barreto, G. A. (2006). A new look at nonlinear time series
tion. Thus, it can be directly applied to generate ARMA model with- prediction with NARX recurrent neural network. In Proceeding of the ninth
Brazilian symposium on neural networks (pp. 160–165).
out necessitating higher order of differencing. Basing on ACF, PACF
Pai, P. F., & Lin, C. S. (2005). A hybrid ARIMA and support vector machines model in
and experimental results, ARMA (3, 4) model for envelope acceler- stock price forecasting. Omega, 33, 497–505.
ation data and ARMA (3, 3) for peak acceleration data are chosen in Rojas, I., Valenzuela, O., Rojas, F., Guillen, A., Herrera, L. J., Pomares, H., et al. (2008).
this study. Furthermore, MLE is used to estimate the model param- Soft-computing techniques and ARMA model for time series prediction.
Neurocomputing, 71, 519–537.
eters ui , /j . Sorjamaa, A., Hao, J., Reyhani, N., Ji, Y., & Lendasse, A. (2007). Methodology for long-
term prediction of time series. Neurocomputing, 70, 2861–2869.
H.T. Pham et al. / Expert Systems with Applications 37 (2010) 3310–3317 3317
Tran, V. T., Yang, B. S., Oh, M. S., & Tan, A. C. C. (2008). Machine condition prognosis models for time series prediction. Fuzzy Sets and Systems, 159,
based on regression trees and one-step-ahead prediction. Mechanical Systems 821–845.
and Signal Processing, 22(5), 1179–1193. Wang, W. (2007). An adaptive predictor for dynamic system forecasting. Mechanical
Tseng, F. M., Yu, H. C., & Tzeng, G. H. (2002). Combining neural network model with Systems and Signal Processing, 21, 809–823.
seasonal time series ARIMA model. Technological Forecasting & Social Change, 69, Wang, W. Q., Golnaraghi, M. F., & Ismail, F. (2004). Prognosis of machine health
71–87. condition using neuro-fuzzy system. Mechanical System and Signal Processing,
Vachtsevanos, G., & Wang, P. (2001). Fault prognosis using dynamic wavelet neural 18, 813–831.
networks. AUTOTESTCON Proceedings, IEEE Systems Readiness Technology Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural
Conference, 857–870. network model. Neurocomputing, 50, 159–175.
Valenzuela, O., Rojas, I., Rojas, F., Pomares, H., Herrera, L. J., Guillen, A.,
et al. (2008). Hybridization of intelligent techniques and ARIMA