You are on page 1of 23

M.

Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Detection of outliers in integer-valued AR


models

M. Eduarda Silva1,3 Isabel Silva2,3

1 Departamento de Matemtica Aplicada, Faculdade de Cincias, Universidade do Porto, Porto, Portugal

2 Departamento de Engenharia Civil, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal

3 Centro de Investigao e Desenvolvimento em Matemtica e Aplicaes (CIDMA), Universidade de Aveiro, Aveiro, Portugal

ERCIM 2010

ERCIM 2010 1 / 23
M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Content

Motivation
Outliers in INAR(1) models
Wavelet analysis
Algorithm for outlier time point detection
Simulation study
Application
Remarks and future work

ERCIM 2010 2 / 23
M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Motivation
Problem in time series modeling: to detect outliers

Model parameter estimation biases,


Invalid inferences.

Outliers in time series [Fox, 1972]


Additive outliers (AO): external error or exogenous change at a certain time
point.
Innovational outliers (IO): internal change or endogenous effect of the noise
process affects all the subsequent observations.

Focus in the literature


Estimation and inference of time series parameters in the presence of outliers.
For ARMA processes, these problems are well documented in the literature.

Motivation ERCIM 2010 3 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Outliers in INAR(1) models

INAR(1) processes [McKenzie, 1985, 1988; Al-Osh and Alzaid, 1987; Du and Li, 1991]

Xt = Xt1 + et ,
Xt1
= Yt,j + et ,
j=1

(Yk,j ) sequence of Bernoulli r.v. (offspring) with mean [0, 1],

(et ) sequence of i.i.d. non-negative integer-valued r.v. (arrival process).

Poisson INAR(1) process: (et ) Po( ).

Outliers in INAR(1) models ERCIM 2010 4 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Outliers in INAR(1) models


257 observations, = 0.3, = 1, S = 100, = 5 = 6
INAR(1) with finitely many AO 7
INAR(1) with AO

Yt = Xt + Ii=1 t,si i , 6
INAR(1)

(Xt )tZ INAR(1) process,


4

I N, si N, si 6= sj for i 6= j,
3

i N,
2

k,m = 1, if k = m,
1
k,m = 0, if k 6= m.
0
0 50 100 150 200 250

Outliers in INAR(1) models ERCIM 2010 5 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Outliers in INAR(1) models

INAR(1) with finitely many IO


257 observations, = 0.3, = 1, S = 100, = 5 = 6
Yt = Yt1 + t , 8
INAR(1) with IO
INAR(1)
7

t = et + Ii=1 t,si i , 6

5
(et )tZ i.i.d. non-negative
4
integer-valued r.v.,
3
I N, si N, si 6= sj for i 6= j,
2
i N,
1

k,m = 1, if k = m,
0
k,m = 0, if k 6= m.
0 50 100 150 200 250

Outliers in INAR(1) models ERCIM 2010 6 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Outliers in INAR(1) models

Outliers in integer-valued models in the literature


Fokianos and Fried (2010): detection and estimation of intervention effects in
INGARCH(1, 1) models.

Barczy et al. (2010): CLS estimation of the INAR(1) model parameters


contaminated with outliers, assuming that the time points of the outliers are
known, but their sizes are unknown.

Our objective
We propose a method based on wavelets to identify the time point of the occurrence
of the outlier (AO and IO) for Poisson INAR(1) processes.

Outliers in INAR(1) models ERCIM 2010 7 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Wavelet analysis

Wavelets are a family of basis functions that are used to express and approximate other
functions.

They combine properties such as orthonormality, different degrees of smoothness,


localization in time and scale, compact support and fast implementation.

Wavelet analysis consists in decomposing a signal into shifted and scaled versions of the
mother wavelet.

Wavelet analysis is able to look at the data at different scales or resolutions, revealing
aspects of the data such as changes in variance, level changes and other discontinuities.

Thus, wavelets are useful for outlier detection [Bilen and Huzurbazar, 2002; Gran and Veiga, 2010].

Wavelet analysis ERCIM 2010 8 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Wavelet analysis
Consider a real-valued function () defined on R satisfying
Z
(u) du = 0,
Z

2 (u) du = 1,

Z
(f )|2
|
0< df < (f ) is the Fourier transform of ()).
(
0 f
Wavelets consist of dilations and translations of (), called mother wavelet:
 
1 tb
a,b (t) = ,
a a
b R( translation parameter), a R+ ( dilation parameter).

Continuous Wavelet Transform (CWT)


Let x(t) be a real-valued function. Then the CWT of x(t) is given by
Z Z  
1 tb
W(a, b) = a,b (t)x(t) dt = x(t) dt.
a a
Wavelet analysis ERCIM 2010 9 / 23
M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Wavelet analysis
Discrete Wavelet Transform (DWT): Multiresolution Analysis
DWT: parameters take only discrete values.
If a = 2m and b = n2m , m, n Z, then there exist such that
1  t 
m,n (t) = m n
2m 2
constitute an orthonormal basis for L 2 (R).

In practice
Pyramid algorithm or two-channel subband coder [Mallat, 1989]: filters of different
cutoff frequencies are used to analyze the signal at different scales:

The signal is passed through a series of high pass filters to analyze the high frequencies
Detail coefficients (low-scale high-frequency components),

The signal is passed through a series of low pass filters to analyze the low frequencies
Approximation coefficients (high-scale low-frequency components).

Wavelet analysis ERCIM 2010 10 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Wavelet analysis

Which wavelets for outliers detection?


Haar wavelet
0.8

0.6

1/ 2, 1 < t 0;
0.4


0.2

(t) = 0 < t 1;
0

1/ 2, 0.2

0.4

0, otherwise. 0.6

0.8
1 0 1

The high- and low-pass filters for Haar correspond to moving differences and
moving averages.
Expect large wavelet coefficients in the presence of outliers.
In practice, in the presence of one outlier first level decomposition is enough.

Wavelet analysis ERCIM 2010 11 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Algorithm for outlier time point detection


1. Apply DWT to series of standardized residuals, zi , and obtain first level detail
coefficients, D1 = (d1 , d2 , . . . , dN/2 ).
2. Set the threshold k1 (discussed next).
3. Calculate S= {S1 , . . . , SI } (set of ordered indices containing the positions of D1
which are above the threshold) by using two approaches:
HB (Based on Bilen and Huzurbazar, 2002)
(i) Find S= {S1 , . . . , SI }, the set of indices such that |di | > k1 .
GV (Based on Gran and Veiga, 2010)
(i) Find dmax = max1jN/2 {|dj | > k1 },
(ii) Set S as the position of dmax ,
(iii) Set dmax = 0 and let D11 = (d1 , d2 , . . . , dS1 , 0, dS+1 , . . . , dN/2 ),
(iv) Recompose the series of residuals by applying the inverse DWT with D11 ,
(v) Repeat steps 1. and (i) to (iv) until all |di | > k1 for all i. Let S= {S1 , . . . , SI }.
4. For each Si , the exact time point of the outliers in the series is t = 2Si or
t = 2Si 1 such that |Xt mean| is maximum.
Algorithm for outlier time point detection ERCIM 2010 12 / 23
M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Algorithm for outlier time point detection


Choosing the threshold, k1
Universal threshold: obtained for IID Gaussian data and conservative.
Our proposal:
Obtain the distribution of the maximum of the detail coefficients absolute value for
standardized residuals of Poisson INAR(1) through Monte Carlo.
Choose k1 as the (1 )% percentile of that distribution.

What we did:
Simulated 20000 replications of Poisson INAR(1) process for various , and N.
For each replication, fitted the model and computed the standardized residuals, zi .
Obtained the detail coefficients for zi and chose the maximum in absolute value.

N 64 128 256 512 1024 2048


k10.05 3.02 3.28 3.51 3.73 3.93 4.09

Algorithm for outlier time point detection ERCIM 2010 13 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Algorithm for outlier time point detection


INAR(1) with AO
7

0
50 100 150 200 250
Standardized residual INAR(1) with AO
5

5
50 100 150 200 250
|1st level Detail coefficient for Haar|
5

0
20 40 60 80 100 120

Algorithm for outlier time point detection ERCIM 2010 14 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Algorithm for outlier time point detection


INAR(1) with IO
10

0
50 100 150 200 250
Standardized residual INAR(1) with IO
5

5
50 100 150 200 250
|1st level Detail coefficient for Haar|
5

0
20 40 60 80 100 120

Algorithm for outlier time point detection ERCIM 2010 15 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Simulation study

N = 64, 128, 512, 1024, 2048


= 0.1, 0.3, 0.5, 0.7, 0.9
= 1, 3, 5, 7, 9
1 or 3 outliers (AO and IO) randomly placed, = 3 , 5

Percentage of correct detections, in 1000 repetitions


Average number of false outliers detected, in 1000 repetitions

Simulation study ERCIM 2010 16 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Simulation study
Illustration of the simulation result for one outlier
AO IO
( , ) N % Correct Average False % Correct Average False
(.3, 1) 128 3 = 4 13.9 0.081 16.5 0.091
128 5 = 6 86.6 0.044 53.9 0.049
128 10 = 12 99.6 0.002 99.5 0.012
512 3 = 4 25.7 0.187 11.4 0.206
512 5 = 6 84.6 0.133 46.0 0.179
512 10 = 12 99.8 0.070 99.9 0.077
(.3, 5) 128 3 = 9 33.8 0.045 15.4 0.041
128 5 = 14 58.7 0.016 62.4 0.023
128 10 = 27 100 0 99.8 0
512 3 = 9 28.2 0.065 11.3 0.071
512 5 = 14 88.2 0.050 48.2 0.056
512 10 = 27 99.9 0.024 99.9 0.019

Simulation study ERCIM 2010 17 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Simulation study
Illustration of the simulation result for one outlier
AO IO
( , ) N % Correct Average False % Correct Average False
(.7, 1) 128 3 = 6 36.6 0.068 38.0 0.051
128 5 = 10 92.6 0.067 94.1 0.021
128 10 = 19 100 0 100 0.002
512 3 = 6 28.7 0.183 28.4 0.137
512 5 = 10 98.2 0.098 88.8 0.106
512 10 = 19 100 0.909 100 0.035
(.7, 5) 128 3 = 13 85.0 0.120 32.9 0.025
128 5 = 21 98.1 0.019 84.8 0.004
128 10 = 41 100 0 100 0
512 3 = 13 80.2 0.192 27.3 0.057
512 5 = 21 85.8 0.281 92.2 0.039
512 10 = 41 100 0 100 0.006

Simulation study ERCIM 2010 18 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Simulation study
Illustration of the simulation result for three outliers
AO IO
( , ) N % Correct Average False % Correct Average False
(.3, 1) 128 3 = 4 21.2 0.034 10.6 0.047
128 5 = 6 37.8 0.006 31.1 0.007
128 10 = 12 93.8 0 87.4 0.000
512 3 = 4 22.3 0.116 10.6 0.131
512 5 = 6 51.4 0.077 37.5 0.108
512 10 = 12 99.4 0.004 98.9 0.013
(.3, 5) 128 3 = 9 16.6 0.018 9.6 0.013
128 5 = 14 58.6 0 31.3 0.001
128 10 = 27 94.1 0 89.1 0
512 3 = 9 18.2 0.061 9.1 0.042
512 5 = 14 81.7 0.021 43.0 0.027
512 10 = 27 99.7 0.002 99.3 0.003

Simulation study ERCIM 2010 19 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Simulation study
Illustration of the simulation result for three outliers
AO IO
( , ) N % Correct Average False % Correct Average False
(.7, 1) 128 3 = 6 54.5 0.151 20.0 0.013
128 5 = 10 67.7 0.002 62.9 0.004
128 10 = 19 99.2 0 96.2 0
512 3 = 6 58.9 0.344 23.2 0.112
512 5 = 10 99.2 0.035 79.1 0.032
512 10 = 19 100 0.047 100 0.002
(.7, 5) 128 3 = 13 31.7 0.090 17.8 0.006
128 5 = 21 80.3 0.015 53.9 0
128 10 = 41 99.1 0 97.3 0
512 3 = 13 37.6 0.178 20.3 0.033
512 5 = 21 82.6 0.180 80.3 0.014
512 10 = 41 100 0.084 100 0.001

Simulation study ERCIM 2010 20 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Application
Dataset: Number of different IP addresses (in periods of 2 min length) at the server
of the Department of Statistics of the University of Wrzburg on November 29th,
2005, between 10 a.m. and 6 p.m. [Wei , 2007]

8 N = 241
7 x = 1.32, 2 = 1.39
November 29th, 2005, 10 a.m. to 6 p.m.

k10.05 = 3.52
Number of different IP addresses

5
Result from the application of the
4
algorithm: t = 224
3

2
Barczy et al. (2010): = 6.79
1

0
0 25 50 75 100 125 150
time (periods of 2 minutes)
175 200 225 241 true x224 1

Application ERCIM 2010 21 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

Remarks and future work

Parametric procedure that detects the time points of the outliers for Poisson
INAR(1) models.
The choice of the threshold:
why does the threshold depend on ?
what if the arrivals are not Poisson?
to use corrected thresholds for integer-valued data based on the universal threshold
[Kolaczyk, 1999]

Detecting patches of outliers: detail coefficients for lower levels


How to discriminate between the two types of outliers?
How to prove that the proposed method converge to the solution of the problem?

Remarks and future work ERCIM 2010 22 / 23


M. Eduarda Silva and Isabel Silva Detection of outliers in integer-valued AR models

References

Gran, A. and Veiga, H. (2010).


Al-Osh, M.A. and Alzaid, A.A. (1987).
Wavelet-based detection of outliers in volatility models.
First-order integer-valued autoregressive (INAR(1)) process.
Computational Statistics and Data Analysis, Vol. 54, pp. 2580-2593
Journal of Time Series Analysis, Vol. 8, pp 261-275.
Kolaczyk, E.D. (1999).
Barczy, M.; Ispny, M.; Pap, G.; Scotto, M. and Silva, M.E. (2010).
Wavelet shrinkage estimation of certain Poisson intensity signals using
Innovational Outliers in INAR(1) Models.
corrected thresholds.
Communications in Statistics - Theory and Methods, Vol. 39, pp 3343-3362.
Statistica Sinica, Vol. 9, pp 119-135.
Bilen, C and Huzurbazar, S. (2002).
Mallat, S. (1989).
Wavelet-based detection of outliers in time series.
Multiresolution approximations and wavelet orthonormal bases of L 2 (R) .
Journal of Computational and Graphical Statistics, Vol. 11, pp 311-327.
Transactions of the American Mathematical Society, Vol. 315, pp. 69-87.
Daubechies, I. (1992).
McKenzie, E. (1985).
Ten Lectures on Wavelets.
Some simple models for discrete variate time series.
Philadelphia, PA: SIAM.
Water Resources Bulletin, Vol. 21, pp. 645-650.
Du, Jin-Guan and Li, Yuan (1991).
McKenzie, E. (1988).
The integer-valued autoregressive (INAR(p)) model.
Some ARMA models for dependent sequences of Poisson counts.
Journal of Time Series Analysis, Vol. 12, pp 129-142.
Advances in Applied Probability, Vol. 20, pp. 822-835.
Fokianos, K. and Fried, R. (2010).
Percival, D., Walden, A. (2000).
Interventions in INGARCH processes.
Wavelet methods for time series analysis.
Journal of Time Series Analysis, Vol. 31, pp 210-225.
Cambridge Series in Statistical and Probabilistic Mathematics.
Fox, A.J. (1972).
Wei , C.H. (2007).
Outliers in Time Series.
Controlling correlated processes of Poisson counts.
Journal Royal Statistical Society, Serie B, Vol. 34, pp 350-363.
Quality and Reliability Engineering International, Vol. 23, pp. 741-754.

References ERCIM 2010 23 / 23

You might also like