You are on page 1of 12

Renewable Energy 99 (2016) 1287e1298

Contents lists available at ScienceDirect

Renewable Energy
journal homepage: www.elsevier.com/locate/renene

On the use of robust regression methods in wind speed assessment


Takvor H. Soukissian a, *, Flora E. Karathanasi a, b
a
b

Institute of Oceanography, Hellenic Centre for Marine Research, Anavyssos, Greece


Department of Naval Architecture and Marine Engineering, National Technical University of Athens, Zografos, Athens, Greece

a r t i c l e i n f o

a b s t r a c t

Article history:
Received 29 August 2015
Received in revised form
16 May 2016
Accepted 5 August 2016

Wind climate analysis and modelling is of most importance during site selection for offshore wind farm
development. In this regard, reliable long-term wind data are required. Buoy measurements are
considered as a reference source in relevant applications including evaluation and calibration of wind
data obtained from less reliable sources, combined assessment, blending and homogenization of multisource wind data, etc. Most of these applications are based on regression techniques elaborated by using
the principle of ordinary least squares (OLS). However, wind data usually contain several outliers, which
may question the validity of the regression analysis, if not properly considered. This study is focused on
the implementation of the most important robust regression methods, which can identify and reveal
outliers, and retain at the same time their efciency. Long-term reference wind data series obtained from
buoys at six locations in the Mediterranean Sea are used to calibrate hindcast (model) wind data by
applying robust methods and OLS. The obtained results are compared according to several statistical
measures. The effects of the calibration methods are also assessed with respect to the available wind
power potential. The results clearly suggest that least trimmed squares and L1-estimator perform in all
respects better than OLS.
2016 Elsevier Ltd. All rights reserved.

Keywords:
Offshore wind energy
Robust methods
Calibration
Outliers
Mediterranean sea

1. Introduction
1.1. General
The accurate modelling and description of the wind climate in
coastal or offshore areas is highly required either for research
purposes (e.g. modelling of atmospheric circulation patterns) or for
coastal and offshore applications, such as design and construction
of marine structures, coastal environmental management and
protection, site selection and feasibility analysis for offshore wind
farm development, estimation of the available offshore wind
resource, etc. Regarding offshore wind energy aspects, the most
essential requirement for the annual and seasonal climate modelling and the estimation of the inter-annual variability is the availability of long-term time series of wind characteristics. In this
connection, long-term wind data are often acquired by appropriately combining wind data from multiple sources; see e.g. Ref. [43].
Nowadays, long-term offshore wind data are available not only
through oceanographic buoy measurements, satellite sensors and

* Corresponding author.
E-mail address: tsouki@hcmr.gr (T.H. Soukissian).
http://dx.doi.org/10.1016/j.renene.2016.08.009
0960-1481/ 2016 Elsevier Ltd. All rights reserved.

numerical weather prediction (NWP) modelling but also through


accurate remote sensing measurement techniques such as SoDAR
and LiDAR; see e.g. Refs. [2,4,39]. Although the amount of wind data
sounds overwhelming and satisfactory, the corresponding temporal extent and spatial coverage is not always and everywhere sufcient for the accurate and reliable assessment of the offshore wind
regime and resource. Therefore, in practical applications, there is a
frequent need to utilize combined wind data originating from
different sources. Since each wind data source is characterized by
its shortcomings and limitations (including also inherent uncertainties), it is suggested to implement rst validation and calibration procedures based on the most reliable (and available) wind
data source and then proceed with the assessment.
The statistical tool for predicting and/or calibrating dependent
variables through linear relationships with associated independent
variables is, in principle, regression analysis. The statistical efciency, completeness and the ease of computation have rendered
ordinary least squares (OLS) and its variants the most common
technique for process modelling; see Ref. [43]. However, in practice,
a linear regression model describes only a portion of the observations; the occurrence of observations that exhibit inconsistency
with the bulk of the data is a very common and delicate issue in real
data analysis encountered in a variety of different applications,

1288

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

such as clinical trials, data cleansing, network intrusion, severe


weather prediction, geographic information systems, etc. As
noticed by Barnett and Lewis [3], the inconsistency is very crucial,
since it is a matter of subjective judgment on the part of the
observer whether or not he picks out some observations for scrutiny. In the relevant literature, such atypical data are named outliers and may have moderate to severe impact on the regression
model parameter estimates. As regards wind data, outliers are not
taken often into consideration despite their frequent occurrence,
since they can be either unnoticed or hidden in the usual OLS residual plots.1 As a consequence, the resulting regression model
could be highly misleading.

1.2. Outliers
Taking into consideration the position of an observation in a
scatter plot and the variable in which it corresponds, the conspicuous observations can be grouped in three classes; see also [37].
The outliers in the explanatory variables (xoutliers or leverage
points) can tilt the least-squares line due to the large effect on the
corresponding estimator. Leverage points do not necessarily have
to be outliers. The outliers in the response variable (youtliers) can
be considered as such, because they have large standardized residuals and rather large inuence on the least-squares line, since
they increase the magnitude of residuals. Finally, an outlier both in
the response and explanatory variables may be either a point with a
large standardized residual or a point that deviates from the linear
relationship set by the majority of the data or both. In this case, the
relationship between the two variables must be taken into account
for the detection of the outlier in question. Another signicant class
of atypical observations are the so-called inuential points, which
individually or jointly excessively inuence the regression equation
and are characterized by very high or very low values of x, see
Ref. [6].
Since outliers may seriously affect regression analysis outputs
and estimation of the relevant parameters, numerous procedures
have been developed for the detection and investigation of a single
or few outliers in linear regression. A well-known and straightforward statistical approach to deal with outliers is through diagnostic
methods, i.e. statistics generally based on classical estimates aiming
to the detection of outliers from the assumed model. This family of
techniques is often implemented by the same procedure: rst
delete each observation one at a time and in turn, examine if there
is any impact on the various calculated values. Furthermore, scatter
plots may sometimes contain as much information as diagnostics
about residuals and hat elements.2 However, these techniques
perform poorly in the presence of multiple outliers because of
masking and swamping effects.3
In contrast with the role of diagnostic measures, there is another
approach that takes into account deviations from idealized assumptions and attempts to temper the inuence of outliers and
inuential observations on the estimator. This approach is known
as robust regression and aims to produce estimators that are

1
A more general case that can be found in any wind data source is the one of
exceptional phenomena, where there are legitimately outlying data points. In this
situation, it is constructive to determine effectively the extreme values from a set of
outliers through an extreme value analysis, forming in this way the cornerstone of
all outlier detection algorithms as a nal step; see Ref. [1].
2
Hat elements are the diagonal elements of the least-squares projection matrix
or hat matrix.
3
Masking is the inefciency of the procedure to identify a set of outliers because
of the presence of another set, usually neighboring, while swamping is the mistake
of clean observations as outliers because of the presence of another group of
observations, usually distant.

insensitive
in
the
presence
of
these
points;
see
Refs. [11,12,15,16,18,34,37,44,46,47]. Compared with diagnostic
methods, Huber [18] mentions that robust methods are more
reliable and perform better, since they provide a middle ground
between rejecting and accepting a suspicious observation without
the analyst's subjective decision (to keep or remove a suspicious
outlier from the data sample).
Particularly, in wind energy applications, robust estimation
methods have been implemented in wind data ltering for the
optimization (operation and maintenance) of a wind farm with
promising results; see Refs. [26,38]. Moreover, a robust approach
was applied in order to estimate the two parameters of a Weibull
distribution on wind speed data with and without outliers; see
Ref. [21]. This analysis showed that the robust estimator performs
better compared to maximum likelihood estimator in both cases. In
Soukissian et al. [41], robust methods were applied and compared
with the traditional OLS approach through a rational evaluation
methodology for four different scenarios in order to examine the
effects of outliers in wind speed assessment.
In this work, robust regression methods are applied in detail for
the assessment of concurrent wind speed data obtained from buoys
and a high resolution NWP model at four offshore locations of the
Aegean Sea (eastern Mediterranean Sea). In order to assess the
proposed methods in diverse wind climates, the same methods
have been also applied at two locations of the Spanish waters
(western Mediterranean Sea). Specically, the main objective is to
evaluate, validate and correct wind data from a less accurate source,
namely NWP model data, by taking as reference source measured
data from oceanographic buoys located in the eastern and western
Mediterranean Sea. Apart from the classical OLS technique, several
robust regression methods are examined in order to evaluate the
performance of each method. As an additional strict test, the effects
that the different calibration relations have on the estimation of the
mean offshore wind power density compared to the values provided directly by the buoy measurements are also assessed.
This work is structured as follows: In the next section, the
theoretical background governing the most known estimators of
robust regression techniques is presented in detail. In Section 3, an
overview of the wind data sources used herewith is made along
with a primary statistical analysis of the concurrent wind data sets.
In Section 4, the evaluation methodology is described analytically,
and in Section 5, the correction relations of the different methods
for the six examined locations in the Mediterranean Sea are
compared in terms of some statistical measures. Furthermore, the
corresponding results are analytically discussed for each step of the
evaluation methodology. Finally, the concluding remarks are presented in Section 6.
2. Robust regression
2.1. Robust vs linear regression
Simple linear regression is based on the principle of the OLS
method, i.e. the minimization of the sum of squared residuals. The
OLS estimators have some important properties: they are linear,
unbiased, efcient (they have the minimum variance) and consistent. For a detailed review of OLS and simple linear regression
methods see Refs. [30,31]. However, all these properties hold inter
alia under the assumptions that the random error terms are statistically independent, normally distributed with zero mean. In
practice, these conditions are rarely satised or even examined and
normality is usually considered just as a convenient approximation.
Despite the elegant properties of the OLS method, it should be bear
in mind that it is clearly not robust to violations of its assumptions
and especially, deviations from the normality assumption.

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

1289

Moreover, OLS estimates are very sensitive to outliers, even in large


samples, leading to inefcient and biased results. For instance,
Rousseeuw and Leroy [34] presented many real data samples in
which the identication of big outliers failed by using OLS residuals
techniques. In Zaman et al. [51], it is noted that even a small percentage of bad or deviant observations in a very large sample can
change drastically the OLS coefcients and result in systematic
distortions of OLS estimates.
Robust methods have been introduced to provide relatively
insensitive, consistent and high efcient estimators, when there are
slight violations from the standard assumptions in the assumed
statistical model, and for the rational consideration of outliers in
regression analysis. In this regard, the use of robust methods is
essential in wind energy applications, since outliers are present in
the available data samples (see subsections 3.1 and 3.2) while some
of the main assumptions of OLS are suspicious or unrealistic, e.g.
the homoscedasticity assumption.
A reasonable question that may be asked at this point is why
robust regression techniques are not widely used? There is a
number of potential answers to this question. As is noted in Zaman
et al. [51], the main reasons are the following:

2.2.3. Asymptotic efciency


In robust regression, the relative efciency of an estimator is
expressed as the ratio of the smallest possible variance obtained
using a robust regression technique divided by the one obtained
from OLS. The OLS estimators are considered to be the most known
efcient estimators, if all conditions are fullled, since this method
possesses the minimum variance. Obviously, the ratio of the ideal
estimator is equal or close to unity. In the relevant literature, the
emphasis is on the asymptotic efciency. In general, the precision of
an asymptotically efcient estimator tends to the theoretical limit,
as the sample size increases. For an unbiased estimator, asymptotic
efciency is the limit of its efciency as the sample size n/ and
depends on the population (distribution).

1) There is a rather naive trust that large sample sizes make robust
techniques unnecessary.
2) A certainty that the outliers can be either detected by visual
inspection or by identifying unusual OLS residuals.
3) There is lack of expertise as regards the interpretation of results
from a robust analysis and lack of knowledge of the gains
available from such analysis.

Txi ; yi vxi Txi ; yi v;

Before analysing the robust estimators, it is important to refer to


the theoretical basis of modern robust statistics. Hence, in the
following subsection, the most widely known measures used to
determine the degree of robustness of any statistical procedure
with respect to model misspecication are dened.

2.2.4. Equivariance
Attaining the equivariance properties assures that the results of
the regression analysis will not alter in case of particular transformations of the data. There are two main types of equivariance4
for regression estimators. According to Rousseeuw and Leroy [34],
the specic properties are addressed below:
1. An estimator T is called regression equivariant, if

i 1; 2; ; n;

where v is any vector. This condition allows the selection of any


arbitrary values for the vector without any consequences in the
validation of the results.
2. An estimator T is called scale equivariant, if

Txi ; cyi cTxi ; yi ;

i 1; 2; ; n;

for any constant c. The above condition practically suggests that the
measurement units of the dependent variable y (with respect to the
measurement units of x) is irrelevant for the regression analysis.
2.3. Robust estimators

2.2. Measures of robustness


Each measure of robustness describes different characteristics of
the procedure, therefore they perform complementarily. In the next
subsections, a matrix notation is adopted denoted by boldface
letters.

We proceed to the description of the robust estimators that are


used in the context of this analysis, starting with the less robust but
simple computed estimators, namely the L-estimators, and
concluding to the highly robust and well-known MM-estimators.
For this, let us denote by b
i b
1; b
2 ; ; b
n the estimated residuals of
the regression line.

2.2.1. Breakdown point


b of
The breakdown point (called hereafter BDP) of an estimate b
the regression slope parameter b is the maximal amount of
contamination (proportion of atypical points) an estimator can
withstand before its bias breaks down (i.e. becomes too large). For
more detailed descriptions of the BDP see Refs. [8,13,14]. Practically,
the highest value of the BDP of an estimator one can hope is 50%,
because it is not possible to discriminate good observations from
outliers for higher values. A rather surprising result is that for OLS,
even one bad observation may have signicant inuence on the
results.

2.3.1. L-estimators
Any estimator that is a linear combination of the order statistics
P
Ln ni1 cni Xn:i , where cni are real constants and Xn:1   Xn:n
denote the ordered values of the sample observations
X1 ; ; Xn ; i 1; 2; ; n, is called an L-estimator. In this regard, an
alternative approach for the estimation of the regression parameters is to minimize the sum of the absolute values of the residuals,
i.e.

2.2.2. Inuence function


Whereas the BDP is a robustness measure of global sense, the
importance of the inuence function (called hereafter IF) is also
emphasized. IF measures the effect of innitesimal perturbations
on the estimator, [34]. Roughly speaking, IF reects the limiting
inuence of adding one more observation, x, to a very large sample;
see Ref. [49]. The analytical mathematical description of the IF is
presented in subsection 2.1.2 of [49].

instead of the root mean square error of the OLS method.


The above form is also referred to as the L1-estimator, while OLS
is sometimes called L2eestimator. It is proved that the L1-estimator
can deal with youtliers of a sample, but remains weak against

 min 
b
b 0 ;b
b1

n  
X
 
i ;
b

i 1; 2; ; n;

(1)

i1

4
There is an additional type of equivariance, called afne equivariance; see Ref
[34]; p. 116.

1290

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

xoutliers, which have greater inuence on the tting. Because of


that effect, the BDP of this estimator will tend to 0% (BDP1/n). A
more generalized method of the L1-estimator was proposed by
Koenker and Bassett [22] and is called teregression quantile
(0 < t < 1).
Another known L-estimator that has higher BDP is the least
trimmed squares (LTS) regression estimator developed by Rousseeuw [33]. This estimator is based on the function

 min 
b
b 0 ;b
b1

h 
X

b
2

i1

i:n

i 1; 2; ; n;

(2)

n
 . 
X
j b i s xi 0;

2.3.2. M-estimators
In Huber [17], the use of another approach of robust regression
was introduced, the M-estimators, which are a trade-off between
the efciency of OLS and the resistance of L1-estimators. This class
of estimators can also be regarded as a generalization of maximum
likelihood estimators. An M-estimator is dened by minimizing the
following function of the residuals:
n
 
X
r b i ;

i 1; 2; ; n;

(3)

i1

where r, is a continuous, non-negative, symmetric function with


a unique minimum at zero and is called objective function. The Mestimator is simply a general robust case that results in the OLS
estimator by appropriately dening function r,. The particular
objective function that was implemented in this work is the Huber
estimator, expressed by the following relation:

.
b
2i 2;

jb
ij  k
.
2
kjb
i j  k 2; jb
i j > k;

(4)

where k is called tuning constant and is generally picked to provide


high efciency at the normal distribution. For the Huber estimator
and 95% efciency, k1.345. Other known objective functions for
M-estimators are: a) the OLS estimator, b) the Andrew estimator, c)
the Welsch estimator, and d) the bi-square or Tukey estimator.
Differentiating (3) with respect to the parameters b and setting
to zero, the minimum value can be found by:
n
 
X
j b i xi 0;

i 1; 2; ; n;

(5)

i1

where j, is the derivative of r, and is called score function.


Solutions of equation (5) are called classical M-estimators or Huber

(6)

where s is a scale factor which should be estimated simultaneously.


A popular estimator for s is the (rescaled) median absolute deviation (MAD), which is dened:

where b
1:n  :::  b
n:n are the ordered squared residuals and h
is the number of the remaining observations after the trimming. In
general, h may depend on some trimming proportion a, for instance
h n1  a 1; see Ref. [33]. When h is approximately n/2, then
LTS achieves the best robustness properties. Moreover, LTS satises
the properties of regression and scale equivariance. Although LTS
has a relatively low asymptotic efciency, as stated by Croux et al.
[7], it still plays a role in the estimation of the parameters of other
more robust methods.
Other known L-estimators are the least median of squares (LMS)
that minimizes the median of the squared residuals. Some authors
claim that teregression quantile, LMS and LTS are also called
resistant regression methods. In fact, resistant methods use estimates that are not inuenced by unusual points, such as the median, instead of diminishing their inuence.

i 1; 2; ; n;

i1

s
2

rb i

estimators, [44]. Because they are not scale equivariant, the residuals should be standardized by means of some estimate of scale
s, i.e.:

MAD
;
0:6745

(7)

where
the
MAD
(of
the
residuals)
is
MAD medianfjb
i  medianfb
i gjg; i 1; 2; ; n. Since this estimator is based on the median, it is highly resistant to outlying
observations, with BDP 50%. Generally, M-estimators are statistically better than OLS with regard to resistance and robustness to
youtliers. In some cases, their performance is poor compared to
the latter, since they do not consider leverage points.
In order to facilitate the procedure, j is replaced by appropriate
weights that decrease as the size of the residual increases. To this
end, the weight function is wi wb
i jb
i =b
i , i1,2,,n, and
Equation (6) may then be written as:
n
X

 . 
i s xi 0;
wi b

i 1; 2; ; n:

(8)

i1

In practice, the M-estimates cannot be computed directly from


the data, because the weights depend upon the residuals, which in
turn, depend upon the estimates. As a result, they are computed
using an iteratively reweighted least squares (IRLS) algorithm.

2.3.3. MM-estimators
Before proceeding to the MM-estimators, the S-estimators are
described in brief, since they provide an initial estimate in more
complex robust regression methods.
The S-estimators were introduced by Rousseeuw and Yohai [36]
and were developed to improve the efciency of both LMS and LTS,
based on the estimates of scale. Specically, their objective functions are replaced by a more efcient scale estimator that is applied
to the residuals b
i in order to minimize their dispersion, and their
mathematical expression is

minsb
1 b; :::; b
n b:

(9)

The dispersion sb
1 b; :::; b
n b should satisfy the following
constraint:
n
b

1X
r i K;
n i1
s

i 1; 2; ; n;

(10)

where K is a constant taken often equal to EF r and F, represents


the standard normal distribution. S-estimators are regression and
scale equivariant, and can achieve high BDP with the appropriate
selection of the tuning constants. However, they cannot combine
simultaneously high asymptotic efciency (approximately 30%).
The class of MM-estimators was rst introduced by Yohai [50] in
the linear regression setting. When the errors are highly distributed, their purpose is twofold: i) high BDP, and ii) high efciency.
According to Yohai [50], the estimate is dened in three steps:
1. An initial regression estimate b* with high BDP (preferably 0.5)
is computed, such as LMS, LTS or S-estimators.

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

2. After computing the residuals b


i b* from step 1, an M-estimate
of scale is computed, namely sn. In addition, the objective
function r should be strictly increasing on [0,c], constant on
[c,] and K/r(c)0.5.
b is dened as any solution of
3. The MM-estimate, b,
n
h
i
X
j b i b=sn xi 0;

(11)

i1

which satises
n
n

 X
   
X
r b i b=sn 
r b i b* sn ;
i1

i 1; 2; ; n:

(12)

i1

It is obvious that at the nal stage an M-estimation is carried out


with only one extra condition. Thus, the IRLS can be applied to
compute the potential solution of (11) by keeping xed the measure
of the scale estimate sn in each iteration. Moreover, Yohai [50] proved
that the nal estimator will obtain the highest BDP (i.e. 0.5), if an
estimator with equal BDP is used in the rst stage. The objective
functions of stages 1, 2 and 3 can vary, since the two rst stages are
responsible for BDP and the third one for asymptotic efciency.
Other known robust estimators found in the relevant literature
are: the rank-based estimators (R-estimators) proposed by Jaeckel
[20]; the generalized M-estimators (GM-estimators) rst proposed
by Mallows [27,28] and the generalized S-estimators (GS-estimators) proposed by Croux et al. [7], which are based on the minimization of the GM-estimator of residual scale.

3. Wind data sources


3.1. In situ measurements
The highest quality and most accurate in situ measurements for
offshore wind energy projections are obtained from meteorological
masts. These masts are sited mainly on large platforms at heights of
approximately 50 m or higher and, in some cases, on coastal regions
(e.g. small islands). Buoys with shorter masts (less than 10 m,
usually 3 m) provide an alternative and less expensive, but also less
reliable, solution for the assessment of the offshore wind power
potential. However, in situ measurements suffer from data incorrectly recorded because of malfunction of the measuring device
(wind sensor), defects in the power supply, errors in the data entry
or during the measurement analysis process, etc. Moreover, wind
measurements are affected by external conditions, since measuring
devices operate in a dynamically changing environment; for
example, there can be deviations in the rotation movements (i.e.
roll, pitch and yaw) of the buoy due to the presence of sea waves or
currents of high intensity. Therefore, such wind measurements are
somehow biased to low values during intense sea-states; nevertheless, Gilhousen [9] and Taylor et al. [45] suggested that these
errors may be small compared to other sources of wind data. In any
case, the impact of waves on buoy wind measurements is an open
issue to be further examined and quantied. Despite the abovementioned deciencies of buoy measurements, which can reinforce the presence of outliers, they have historically been considered as the primary reference data source for the validation and
calibration of gridded wind data as a result of the increased measurement accuracy; see e.g. Refs. [10,43].
In the Greek Seas, a network of eleven oceanographic buoys,
which are deployed in deep water locations, operates within the
framework of the POSEIDON marine monitoring and forecasting
system since 2000 under the Hellenic Centre for Marine Research
(HCMR); see Ref. [40]. Each buoy is equipped with meteorological

1291

sensors for the measurement of temperature, atmospheric pressure, wind speed, direction and gust. The wind measurements are
performed at 3 m height above sea surface with recording period
600 s and frequency 1 Hz, and the measurements are performed
every 3 h. In this work, the buoy wind data consist of long-term
wind speed time series of the mean values of wind speed
measured during the 600 s and are obtained from four buoys
located in the Aegean Sea.
In the Spanish Seas, twelve oceanographic buoys located in deep
water depths provide data from measured parameters similar to
the Greek buoys. The monitoring system operates under the responsibility of the Spanish Port Authority (Puertos del Estado). The
measurements are made at 3 m height above sea surface with a
recording interval of 1 h. The corresponding wind speed time series
cover time periods varying between 5 and 18 years. The buoy wind
data used in this work consist of long-term wind speed time series
obtained from two buoys located in the Mediterranean part of the
Spanish waters.
3.2. NWP model data
The physical and dynamical properties of the atmosphere can be
described by numerical models that are implemented at global or
regional scale. One of the most relevant reanalysis procedures,
producing long time series of consistent analyses, is from the European Centre for Medium-Range Weather Forecasts (ECMWF).
ECMWF reanalysis data sets include ERA-40 (1957e2002) and, the
more recent, ERA-Interim global atmospheric reanalysis, producing
wind data since 1979 on a primary spatial resolution of
0.75  0.75 . Since the resolution of global atmospheric models is,
in general, too scant for wind speed elds to be used directly in the
offshore wind resource assessment, they should be downscaled
using either regional models or statistical approaches. In this study,
the results of a version of the Eta-SKIRON mesoscale meteorological
model have been analysed. Specically, Eta-SKIRON model is used
for the dynamical downscaling approach in the Mediterranean Sea,
with a horizontal grid increment of 0.1  0.1 (about
10 km  10 km) and initial and boundary conditions obtained from
the ECMWF analysis elds. The wind data sets from the atmospheric model refer to 10 m above sea surface. A detailed description regarding the dynamical downscaling procedure of the EtaSKIRON model can be found in Ref. [29].
Let us note that data from NWP models contain inherent uncertainties derived from: i) model errors such as parameterization
errors, discretization error of the grid, inadequate physical processes of the model, and ii) boundary and initial errors (e.g. inaccurate wind input), which can be sources of outliers.
3.3. Statistical analysis of wind data
Wind measurements from four buoys deployed in the Aegean
Sea and two buoys in the Spanish waters are used as reference data
source. Their geographic coordinates along with the examined
measurement periods are presented in Table 1.
Table 1
Names, geographical coordinates and concurrent measurement time periods for the
examined buoys.
Buoy name

Location

Athos
Lesvos
Mykonos
Santorini
Cabo Begur
Cabo de Gata

39 580 N
39 100 N
37 310 N
36 160 N
41 550 N
36 340 N

Recording period
24 430 E
25 490 E
25 280 E
25 300 E
3 390 E
2 190 E

5/2000e12/2004
1/2000e12/2004
1/2000e12/2004
1/2000e12/2004
3/2001e12/2004
1/2000e10/2004

1292

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

Before the regression/calibration analysis, the buoy wind data


were rst checked qualitatively and any missing or clearly erroneous values (such as spikes) were discarded. Then, for comparison
purposes, wind speeds were adjusted to the reference level of 10 m
above sea level using the log-law wind prole. After this adjustment, the collocation in space and time procedure was carried out.
More specically, the four nearest wind data series of the NWP
model were downscaled to the exact location of each buoy by
applying the following weighting interpolation scheme; see e.g.
Ref. [43]:

P4

ui
i1 r 2
i

u P4

1
i1 r 2
i

i 1; 2; 3; 4;

(13)

where u1, u2, u3 and u4 denote the wind speeds at the four grid
points surrounding the buoy location, r1, r2, r3 and r4 denote the
corresponding distances from that location, and u is the sought-for
(model) wind speed. Regarding the temporal collocation, the
common time frame is 3 h (00:00, 03:00, 06:00, etc. UTC).
In Table 2, the results of a primary statistical analysis regarding
the concurrent wind data sets (buoy measurements and NWP
model results) are presented for the examined locations. The statistical parameters depicted in this table are the following: sample
size N, mean value m, standard deviation s, minimum value min,
maximum value max, coefcient of variation CV and coefcient of
determination r2 between buoy measurements and NWP model
results. It should be noted that, despite the collocation procedure,
the sample size remains adequate for performing a statistically
reliable analysis.
From the results of Table 2 the following remarks can be made:
1) Regarding the Greek Seas, the highest values of mean wind
speed and standard deviation (for both data sources) correspond to Mykonos (8.135 ms-1 and 4.176 ms-1, respectively) and
the maximum wind speed corresponds to Lesvos (34.419 ms-1).
The highest value of coefcient of variation is observed for Athos
(74.347 for buoy measurements and 64.309 for the NWP model)
and the smallest one for Mykonos (51.328 for buoy measurements and 48.462 for the NWP model). The highest coefcient
of determination corresponds to Mykonos (0.707) and the
smallest one to Santorini (0.577).
2) For the Spanish Seas, the highest values of the mean wind speed
and standard deviation correspond to Cabo Begur (8.216 ms-1
and 5.847 ms-1, respectively), which are also the highest for all
the examined locations and wind data sources in this work.
3) For all examined locations, the mean value, standard deviation,
maximum value and coefcient of variation of wind speed are
systematically larger for buoy wind data except for the latter

statistic in Cabo de Gata. Evidently, there is a tendency of the


NWP model to underestimate the values of these statistics.
4. Methodology
The main aim of the proposed methodology is to calibrate
concurrent wind speed data from the less accurate source (NWP
model results) using buoy data (which is the reference data source),
through the implementation of a linear calibration procedure.
Furthermore, the performance of the examined regression (calibration) methods is also assessed.
Henceforth, let uM denote the wind speeds obtained from the
NWP model and uB denote the wind speeds obtained from buoy
b M denotes the corrected (calibrated) wind speed
measurements. u
from the NWP model. Rearranging the regression equation,
b M , the correction
uM b0 b1 uB , and assuming that uB u
(calibration) relation for the wind speed from NWP model is:
0
0
bM b
b0b
b 1 uM ;
u

(14)

where

.
.
0
0
b
b 0 b
b 1 and b
b0 b
b11 b
b1

(15)

are the estimates of the calibration parameters and b


b0, b
b 1 are the
estimates of b0 (intercept) and b1 (slope of the regression line),
respectively. This methodology has been adopted by the authors in
a series of publications [41e43]. As mentioned before, due to the
convenient properties of the OLS estimators and the simplicity of
this method, OLS is widely used in relevant applications for the
estimation of b0 and b1.
Along with the OLS method, the robust methods that have been
applied and examined in this work are the following:
 MM-estimation, which is computed as described in
Refs. [24,50]. It uses a bi-square redescending score function and
as initial estimator an S-estimator; see Rousseeuw and Yohai
[36];
 Huber's M-estimation, using IRLS for the tting;
 LTS, using Fast LTS algorithm proposed by Rousseeuw and Van
Driessen [35], with a0.75;
 L1-estimation, using the FrischeNewton interior point method
by Koenker and Ng [23].
The R statistical software (https://www.R-project.org) was
selected for the estimation of the robust regression parameters
(packages
robustbase,
http://CRAN.R-project.org/
packagerobustbase, MASS, Ref [48], and quantreg). Specifically, the regression parameters were estimated according to OLS

Table 2
Basic statistics of wind speed for all the examined sites in the Mediterranean Sea for the concurrent recording periods.
Location

Data source

m (ms1)

s (ms1)

min (ms1)

max (ms1)

CV (%)

r2

Athos

Buoy
Model
Buoy
Model
Buoy
Model
Buoy
Model
Buoy
Model
Buoy
Model

9044

5.379
4.908
7.136
5.749
8.135
6.622
6.779
4.754
8.216
6.704
6.204
5.403

3.999
3.156
4.072
3.053
4.176
3.209
3.650
2.410
5.847
3.969
3.832
3.377

0.118
0.035
0.117
0.098
0.117
0.164
0.118
0.074
0.232
0.175
0.232
0.100

25.472
20.759
34.419
22.922
21.480
17.967
21.342
18.820
27.718
22.501
20.992
18.377

74.347
64.309
57.068
53.111
51.328
48.462
53.841
50.696
71.171
59.198
61.760
62.502

0.670

Lesvos
Mykonos
Santorini
Cabo Begur
Cabo de Gata

11453
8606
11220
2257
8270

0.588
0.707
0.577
0.738
0.704

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

1293

Fig. 1. Schematic representation of the applied methodology corresponding to case C.2.

and the aforementioned robust regression methods, and then by


applying Equation (14), the calibration of the data from the less
reliable source was performed. Hereinafter, we refer only to calibration parameters and procedures, implying that the corresponding regression parameters have been rstly estimated.
The numerical results, presented in the next section, are acquired by using two different data sets: the rst data set (called
estimation set) is the one from which the calibration parameters
are estimated, and the second data set (called evaluation set), is the
one on which the calibration is applied and the evaluation of the
methods is performed. See also Fig. 1, for a schematic representation of the applied methodology.
As is shown in Fig. 1, the estimation data set consists in collocated data from buoy measurements (black solid line) and NWP
model results (black dashed line). From this data set, the parame0
0
ters b
b 0 and b
b 1 for the calibration of the NWP model results are
estimated. These parameters are used for acquiring the calibrated
NWP model results (red solid line) of the evaluation data set and
then the calibrated NWP model data are compared with the buoy
data of the evaluation set.
The following special cases are examined in detail:
1) Full data sample analysis: This is the most fundamental case
(called hereafter case C.1), where the estimation and the evaluation data sets are the same and refer to the entire available
time period. Specically, OLS and robust methods are applied
using the full concurrent available wind data sets in order to
calibrate (correct) wind speed data from the less accurate
source. The evaluation of the performance of OLS and robust
methods is made using the same data set.
2) Partial data sample analysis: The estimation and the evaluation
data sets are different and non-overlapping, i.e. the estimation
of the calibration parameters is made using part of the available
data sample and the calibration and evaluation of the methods
is made based on the remaining part of the available data
sample. Specically, the estimation data set refers to wind data
corresponding to the rst year of the available time period and
the evaluation data set refers to data corresponding to the

remaining time period (case C.2).5 It is evident that the unused


wind data sample is considered to contain new (fresh) wind
data, rendering this sample more realistic and the methodology
tighter for the assessment of the calibration relations.
In order to evaluate the performance of the examined regression/calibration methods, the calibrated NWP model wind speeds
from robust methods and OLS are compared to the corresponding
measured wind speeds obtained from the buoys through the
following statistical measures:the bias (BIAS, ms1):

PN

i1

BIAS

b M;i  uB;i
u

(16)

the root mean square error (RMSE, ms1):

RMSE

v
u

2
uPN
u
b
u

u
B;i
M;i
t i1
N

(17)

the mean absolute error (MAE, ms1):

MAE




PN 

b
u

u
M;i
B;i
i1 

N

(18)

and the scatter index (SI):

5
The analysis in case C.2 resembles the concept of measureecorrelateepredict
(MCP) methods in wind speed forecasting; see Refs. [5,25,32]. In MCP methods, the
available short-term wind data series in a candidate area are extrapolated and
extended in time from the closest meteorological stations that are used as reference
stations. The usual time period for the available reference data is one year.

1294

RMSE
SI
;
uB

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

(19)

b M;i is the ith calibrated wind


where uB,i is the ith buoy wind speed, u
speed from the NWP model and uB is the mean wind speed from
the buoy. Clearly, the above-dened statistical measures are proportional (in various forms) to the error (difference) between the
corrected wind speed from the NWP model and the measured wind
speed from the buoys. Specically, MAE takes into consideration the
sum of the absolute errors, RMSE and SI the sum of the errors
squared, while BIAS the positive and negative value of this difference. Though, it should be noted that RMSE (and SI) is more sensitive to the presence of outliers than MAE, since large errors are
biased towards outliers; see also [19]. Let us also note that MAE,
RMSE and SI comprise stricter and more realistic control criteria

than BIAS, since the latter neutralizes the foregoing differences by


denition, meaning that positive differences can be offset by
negative ones. Finally, the quality of a calibration performance is
characterized as good, if the values of the applied statistics are as
close as possible to zero.
5. Numerical results
5.1. Estimation of regression/calibration parameters
The regression lines obtained from all examined methods (OLS
method, MM-estimation, Huber's M-estimation, LTS and L1-estimation) for the entire samples (i.e. case C.1) along with the corresponding density scatter plot are shown in Fig. 2 for the Greek and
Spanish Seas. The colour gradation indicates the density (percentage) of the data points falling within each square, where the red

Fig. 2. Scatter plot (with colour-scale indicating density) and regression lines of wind speeds obtained for case C.1 for all the applied methods at: (a) Athos, (b) Lesvos, (c) Mykonos,
(d) Santorini, (e) Cabo Begur and (f) Cabo de Gata. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article.)

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

and blue tone denotes the maximum and minimum frequency of


appearance, respectively.
It is noticed that, in general, the OLS line appears to be relatively
further off the diagonal compared to the lines obtained from robust
methods. In addition, the slope of the regression line b
b 1 is systematically lower for OLS compared to robust methods in all
examined cases indicating the lower variance of the predicted
values of model data. It is also easily observed that robust methods
provided values of slope and intercept mutually close, compared to
the values provided by OLS.
0
Using relation (15), the corresponding calibration parameters b
b0
0
and b
b 1 for the NWP model wind data are estimated. The values of
0
0
b
b 0 and b
b 1 , for each method and both cases, are presented in Table 3
for the examined Greek and Spanish locations.
From this table the following conclusions can be drawn:
0
0
i) Parameter b
b 0 is always negative and parameter b
b 1 is always
positive for all methods, locations and cases examined. This
behaviour is due to the fact that the NWP model tends to
underestimate wind speed measured from buoys, as was
mentioned in subsection 3.3;
ii) the estimated parameters from the pairs of MM and M-H
methods, and LTS and L1 methods are fairly close to each
other in the majority of the locations and cases examined;
0
0
iii) the values of parameters b
b 0 and b
b 1 of the OLS method are
consistently lower and higher, respectively, compared to the
corresponding parameters of the robust methods. Exceptions
0
0
of this behaviour are b
b 0 in Cabo Begur and b
b 1 in Santorini
(both for case C.1);
iv) in general, LTS and L1 methods provide the smallest values of
0
b
b 1 parameter for the majority of the examined locations.

5.2. Evaluation of the calibration methods


The evaluation of the performance of the examined calibration
methods is made by estimating the statistical measures provided in
relations (16)e(19).
In case C.1, the estimation of the calibration parameters is made
by utilizing the entire concurrent data samples of buoy measurements and NWP model results. In Table 4, the obtained values of the
applied statistics are presented at the examined locations for all
regression/calibration methods. The optimum value of the statistics
for each examined case is shown in boldface letters.
The most important conclusions that can be drawn from the
obtained results are the following:

Table 3
Estimated calibration parameters obtained from each applied method for cases C.1
and C.2 for all examined locations.
Location

Athos

Lesvos

Mykonos

Santorini

Cabo Begur

Cabo de Gata

In case C.2, the estimation of the calibration parameters was


made by utilizing only the rst year data from the concurrent data
samples of buoy measurements and NWP model results. After
applying the calibration, the evaluation of the obtained results was
made on the remaining time period data. In Table 5, the obtained
values of the applied statistics are presented for all examined locations and regression methods. Again, the optimum value of the
statistics for each examined case is shown in boldface letters.

Method

OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1

0
b
b0

0
b
b1

C.1

C.2

C.1

C.2

2.220
1.980
2.035
1.857
1.913
2.865
2.059
2.241
1.806
1.849
2.113
1.867
1.900
1.750
1.888
2.702
2.610
2.644
2.660
2.555
3.285
3.273
3.279
3.300
3.231
1.104
0.837
0.883
0.700
0.676

2.160
1.871
1.949
1.692
1.840
2.087
1.771
1.804
1.634
1.551
2.753
2.497
2.539
2.349
2.473
2.667
2.593
2.611
2.572
2.430
3.338
3.217
3.262
3.177
3.289
0.971
0.738
0.774
0.646
0.579

1.548
1.499
1.512
1.471
1.491
1.740
1.622
1.649
1.587
1.589
1.548
1.511
1.517
1.494
1.516
1.994
1.993
1.997
2.015
1.991
1.715
1.703
1.705
1.697
1.701
1.353
1.304
1.313
1.280
1.278

1.631
1.556
1.577
1.508
1.550
1.657
1.604
1.610
1.581
1.559
1.666
1.627
1.635
1.608
1.631
2.037
2.032
2.035
2.035
1.997
1.776
1.731
1.743
1.712
1.743
1.307
1.265
1.272
1.248
1.242

Table 4
Statistics of calibration equations based on ordinary least squares (OLS), MMestimate (MM), M-estimate by Huber (M-H), least trimmed squares (LTS) and L1estimate for all the examined locations for case C.1.
Location

Method

BIAS

RMSE

MAE

SI

Athos

OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1

0.297
0.231
0.256
0.191
0.243
0.347
0.334
0.331
0.338
0.315
0.078
0.067
0.073
0.062
0.083
0.190
0.260
0.247
0.321
0.298
0.292
0.214
0.225
0.150
0.246
0.135
0.089
0.100
0.071
0.082

2.744
2.668
2.687
2.630
2.660
3.317
3.152
3.188
3.107
3.108
2.664
2.607
2.615
2.583
2.615
3.046
3.056
3.058
3.091
3.057
3.389
3.362
3.367
3.350
3.362
2.455
2.381
2.394
2.347
2.345

2.081
2.022
2.037
1.992
2.016
2.501
2.361
2.393
2.323
2.325
2.064
2.019
2.025
2.000
2.025
2.300
2.305
2.307
2.329
2.305
2.576
2.554
2.558
2.545
2.555
1.872
1.811
1.822
1.783
1.781

0.480
0.472
0.474
0.467
0.472
0.441
0.427
0.431
0.424
0.423
0.322
0.316
0.317
0.313
0.317
0.437
0.440
0.440
0.445
0.441
0.394
0.390
0.391
0.389
0.390
0.383
0.376
0.378
0.373
0.373

Lesvos

Mykonos

 MAE and RMSE (along with SI), which are statistic measures
quantifying the absolute and squared difference between corrected and measured wind speeds, respectively, were systematically lower for LTS and L1 methods for all examined locations
except for Santorini.
 From the overall combination of 6 locations with 4 statistical
criteria (i.e. in total 24 outcomes), LTS performed better for 15
out of the 24 outcomes and L1-estimator for 5 outcomes. OLS
performed better 4 times out of 24.

1295

Santorini

Cabo Begur

Cabo de Gata

1296

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

Table 5
Statistics of calibration equations based on ordinary least squares (OLS), MMestimate (MM), M-estimate by Huber (M-H), least trimmed squares (LTS) and L1estimate for all the examined locations for case C.2
Location

Method

BIAS

RMSE

MAE

SI

Athos

OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1

0.975
0.818
0.868
0.719
0.814
0.587
0.550
0.559
0.530
0.475
0.347
0.325
0.338
0.325
0.370
0.510
0.556
0.552
0.588
0.533
0.812
0.620
0.661
0.529
0.636
0.061
0.094
0.086
0.099
0.083

3.089
2.918
2.966
2.816
2.909
3.305
3.222
3.234
3.187
3.146
2.865
2.793
2.807
2.758
2.806
3.158
3.162
3.166
3.172
3.107
3.542
3.414
3.445
3.363
3.441
2.358
2.304
2.311
2.283
2.279

2.358
2.229
2.266
2.151
2.223
2.451
2.382
2.392
2.353
2.320
2.226
2.169
2.180
2.141
2.178
2.366
2.368
2.370
2.374
2.329
2.715
2.617
2.640
2.577
2.637
1.786
1.743
1.749
1.726
1.723

0.551
0.526
0.533
0.510
0.525
0.449
0.441
0.442
0.437
0.433
0.350
0.343
0.344
0.339
0.345
0.461
0.462
0.462
0.463
0.455
0.441
0.425
0.429
0.419
0.428
0.379
0.374
0.374
0.372
0.373

Lesvos

Mykonos

Santorini

Cabo Begur

Cabo de Gata

The most important conclusions that can be drawn from the


obtained results are the following:
 The best values for RMSE, MAE and SI are obtained from LTS and
L1-estimator robust methods at all the examined locations.
 Each of LTS and L1-estimator methods performed better 12 and
9 times, respectively, while 1 time LTS and MM-estimator performed equally best. In total, together LTS and L1-estimator
performed better in 22 out of 24 total outcomes. Furthermore,
OLS performed well only 2 times.

b M . The quality of the calibration procedure described in the


by P
foregoing sections can be additionally cross-examined by evalubM .
ating the (absolute) relative error between P B and P
The obtained results for relative errors with respect to all locations and regression (calibration) methods examined are shown in
Table 6. The minimum values of this quantity for each examined
case and method is shown in boldface letters.
It is clear from the above results that LTS and L1-estimator
methods perform systematically better than OLS (apart from case
C.1 at Santorini). It is also worth mentioning that OLS method
provides the largest relative error for all cases and locations apart
from Santorini. Let us note that, for case C.1, the relative error obtained for the non-calibrated data is signicantly reduced by all
calibration methods (results are not shown here). However, even
after the application of the calibration procedures, the relative error
remains still relatively large. Moreover, the relative error obtained
for case C.2 is large and may be considered as unacceptable; this
suggests that one-year data may be inappropriate for forecasting
purposes as regards the available offshore wind power potential. All
these issues reveal the need for an in-depth assessment of the less
reliable data sources in wind energy related applications. The same
conclusion was obtained in Soukissian and Papadopoulos [43]. See
also the interesting discussion of subsection 2.3 in Ref. [5].
6. Conclusions
Wind data from various sources quite often include erroneous
observations that either can remain unnoticed or hidden in classical
regression analysis or can be excluded from further assessment on
the basis of some diagnostic tools. In either case, the results of the
analysis may be highly misleading, since the presence of outliers or
their false rejection can seriously affect the regression procedure
(parameter estimation) and, consequently, the calibration results.

Table 6
Relative errors (%) of mean wind power density based on ordinary least squares
(OLS), MM-estimate (MM), M-estimate by Huber (M-H), least trimmed squares (LTS)
and L1-estimate for all the examined locations for cases C.1 and C.2
Location

Method

Case C.1

Case C.2

Athos

OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1
OLS
MM
M-H
LTS
L1

40.18
32.21
34.57
27.76
31.84
38.37
29.57
31.53
27.29
26.71
21.87
18.80
19.36
17.50
19.63
45.71
48.41
48.18
52.60
49.67
28.54
25.43
25.95
23.45
25.99
25.84
19.76
21.03
16.91
17.12

82.01
64.61
69.74
54.06
63.78
40.74
35.20
36.09
32.86
29.25
45.39
41.51
42.51
39.98
43.38
65.40
66.86
67.01
68.41
62.58
57.84
47.13
49.64
42.41
48.90
13.21
8.39
9.23
6.66
6.91

Lesvos

5.3. Evaluation of the calibration methods on wind energy


estimation
In this subsection, we assess the effects that the different
regression (calibration) methods have on the estimation of the
mean wind power density P. The mean (long-term) wind power
density P in a specic sea area can be directly obtained, if a sufciently long time series of observed wind speeds is available,
through the following relation:

N
1 X
ru3 ;
2N i1 i

Mykonos

Santorini

(20)

where N is the sample size and ui, i1,2,,N, is the observed wind
speed time series.
P can be directly estimated by utilizing the wind speed time
series uB,i, i1,2,,N, obtained from buoys. This estimate is denoted
by P B and is considered as the reference value. Moreover, P can be
also estimated by using the calibrated results of the NWP model, i.e.
b M;i , i1,2,,N; this estimate is denoted
the wind speed time series u

Cabo Begur

Cabo de Gata

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

Therefore, before proceeding to any analysis, the identication of


outliers is an important, but rather delicate, procedure.
In this work, robust estimators were analytically described and
applied for correcting wind speed measurements from less reliable
data sources with reference to in situ measurements. We discussed
that such measurements are prone to the presence of outliers and
inuential observations and as a consequence, the obtained results
can be fallacious to derive decisions in wind speed assessment. Two
different types of concurrent wind data sets referring to six offshore
locations across the Mediterranean Sea were used: wind speed
time series obtained from buoys and wind speed time series obtained from a high-resolution NWP model results covering various
recording periods. The primary statistical analysis showed that
wind speed is, in the mean, underestimated for the examined atmospheric model compared to buoys.
The evaluation of each robust method, along with the traditional
OLS approach, was made by applying the regression and calibration
procedure for different time periods (and consequently, different
data samples). Using the entire available data sample (i.e. case C.1
above), the regression (and calibration) coefcients were estimated. In the other examined case (i.e. case C.2 above), an alternative methodology was applied for a more realistic and practical
evaluation, where the regression (and calibration) relations were
assessed from unused wind measurements. Since there is not a
unique statistical criterion for evaluating the performance of the
examined calibration methods, several different statistical criteria
were applied.
The obtained results from the evaluation procedures revealed
that least trimmed squares (and secondarily, L1-estimator) method
performed systematically better for each examined case and for all
locations than the rest methods. OLS method seemed to give rarely
better results. Furthermore, the validity and performance of the
regression/calibration methods was tested in the estimation of the
mean wind power density. This assessment conrmed the results
already obtained from the evaluation of wind speed. Specically, it
was found that least trimmed squares and L1-estimator methods
performed systematically better than OLS, while the latter provided
always the greatest relative estimation error.
Overall, robust statistics can provide a means for dealing efciently with outliers in wind data samples in a theoretically justied way, while their implementation to wind energy assessment
proved to give better results than classical OLS method. Least
trimmed squares and L1-estimator methods are characterized by
their appealing denition and computability and can provide
reasonable results, even if the outliers in the examined sample are
numerous. Furthermore, another straightforward and efcient
approach is to detect unusual wind speed data, with an emphasis
on bad leverage points that have large residuals, through least
trimmed squares analysis and then perform OLS regression without
(all or a part of) these observations. Therefore, it is suggested that
the use of robust methods should be seriously considered in wind
energy related applications, since their effectiveness with samples
containing outliers is indubitable.
Acknowledgements
1) The research leading to these results has received funding
from the European Community's Seventh Framework Programme
(FP7/2007-2013) under Grant Agreement No. 287844 for the
project Towards COast to Coast NETworks of marine protected
areas (from the shore to the high and deep sea), coupled with seabased wind energy potential (COCONET).
2) This research has been also funded from the Greek General
Secretariat for Research and Technology and the European Regional
Development Fund under Grant Agreement no. 09SYN-32-598 for

1297

the project National programme for the utilization of offshore


wind potential in the Aegean Sea: preparatory actions (AVRA).
References
[1] C.C. Aggarwal, Outlier Analysis, Springer, New York, 2013.
[2] I. Antoniou, H.E. Jrgensen, T. Mikkelsen, S. Frandsen, R.J. Barthelmie,
C. Perstrup, M. Hurtig, Offshore wind prole measurements from remote
sensing instruments, in: Proceedings of the EWEA European Wind Energy
Conference & Exhibition, Athens, GR, 27 Februarye2, March 2006.
[3] V. Barnett, T. Lewis, Outliers in Statistical Data, Wiley, Chichester, 1980.
[4] R.J. Barthelmie, L. Folkerts, F. Ormel, P. Sanderhoff, P. Eecen, O. Stobbe,
N.M. Nielsen, Offshore wind turbine wakes measured by SODAR, J. Atmos.
Ocean. Technol. 20 (4) (2003) 466e477, http://dx.doi.org/10.1175/15200426(2003)20<466:OWTWMB>2.0.CO;2.
zquez, P. Cabrera, A review of measure-correlate-predict
[5] J.A. Carta, S. Vela
(MCP) methods used to estimate long-term wind characteristics at a target
site, Renew. Sustain Energy Rev. 27 (2013) 362e400, http://dx.doi.org/
10.1016/j.rser.2013.07.004.
[6] S. Chatterjee, A.S. Hadi, Inuential observations, high leverage points, and
outliers in linear regression, Stat. Sci. 1 (3) (1986) 379e393, http://dx.doi.org/
10.1214/ss/1177013622.
[7] C. Croux, P.J. Rousseeuw, O. Hossjer, Generalized S-Estimators, J. Amer. Stat.
Assoc.
89
(428)
(1994)
1271e1281,
http://dx.doi.org/10.1080/
01621459.1994.10476867.
[8] D.L. Donoho, P.J. Huber, The Notion of Breakdown Point in a Festschrift for
Erich L. Lehmann, Wadsworth, Belmont (CA), 1983.
[9] D.B. Gilhousen, A eld evaluation of NDBC moored buoy winds, J. Atmos.
Ocean. Technol. 4 (1) (1987) 94e104, http://dx.doi.org/10.1175/15200426(1987)004<0094:AFEONM>2.0.CO;2.
[10] J.F.R. Gower, Intercalibration of wave and wind data from TOPEX/Poseidon
and moored buoys of the west coast of Canada, J. Geophys Res. 101 (C2)
(1996) 3817e3829, http://dx.doi.org/10.1029/95JC03281.
[11] F.R. Hampel, A general qualitative denition of robustness, Ann. Math. Stat. 42
(6) (1971) 1887e1896, http://dx.doi.org/10.1214/aoms/1177693054.
[12] F.R. Hampel, The inuence curve and its role in robust estimation, J. Amer Stat.
Assoc.
69
(346)
(1974)
383e393,
http://dx.doi.org/10.1080/
01621459.1974.10482962.
[13] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel, Robust Statistics, the
Approach Based on Inuence Functions, Wiley, New York, 1986.
[14] S. Heritier, E. Cantoni, S. Copt, M.P. Victoria-Feser, Robust Methods in
Biostatistics, Wiley, Chichester, 2009.
[15] P.J. Huber, Robust estimation of a location parameter, Ann. Math. Stat. 35
(1964) 73e101, http://dx.doi.org/10.1214/aoms/1177703732.
[16] P.J. Huber, The behavior of maximum likelihood estimates under nonstandard
conditions, in: Proceedings of the 5th Berkeley Symposium on Mathematical
Statistics and Probability, University of California Press, Berkeley, CA, 1967.
[17] P.J. Huber, Robust regression: asymptotics, conjectures and monte carlo, Ann.
Stat. 1 (1973) 799e821, http://dx.doi.org/10.1214/aos/1176342503.
[18] P.J. Huber, Robust Statistics, Wiley, New York, 1981.
[19] R.J. Hyndman, A.B. Koehler, Another look at measures of forecast accuracy, Int.
J.
Forecast
22
(4)
(2006)
679e688,
http://dx.doi.org/10.1016/
j.ijforecast.2006.03.001.
[20] L.A. Jaeckel, Estimating regression coefcients by minimizing the dispersion of
the residuals, Ann. Math. Stat. 43 (5) (1972) 1449e1458, http://dx.doi.org/
10.1214/aoms/1177692377.
[21] Y.M. Kantar, I. Usta, I. Arik, The method of medians to estimate Weibull parameters for wind, in: Proceeding Science and Engineering Symposium 4th
International Science, Social Science, Engineering and Energy Conference ISEEC, 2012.
[22] R. Koenker, G. Bassett, Regression quantiles, Econometrica 46 (1) (1978)
33e50, http://dx.doi.org/10.2307/1913643.
[23] R. Koenker, P. Ng, A frisch-newton algorithm for sparse quantile regression,
J. Acta Math. Appl. Sin. 21 (2) (2005) 225e236, http://dx.doi.org/10.1007/
s10255-005-0231-1.
[24] M. Koller, W.A. Stahel, Sharpening Wald-type inference in robust regression
for small samples, Comput. Statist Data Anal 55 (8) (2011) 2504e2515, http://
dx.doi.org/10.1016/j.csda.2011.02.014.
[25] M.A. Lackner, A.L. Rogers, J.F. Manwell, Uncertainty analysis in MCP-based
wind resource assessment and energy production estimation, J. Sol. Energy
Eng. 130 (3) (2008) 031006e031006-10, http://dx.doi.org/10.1115/
1.2931499.
[26] A. Llombart, C. Pueyo, J.M. Fandos, J.J. Guerrero, Robust data ltering in wind
power systems, in: Proceedings of the European Wind Energy Conference &
Exhibition, Athens, GR, 27 Februarye2, March 2006.
[27] C.L. Mallows, Inuence functions, in: Conference on Robust Regression, National Bureau for Economic Research, Cambridge, MA, 1973.
[28] C.L. Mallows, On Some Topics in Robustness, Technical Memorandum, Bell
Laboratories, Murray Hill, NJ, 1975.
[29] A. Papadopoulos, P. Katsafados, G. Kallos, S. Nickovic, The weather forecasting
system for poseidon e an overview, J. Atmos. Ocean. Sci. 8 (2e3) (2002)
219e237, http://dx.doi.org/10.1080/1023673029000003543.
[30] J.O. Rawlings, S.G. Pantula, D.D. Dickey, Applied Regression Analysis, a

1298

T.H. Soukissian, F.E. Karathanasi / Renewable Energy 99 (2016) 1287e1298

Research Tool, second ed., Springer, 1998.


[31] N.R. Draper, H. Smith, Applied Regression Analysis, third ed., Wiley Series in
Probability and Statistics, 1998.
[32] A.L. Rogers, J.W. Rogers, J.F. Manwell, Comparison of the performance of four
measureecorrelateepredict algorithms, J. Wind Eng. Ind. Aerodyn. 93 (2005)
243e264, http://dx.doi.org/10.1016/j.jweia.2004.12.002.
[33] P.J. Rousseeuw, Least median of squares regression, J. Amer Stat. Assoc. 79
(388) (1984) 871e880, http://dx.doi.org/10.1080/01621459.1984.10477105.
[34] P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, Wiley,
New York, 1987.
[35] P.J. Rousseeuw, K. Van Driessen, A fast algorithm for the minimum covariance
determinant estimator, Technometrics 41 (3) (1999) 212e223, http://
dx.doi.org/10.1080/00401706.1999.10485670.
[36] P.J. Rousseeuw, V. Yohai, Robust Regression by Means of S-estimators. Robust
and Nonlinear Time Series Analysis, Springer, New York, 1984. Lecture Notes
in Statistics, 26.
[37] T.P. Ryan, Modern Regression Methods, Wiley, New York, 1997.
[38] E. Sainz, A. Llombart, J.J. Guerrero, Robust ltering for the characterization of
wind turbines: improving its operation and maintenance, Energy Convers.
Manag.
50
(9)
(2009)
2136e2147,
http://dx.doi.org/10.1016/
j.enconman.2009.04.036.
[39] D.A. Smith, M. Harris, A.S. Coffey, T. Mikkelsen, H.E. Jrgensen, J. Mann,
R. Danielian, Wind LIDAR evaluation at the Danish wind test site in Hvsre,
Wind Energy 9 (2006) 87e93, http://dx.doi.org/10.1002/we.193.
[40] T.H. Soukissian, GTh Chronis, POSEIDON: marine environmental, monitoring,
forecasting and information system for the Greek Seas, Mediter Mar. Sci. 1 (1)
(2000) 71e78, http://dx.doi.org/10.12681/mms.12.
[41] T.H. Soukissian, F.E. Karathanasi, E.G. Voukouvalas, Effect of outliers in wind
speed assessment, in: Proceedings of the Twenty-fourth International Ocean

[42]

[43]

[44]
[45]

[46]

[47]
[48]
[49]
[50]

[51]

and Polar Engineering Conference Busan, Korea, June 15e20, 2014. ISBN 9781 880653 91-3 (Set); ISSN1098-6189 (Set).
T. Soukissian, C. Kechris, About applying linear structural method on ocean
data: adjustment of satellite wave data, Ocean. Eng. 34 (3e4) (2007) 371e389,
http://dx.doi.org/10.1016/j.oceaneng.2006.04.002.
T.. Soukissian, A. Papadopoulos, Effects of different wind data sources in
offshore wind power assessment, Renew. Energy 77 (2015) 101e114, http://
dx.doi.org/10.1016/j.renene.2014.12.009.
R.G. Staudte, S.J. Sheather, Robust Estimation and Testing, Wiley, New York,
1990.
P.K. Taylor, E. Dunlap, F.W. Dobson, R.J. Anderson, V.R. Swail, On the accuracy
of wind and wave measurements from buoys, in: Data Buoy Cooperation
Panel (DBCP) Technical Documents No. 21, 2002.
J.W. Tukey, A Survey of Sampling from Contaminated Distributions, Contributions to Probability and Statistics, Stanford University Press, Stanford (CA),
1960.
J.W. Tukey, The future of data analysis, Ann. Math. Stat. 33 (1) (1962) 1e67,
http://dx.doi.org/10.1214/aoms/1177704711.
W.N. Venables, B.D. Ripley, Modern Applied Statistics with S, Springer, New
York, 2002.
R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, Elsevier,
2012.
V.J. Yohai, High breakdown-point and high efciency robust estimates for
regression, Ann. Stat. 15 (2) (1987) 642e656, http://dx.doi.org/10.1214/aos/
1176350366.
A. Zaman, P.J. Rousseeuw, M. Orhan, Econometric applications of highbreakdown robust regression techniques, Econ. Lett. 71 (1) (2001) 1e8,
http://dx.doi.org/10.1016/S0165-1765(00)00404-3.

You might also like