You are on page 1of 21

2010 10th International Conference on Quality Software

Software Quality Prediction Models Compared


Ru diger inc!e" #o$ias %ut&mann and 'elf o
we School of Computer Science" Physics and
Mathematics innaeus (ni)ersity
*+1,+ -a./o "
Sweden
0mail1 {rudiger2linc!e|to$ias2gut&mann|welf2lowe}3lnu2se
4$stract56umerous empirical studies confirm that many soft7
ware metrics aggregated in software 8uality prediction models
are )alid predictors for 8ualities of general interest li!e maintain7
a$ility and correctness2 0)en these general 8uality models differ
8uite a $it" which raises the 8uestion1 9o the differences matter:
#he goal of our study is to answer this 8uestion for a selection of
8uality models that ha)e pre)iously $een pu$lished in empirical
studies2 'e compare these 8uality models statistically $y applying
them to the same set of software systems" i2e2" to altogether *2;
)ersions of 11 open7source software systems2 <inally" we draw
conclusions from 8uality assessment using the different 8uality
models" i2e2" we calculate a 8uality trend and compare these
conclusions statistically2 'e identify significant differences among
the 8uality models2 =ence" the selection of the 8uality model has
influence on the 8uality assessment of software $ased on software
metrics2
I2 I6 #R> 9 (C #I > 6
Software analysis and 8uality assessment as supported $y
product metrics
1
recei)e 8uite some attention $y the research
community2 0fforts are directed to $oth the de)elopment of
?o$/ect7oriented@ product metrics A*BCA+B and on their )alida7
tion ADBCA10B2 #he latter is particularly important since metrics
are of little )alue $y themsel)es unless there is empirical
e)idence that they are correlated with important e.ternal
?8uality@ attri$utes A11B2 Such a correlation allows using the
metrics in the assessment and prediction of software 8uality"
which is input to 8uality7control and 7management" and to
general planning acti)ities2
4s discussed $y Eriand A12B" <enton A1*B" and Fitchen7
ham A1GB" we distinguish two types of )alidation1 theoretical
and empirical )alidation2 #heoretical )alidation ensures that
a product metric is a proper numerical characteri&ation of the
property it claims to measure2 0mpirical )alidation demon7
strates that a product metric is associated with some important
8uality attri$utes" e2g2" correctness or maintaina$ility2
#heoretical )alidations ha)e shown that certain program
constructs ha)e a causal relationship with some 8ualities A1+BC
A1HB2 #he current theoretical framewor! for e.plaining the
effect of the structural properties of o$/ect7oriented programs
on 8uality attri$utes has $een /ustified empirically A1;B2 Most
studies agree that highly cohesi)e" sparsely coupled" and low
inheritance programs are less li!ely to contain faults and
1
iterature distinguishes $etween the notions ImetricI and ImeasureI A1B2
'e use ImetricI to $e consistent with IS> ,12D A2B" which defines a Isoftware
8uality metricI as a I8uantitati)e scale and method which can $e used to
determine the )alue a feature ta!es for a specific software product2I
1++07D002J10 K2D200 L 2010 I000
9>I 102110,JQSIC220102,
are easier to maintain2 #he empirical )alidation of o$/ect7
oriented product metrics A1;BCA20B shows e)idence for the
predicti)e )alidity of many product metrics2 4s a matter of
fact" the metrics proposed in the Chidamer and Femerer
metrics suite A*B are e)en integrated in industrial strength
software de)elopment tools li!e Rational Rose
2
and #ogether
*
2
0.isting empirical )alidation studies collect product metrics
as independent )aria$les and aim at predicting either software
faults or maintenance costs as dependent )aria$les $y applying
a )ariety of so7called software prediction models2 4 software
8uality prediction model" in short a 8uality model" maps
metrics )alues to a 8uality attri$ute2 9epending on the product"
company" $ranch" customer" etc2" different 8uality attri$utes
might $e interesting" $ut some of them are generally positi)e"
e2g2" few faults or low maintenance costs2 #he )alidations aim
at predicting such positi)e 8uality attri$utes with the help of
product metrics and 8uality models2 <or statistical )alidations"
the 8uality attri$utes need to $e assessed 8uantitati)ely as
well" independently of the metrics and the 8uality models2
#herefore" the num$er of software faults is deri)ed from tests
or $ug data$ases A;BCA10B" A1;B" A21B2
#he maintenance costs are more difficult to determine and
usually appro.imated through maintenance effort2 #his effort
is measured $y means of the time spent on performing a
maintenance tas! ADB" A22B" the changes performed A21B" A2*B"
A2GB" or the maintaina$ility inde. ?MI@ A2+B2 #he changes made
in the code is in most cases appro.imated $y num$er of lines
of code changed2 MI is a statistically )alidated 8uality model
itself that is $ased on )arious product metrics
G
2 =ence it is
considered trustworthy2
4s a conse8uence of the )arying e.perimental setups of the
)alidation studies" literature suggests 8uite a large num$er of
different 8uality models $ased on different product metrics2
4ll of them are )alidated to assess and predict the one or the
other general notion of software 8uality" $ut they are )alidated
using different sample data and dependent and independent
)aria$les2 #his ma!es it difficult for $oth researchers and
practitioners to decide which 8uality model is trustworthy2 It
2
http1JJwww7 012i$m2comJsoft w areJ a wdtoolsJd e ) eloperJroseJ
*
http1JJww w 2$orland2comJusJproductsJtogetherJind e .2html
G
MI com$ines se)eral metrics including1 the a)g2 =alstead )olume
per module" the a)g2 e.tended cyclomatic comple.ity per module" the a)g2
lines of code per module" and the a)g2 percent of lines of comments per
module2 =owe)er" a correlation of this metric and the actual Mmaintaina$ilityI
has $een shown in se)eral studies A2+B2
;2
;*
is not e)en !nown if these models differ in their assessments
and the resulting conclusions2
#his study answers these 8uestions for a num$er of selected
8uality models2 #he remainder of this paper has the following
structure1 Section II discusses the $ac!ground of our )alidation
study2 Section III summari&es the design of the e.periment2
Section I- discusses data collection and measurement" e)al7
uation" and analysis2 Section - concludes this paper and
presents future wor!2 #he appendi. pro)ides complementary
information regarding the selection of one of the in)estigated
8uality models2
II2 E4CF % R>( 6
9
0urocontrol de)eloped" together with its partners" a high
le)el design of an integrated 4ir #raffic Management ?4#M@
system across all 0C4C States
+
2 It was planned to supersede
the current collection of indi)idual national systems2 #he sys7
tem architecture" called >)erall 4#MJC6S #arget 4rchitecture
?>4#4@" is a (M specification2 4s e.ternal consultants"
we supported the structural assessment of the architecture
using a metrics7$ased approach using our software metrics
tool -i&&4naly&er
D
2 #he pilot )alidation focused only on a
su$system" consisting of ; modules and H0 classes" of the
complete architecture2
'e /ointly defined the set of product metrics which 8uantify
the architecture 8uality C a su$set of the (M specification
?$asically class and se8uence diagrams@2 Since no $est prac7
tice e.isted" we defined our own software 8uality prediction
model $ased on IS> ,12D2 'e used the <actor7Criteria7Metric
approach of McCall A2DB $ut defined our own 8uantitati)e
relationship $etween metrics" 8uality attri$utes" and 8uality
factors" mainly $ased on our intuition2 9uring this definition
process" we had to choose from se)eral e8ually intuiti)e
)ariants and we neither had the time nor the resources to
e)aluate all of them2 9uring e)aluation of our assessment"
0urocontrol raised two 8uestions which we could not answer
in a satisfactory way1
Q1 A)alidity of the 8uality modelB 9o the suggested
and different alternati)e software 8uality prediction models
calculate compara$le results for the same input:
Q2 A)alidity of the conclusionsB If not" does this matter:
More specifically" do these differences lead to different con7
clusions:
In the following" we descri$e the e.periment and conclu7
sions that aim at answering these 8uestions2 'e use a larger
statistical $asis than the >4#4 pro/ect could pro)ide2
III2 0N P0 RI M 06 # 90 SI
% 6
42 0.periment 9efinition
'e define our e.periment to analy&e different software
8uality prediction models ?Section III7C@2 #he purpose is
to find out differences in the prediction functions and in
the conclusions drawn2 'e ta!e the point of )iew of the
+
0uropean Ci)il 4)iation ConferenceO an intergo)ernmental
organi&ation with ore than G0 0uropean states2
D
http1JJww w 2arisa2se
practitioner in software 8uality management assessing se)eral
)ersions of the same pro/ect2
E2 Planning the 0.periment
1@ Conte.t Selection1 #he en)ironment in which the e.per7
iment is e.ecuted is open7source Pa)a pro/ects2 %enerali&ation
to other pro/ects will $e discussed in Section III79 as a threat
to the e.periment2 #wo of the authors of this paper conduct
the e.periment2 #he e.periment addresses a pro$lem o$ser)ed
in practice2
2@ =ypothesis <ormulation1 'e want to !now if different
software 8uality prediction models QM pro)ide different
assessments of software 8uality attri$utes Q
0
when applying
them to the same system?s@2 <urthermore" to compare con7
clusions regarding a 8uality trend" we assess se)eral )ersions
of the same test systems2 #he typical approach for e)aluating
software 8uality in o$/ect7oriented systems is to assess 8uality
on class le)el with the help of a num$er of software 8uality
metrics" and then to aggregate the different metrics )alues of
different classes to one )alue on system le)el2
<irst" we apply software metrics to a software system and
calculate a specific metrics )alue for each class in the system
and each metric of the 8uality model2 et C
i"/"!
denote a
class ! in a )ersion / of a software system i2 M
l
?C
i"/"!
@ is
the )alue for metric l of class C
i"/"!
2 Considering different
metrics" I software systems each in P
i
different )ersions each
in turn containing F
i"/
classes" results in a huge amount of
information2 #o reduce it" different software 8uality prediction
models QM
n
aggregate some or all )alues M
l
?C
i"/"!
@ to
a 8uality Q
n
?C
i"/"!
@ per class C
i"/"!
" and summari&e Q
n
?C
i"/"!
@ for all classes of one )ersion to a 8uality Q
0
?S
i"/
@
on system le)el2 'hile we use different 8uality models QM
n
to integrate the metrics )alues per class Q
n
?C
i"/"!
@" we
use the same aggregation QM
0
to integrate these per class
8uality )alues to system le)el 8uality Q
0
?S
i"/
@" see <igure 12
Secondly" we aim at drawing conclusions from the system
le)el 8uality )alues2 <or our e.periment" we draw conclusions
$ased on the 8uality trend2 In other words" we as! if for a
pro/ect i the 8uality Q
0
?S
i"/
@ is impro)ing o)er the )ersion
/" or if it is constant or e)en deteriorating2 #herefore" we nor7
mali&e the )alues Q
0
?S
i"/
@ such that 1 is the worst 8uality and
0 is the $est 8uality for each of the 8uality models QM
1226
"
and aggregate the )alues Q
0
?S
i"/
@ for different )ersions of a
system to a common trend )alue #
n"i
2 #herefore" we use the
slope of the linear regression function reg
n"i
Q a
n"i
/ R
$
n"i
of each pro/ect i o)er its )ersions /2 >ur trend conclusion
#
n"i
is impro)ing iff a
n"i
is negati)e" MdeterioratingI iff a
n"i
is positi)e" and MconstantI iff it is ?close to@ &ero2
4s already discussed in the introduction" it is possi$le to
choose from se)eral software 8uality models2 #hese define
the aggregation of indi)idual metrics to the 8uality Q of a
class2 Q is then aggregated to the 8uality Q
0
of a system and"
hence" allows to come to different trend conclusions # a$out
the 8uality of the system2 'e formali&e the resulting research
8uestions with the following hypotheses1
n
n
n
n
n
M
1
?C
i" j "k
@
M
2
?C
i" j "k
@

M
L
?C
i" j "k
@
QM
1
S Q
1
?C
i" j "k
@ S QM T S QT
1
?S
i"
j
@
QM
2
S Q
2
?C
i" j "k
@ S QM T S QT
2
?S
i" j
@

QM
N
S Q
N
?C
i" j "k
@ S QM T S QT
N
?S
i" j
@
turn" the selection of software 8uality metrics2 #he selected
metrics are a sample of metrics descri$ed in literature" $ut not
a random sample2
<inally" further limitations apply to the software systems
analy&ed2 Since the selected metrics tool ?as most alternati)e
tools@ wor!s on source code" legal restrictions limit the suita$le
<ig2 11 4ggregating metrics
)alues for classes to system
le)el using different 8uality
models2
Q1 6ull hypothesis1 #here
is no principle difference
in the
systems2 #hus" we restrict
oursel)es to open7source
software as a)aila$le on
Source<orge260#
H
2 #he
test systems selected are a
random sample2
+@ 0.periment 9esign1
#he dependent )aria$le
software 8uality Q
0
?S
i"/
@ is
measured on a ratio scale"
and the resulting trend
conclusion #
n"i
is measured
on an ordinal scale ?impro)7
softwar
e
8uality
Q
0
measured $y
the same
metrics
ing is $etter than constant
is $etter than deteriorating@2
'e
M
122
applied to the
same test systems S
i"/
and aggre7 gated with
different 8uality
models QM
0

QM
1226
" i2e2"
use Pearson correlation and
46>-4 or their non7
parametric alternati)es to
compare the correlation
$etween the system
=
0
1 Q
0
?S
i"/
@ Q f
2 ?Q
0
?S
i"/
@@ Q 2 2 2
Q f
6
?Q
0 ?S
i"/
@@
1
2
6
with linear functions f
2
4lternati)e hypothesis1
#here is no such set
of linear functions f "
i2e2" =
1
1 Q
0
?S
i"/
@ Q
f
2
?Q
0
?S
i"/
@@ Q 2 2 2 Q
le)el 8ualities ?trend
conclusions@ when applying
the different
8uality models to the same
system ?pro/ect@2
f
6

?
Q
0
1
2
?S
i"/
@@
for linear
functions
f 2
C2 Instrumentation
>ur e.periment is
performed on the
a)aila$le wor!ing
Measures needed1
metrics )alues per class
M
122
?C
i"/"!
@"
software 8uality )alues
per class Q
1226
?C
i"/"!
@" and soft7
e8uipment" i2e2" a standard
PC satisfying the
minimum re7 8uirements of
the software measurement"
data collection" and
ware 8uality
)alues per
system Q
0
?
S
@2
n
1226
6
1226
e)al u
a
Q2 6ull
hypothesis1
#here is no
difference
in the
conclusions
#
n"
i
$
as
e
d
o
n
di
f
er
e
nt
8
u
al
iti
es
Q
0
?
1
@
S
o
ft
w
a
r
e
M
e
tr
i
c
s
S
e
l
e
c
ti
o
n
1
'
e
c
o
n
si
d
e
r
t
h
e
u
n
i
o
n
o
f
the
sets
of
met
rics
re8
uire
d as
inp
ut
$y
the
diff
ere
nt
soft
war
e
fr
o
m
di
ff
er
en
t
8u
ali
ty
m
od
el
s
Q
M
0

Q
M
122
6
fo
r the
)ersions /
A1 2 2 2 P
i
B of the
same test
pro/ect i"
i2e2"
=
0
1 #
1"i
Q 2 2 2 Q
#
6"i
4lternati)e
hypothesis1
#here is a
difference"
i2e2" =
1
1
#
1"i
Q 2 2 2
Q #
6"i
Measures
needed1
trend
conclusion
s
per
system
#
1226"i
$ased on
software
8uality
)alues per
system
1
2
2
6

?
S
1
2
2
P
"
/

@
2
*@ -aria$le
Selection1 #he
independent
)aria$le is the
8uality model
QM 2 #he
dependent
)aria$le is the
system le)el
software
8uality Q
0
and
resulting trend
conclusions # 2
G@ Selection
of
Su$/ectsJ>$/ect
s1 <irst" we
consider
)ariants of an
IS> ,12D7$ased
8uality model2
4dditionally"
we consider
8uality models
from literature"
$ut we limit
oursel)es to
e)aluating only
those $ased on
similar
e)aluation
approaches and
on the same
input metrics2
(sing models
with a too
di)erse set of
input metrics
would increase
the effort for
collecting these
metrics $eyond
our resources2
#hus" we omit
approaches
in)ol)ing
6eural
6etwor!s etc2
for integrating
and aggregating
the indi)idual
class le)el
metrics )alues2
#he selected
8uality models
are a sample of
models
discussed in
literature" $ut
not a random
sample2
Second" we
limit oursel)es
to a single
software metric
tool"
-i&&4naly&er"
since repeating
the e.periment
with se)eral
tools would
re8uire a much higher effort
on measurement" data
collection" e)aluation" and
analysis2 In fact" we ha)e
shown earlier that different
metrics tools lead to
different class le)el 8uality
)alues for the same
system?s@ and the same
8uality model and e)en to
different conclusions
regarding the 8uality
ran!ing of the classes A2HB2
#hird" the 8uality
models and the metrics
tool limit" in
8uality models ?discussed
$elow@2 Most metrics
originate from well7!now
metrics suites li!e
Chidam$er U Femerer A*B"
namely Coupling Eetween
>$/ects ?CE>@" 9epth of
Inher7 itance #ree ?9I#@"
ac! of Cohesion in
Methods ?C>M@" 6um$er
>f Children ?6>C@"
Response <or a Class ?R<C@"
'eighted Method Count
?'MC@ using McCa$e
Cyclomatic Comple.ity as
weight for the methodsO i
U =enry A2;B" namely 9ata
4$straction Coupling
?94C@" Message Pass7 ing
Coupling ?MPC@" 6um$er
>f local Methods ?6>M@"
6um$er of 4ttri$utes and
Methods ?64MJSIV02@O
Eieman
U Fang A2,B" namely
#ight Class Cohesion
?#CC@O =it&
U Monta&eri A*0B" namely
ocality of 9ata ?9@"
Impro)e7 ment of C>M
?IC>M@2 4dditionally" we
added commonly !nown
metrics li!e ength of class
names ?06@" ines >f
Code ?>C@" and ac! >f
9ocumentation ?>9@2
<inally" the Cyclicity ?CWC@
of a class measures the
si&e of the largest cycle of
this and other classes o)er
call" access" and inheritance
relations2
4 detailed discussion of
the a$o)e ?and other@
software metrics can $e
found in A*1B2 4n
o)er)iew including e.7 act
definitions is pro)ided in the
MCompendium of software
8uality standards and
metricsI A*2B2 #he
definitions gi)en in the
compendium are used as the
$asis for the metrics
implementations in
-i&&4naly&er2
2@ Quality Model
Selection1 Many 8uality
models dis7 cussed in
literature predict the
maintaina$ility of classes
1226
Q
0
$as
ed
on
stat
ic
met
rics
2
#hi
s
als
o
hol
ds
for
the
diff
ere
nt
)ar
iant
s
of
the
IS
>
,12
D7
$as
ed
8ua
lity
mo
del
and
two
reg
res
sio
n7
$as
ed
mo
del
s
fro
m
lite
rat
ure
we
sel
ect
ed2
H
htt
p1JJ
so
urc
efo
rge
2ne
t 7
fro
m
no
w
on
ref
err
ed
to
as
So
urc
e<
or
ge2
;+
l
?C @ Q

1 if M
l
?C @ AM
l"min
2 2 2 M
l"min
R
M
l"out
@O

0 if otherwise2
and
l
?
C

@
Q

1
if M
l
?C
@
?M
l"ma.

M
l"out
2
2 2
M
l"ma.
BO

0
if
otherwi
se2
?
1
?
2
<or metric M
l
with a direct
correlation
with a
criterion
we define1 M
criterion low
l
?C @ Q M
l
?C @2 <or metric
with an indirect
correlation to
we define1
M
criterion up
.
<ig2 21 Software
Quality Matri."
showing only
maintaina$ility
related 8uality
factors" criteria
and metrics and
their relation7
ship2 ?R@R
?strong@ direct"
and ?7@7 ?strong@
indirect
correlation2
#he mapping
QM
1
from the
indi)idual
metrics )alues
to the criteria
and" finally" to
maintaina$ility"
Main property Maintainability
S
u
b

P
r
o
p
e
r
t
yA
n
a
l
y
z
a
b
i
l
i
t
y
C
h
a
n
g
e
a
b
i
l
i
t
y
S
t
a
b
i
l
i
t
y
T
e
s
t
a
b
i
l
i
t
y
M
a
i
n
t
a
i
n
a
b
i
l
i
t
y

C
o
m
p
l
i
a
n
c
e
Category Sub-Category Metric
Complexity
size LOC -- -- - --
interface C.
AM -- -- - --
OM -- -- - --
structural C.
!MC -- -- - --
"#C -- -- - --
Architecture $
Structure
%nheritance
&%T -- -- - --
OC - -- - -
Coupling
C'O -- -- -- --
&AC -- -- -- --
L& (( (( (( ((
MPC -- -- -- --
Cohesion
LCOM -- -- -- --
%LCOM -- -- -- --
TCC (( (( (( ((
&esign gui)elines $
Co)ing con*entions
&ocumentation LO& -- -- - --
+ui)elines
L, -- -- -- --
C-C -- -- -- --
M
low
M
up
;+
l
?C @ Q M
l
?C @" cf2
<igure 2 for the corre7
lation of the metrics ?rows@ to the
criteria ?column@2 6ote
that M
l"ma.
Q M
l"min
implies that
M
l"out
Q 0 and M
criterion
Q 0 for
all classes C " i2e2" no e.treme )alue
and" hence" no outliers for metric M
l
2
#he indi)idual metrics )alues M
criterion
?C @ are then aggregated to a
single )alue M
criterion
?C @ per
criterion and class according to
the weights as defined in the
8uality model2 et w
criterion
$e
the weight connecting metrics M
l
with a criterion2 It is two for strong
direct or strong indirect connection
and one for direct or indirect
connection2 'e define1

a@ IS> ,12D7$ased
-ariants1
0ach 8uality model
N
M
crit
erio
n
criterion
QM
12226
in this category maps and
aggregates the indi)idual metrics
)alues to the 8uality factor
maintaina$ility )ia the cri7 teria
analy&a$ility" changea$ility" sta$ility"
and testa$ility" as
M
criterion
?C @ Q
lQ1
l
?C @ w
l

N
criteri
on
l
?
*
@
seen in <igure 22 4ll 6 )ariants count
classes that are outliers2 #hat means
classes ha)ing )alues outside their
desired )alue range2 #he 8uality models
differ in how they determine the
outliers2 'e introduce in the following
the $aseline approach ?System )alues
1+X@ in detail2 #hen" we $riefly
present how the other 8uality model
)ariants ?4ll )alues 1+X and 4ll ran!
1
+
X
@

d
i
f
f
e
r
2
Remar!1 'e select a threshold of
1+X since this was the threshold we
used in the 0urocontrol pro/ect2 'e
therefore e.pect it to $e a good e.ample
of industrial practice2 4ddi7 tionally" an
empirical e)aluation of alternati)e
threshold )alues in a range of +X C +0X
showed that using different thresholds
has only little effect on the conclusions
as presented later on in #a$le I-2 'e
present a summary of the alternati)e
conclusions for the System )alues +X"
10X" 20X" *0X" G0X" and +0X
thresholds in the appendi." #a$le -"
for reference2
I2 System )alues 1+X2 4 class C is
an outlier wrt2 metric
#he
maintaina$ilit
y Q?C @ of a
class C is
now defined
as the a)erage
of M
4naly&a$ility
?C @" M
C
hangea$ility
?C
@"
M
Sta$ility
?C @" and M
#
esta$ility
?C @2
#he )alues
range from 0
to 1" with 0
$eing the $est
possi$le
maintaina$ilit
y" since C is
not an outlier
wrt2 any
metric and
criterion2
-alue 1
indicates the
worst possi$le
maintaina$ilit
y" since all
metrics )alues
for C e.ceed
their
thresholds2
II2 4ll )alues
1+X2 #his
)ariant differs
only in that
we
define
M
l"out
Q
?M
l"ma."1+H

M
l"min"1+H
@
1+X" where
M
l"ma."1+H
?M
l"min"1+H
@
is the
ma.imum
?minimum@
)alue of M
l
for any class
in 1+H
systems" with
2G+"*20
classes
o)erall"
analy&ed $y
Ear!mann et
al2 A**B2
III2 4ll ran! 1+X2
#his )ariant differs
only in how we
define
M

l
o
w
u
p
l
?C @ and
M
l
?C @1
for each
metric" the
)alues of the
classes in the
1+H systems
?from the
Ear!mann
study@ are
sorted
increasingly
?decreasingly"
resp2@" and
we define the
outlier inter)al
$orders for M
low
?C @ ?M
up
?C @" resp2@
M
l
and
criterion c if
and only if
the metrics
)alue M
l
?C
@
is within
the highest
?lowest@
1+X of the
)alue range
as the
)alue
of the
class at
ran!
?M
up
l
2
G
+
"
*
2
0
1
+
X
2 M
low
measured
for any class
in the
system if
low ?high@
)alues are
desired for
M
l
to
satisfy a
criterion2
'e denote
this with
an indirect
au.iliary
metric M
criterion
?C
@2 'e
define
M
l"out
Q
?M
l"ma.
M
l"min
@ 1+X"
where
l
l
l
w
l
?C @
l
;+
M
l"ma.
?M
l"min
@ is the
ma.imum ?minimum@ )alue of
M
l
for any class in the system2
<urthermore" we define
l
?C @" resp2@ Q 1 iff M
l
?C @ is
smaller ?larger" resp2@
than this outlier inter)al
$order" and 0 otherwise2
4 fourth alternati)e System ran!
1+X does not ma!e sense when
aggregating the class to the
system le)el )alues ?Q 021+ for all
systems@ and is therefore omitted2
$@ Regression Model7$ased -ariants1 #hese )ariants are
ta!en from two studies2 #hey directly calculate the 8uality on
class le)el Q?C @2
I-2 Regression 42 Su$ramanyam and Frishnan A,B pro)ide
Soft.are Systems
,clipse !or/space
a0
c0
b0
)0
1izzAnalyzer
&ata ,xtraction
e0
Local &'
Meta-&ata2
"epository 3"L
S4M Tool g0
MS ,xcel5SPSS
f0
Analysis $ 1isualization
e
m
p
i
r
i
c
a
l
e
)
i
d
e
n
c
e

s
u
p
p
o
r
t
i
n
g

t
h
e

)
a
l
i
d
i
t
y

o
f

a

s
u
$
s
e
t
o
f

t
h
e

C
h
i
d
a
m
$
e
r
a
n
d

F
e
m
e
r
e
r
s
u
i
t
e

A
*
B
i
n

d
e
t
e
r
m
i
n
i
n
g

s
o
f
t
w
a
r
e

d
e
f
'
at
ch
,x
po
rt
ects2 #hey collected the
product metrics
manually from industry
data ?E2C e7commerce
applica7 tions@
in)ol)ing programs
written in Pa)a and CR
R2 #hey applied linear
regression to predict
software defects from
metrics2 #he dependent
)aria$le is the defect
count" which includes
defects reported from
customers and defects
found during customer
acceptance testing"
leading to the
following 8uality
model for Pa)a
programs1
1JQ?C @ Q
02D+H0 020000*
M
SI V 0
?C @ ?G@
<ig2 *1
#ools and
Processes2
#he pro/ects are located
in an 0clipse wor!space
?a@ as Pa)a pro/ects2 #hey
are complete and
compila$le" which is a
prere8uisite for data
e.traction2 #he
-i&&4naly&er metric tool
?$@ is fed with low7le)el
information from the
0clipse pro/ects ?synta."
cross references" etc2@ and
computes the metrics2 'e
use -i&&4naly&er for the
metrics e.traction" $ut other
software metrics tools ?cf2
inc!e et al2 for an
o)er)iew A2HB@ could ha)e
$een used as well2 #he
-i&&4naly&er was our
choice since it supports
automated processes1 an
interface ?c@ allows for
$atch
predi
ction
mod
els
and
allo
ws
for
fle.i
$le
calc
ulati
on
of
the
-2 Regression E2 Wu et
al2 A10B empirically
)alidated ten o$/ect7
oriented metrics"
among them the
Chidam$er and
Femerer suite A*B" wrt2
their usefulness in
predicting fault7
proneness as an
important software
8uality indicator2 #he
test system was
written in Pa)a and
had 12* classes2 #hey
collected defects found
during testing ?together
with their se)erity and
type@ as stored in a
pro$lem trac!ing
system2 #his
information was used
as the dependent
)aria$le2 (sing
regression" they
statistically deri)ed a
8
u
a
l
i
t
y

m
o
d
e
l
1
Q?C
02+20
02GD2
M
6
?C
?,@
021,0
?10@
022G1
?11@
020,H
R
?12@
021H+
?1*@
022+D
?1G@
<
o
r
a
ll
fi
)
e
8
u
a
li
t
y
m
o
d
e
ls
"
t
h
e
m
a
p
p
i
n
g
Q
M

0
fr
o
m

8
u
a
li
t
y
Q
?
C
@
o
f
c
l
a
s
s
e
s
t
o
a
s
y
st
e
m
l
e
)
e
l
Q
0
?
S
@
is
si
m
p
l
y
t
h
e
a
)
e
r
a
g
e
o
f
Q
?
C

@
o
f
a
l
l
c
l
a
s
s
e
s
o
f
t
h
e
s
y
s
t
e
m
2
*
@
M
e
a
s
u
r
e
m
e
n
t
P
r
o
c
e
s
s
a
n
d
S
o
f
t
w
a
r
e
M
e
a
s
u
r
e
m
e
n
t
#
o
o
l
S
e
l
e
c
t
i
o
n
1
<
i
g
u
r
e
*
p
r
o
)
i
d
e
s
a
n
o
)
e
r
)
i
e
w

o
f
o
u
r
m
e
a
s
u
r
e
m
en
t
pr
oc
es
se
s
an
d
to
ol
s2
It
is
$u
ilt
ar
ou
nd
an
I
9
0
fo
r
e.
tr
ac
ti
ng
th
e
$a
si
c
in
fo
r
m
ati
on
a$
ou
t
th
e
pr
o/
ec
ts"
0
cli
ps
e
?a
@"
th
e
m
et
ri
cs
to
ol
fo
r
co
m
pu
ti
ng
m
et
ri
cs
)a
lu
es
"
-i
&&
4
na
ly
&e
r
?$
@"
a
lo
ca
l
da
ta
$a
se
fo
r
st
or
in
g
th
e
da
ta"
M
S
4
cc
es
s
?e
@"
to
ol
s
fo
r
st
ati
sti
ca
l
a
n
a
l
y
s
i
s
"
M
S

0
.
c
e
l
a
n
d

S
P
S
S
;

?
f
@
"
a
n
d

a

t
o
o
l
f
o
r

e
)
a
l
u
a
t
i
o
n

a
n
d

a
$
s
t
r
a
c
t
i
o
n

o
f

t
h
e

d
a
t
a

s
t
o
r
e
d

i
n

t
h
e

l
o
c
a
l
d
a
t
a
$
a
s
e
"
t
h
e

S
Q
M

#
o
o
l
?g@2
;
http
1JJw
w w
2sps
s2co
m
#h
e
prese
nted
tools
and
proc
esses
were
alrea
dy
pro)
en
funct
ional
in a
pre)i
ous
stud
y
cond
ucte
d $y
Ear!
man
n et
al2
A**B2
G@
#est
Syst
em
Sele
ction
1
'e
com
pute
the
metri
cs
for
testi
ng
our
hypo
these
s
from
a
num
$er
of
test
syste
ms2
<or
com
putin
g a
trend
0200*2 M' M C ?C @ R ?+@
processing of a list of pro/ects and an e.port engine ?d@ to
020011 M
C E>
?C @ R ?D@
store the computed metrics in a data$ase for later processing2
0211;0 M
9I #
?C @ ?H@
#he SQM #ool ?g@ implements the different software 8uality
020210 M
C E>
?C @ M
9I #
?C @ ?;@
8uality )alues2
" we need a sufficient
num$er of )ersions of one
and the same pro/ect to
$e statistically significant2
'e selected Pa)a software
pro/ects from Source<orge
and the 4pache Software
<oundation and" to ensure
that the pro/ects and
systems are not tri)ial" we
applied two selection
criteria1 ?i@ each )ersion of
a pro/ect must ha)e at least
G0 classes" and ?ii@ each
pro/ect must ha)e at least
10 different )ersions o)er
time2
Eelow" we $riefly
introduce the 11 software
pro/ects se7 lected2 'e list
them alpha$etically2 4)alon
pro)ides Pa)a software for
component and container
programming ?http1
JJa)alon2apache2org@2
Chec!style is a
de)elopment tool to help
programmers write Pa)a
code $y automating the
process of chec!ing the
compliance to style
guidelines
?http1JJchec!style2
sourceforge2net@2
PasperReports is a
$usiness intelligence and
reporting engine written in
Pa)a2 It is a li$rary that can
$e em$edded in other
applications
?http1JJ/asperfo rge2org@2
/0dit is a programmerYs te.t
editor written in Pa)a which
uses the Swing tool!it and
is released as free software
?http1JJwww2 /edit2org@2
logG/ is a logging tool2 It is
written in Pa)a and logs
statements in a file
?http1JJlogging2apache2o rgJlo
gG/@2 ucene is an
information retrie)al li$rary"
originally created in Pa)a"
ported to other
programming languages
including 9elphi" Perl" CZJR
R ?http1JJlucene2apache2o rg@2
>ro includes a set of te.t
processing Pa)a classes that
pro)ide Perl+ compati$le
regular e.pressions" 4'F7
li!e regular e.pressions"
glo$ e.pressions" and utility
classes for performing
su$stitutions" splits"
filtering" etc2
?http1JJ/a!arta2apache2o rgJor
o@2 PM9 is a Pa)a source
code analy&er written in
Pa)a2 It scans Pa)a source
code and loo!s
;
D
1
2
#4E0 I1 9escripti)e statistics of pro/ects o)er all considered )ersions ?$efore remo)al of outliers@2
<iles
Classes U
Interfaces <ields Methods
>C in
<iles
>C in
Classes
>C in
Methods -ersions
Mean G0,2D* +DD2;H 2"1122+0 G"1*G2,, ,G"D,+2HD ;;"02*2*, +,"2HH2*H *121;
Std2 0rror 1G2DH 202H2 ,+2,2 1;+2*1 G"2G*2DD G"0HH22* 2"H1,2;2 +20G
Median *DH +1G 1"+,G *"DD; H1"2;2 D*"H20 G1";+* 2G
Mode 1,H 2,2 ;*0 1"G,0 12"H0, 2,"2+H 1;"+;0 1,
Std2 9e)2 2H12D; *;*2DH 1"HHD2GG *"G*2200 H;"+,*2H* H+"+112*1 +0"*H12;D 1D2H1
Furtosis 0211 020D 12*D 12++ 120G 12GG 120; 712D2
S!ewness 02;D 02HH 12*0 12** 1220 12*0 1220 02*,
Range 1"111 1"DG1 H"G1D 1G"G0+ *2G"20G *1D"G,D 20H"+0; GG
Minimum 1* 1* 2; 11; 2"1*+ 1";2G 1"2G; 11
Ma.imum 1"12G 1"D+G H"GGG 1G"+2* *2D"**, *1;"*20 20;"H+D ++
Sum 1G0"+02 1,G"G*+ H2G"+;H 1"G1;"*02 *2"G;0"DG+ *0"1,2"022 20"**2"1*H *G*
Count *G* *G* *G* *G* *G* *G* *G* 11
for potential pro$lems ?http1JJpmd2sourcefo rge2net@2 Struts is
a 'e$ application framewor! for de)eloping Pa)a 'e$ appli7
cations ?http1JJstruts2apache2org@2 #omcat D2. is a Ser)let con7
tainer written in Pa)a2 It includes tools for configuration and
management ?http1JJtomcat2apache2o rg@2 Nerces is an NM
parser ?http1JJ . erces2apache2o rgJ . erces27/@2
In total" during the course of this e.periment" we ha)e
analy&ed 11 pro/ects in *G* )ersions ?appro.2 2,2H* )ersions
per pro/ect@2 Some standard descripti)e statistics a$out this
pro/ect group are pro)ided in #a$le I2
92 -alidity 0)aluation
1@ Conclusion )alidity1 assures a statistical relation with
sufficient significance $etween the different 8uality models
QM applied and the 8uality )alues Q
0
and trend conclusions #
o$ser)ed2 'e are confident that the applied statistical methods
are appropriateO their assumptions are fulfilled" e)en though we
do not include a detailed discussion for the sa!e of $re)ity2
#he data set for Q1 ?*2; )ersions@ is suita$ly large to get
significant results2 <or Q2" we still o$tain significant results
despite the rather limited amount of data ?11 pro/ects@2
2@ Internal )alidity1 of the actual e.periment assures that
only the )arying 8uality models QM may cause the effects on
the o$ser)ed )alues Q
0
and trend conclusions # 2 4s we ha)e
a straightforward e.periment design5no humans in)ol)ed" no
time dependency" only two independent )aria$les5we ha)e
full control o)er the e.periment2
*@ Construct and e.ternal )alidity1 are a$out generali&ing
the e.perimental design to the theory $ehind the e.periment
and to industrial practice2 Software metrics and 8uality models
are indeed used for 8uality assessment and their conclusions
are input for 8uality management acti)itiesO $oth are rele)ant
in industry2 >ur metrics" 8uality models" trend conclusions"
and systems o$ser)ed are good representati)es of industrial
practice2
4s discussed in the introduction" the a)aila$le metrics and
related theory has $een )alidated in empirical studies2 Selected
metrics are integrated in state7of7the7art de)elopment tools2 'e
did not )ary the metrics tool used" as our pre)ious study A2HB
showed de)iations in measured )alues for the same metrics and
software2 (sing an alternati)e $ut correct metrics tool should
not ha)e an impact on the results2
>ur $aseline 8uality model ?I@ is e)en $ased on an IS>
standard and has already $een used in se)eral industrial
pro/ects2 #he models ?II" III@ are minor )ariations thereof2
#he two regression7$ased 8uality models ?I-" -@ are ta!en
from literature2 #o a)oid ta!ing them out of conte.t" we
carefully chec!ed and guaranteed all preconditions" e2g2" the
programming language2 =owe)er" a threat to )alidity is the
assumption that other 8uality models deli)er similar results2
Concluding trends out of a series of assessments has
$een documented in the <4M>>S =and$oo! of Reengineer7
ing A*GB2
#he selected test systems are non7tri)ial" e)en though they
are open7source2 4 threat to construct )alidity is the assump7
tion that the programming language has no impact and that
our findings are transfera$le to non7Pa)a programs2
I-2 4SS 0S SM 06 # > < =W P># =0
S0 S
42 Measurement and 9ata Collection
'e use the data collection process and tools discussed in
Section III7C to collect the data from the different )ersions of
the test systems2 #he data$ase contains the different metrics
)alues M
l
" Q
n
" and Q
0
" a)aila$le for further analysis
with MS 0.cel and SPSS2 'e collected data from all 11
pro/ects with *G* )ersions2
E2 4nalysis and Interpretation
1@ 9escripti)e Statistics1 'e collected 1H metrics )alues
?M
l
" l Q 1 2 2 2 1H@ for each of the 1,G"G*+ classes and
inter7 faces of the test systems2 Eased on this data" we
calculated fi)e different 8uality )alues per classJinterface using
our 8uality models ?Q
n
" n Q I 2 2 2 - @2 #his data was then
aggregated to fi)e different 8uality )alues Q
0
" n Q I 2 2 2 - "
n
1
3
on system le)el
n
for the 11 systems in all *G* )ersions2 'e summari&e the #4E0 II1 Correlation $etween 8uality Q
0
of systems
col
lec
ted
dat
a
usi
ng
de
scr
ipt
i)e
sta
tist
ics
"
sc
att
er
pl
ots
"
etc
2"
$u
t
e.
clu
de
the
m
he
re
for
$r
e)i
ty2
2
@
9a
ta
Re
du
cti
on
an
d
#r
an
sfo
rm
ati
on
1
'
e
re
m
o)
ed
)e
r7
sio
ns
wh
ich
do
no
t
ful
fill
the
re
8u
ire
me
nt
reg
ar
di
ng
the
mi
ni
m
u
m
nu
m
$e
r
of
cla
sse
s"
i2e2
"
in
pr
o/e
ct
C
he
c!
st
yl
e
all
)e
rsi
on
s
pri
or
to
*20
2
<u
rth
er"
we
remo)ed multiple copies of
)ersion 12221G ?12221G7
ma)en and 12221G7
updatesite@ in pro/ect logG/2
In PasperReports" we
remo)ed )ersion 12*7 alpha1
and 12*7alpha+" which ha)e"
compared to the )ersions
coming $efore and after
them" twice as many
classes2 =ere" the
de)elopers seemed to ha)e
reorgani&ed their
wor!space2 In pro/ect Struts"
we remo)ed )ersions 12*
and 12*21 $ecause they did
not compile2 #he )ersions
ha)e less than half of the
classes of the )ersions
$efore and after them2 4lso"
we remo)ed )ersion 02+
since we thin! that it is
e.perimental" and )ersion
22122 since it is not part of
the 12. pro/ect line" and we
could so far not analy&e the
remaining )ersions of the
22. pro/ect line2 <or pro/ect
Nerces2" we remo)ed
)ersions
12020 and 12021" since they
ha)e three times the num$er
of the classes of the
)ersions thereafter2 4fter
reduction" data from
S
i"/
computed with different
8uality models QM
0

QM
1222+
2
Pearsons correlation" all
correlations significant at
the 02017
le)el2
r
?I"II@
r?I"III@
r?I"I-@
r?I"-@
Z 4)alon
02,
02+
702;;
702;;
Chec!st
yle
02;H
02,G
702D;
02;H
PasperRep
o
rt
s
0
2
D
,
0
2
*
0
2
+
+

0
2
H
D
/
0
d
it
0
2
;
1
7
0
2
+
*

0
2
0
+
7
0
2
D
2
l
o
g
G
/

0
2
G
*
7
0
2
2
;
7
0
2
1
,
7
0
2
G
1

u
c
e
n
e

0
2
H
H

7
0
2
H
*

0
2
0
,

7
0
2
D
*
>
r
o

0
2
G
1

0
2
1

7
0
2
;
D

7
0
2
H
+
P
M
9

0
2
2
,

0
2
G
+

7
0
2
,
*

0
2
,
;
S
t
r
u
t
s

0
2
H
2

0
2
H
1

7
0
2
1
,

0
2
*
2
#
omcatD
02GG
02*G
02G+
Nerces2
02;+
02;,
020,
Correlated
H
G
1
*
6ot
correla
ted
G
+
D
G
1,
I
n
d
i
r
e
c
t
l
y
c
o
r
r
e
l
a
t
e
d
0
2
G
G
1
0
I 222-
the software 8uality Q
0
measured $y the same
metrics
*2; )ersions
remained for
e)aluation and
analysis2
H
a
p
p
li
e
d
t
o
t
h
e
s
a
m
e
t
e
s
t
s
y
s
t
e
m
s
S
i"
/
and
aggregated
Since the num$er
of )ersions per
pro/ect P
i
depends on
with
different
8uality
models
QM
0

QM
I 222-
2
the pro/ect i" we
need to normali&e
the )ersion num$ers
of the different
pro/ects $etween 0
and 1 in order to
ma!e them
compara$le2 #he
first ?oldest@ )ersion
of a pro/ect gets the
num$er 0O the latest
)ersion gets the
)alue 12 <or pro/ect
i" the )ersion
num$er / is
normali&ed $y
#he 8ualities of
the indi)idual
)ersions are
meaningless
$y themsel)esO they
are further a$stracted
to ?trend@ inter7
pretations2 =ence"
we want to answer
8uestion Q2" i2e2" do
different 8uality
)alues lead to
different trend
conclusions" i2e2" do
the differences in the
8uality model
actually matter2
<igures Ga" G$" and
Gc show three
e.amples of the
trends we
[
Q
/

mi
n
P
@
ma.
@
min
@
2
?1+@
o$ser)ed in the 11
analy&ed pro/ects2
0ach diagram
displays the
normali&ed )ersions
on the .7a.is" and the
standardi&ed 8uality
)alues on the y7
a.is2 <or pro/ect
4)alon" the 8uality
0)en
though
the
8uality
)alues Q
0
are
alread
y
$etwe
en
models System -alues
1+X" 4ll -alues 1+X"
and 4ll Ran! 1+X
0 and 1 $y
definition" they are
pro/ect relati)e and
thus on
a pro/ect specific
scale with a
pro/ect specific
mean2 'e use
standardi&ation for
calculating the
standardi&ed )alues
show a generally
impro)ing trend"
while Regression 4
and E show a
deteriorating trend2
Eut for pro/ect
PasperReports" all
models show an
impro)ing trend"
while for pro/ect
Struts"
from the
pro/ect specific
distri$ution of
Q
0
to
ma!
e
the
m
all show a
deteriorating trend2
co
mp
ara
$le
2
#h
e
pro
/ec
t
spe
cifi
c
dis
tri
$ut
ion
is
ch
ara
cte
ri&
ed
$y
the
pro
/ec
t
spe
cifi
c
me
an
?ar
ith
me
tic
me
anJ
a)
era
ge"
Q
0
@
an
d
the
pro
/ec
t
spe
cifi
c
sta
nd
ard
de
)ia
tio
n
?\
i
@2
#h
en
the
I
n
or
de
r
to
an
sw
er
Q2
"
we
cal
cul
ate
lin
ear
re
gr
ess
io
n
fu
nct
io
ns
for
all
8u
ali
ty
m
od
els
in
ea
ch
of
the
tes
t
sy
ste
ms
2
#h
e
res
ult
s
are
su
m
ma
ri&
ed
in
#a
$le
III2
'
I 222-
n
n
n"i
e pro)ide a
n"/
and $
n"/
"
standardi&
ed 8uality
Q
0
computed $y
8uality model n
for a
the coefficients for the linear
regression functions" as well
as
)ersion /
in a pro/ect
i is
Q
[
0
Q
n
the coefficient of
determination ?r
2
@ and
the o$ser)ed <7)alue
?<7test" p Q 0201@ for
significance testing2
'e see that most of
the calculated
regression lines are
significant ?$olt"
o$ser)ed
0
n"i"/
Q
i
0
2
?1D@
<7)alue larger than critical
<7)alue@2
*@ =ypothesis #esting1
#o answer our first
research 8ues7 tion Q1" we
assess the associated
hypothesis and calculate
for all pro/ects the
correlation ?Pearson
correlation r@ $etween the
System -alues 1+X 8uality
?I@ and the other four
other
8ualities ?II5-@2 'e
assume direct correlation for
r 02+" and
indirect correlation for r
02+2 #he results are
summari&ed
in #a$le II2 In 1+ cases" the
8uality models ?II5-@
correlate directly with
System -alues 1+X ?I@2 In
2, cases" we cannot
confirm a direct
correlation" and 10 cases
e)en show an indirect
correlation2 #his allows us to
re/ect =
0
and answer the
research 8uestion Q1 $y1
#here are principle
differences in
Regardless of the
significance" we may
formali&e the trend
#
n"/
using the slope a
n"/
of the linear regression
models1
#
n"/
is impro)ing
?deteriorating@ iff a
n"/

02+ ? 02+@"
and constant otherwise2
#a$le I- gi)es all
conclusions drawn
from the 8uality )alues o)er
pro/ect )ersions and the
8uality models2 4s already
indicated $y the diagrams
in <igure G" the conclusions
largely depend on the
8uality models2 Some
8uality models seem to
agree" at least for certain
pro/ects" e2g2" System
-alues 1+X" 4ll -alues
1+X" and 4ll Ran! 1+X for
4)alon" PasperReports"
and Nerces2O Regression 4
and E for 4)alon"
Chec!style"
PasperReports" etc2 Wet
some
n
\
6.77
8.97
8.77
7.97
7.77
-7.97
-8.77
-8.97
-6.77
Avalon
7.77 7.87 7.67 7.:7 7.;7 7.97 7.<7 7.=7 7.>7 7.?7 8.77
% %% %%% %1 1
?a@
6.77
8.97
8.77
7.97
7.77
-7.97
-8.77
-8.97
-6.77
-6.97
JasperReports
7.77 7.87 7.67 7.:7 7.;7 7.97 7.<7 7.=7 7.>7 7.?7 8.77
% %% %%% %1 1
?$@
6.7
7
8.9
7
8.7
7
7.9
7
7.7
7
-
7.9
7
-
8.7
7
-
8.9
7
-
6.7
7
S
t
r
u
t
s
86
87
>
<
;
6
7
7.77 7.87 7.67 7.:7 7.;7
7.97 7.<7 7.=7 7.>7 7.?7
8.77
% %%
%%%
%1
1
?
c
@
Conclusions per
Prediction Model
% %%
%%
%1
1
%mpro*ing
Constant
&eteriorating
?
d
@
<ig2 G1 ?a7c@ Selected pro/ects showing different trends
for the 8uality prediction models2 ?d@ <re8uency of
different trend conclusions of the 8uality models I5-2
#4E0 I-1 Conclusions
#
n"i
for the pro/ects i Q
1 2 2 2 11
according to the 8uality
models IC-2 Quality is
impro)ing ?R@" constant
?@" or deteriorating ?7@2
I

I
I

I
I
I

I
-

-

4
)
a
l
o
n

R


R


R

7


7
C
h
e
c
!style
R
7
7
R
R PasperReports
R
R
R
R
R /0dit
R
R
7
7
logG/
R
R
7
7
7 ucene
R
R
7
R
>
r
o
R

R
7
7
R
7
7
R
7 Struts
7
7
7
R
7 #omcatD
R
7
R
7
N
e
r
c
e
s
2

R
R
R

R
c
o
n
c
l
o
g
G
/
2
e
compare the trend due to
the different 8uality
models ?I5-@ using a
one7way repeated
measures 46>-4 and
o$ser)e effecti)e
differences for the
models1 'il!sY ] Q
02**G" < ?G" H@ Q *2G;*"
p ^ 021" multi)ariate
partial _
2
Q 02DH2 #he
<riedman test confirms
statistically significant
differences of the 8uality
models1 `
2
?G" n Q 11@
Q 1+20+" p ^ 02012
#his lets us re/ect =
0
for the second research
8uestion Q2 and
conclude1 #he
differences among the
8uality models e)en
lead to different
conclusions2
<inally" we computed
the correlation
coefficients of the
trends for all pairs of
8uality models2 It showed
that the models I and II
?III and -@ are
positi)ely correlated" r
Q 02D2+
?r Q 02D*H@" significant
at the 020+7le)el" 27
tailed2 #he cor7 relation
of the models I and II
supports that outlier
thresholds can $e defined
on )alue ranges relati)e
to a ?sufficiently large@
system and $y using
glo$al )alue ranges2 #he
correlation of III and -
and" e)en more so" the
lac! of a correlation of
the models I- and -"
$oth )alidated regression
models for defects" come
at some surprise and
need further studies"
especially in the light of
the relati)ely small
statistical $asis of only 11
samples ?pro/ects@ for
the statistics answering
Q22
-2 C> 6C (S
I > 6S 4 6 9 <(
#( R 0 '> R F
9o different
alternati)e software
8uality prediction
models calculate
compara$le results for
the same pro/ect: 6o"
we could not show a
significant correlation
$etween the 8uality
)alues of different 8uality
models regardless of the
analy&ed pro/ects2 9oes
this matter: Wes" we
could show that the
different 8uality models
applied to the same
pro/ect lead to different
8uality trend /udgments"
hence" to different
conclu7 sions2 #he
o$tained results are
interesting for researchers
and practitioners ali!e2
Researchers o$)iously
ha)e to $e careful when
generali&ing the )alidity
of software 8uality
prediction models2
Practitioners need a )ery
good understanding of
the software 8uality
prediction models they
apply ?instead of using
t
h
<
u
stion2 It would also $e
interesting to analy&e more
thor7 oughly why the
8uality models lead to the
o$ser)ed differences in
assessments and
conclusions2
#4E0 III1 Regression e8uations ?reg
n"i
Q a
n"i
/ R $
n"i
@ per pro/ect and 8uality model2 Coefficient of determination of
correlation ?r
2
@ and o$ser)ed <7)alues ?< @2 Critical <7)alues ?<7crit2@ for one degree of freedom and p ^ 02012 Eolt )alues1
o$ser)ed <7)alues are significant2
I II III I- -
Coeffi7
cients
an"/ $n"/
r2 <
an"/ $n"/
r2 <
an"/ $n"/
r2 <
an"/ $n"/
r2 <
an"/ $n"/
r2 <
n
<7crit2
4)alon 722; 12G
02;D +D2H
722+2 122D
02H 202;
712*; 02D,
0221 22G
22H, 712G
02;D +*2H
22; 712G
02;D +H21
11
102+D
Chec!7
style
702; 021H
021 12D,
712** 122,
02*; ,2*
702H 02,1
020+ 02;
020; 702DG
0201 021
7021+ 702G+
020; 122
1H
;2D;
Pasper7
Reports
7*20* 12+1
02;; 1G12+
722GD 122*
02+; 2D2;
712GG 02H2
022 G2H
712, 02,+
02*+ 1021
722H, 12G
02H+ +H2*
21
;21;
/0dit 7*21* 12+H
02;H **0
722,2 12GD
02H+ 1+G2D
12H; 702;,
022; 1,2H
02+G 7022H
020* 12G
22+1 7122+
02++ D*2*
+*
H21D
logG/ 712G2 02H*
021; 102+
722GG 122G
02+, D;2H
22HH 712*,
02H1 11D22
121G 702+
021+ ;2G
22D+ 712*
02D1 H*2,
+0
H21,
ucene 722G 122
02+D 212;
7*20G 12+2
02, 1+H2D
12*D 702D;
021; *2;
702,* 02GH
020; 12D
* 712+
02;; 12H2H
1,
;2G
>ro 722D1 12*1
02H2 2;21
020; 7020G
0 0
712*G 02DH
021, 22D
22,2 712GD
02, ,,2H
22;2 712G1
02;G +H22
1*
,2D+
PM9 7102*1 22+,
02;, 1*G2*
7*2D; 02G
021; *2H
712,; 0221
022H D2G
102G* 7121+
02; D;2G
712;1 702+H
02;G ;H2;
1,
;2G
Struts 12H; 712**
02*G *22H
12H, 7120G
0222 1;2*
22,H 712H2
02+ D*2*
71202 02GH
0212 ;2H
221, 702,H
02*+ *+21
DD
H20+
#omcatD 702H1 02*+
020+ 02;H
221+ 7120H
02G+ 1G
722,D 12G;
02;+ 10022
12,G 702,H
02*H ,2,
722;H 12GG
02;1 H02H
1,
;2G
Nerces2 7*20G 12D1
02H2 ,D21
7*2*2 12H*
02,2 G1121
7*21, 12D;
02;G 20G2G
02G1 7022G
0201 02+
7022G 020D
0201 022
G0
H2*+
4P P06 9 I
N
#a$le - shows alternati)e conclusions #
n"i
for pro/ects
i Q 1 2 2 2 11 according to the 8uality model I2 'e used
a System )alues +X" 10X" 20X" *0X" G0X" and +0X
thresh7 olds in comparison to the 1+X threshold ?$old@
discussed in Section III2
#4E0 -1 Conclusions #
n"i
for the pro/ects i Q 1 2 2 2 11
according to the 8uality model I with different threshold
)alues2 Quality is impro)ing ?R@" constant ?@" or deteriorating
?7@2
trend lines follow a common trend2 #his o$ser)ations could
also $e made in the other pro/ects $eing part of this study2
Struts (5% - 50%, stepsize 5%)
;.77
:.77
6.77
8.77
7.77
-8.77
-6.77
-:.77
-;.77
<ig2 +1 9ifferent trends for pro/ect Struts using 8uality model
I with different threshold )alues2
4C F 6>' 09 % M 0 6#
<igure + shows for pro/ect Struts different trends for
8uality prediction model I using System )alues +X" 10X"
20X" 2+X" *0X" *+X" G0X" G+X and +0X thresholds ?gray@
in comparison to the 1+X threshold ?$lac!@" as discussed in
Section III2 It is )isi$le that despite the different thresholds the
#he authors would li!e to than! the Fnowledge foundation
for financing our research with the pro/ect" M-alidation of
metric7$ased 8uality controlI" 200+J021;" and 4pplied Re7
search in System 4nalysis 4E ?4RiS4 4E" http1JJww w 2arisa2
se@ for pro)iding us with the -i&&4naly&er tool2
+X 10X 1+X 20X *0X G0X +0X
4)alon R R R R R R R
Chec!style R R R R 7
PasperReports R R R R R R R
/0dit R R R R R R R
logG/ R R R R R R R
ucene R R R R R R R
>ro R R R R R R
PM9 R R R R R R R
Struts 7 7 7 7 7 7 7
#omcatD R R R R R R
Nerces2 R R R R R R R
Impro)ing 10 10 10 , , , ;
Constant 0 0 0 1 0 1 2
9eteriorating 1 1 1 1 2 1 1
R0< 0R 0 6C 0
S
A1B 42 2 Ea!er" P2 M2 Eieman" 62 <enton" 92 42 %ustafson"
42 Melton" and R2 'hitty" M4 philosophy for software
measurement"I Pournal of Systems and Software" )ol2 12" no2 *"
pp2 2HH C 2;1"
1,,02 A>nlineB2 4)aila$le1 http1JJww w 2sciencedirect2comJscienceJarticleJ
ED-067 G,,,1SH7 D'J2J2+HfD*2feGf;;H;$D,02DfG$a1$$fce;
A2B IS>" MIS>JI0C ,12D71 MSoftware engineering 7 Product Quality 7 Part
11 Quality modelI"I 20012
A*B S2 R2 Chidam$er and C2 <2 Femerer" M4 Metrics Suite for >$/ect7
>riented 9esign"I I000 #ransactions on Software 0ngineering" )ol2
20" no2 D" pp2 GHDCG,*" 1,,G2
AGB E2 =enderson7Sellers" >$/ect7oriented metrics1 measures of comple.ity2
(pper Saddle Ri)er" 6P" (S41 Prentice7=all" Inc2" 1,,D2
A+B '2 i and S2 =enry" MMaintenance metrics for the o$/ect oriented
paradigm"I in I000 Proceedings of the <irst International Software
Metrics Symposium" May 1,,*" pp2 +2CD02
ADB R2 Eandi" -2 -aishna)i" and 92 #ur!" MPredicting maintenance per7
formance using o$/ect7oriented design comple.ity metrics"I Software
0ngineering" I000 #ransactions on" )ol2 2," no2 1" pp2 HHC;H" Pan2 200*2
AHB -2 R2 Easili" 2 C2 Eriand" and '2 2 Melo" M4 -alidation of >$/ect7
>riented 9esign Metrics as Quality Indicators"I I000 #rans2 Softw2 0ng2"
)ol2 22" no2 10" pp2 H+1CHD1" 1,,D2
A;B #2 %yimoa thy" R2 <erenc" and I2 Si!et" M0mpirical )alidation of
o$/ect7 oriented metrics on open source software for fault
prediction"I I000 #rans2 on Software 0ngineering" )ol2 *1" no2 10" pp2
;,HC,10" 200+2
A,B R2 Su$ramanyam and M2 S2 Frishnan" M0mpirical 4nalysis of CF Met7
rics for >$/ect7>riented 9esign Comple.ity1 Implications for Software
9efects"I I000 #rans2 Softw2 0ng2" )ol2 2," no2 G" pp2 2,HC*10" 200*2
A10B P2 Wu" #2 Systa" and =2 Muller" MPredicting fault7proneness using oo met7
rics2 an industrial case study"I Software Maintenance and
Reengineering"
20022 Proceedings2 Si.th 0uropean Conference on" pp2 ,,C10H" 20022
A11B I2 >2 for Standardi&ation and the International 0lectrotechnical Com7
mision" MIS>JI0C 1G+,;71" Information #echnologyCSoftware Product
0)aluationO Part 11 >)er)iew"I 1,,D2
A12B 2 Eriand" F2 0l" 02 S2 Morasca" C2 92 R2 I2 9e" C2 92 R2 I2 9e" and
P2 2 92 -inci" M#heoretical and empirical )alidation of software product
measures"I IS0R67,+70*" International Software 0ngineering Research
6etwor!" #ech2 Rep2" 1,,+2
A1*B 62 <enton" MSoftware metrics1 theory" tools and )alidation"I Software
0ngineering Pournal" )ol2 +" no2 1" pp2 D+CH;" Pan 1,,02
A1GB E2 Fitchenham" S2 2 Pfleeger" and 62 <enton" M#owards a
framewor! for software measurement )alidation"I I000 #ransactions
on Software 0ngineering" )ol2 21" no2 12" pp2 ,2,C,GG" 1,,+2
A1+B 42 E2 P2 M2 M2 R2 P 9aly" M 'ood" MStructured inter)iews on the o$/ect7
oriented paradigm"I CiteSeerN 7 Scientific iterature 9igital i$rary and
Search 0ngine Ahttp1JJciteseer.2ist2psu2eduJoai2B ?(nited States@" #ech2
Rep2" 1,,+2 A>nlineB2 4)aila$le1 http1JJcitesee r2ist2psu2eduJ1+H12G2html
A1DB 2 Prechelt" E2 (nger" M2 Philippsen" and '2 #ichy" M4 controlled
e.periment on inheritance depth as a cost factor for code maintenance"I
Pournal of Systems and Software" )ol2 D+" no2 2" pp2 11+ C 12D"
200*2 A>nlineB2 4)aila$le1 http1JJww w 2sciencedirect2comJscienceJarticleJ
ED-067 G;<CP0,7 CJ2J0dfd,D++G,aDeea*afGdD2,Dd+2,$aG2
A1HB 62 'ilde" P2 Matthews" and R2 =uitt" MMaintaining o$/ect7oriented
software"I Software" I000" )ol2 10" no2 1" pp2 H+C;0" Pan 1,,*2
A1;B F2 02 0mam" S2 Eenlar$i" 62 %oel" and S2 62 Rai" M#he Confounding
0ffect of Class Si&e on the -alidity of >$/ect7>riented Metrics"I I000
#rans2 Softw2 0ng2" )ol2 2H" no2 H" pp2 D*0CD+0" 20012
A1,B C2 Misra" Su$has" MModeling designJcoding factors that dri)e maintain7
a$ility of software systems"I Software Quality Control" )ol2 1*" no2
*" pp2 2,HC*20" 200+2
A20B I2 Samoladas" I2 Stamelos" 2 4ngelis" and 42 >i!onomou" M>pen source
software de)elopment should stri)e for e)en greater code maintaina$il7
ity"I Commun2 4CM" )ol2 GH" no2 10" pp2 ;*C;H" 200G2
A21B M2 #2 #hwin" Mie and #27S2 Quah" M4pplication of neural networ!s for
software 8uality prediction using o$/ect7oriented metrics"I P2 Syst2
Softw2" )ol2 HD" no2 2" pp2 1GHC1+D" 200+2
A22B 2 Eriand" C2 Eunse" and P2 9aly" M4 controlled e.periment for e)aluating
8uality guidelines on the maintaina$ility of o$/ect7oriented designs"I
Software 0ngineering" I000 #ransactions on" )ol2 2H" no2 D" pp2 +1*C
+*0" Pun 20012
A2*B C2 )an Foten and 42 %ray" M4n application of $ayesian
networ! for predicting o$/ect7oriented software maintaina$ility"I
Information and Software #echnology" )ol2 G;" no2 1" pp2 +, C DH"
200D2 A>nlineB2 4)aila$le1 http1JJww w 2sciencedirect2comJscienceJarticleJ
ED-0E7 G<N=P,27 *J2JDd12f$aec,e*,1D,0Hf;d2fff0G2H+02
A2GB W2 Vhou and =2 eung" MPredicting o$/ect7oriented software maintain7
a$ility using multi)ariate adapti)e regression splines"I P2 Syst2
Softw2" )ol2 ;0" no2 ;" pp2 1*G,C1*D1" 200H2
A2+B 92 'el!er" Furt" '2 >man" Paul" and %2 4t!inson" %erald" M9e)el7
opment and application of an automated source code maintaina$ility
inde."I Pournal of Software Maintenance" )ol2 ," no2 *" pp2 12HC1+,"
1,,H2
A2DB P2 42 McCall" P2 %2 Richards" and %2 <2 'alters" M<actors in Software
Quality"I 6#IS" 6#IS Springfield" -4" #ech2 Rep2 -olume I" 1,HH"
n#IS 49J470G, 01G2
A2HB R2 inc!e" P2 und$erg" and '2 o we" MComparing software
metrics tools"I in ISS#4 Y0;1 Proceedings of the 200; international
symposium on Software testing and analysis2 6ew Wor!" 6W" (S41
4CM" 200;" pp2 1*1C1G22
A2;B '2 i and S2 =enry" M>$/ect7>riented Metrics that Predict Maintain7
a$ility"I Pournal of Systems and Software" )ol2 2*" no2 2" pp2 111C
122"
1,,*2
A2,B P2 M2 Eieman and E2 Fang" MCohesion and Reuse in an >$/ect7>riented
System"I in SSR Y,+1 Proceedings of the 1,,+ Symposium on
Software reusa$ility2 6ew Wor!" 6W" (S41 4CM Press" 1,,+" pp2
2+,C2D22
A*0B M2 =it& and E2 Monta&eri" MMeasure Coupling and Cohesion in >$/ect7
>riented Systems"I in Proceedings of International Symposium on
4pplied Corporate Computing ?IS44CY,+@" >cto$er 1,,+" pp2 2G" 2+"
2HG" 2H,2
A*1B R2 inc!e" M-alidation of a standard7 and metric7$ased software 8uality
model C creating the prere8uisites for e.perimentation"I icentiate
#hesis" MSI" -a./o (ni)ersity" Sweden" 4pr 200H2
A*2B R2 inc!e and '2 o we" MCompendium of Software Quality
Standards and Metrics"I http1JJww w 2arisa2seJcompendiumJ" 200+2
A**B =2 Ear!mann" R2 inc!e" and '2 o we" MQuantitati)e 0)aluation
of Software Quality Metrics in >pen7Source Pro/ects"I in 4ccepted
for pu$lication at Qu0S# Y0,1 Proceedings of #he 200, I000
International 'or!shop on Quantitati)e 0)aluation of large7scale
Systems and #ech7 nologies" Eradford" (F2 I000" 200,2
A*GB =2 Ear" M2 Eauer" >2 Ciup!e" S2 9emeyer" S2 9ucasse" M2
an&a" R2 Marinescu" R2 6e$$e" >2 6ierstras&" M2 Pr&y$ils!i" #2
Richner" M2 Rieger" C2 Ri)a" 42 Sassen" E2 Schul&" P2 Steyaert" S2
#ichelaar" and P2 'eis$rod" M#he <4M>>S >$/ect7>riented
Reengineering =and7 $oo!"I
http1JJww w 2iam2uni$e2chJ

famoosJhand$oo!J" >ct2 1,,,2

You might also like