You are on page 1of 4

Multi-Perspective Change Impact Analysis Using Linked

Data of Software Engineering


Chengcheng Wan, Zece Zhu, Yuchen Zhang, Yuting Chen
Dept. of Computer Science and Technology
Shanghai Jiao Tong University
Shanghai 200240, China
{wancc1995, chenyt}@sjtu.edu.cn

ABSTRACT Meanwhile, changes spread across all perspectives and affect


Change impact analysis plays an important role in software different types of software artifacts [3]. Different kinds of changes
maintenance and evolution. However existing researches mostly also have different effects on the artifacts. Multi-perspective
focus on one single artifact. Software development is usually change impact analysis aims to assist in maintaining the
accompanied by various types of software artifacts, such as consistency of the heterogeneous artifacts by enabling developers
requirement documents, software architectures, test cases, source to understand and retrace the propagation of changes and their
code, etc., requiring a much more comprehensive change impact impacts across the entire software system [4]. But, it faces several
analysis. This paper presents a novel approach to multi- challenges caused by interactions among software artifacts and
perspective change impact analysis that is able to address uses of different perspectives [2][4]: formalization in various
heterogeneous software artifacts. The essential idea of the novel degrees, information inconsistency, and data incompleteness.
approach is (1) to adopt semantic web to construct automatically To tackle with these challenges, researchers propose some
ontology based software engineering linked data, which links solutions for multi-perspective change impact analysis [5-8]. In
requirements, classes, code, bug reports, commits, developers, test order to solve the challenges of data, they analyze different
cases and others, (2) to build a weighted change impact software artifacts with different techniques to extract information
matrix/graph using the dependency features extracted from linked and form into a unified data format. Then they can perform
data, and (3) to follow a change impact propagation algorithm to change impact analysis with rule-based method, relationship
analyze the overall change impacts. We have conducted tracing, and other methods. However, these approaches have
experiments on two open source projects (HtmlUnit and several disadvantages. First, the type of artifacts is limited. These
OpenRocket) to evaluate our approach. The experimental results approaches usually select two types of artifacts from requirement
show that our approach achieves better F-measure and stability models, design models, and source code, but exclude the others,
than existing multi-perspective change impact analysis approaches. such as code committing records and bug reports. Second, their
change impact analysis methods are specific, lacking a unified
CCS Concepts framework for a variety of artifacts. They only consider direct
Software and its engineering Software evolution; dependencies among requirements, design and source code,
Maintaining software. omitting the hidden and implicit dependencies. They also omit the
fact that different dependencies will have different change impacts.
Keywords
Software engineering linked data; multi-perspective change In this paper, we propose a general approach to multi-perspective
impact analysis; propagation assessment. change impact analysis using linked data in software engineering.
It analyzes internal or external associations of requirements,
1. INTRODUCTION classes, program files, bug reports, code committing history,
During software development and maintenance, software artifacts developer information and other artifacts, and constructs linked
are often changed for adapting the improvements of solutions or data based on the software engineering ontology. It then extracts
changes in requirements, environments, and resources. One dependency features from these linked data, and calculates the
change of a software artifact usually directly or indirectly impacts change impact degree using a random walk algorithm for
on the others, and thus the ripple effects (i.e., a sequence of achieving high accuracy.
follow-up changes) will exist.
2. RELATED WORK
Researchers have proposed many approaches to software change In the past years, many impact analysis approaches have been put
impact analysis and identification of potential impacts of partial forward, using dependency analysis, mining of software
changes on the whole system [1]. However, most previous impact repositories, information retrieval, probabilistic approaches or
analysis techniques focus on a single type of software artifacts, rule-based approaches. The majority of these researches are still
such as source code or requirement models [2]. solely focused on single artifacts [2][4], where source code as
65%, architecture model as 11%, and requirement model as 7%.
Permission to make digital or hard copies of all or part of this work for personal or The major limitation of current research is the lack of impact
classroom use is granted without fee provided that copies are not made or analysis support for the interaction of heterogeneous software
distributed for profit or commercial advantage and that copies bear this notice and artifacts [3]. Hence, these approaches fail in analyzing the
the full citation on the first page. Copyrights for components of this work owned by
others than ACM must be honored. Abstracting with credit is permitted. To copy consequences of changes on different types of software artifacts,
otherwise, or republish, to post on servers or to redistribute to lists, requires prior which constraints their usability in practice.
specific permission and/or a fee. Request permissions from Permissions@acm.org.
Internetware '16, September 18 2016, Beijing, China Multi-perspective change impact analysis complies with software
2016 ACM. ISBN 978-1-4503-4829-4/16/09 $15.00 frequent changes. However, most of the existing multi-perspective
DOI: http://dx.doi.org/10.1145/2993717.2993729

95
approaches are focused on two or three perspectives. Hammad et #vlsergey@gmail.com
#lharper5@kc.rr.com
al. propose an approach to analyzing the change propagation
between source code and UML classes [5]. Khan et al. analyze the Person
#ilias@lazaridis.com
change propagation between requirements and architectural
components [6]. Xiao et al. propose an approach to analyzing the
change impact between business process specifications and source #evol@cyram.com
code, to estimate the cost of a business process change in a service #fillg1@web.de
oriented business application [7]. Lehnert et al. present an #Bug_41508 #Bug_38718 #Bug_65102
#42bd9b01a3f5cafa
approach to change impact analysis among requirement models, 3c20186dd8b7608
#730ec55536b5ef14
architectural models and source code, by a set of predefined Bug
2bbace222c0f642
impact propagation rules [8]. Commit

In summary, there are not many studies on change impact analysis


#c660a0e6a2551567 #Bug_60012
on a variety of artifacts. The existing studies only face to limited fa3f92ffe31ef9b #Bug_19201
artifact types and only consider the direct dependencies among
requirement, design and source code. Their change impact Figure 1. A sample of linked data of software engineering.
analysis methods are specific and there lacks a unified framework
for a variety of artifacts. Additionally, most impact analysis
approaches do not distinguish different types of dependencies and
4. CHANGE IMPACT ANALYSIS
Any intended change may have an impact, which can be
the different effects caused by them. It will result in either too
propagated across heterogeneous software artifacts through the
many false-positives being detected or too large amount of actual
different kinds of relations. We propose a novel multi-perspective
impacts being missed.
change impact analysis approach to assessing the change impact
3. CONSTRUCTION OF SOFTWARE on the whole software system, using software engineering linked
data. First, we predict change impact propagation in one step, i.e.,
ENGINEERING LINKED DATA direct impact between two elements. And for this, we extract
In order to analyze change impact among the heterogeneous dependency features between these elements from the linked data,
software artifacts, we introduce semantic web into software to calculate the impact degree. Then, a random walk algorithm is
engineering to build fine-grained semantic links among multi- designed to direct change impact propagation in multiple steps,
artifacts, following three steps. i.e., across multiple heterogeneous elements.
Step 1: Build Software Engineering Ontology.
4.1 Change Impact Propagation in One Step
We propose a rule-based mapping method to build the
corresponding ontology from metadata of software repository, to Table 1. Dependency features for change impact analysis.
allow for a multi-perspective analysis across different views on
software. It extracts metadata from relational database, including Element- Relations Features
table names, column names, primary keys, foreign keys, and Element
Call, Inheritance, Brotherhood
integrity constraints, and then analyzes primary keys, foreign keys, (inherit from the same super class),
and other information. Relational mapping rules are adopted to Direct relations
CodeSimilarity (with similar source
create new concepts, concept levels, concept properties, and code), InterfaceImplementation
concept relationships. Then multiple ontologies from different Indirect relations Co-author (developed by the same
data sources are merged into a whole, using existing ontology Class- through developers person)
alignment method [9]. Class Indirect relations Co-req (implement the same
through requirements requirement)
Step 2: Extract Linked Data. Indirect relations
Co-add, Co-change, Co-delete
through commit log
Guided by software engineering ontology, linked data is extracted Indirect relations SimilarError (in the same bug report
from structural data stored in software repositories. Then, as through bug report or with similar bug)
Direct relations Include, Extend, Generalization
concept instances, they are mapped to the concepts in software
Indirect relations
engineering ontology. Linked data of software engineering through developers
Co-author
consists of software elements and their relations. Fig. 1 illustrates Req-Req Indirect relations Co-design (implemented by the same
an example of linked data of software engineering. through class classes)
Indirect relations
Step 3: Recover Missing Linked Data. SimilarError
through bug report
Design (the class is designed for the
Unfortunately many organizations have ineffective traceability Direct relations
requirement)
practices in place, largely because of poor communication and Indirect relations
Class-Req Co-author
time-market pressure. Therefore some important links are missed through developers
or lost in software repository, such as requirement-code links. Indirect relations
SimilarError
Using natural language processing and information retrieval through bug report
Implementation (the program file
technologies, we propose a generic method to recover these links Direct relations
implements the class)
between two software elements [10], combining three features: Indirect relations
synonyms, verb-object phrases, and structural information. As a Class-Code Co-author
through developers
result, recovered links will complement the software engineering Indirect relations
SimilarError
linked data. As illustrated in Fig. 1, there are two recovered links through bug report
between a bug instance and a commit instance, indicated by
dashed lines.

96
In order to assess direct change impact between two software 5. EXPERIMENTS
elements, we identify proper dependency features from the
constructed linked data model. Taking requirement, class and 5.1 Setting
code as an example, their dependency features are listed in Table We designed four experiments upon open source project data,
1. Based on these features, we use logistic regression model to including a class-class change impact experiment, a requirement-
calculate the change impact degree between two software requirement change impact experiment, a requirement-class
elements. Different feature contributes differently, and therefore change impact experiment, and a requirement-class-code change
their weights vary. Thus change impact matrix FS is defined as impact experiment.

FS = X , (1) 1) Dataset

where Xi denotes the value of dependency feature, 0 or 1, on In order to evaluate our approach, we used software development
behalf of the existence of the dependency; , whose range is (0,1], data from two open source projects HtmlUnit and OpenRocket, as
denotes the weight on the feature. Through experiments with real shown in Table 2.
data, we can select the optimal weight on each feature. Table 2. Dataset.
Note that FS also represents a weighted change impact graph, Data HtmlUnit OpenRocket
with software elements as nodes, and change impacts as edges.
Number of requirement element 223 19
4.2 Change Impact Propagation in Multiple Number of code comments 2911 3164
Steps Number of bug reports 1685 18
The change will propagate further among heterogeneous software
artifacts. Based on the random walk theory on graph [11], we 111M 17.5M
Size of source code
design a change impact propagation algorithm in multiple steps. 852 classes 828 classes
A random walk on a graph is a special case of a Markov chain. In Number of developers 80 13
a Markov chain, the future state at time n+1 is based solely on the From these data, a small set of software engineering ontology was
current state at time n and its respective outgoing edges. Similarly, built, including 17 concepts, e.g., Person, Bug, Commit, and
change impact degree of the k+1th step is based solely on impact Requirement. Then 41762 data instances and about 1 million
degree of the kth step and the structural context of each node in the attribute links and instance links were extracted, allowing a large
FS graph. set of software engineering linked data to be constructed for the
Let MS(A, B) denote the change impact of element A on element four experiments.
B. We write a recursive equation for MS(A, B) on the k+1th step. 2) Comparison Method
, 0 There are only a few studies on multi-perspective change impact
, = | | (2) analysis. We selected Lehnert, a state-of-the-art change impact
, > 0
| | analysis method, for comparison. Lehnert outperforms many other
where |N(A)| is the number of As neighbors, Ni(A) the ith existing methods in [8].
neighbor of A, and d a decay factor during propagation. d is
3) Evaluation Metrics
between 0 and 1 and here we set d = 0.8 [12].
To evaluate the experimental results, we randomly selected 20%
This iterative process is recursively computed by the above
change impacted element set. Then ten skilled developers with
equation until |MSk+1(A, B) MSk(A, B)| < , where is a
more than 5 years of developing experience were asked to judge
tolerance factor defined for convergence judgment. We consider
the correctness of results. We used precision, recall, and f-
MS(A, B) = MSk+1(A, B) when |MSk+1(A, B) MSk(A, B)|< . Here
measure to measure the performance of our approach.
we set = 0.001 [12].
The multi-step change impact propagation algorithm is given as 5.2 Results and Analysis
follows. The experimental results are listed in Table 3 ~ Table 5.

Algorithm 1. Multi-Step Change Impact Propagation Table 3. Results of class-class change impact analysis on
HtmlUnit.
Input: A one-step change impact graph FS(V, E).
Output: A change impact matrix MS. Size of Set Number of Sets
Method: 1~5 1237
obtain the transition matrix T; 6~10 643
initialize MS0 to be an FS matrix; 11~15 192
k 0; 15~20 135
do MSk+1 T MSk ; >20 114
diagonal elements of MSk+1 1; In class-class change impact experiment on HtmlUnit project,
k k+1; there are 2321 change impact sets, as shown in Table 3. The
number of those with more than 2 classes is 2197, occupying a
while (max(|MSk MSk1 |)< ); larger proportion (94.66%). In those sets with 1 or 2 classes,
return MSk through deeper analyzing of class code, we find that most classes

97
have sole functions and their call graphs are very small, indicating between elements in multi-artifacts are established, including
that they rarely depend on the other classes. Most of the other requirements, classes, program files, bug reports, code committing
classes are utility classes or entry point ones. F-measure of this history, developer information and others. Then, a weighted
experiment is thus 94.2%. The experiment on the OpenRocket change impact matrix/graph is calculated by the dependency
project achieves the similar results. features extracted from the software engineering linked data, and
a random walk algorithm is designed to propagate the change
Table 4. Results of requirement-requirement change impact impact. Experimental results show that our approach is better than
analysis on HtmlUnit. the existing multi-perspective change impact analysis approaches,
Size of Set Number of Sets and it can propagate the change impacts across heterogeneous
artifacts stably.
1~2 153
3~4 119 7. ACKNOWLEDGMENTS
5~6 27 This research is supported by 973 Program in China (Grant No.
2015CB352203) and National Natural Science Foundation of
>6 8 China (Grant No. 61472242 and 61572312).
In requirement-requirement change impact analysis on the
HtmlUnit project, there are 307 change impact sets, as shown in 8. REFERENCES
Table 4. Since requirement specifications are preprocessed using [1] Arnold, Robert S. 2010. Software change impact analysis.
the NLP technique before analysis, and some specifications are Los Alamitos, CA: IEEE Computer Society Press.
incomplete or too simple, the F-measure of change impact [2] Sun X, Li B, Li B, et al. 2012. A comparative study of static
analysis is a little low (72.1%), and the size of change impact sets CIA techniques. In Proceedings of the Fourth Asia-Pacific
is small. The similar results are also obtained when the analysis Symposium on Internetware. 23-30.
was conducted on the OpenRocket project. [3] Fradet P, Le Mtayer D, Prin M.1999. Consistency
In requirement-class change impact analysis on the HtmlUnit checking for multiple view software architectures, In
project, there are 297 change impact sets and its F-measure is Proceedings of ESEC/FSE. 410-428.
88.7%, a value between those obtained from class-class and [4] Lehnert S. 2011. A review of software change impact
requirement-requirement analyses. The similar results are analysis, Tech. Report, Ilmenau University of Technology.
obtained in the requirement-class-code change impact experiment,
because HtmlUnit is developed in Java and thus there exists an [5] Maen Hammad, Michael L. Collard, and Jonathan I. Maletic.
almost one-one mapping from classes and program files. The 2009. Automatically identifying changes that impact code-to-
similar results are also obtained when the analysis was conducted design traceability. In Proceedings of the IEEE 17th
on the OpenRocket project. International Conference on Program Comprehension (May
2009). 2029.
Table 5. Method comparison. [6] Safoora Shakil Khan and Simon Lock. 2009. Concern tracing
HtmlUnit OpenRocket and change impact analysis: An exploratory study. In
Method Proceedings of the ICSE Workshop on Aspect-Oriented
Precision Recall F-measure Precision Recall F-measure Requirements Engineering and Architecture Design (May
Lehnert 87.9% 87.4% 87.6% 81.2% 80.1% 80.6% 2009). 4448.
[7] Xiao H, Quo J, Zou Y. 2007. Supporting change impact
Our 89.2% 88.2% 88.7% 88.5% 87.9% 88.2% analysis for service oriented business applications, In
approach Proceedings of International Workshop on Systems
Lehnert is the best multi-perspective change impact analysis Development in SOA Environments, in conjunction with
method that is publicly available. We compare it with our ICSE, 6-6.
approach in requirement-class-code change impact experiment. [8] Lehnert S. 2015. Multiperspective change impact analysis to
The comparison is shown in Table 5. support software maintenance and reengineering, Doctoral
Comparatively, Lehnert adopts a change impact rule based Thesis. University of Hamburg.
approach and takes different dependences with the same impact [9] Jain P, Hitzler P, Sheth A, et al. 2010. Ontology alignment
degree, while our approach extracts features from software for linked open data. In Proceedings of the Semantic Web
engineering linked data, and sets each feature with different ISWC, Springer Berlin Heidelberg, 402-417.
weight through an experiment on history data. Our approach also [10] Yuchen Zhang, Chengcheng Wan, Bo Jin, 2016. An
considers both the direct dependencies between two software Empirical study on recovering requirement-to-code links, In
elements, and indirect relations among persons, commits, bugs Proceedings of 17th IEEE/ACIS International Conference on
and others. Thereafter, our approach improves the precision of Software Engineering, Artificial Intelligence, Networking
prediction. Furthermore, the random walk algorithm for and Parallel/Distributed Computing. 121-126.
propagating the impacts can reduce false negative, allowing our
approach to outperform the Lehnert method in accuracy and [11] Lovsz, L, 1993. Random walks on graphs: a survey. Lecture
stability. Notes in Mathematics, 8(4):285-303.
[12] Li P, Li Z, He J, et al., 2009. Assessing the influence
6. CONCLUSION probability between objects: A random walker approach, In
In this paper we propose a general approach to multi-perspective Proceedings of IEEE Symposium on Computational
change impact analysis using linked data in software engineering. Intelligence and Data Mining, 25-3
Guided by software engineering ontologies, semantic links

98

You might also like