Professional Documents
Culture Documents
assessment
of software and data
tudor gîrba
www.tudorgirba.com
www.humane-assessment.com
Assessment is a human activity
Assessment is an act of reasoning. It otherwise, we will only get more data, the end, it is the human that need to
is what we should do before taking a so we better get good at dealing with understand the situation and decide
decision. it. regarding the future path of actions.
In most cases, we do. It’s just that We do need tools to deal with the Assessment is a human activity, and
when we have to deal with large vastness of data, but rather than that we have to tackle it as such.
amounts of data assessment gets forcing humans to conform to them, Analysis tools should merely assist
suddenly difficult. It is difficult these tools should bend around analysts by crunching data only when
because we, as human beings, are just humans, and around the particularity and how it is requested.
not equipped to handle more than a of the problem at hand.
handful of details at a time. I call this approach the
For too long professionals were humane (or agile)
Data is here to stay. Whether under asked to conform to tools. Tools are assessment.
the form of software systems or important, but they are just tools. In
Assessment is all about feedback
Assessment is defined as the all the details, and we recourse to surface and interpret the situations
evaluation or estimation of the tools to process these details. As a ourselves, because it is us who have
nature, quality, or ability of someone consequence, we tend to perceive the to act about it.
or something. In other words, problem to be primarily a tool issue:
assessment is what we do when we all will be solved if we buy the Any indirect learning involves risks,
want to know the state of facts, and perfect reporting tool. because a summary presents by
it is what we should to do before definition a reduced set of details
taking a decision. This is a false perception. that are typically obtained through an
interpretation. The risks can come
To assess is to obtain We can indeed externalize parts of from the possibility of the set of
feedback from reality. the number crunching and learn details being incomplete or from the
indirectly from the summary of the possibility of the interpretation being
If we want to know, we have to get analysis. If we know what the incorrect.
to know. We have to gather the facts, problem at hand is, and how to solve
understand their structure and it, we can have tools that automate To properly grasp the output of a
relationships, and interpret them the tedious parts. tool, we need to know what details it
from the point of view of the looks at and what interpretations are
problem at hand. However, we cannot just treat a tool involved in the analysis process.
as a magic black box that analyzes the
When we have to deal with large data and tells us whether to go left or
amounts of data, we cannot handle right. We have to go beyond the
For example, let us suppose we get a have to make sure that we do not trust the result, but we cannot claim
tool that offers a simple metric such pretend we know when we do not. that we know.
as the number of methods in a class.
If the tool says that a certain class For example, suppose we need to The difference might look small, but
has 7 methods, we still have to know find the best resource for assessment. it is critical. Knowledge implies
what exactly was measured: Were One solution is to spawn a search control, trust implies acceptance of
private methods considered? What engine and search for assessment. We risks. We should strive to get as much
about constructors? How about can be happy with the first result we control as possible.
generated methods? In general, we get (probably the wikipedia article
have to know what parts of the that talks about educational The goal of assessment is to exhibit
system were analyzed and how they assessment). the reality of data so that the
were analyzed. Only afterwards we situation gets explicit, and so that we
can interpret the result. Even though we get a result, we can take informed decisions.
cannot claim that this is the best
It is not always feasible to control article because we have not much of
and know everything. Due to an idea about what were the
practical reasons, such as time decisions taken by the search
constraints, we sometimes do have to algorithm and we can certainly have
rely on summaries provided either by not much of an idea of the quality
tools or by other people. We just the input data. We can say that we
Software assessment needs rethinking
Software assessment is a critical work to read a quarter of a million predefined analyses that provide
activity that should be addressed lines of code ... and this is just for answers to standard questions.
explicitly in the development process. reading the code.
Receiving a standard answer is great
Why? Because software systems are So, does anybody read the entire when you have a standard question.
so large and complex that teams code base? Not really. The reading is However, it turns out that most of
spend up to 50% of their overall typically limited to a part of the the time our questions are not quite
effort to understand the system. system, while the global view is left standard. Software systems tend to
for the drawing board from the present many contextual details due
Given that 50% is considerably large, architect’s office. Thus, most to various factors: different
you would expect a significant decisions tend to be local, while technologies, different third party
amount of tools to be geared those that are global are mostly based libraries, different architecture
towards this side. Yet, practice is on inaccurate information. decisions and so on. In these
dominated by one technique: code situations, regardless how smart the
reading. However, software systems To rectify the situation we need tools analysis is, it is of limited use if it
are so large that relying on reading that help us crunch the vastness of does not allow us to contextualize it.
them just does not scale when we data. However, not any tool does the
want to understand them as a whole. job. Many tools exist to deal with We need customized tools
To put it in perspective, a person that various aspects of data and software that provide continuous
reads one line in two-seconds would analysis, but most of them take the and contextual feedback.
require approximately one month of oracle way: they offer some
One reason regression testing is so functionality. Ideally, we would like to Instead of relying primarily on
useful is that it provides feedback create assessment tools just like we manual reviews, quality assurance
that is continuous and highly now create tests. The only trouble is teams would build automatic
contextual, thus it has great potential that the cost of building such tools is reporting tools with dedicated
to leading to corrective actions. too high. At least it is perceived to analyses. Instead of specifying
be, especially if you have to write architecture rules on paper, architects
Integrating regression testing in the them from scratch. A middle ground would encode them in automatic
development process was not always can be found by having a platform checkers.
so welcome. It took more than a upon which these tools can be built.
decade of agility to make tests gain Drawing from the testing parallel, we It would be a new way of
their rightful place. And it’s not even would need the correspondent of an approaching software
a close chapter as we can still find xUnit-like platform. development.
skeptics pointing their finger to the
costs of writing tests, and how this With such a platform, the process of A way in which the possibility of
takes away from overall effort that assessment would be turned around. making unintentional mistakes is
could otherwise be spent on active Instead of conforming to generic embraced and actively guarded
programming. tools, that are often prohibitively against by putting in place tools that
expensive, development teams would provide continuous and contextual
While testing is important, it start crafting their own tools, tools checking.
concerns but one aspect of a that make sense for their project, for
software system, namely the expected their process, for their practices.
Moose makes assessment practical
Moose is an extensive platform for stored in a model. For example, level design problems, identification
software and data analysis that makes Moose can handle out of the box of exceptional entities and so on.
software assessment practical. various languages via multiple meta-
models: Java, JEE, Smalltalk, C++, On the other hand, Moose is more
It is a free and open-source project C, XML. than a tool. Moose is a platform that
started in 1996. Since then it spread offers an infrastructure through
to several research groups and it is On top of the created model, the which new analyses can be quickly
increasingly being applied in analyst performs analyses, combines built and can be embodied in new
industrial contexts. In total, the effort and compares the analyses to find interactive tools.
spent on Moose raises to more than answers to specific questions. Moose
150 person-years of research and enables this in two distinct ways. Using Moose we can easily
development. define new analyses, create
On the one hand, Moose comes with new visualizations, or build
The design of Moose is rooted in the multiple predefined services such as: complete browsers and
core idea of placing the analyst at the parsing, modeling, metrics reporting tools altogether.
center and of empowering him to computation, visualization, data
build and to control the analysis mining, duplication detection, or More information about Moose can
every step of the way. querying. These basic services are be found at:
aggregated into more complex
Data from various sources and in analyses like computation of http://moosetechnology.org
various formats is imported and dependency cycles, detection of high http://themoosebook.org
Generic
Meta-models analysis
services
{ Aggregated
Importers Model analysis
Repository
} services
Dedicated
Tool building
interactive
infrastructure
tools
System Complexity
Visualizing the internal structure of classes
Class Blueprint
Visualizing dependencies between modules
that shows
the outside
the inside and
the philosophy of
the Moose platform
by Tudor Gîrba
Success story: Continuous software assessment
Our client was responsible for the interactive workshops in which we into account and to correct the
development and maintenance of identified problems and showed how situation continuously.
several software projects. The teams these can be answered fast. We then
were following an agile development encoded these concerns into rules First, the concerns are made explicit
process, but as the size of the that addressed several areas including and captured in form of rules that
projects increased, the need for a quality assurance and documentation. were integrated into the continuous
continuous assessment of the reporting. The detected problems
systems became apparent to ensure We used Moose to craft the reporting were then discussed on a daily basis.
the quality and compliance of the tools. These tools were integrated If the rule was considered valid, it
work. We were mandated to improve into the continuous integration was either resolved immediately, or
this state. process and provided continuous and planned for a later iteration. If it was
contextual feedback about how the determined that the concern is not
We approached these issues in two concerns were reflected in the valid or has exceptions, the rule was
ways: (1) we introduced custom system. An example of such a report adjusted.
reporting tools, and (2), we coached can be seen on the next page.
the teams to make their decisions The quality of the projects improved
explicit, and to check them against The most important component of in a short amount of time with only
the reality of the system. our work was to affect the little overall overhead, and the
development process to get the team confidence of the team and
To get stakeholders to define their take the actual state of the system management increased.
concerns explicitly, we held several
Example of a Moose report integrated into the Hudson continuous integration server.
Success story: Architecture conformance
Our client’s IT-Architecture architectural layers and interface improve the structure of the system
department published a set of boundaries. We validated the coarse to conform to the desired guidelines.
architectural guidelines and coding grained architectural rules. We used
conventions for the internal and visualization techniques and applied We also provided a summary of the
external software systems. They several custom detection strategies to overall code quality. For this purpose
needed a way to verify compliance of highlight points of interest and we used several techniques such as
one of their software system. irregularities in the code. metrics, queries and visualizations.
Among others we pinpointed how
Our role was to review the On closer inspection through queries the logic is distributed over the
application and verify its and selective code reading, we system and how the code duplication
conformance to the guidelines. identified a number of guidelines should be refactored.
violations. For example, one of the
The first step was to obtain an detected shortcomings of the The architectural violations were
overview of the system. We based application was the poor exception scheduled for refactoring before the
our analysis on reading both handling that violated the application was to go into
technical and business architectural constraints. production.
documentation and on analyzing the
actual source code. Once the non-conforming parts were
identified, we proposed concrete
Using the architecture descriptions, recommendations for how to
we focussed on checking the
A visualization highlighting violations over the structure of the system
Success story: Custom analysis tool for custom configurations
The client had a large software step was the analysis of the various The last step was to create several
system composed of several hundred configuration files with the aim of analyses and visualizations on top of
subsystems, each being written in capturing the class diagram. The this model. These views were then
Java. The overall system was based diagram on the next page shows the integrated into an interactive tool.
on a custom engine that was put anonymized result of the When used by developers, the tool
together using a multitude of custom configuration analysis. Once the revealed several unexpected
configuration files specified in XML meta-model specified, we encoded it dependencies and incomplete
and in another custom language. in the tool. specifications in the system.
Given the particularities of the
project, a dedicated tool was needed The next step was to create the For putting together this solution, we
to analyze it. importer to take all configuration used the tool building capabilities of
files as input and then to create a Moose.
We received the task of crafting a coherent model.
dedicated tool that would be able to All in all, the development effort for
provide an overview of these An important challenge was posed by the prototype totaled a mere 10
configurations and expose their the unification of data: because the person-days. Placed in the overall
various relationships. system was developed over a decade, context of the multiple hundred
there were multiple ways of person-years of development effort
The prerequisite for any data analysis expressing the same information. spent in the project, it was an
is the specification of the meta- Thus, our importing solution needed insignificant investment with high
model for the data. Thus, the first to deal with all these differences. return on investment.
*
O *
P Q K
A M
*
L * I
*
B J *
C H
*
*
D E F
*
A schematic view of the complex meta-model A part of the custom tool for dependency overview
Assessment is a discipline
The amount of data we have to take Specific questions and custom data that has to interpret the results and
into account for decision making will require specific answers provided by to take decisions.
only grow, so we better get good at custom assessment tools.
dealing with it. Assessment is all about feedback, and
Building an assessment tool is similar the best feedback is continuous and
Successful data and software to developing any software project. contextual.
assessment requires both a driving Thus appropriate attention and
set of questions and access to data. investment should be allocated to it. To make effective use of this
feedback, assessment must be
Metrics make for great analysis tools, Using a proper infrastructure can integrated into the process of
but they only show numbers. Data is significantly decrease the overall cost software development.
more than that. Data, and software of building a tool. Moose provides
systems in particular, presents various such an infrastructure. Assessment is a discipline.
parts and complex dependencies.
These must be taken into account to A good tool does not guarantee
obtain a global picture. magic solutions, but it can provide
accurate insights for better decisions
Figuring out what data is important
and how to extract the relevant Tools are important, but we have to
information requires dedicated remember that ultimately, assessment
training. is a human activity. It is the human
Craft
analysis
Interpret
Confident?
results
Act
Tudor Gîrba attained his PhD in 2005,
and he now works as a consultant. His
main expertise lies in the area of
software engineering with focus on
software and data assessment.