You are on page 1of 30

humane

assessment
of software and data

tudor gîrba
www.tudorgirba.com
www.humane-assessment.com
Assessment is a human activity

Assessment is an act of reasoning. It otherwise, we will only get more data, the end, it is the human that need to
is what we should do before taking a so we better get good at dealing with understand the situation and decide
decision. it. regarding the future path of actions.

In most cases, we do. It’s just that We do need tools to deal with the Assessment is a human activity, and
when we have to deal with large vastness of data, but rather than that we have to tackle it as such.
amounts of data assessment gets forcing humans to conform to them, Analysis tools should merely assist
suddenly difficult. It is difficult these tools should bend around analysts by crunching data only when
because we, as human beings, are just humans, and around the particularity and how it is requested.
not equipped to handle more than a of the problem at hand.
handful of details at a time. I call this approach the
For too long professionals were humane (or agile)
Data is here to stay. Whether under asked to conform to tools. Tools are assessment.
the form of software systems or important, but they are just tools. In
Assessment is all about feedback

Assessment is defined as the all the details, and we recourse to surface and interpret the situations
evaluation or estimation of the tools to process these details. As a ourselves, because it is us who have
nature, quality, or ability of someone consequence, we tend to perceive the to act about it.
or something. In other words, problem to be primarily a tool issue:
assessment is what we do when we all will be solved if we buy the Any indirect learning involves risks,
want to know the state of facts, and perfect reporting tool. because a summary presents by
it is what we should to do before definition a reduced set of details
taking a decision. This is a false perception. that are typically obtained through an
interpretation. The risks can come
To assess is to obtain We can indeed externalize parts of from the possibility of the set of
feedback from reality. the number crunching and learn details being incomplete or from the
indirectly from the summary of the possibility of the interpretation being
If we want to know, we have to get analysis. If we know what the incorrect.
to know. We have to gather the facts, problem at hand is, and how to solve
understand their structure and it, we can have tools that automate To properly grasp the output of a
relationships, and interpret them the tedious parts. tool, we need to know what details it
from the point of view of the looks at and what interpretations are
problem at hand. However, we cannot just treat a tool involved in the analysis process.
as a magic black box that analyzes the
When we have to deal with large data and tells us whether to go left or
amounts of data, we cannot handle right. We have to go beyond the
For example, let us suppose we get a have to make sure that we do not trust the result, but we cannot claim
tool that offers a simple metric such pretend we know when we do not. that we know.
as the number of methods in a class.
If the tool says that a certain class For example, suppose we need to The difference might look small, but
has 7 methods, we still have to know find the best resource for assessment. it is critical. Knowledge implies
what exactly was measured: Were One solution is to spawn a search control, trust implies acceptance of
private methods considered? What engine and search for assessment. We risks. We should strive to get as much
about constructors? How about can be happy with the first result we control as possible.
generated methods? In general, we get (probably the wikipedia article
have to know what parts of the that talks about educational The goal of assessment is to exhibit
system were analyzed and how they assessment). the reality of data so that the
were analyzed. Only afterwards we situation gets explicit, and so that we
can interpret the result. Even though we get a result, we can take informed decisions.
cannot claim that this is the best
It is not always feasible to control article because we have not much of
and know everything. Due to an idea about what were the
practical reasons, such as time decisions taken by the search
constraints, we sometimes do have to algorithm and we can certainly have
rely on summaries provided either by not much of an idea of the quality
tools or by other people. We just the input data. We can say that we
Software assessment needs rethinking

Software assessment is a critical work to read a quarter of a million predefined analyses that provide
activity that should be addressed lines of code ... and this is just for answers to standard questions.
explicitly in the development process. reading the code.
Receiving a standard answer is great
Why? Because software systems are So, does anybody read the entire when you have a standard question.
so large and complex that teams code base? Not really. The reading is However, it turns out that most of
spend up to 50% of their overall typically limited to a part of the the time our questions are not quite
effort to understand the system. system, while the global view is left standard. Software systems tend to
for the drawing board from the present many contextual details due
Given that 50% is considerably large, architect’s office. Thus, most to various factors: different
you would expect a significant decisions tend to be local, while technologies, different third party
amount of tools to be geared those that are global are mostly based libraries, different architecture
towards this side. Yet, practice is on inaccurate information. decisions and so on. In these
dominated by one technique: code situations, regardless how smart the
reading. However, software systems To rectify the situation we need tools analysis is, it is of limited use if it
are so large that relying on reading that help us crunch the vastness of does not allow us to contextualize it.
them just does not scale when we data. However, not any tool does the
want to understand them as a whole. job. Many tools exist to deal with We need customized tools
To put it in perspective, a person that various aspects of data and software that provide continuous
reads one line in two-seconds would analysis, but most of them take the and contextual feedback.
require approximately one month of oracle way: they offer some
One reason regression testing is so functionality. Ideally, we would like to Instead of relying primarily on
useful is that it provides feedback create assessment tools just like we manual reviews, quality assurance
that is continuous and highly now create tests. The only trouble is teams would build automatic
contextual, thus it has great potential that the cost of building such tools is reporting tools with dedicated
to leading to corrective actions. too high. At least it is perceived to analyses. Instead of specifying
be, especially if you have to write architecture rules on paper, architects
Integrating regression testing in the them from scratch. A middle ground would encode them in automatic
development process was not always can be found by having a platform checkers.
so welcome. It took more than a upon which these tools can be built.
decade of agility to make tests gain Drawing from the testing parallel, we It would be a new way of
their rightful place. And it’s not even would need the correspondent of an approaching software
a close chapter as we can still find xUnit-like platform. development.
skeptics pointing their finger to the
costs of writing tests, and how this With such a platform, the process of A way in which the possibility of
takes away from overall effort that assessment would be turned around. making unintentional mistakes is
could otherwise be spent on active Instead of conforming to generic embraced and actively guarded
programming. tools, that are often prohibitively against by putting in place tools that
expensive, development teams would provide continuous and contextual
While testing is important, it start crafting their own tools, tools checking.
concerns but one aspect of a that make sense for their project, for
software system, namely the expected their process, for their practices.
Moose makes assessment practical

Moose is an extensive platform for stored in a model. For example, level design problems, identification
software and data analysis that makes Moose can handle out of the box of exceptional entities and so on.
software assessment practical. various languages via multiple meta-
models: Java, JEE, Smalltalk, C++, On the other hand, Moose is more
It is a free and open-source project C, XML. than a tool. Moose is a platform that
started in 1996. Since then it spread offers an infrastructure through
to several research groups and it is On top of the created model, the which new analyses can be quickly
increasingly being applied in analyst performs analyses, combines built and can be embodied in new
industrial contexts. In total, the effort and compares the analyses to find interactive tools.
spent on Moose raises to more than answers to specific questions. Moose
150 person-years of research and enables this in two distinct ways. Using Moose we can easily
development. define new analyses, create
On the one hand, Moose comes with new visualizations, or build
The design of Moose is rooted in the multiple predefined services such as: complete browsers and
core idea of placing the analyst at the parsing, modeling, metrics reporting tools altogether.
center and of empowering him to computation, visualization, data
build and to control the analysis mining, duplication detection, or More information about Moose can
every step of the way. querying. These basic services are be found at:
aggregated into more complex
Data from various sources and in analyses like computation of http://moosetechnology.org
various formats is imported and dependency cycles, detection of high http://themoosebook.org
Generic
Meta-models analysis
services

{ Aggregated
Importers Model analysis
Repository
} services

Dedicated
Tool building
interactive
infrastructure
tools

The Moose workflow


http://moosetechnology.org
http://themoosebook.org
Visualizing class hierarchies

The System Complexity is a


polymetric view that shows classes as
nodes and inheritance as edges.
Furthermore, each node is enhanced
visually with three metrics: the height
is given by the number of methods,
the width is given by the number of
attributes, and the color is given by
the number of lines of code.

System Complexity
Visualizing the internal structure of classes

While the System Complexity


visualization shows the overall class
hierarchies, the Class Blueprint shows
the internals of classes.

The figure to the right shows an


example of a class with multiple
methods. The figure to the bottom
shows a larger hierarchy.

Class Blueprint
Visualizing dependencies between modules

The Dependency Structure Matrix is


Dependency Structure Matrix
a visualization that shows the
dependencies between modules at a
course grained level.

The list of modules is shown on


both rows and columns and each dot
represents a dependency. The
algorithm tries to place all dots below
the diagonal. If a dot appears above
the diagonal, it signifies that a cycle
was detected in the system. The
visualization highlights these cyclic
dependencies with a shaded
background.

In the example to the right, we have a


system with more than 100 packages.
In total there are four cycles, of
which one formed by multiple
modules.
Other software visualizations

The previous pages show a couple of


Annotation constellation Side-by-side duplications
classic visualizations. While they
provide valuable views over a
software system, they are only some
of the visualizations offered. Two of
these are shown to the right.

The first reveals the complexity


introduced by source code
annotations. In the example, we see
the types of annotations used in a
system and how they are related.

The second shows how source code


duplications appear between parts of
the system. In the example, we see
the same packages on each column,
and a connection between them
denotes a duplication: internal if the
line is horizontal, or external if the
line is oblique.
Querying models

Visualizations provide maps of what


otherwise is intangible information.
Using these maps we can get a sense
of the structure of the large amounts
of data. Another mechanism for
understanding data is querying.
Queries are instructions given to the
computer to retrieve a part of the
data that conforms to the given
predicate. Thus, they help the analyst
to focus on smaller set of data.

Moose offers a rich scripting API.


Furthermore, it also offers the
possibility to relate multiple views,
such as query results and
visualizations. For example, the
picture to the right highlights a set of
classes detected as being flawed on
an overall System Complexity
visualization.
Browsing models

Moose offers multiple tools and


mechanisms for exploration. The
basic exploration tool offers a generic
user interface that can navigate
through any model, be it of source
code or otherwise.

The tool employs a Finder-like


design: selecting any entity spawns
the details of this entity to the right.
The details can be presented in
various forms (such as simple lists,
tables, or visualizations), each of
these forms offering a high degree of
interaction.

For example, the screenshot at the


top shows a list of classes to the left
and the result of a query is spawned
to the right. The screenshot to the
bottom shows various visualizations.
Browsing source code

When we deal with source code, we


want specialized views. Moose offers
multiple browsers that focus on
various aspects. For example, the
browser to the top highlights
annotations used in code.
Painting custom visualizations

Often, we need dedicated


visualizations. Moose allows the
analyst to build such specific
visualizations through a domain
specific scripting interface.

The screenshot to the right


shows the Mondrian Easel, a
dedicated editor for such
visualizations. The presented
example, shows how concise it
is to draw a System Complexity
view of a set of classes.

Using this infrastructure, we


can build visualizations
dedicated to the data and the
problem of interest. Most
visualizations in Moose are
drawn using this engine.
Crafting custom tools

Large models hold many


details. To tame these
models, we need interactive
browsers that help us
understand their inner
structure and relationships.

Moose offers Glamour, an


engine for building custom
interactive tools. For
example, the script to the
right builds a complete code
browser that displays
packages, classes, methods
and source code.

Using this engine, Moose


enables the analyst to craft
quickly custom tools that
target custom data sets, or
custom navigation use cases.
Parsing new languages

To analyze software systems, we first


need to be able to parse them. Moose
comes with PetitParser, a generic
framework for defining dedicated
parsers.

For example, the pictures to the right


shows the PetitParser development
tool open on a part of the grammar
of the MSE file format, the default
format for models import/export.
The development tool offers multiple
perspectives including a graphical
representation, a grammar cycle
detection, and a debugger.

This infrastructure enables the


analyst to analyze various data
sources spanning from domain
specific languages to custom
configuration files.
Building live reports

While a significant part of Moose is


focused on providing interactive
capabilities to build and perform
analysis, it is often desirable to carve
in stone a set of concerns that we
want to watch for, and afterwards
have a tool to check them and to
produce a report. Moose offers a
report building engine that enables
the analyst to define custom
concerns.

In the example to the right, we have a


report on a source code model
checking several concerns. The
concerns are marked in red because
violations were found in the system.
This, these concerns can act like tests
against the model. Furthermore,
concerns can also be visualized
various ways.
the
book
the book
www.themoosebook.org

that shows
the outside
the inside and
the philosophy of
the Moose platform

by Tudor Gîrba
Success story: Continuous software assessment

Our client was responsible for the interactive workshops in which we into account and to correct the
development and maintenance of identified problems and showed how situation continuously.
several software projects. The teams these can be answered fast. We then
were following an agile development encoded these concerns into rules First, the concerns are made explicit
process, but as the size of the that addressed several areas including and captured in form of rules that
projects increased, the need for a quality assurance and documentation. were integrated into the continuous
continuous assessment of the reporting. The detected problems
systems became apparent to ensure We used Moose to craft the reporting were then discussed on a daily basis.
the quality and compliance of the tools. These tools were integrated If the rule was considered valid, it
work. We were mandated to improve into the continuous integration was either resolved immediately, or
this state. process and provided continuous and planned for a later iteration. If it was
contextual feedback about how the determined that the concern is not
We approached these issues in two concerns were reflected in the valid or has exceptions, the rule was
ways: (1) we introduced custom system. An example of such a report adjusted.
reporting tools, and (2), we coached can be seen on the next page.
the teams to make their decisions The quality of the projects improved
explicit, and to check them against The most important component of in a short amount of time with only
the reality of the system. our work was to affect the little overall overhead, and the
development process to get the team confidence of the team and
To get stakeholders to define their take the actual state of the system management increased.
concerns explicitly, we held several
Example of a Moose report integrated into the Hudson continuous integration server.
Success story: Architecture conformance

Our client’s IT-Architecture architectural layers and interface improve the structure of the system
department published a set of boundaries. We validated the coarse to conform to the desired guidelines.
architectural guidelines and coding grained architectural rules. We used
conventions for the internal and visualization techniques and applied We also provided a summary of the
external software systems. They several custom detection strategies to overall code quality. For this purpose
needed a way to verify compliance of highlight points of interest and we used several techniques such as
one of their software system. irregularities in the code. metrics, queries and visualizations.
Among others we pinpointed how
Our role was to review the On closer inspection through queries the logic is distributed over the
application and verify its and selective code reading, we system and how the code duplication
conformance to the guidelines. identified a number of guidelines should be refactored.
violations. For example, one of the
The first step was to obtain an detected shortcomings of the The architectural violations were
overview of the system. We based application was the poor exception scheduled for refactoring before the
our analysis on reading both handling that violated the application was to go into
technical and business architectural constraints. production.
documentation and on analyzing the
actual source code. Once the non-conforming parts were
identified, we proposed concrete
Using the architecture descriptions, recommendations for how to
we focussed on checking the
A visualization highlighting violations over the structure of the system
Success story: Custom analysis tool for custom configurations

The client had a large software step was the analysis of the various The last step was to create several
system composed of several hundred configuration files with the aim of analyses and visualizations on top of
subsystems, each being written in capturing the class diagram. The this model. These views were then
Java. The overall system was based diagram on the next page shows the integrated into an interactive tool.
on a custom engine that was put anonymized result of the When used by developers, the tool
together using a multitude of custom configuration analysis. Once the revealed several unexpected
configuration files specified in XML meta-model specified, we encoded it dependencies and incomplete
and in another custom language. in the tool. specifications in the system.
Given the particularities of the
project, a dedicated tool was needed The next step was to create the For putting together this solution, we
to analyze it. importer to take all configuration used the tool building capabilities of
files as input and then to create a Moose.
We received the task of crafting a coherent model.
dedicated tool that would be able to All in all, the development effort for
provide an overview of these An important challenge was posed by the prototype totaled a mere 10
configurations and expose their the unification of data: because the person-days. Placed in the overall
various relationships. system was developed over a decade, context of the multiple hundred
there were multiple ways of person-years of development effort
The prerequisite for any data analysis expressing the same information. spent in the project, it was an
is the specification of the meta- Thus, our importing solution needed insignificant investment with high
model for the data. Thus, the first to deal with all these differences. return on investment.
*
O *

P Q K

A M
*

L * I

*
B J *

C H
*

*
D E F
*

A schematic view of the complex meta-model A part of the custom tool for dependency overview
Assessment is a discipline

The amount of data we have to take Specific questions and custom data that has to interpret the results and
into account for decision making will require specific answers provided by to take decisions.
only grow, so we better get good at custom assessment tools.
dealing with it. Assessment is all about feedback, and
Building an assessment tool is similar the best feedback is continuous and
Successful data and software to developing any software project. contextual.
assessment requires both a driving Thus appropriate attention and
set of questions and access to data. investment should be allocated to it. To make effective use of this
feedback, assessment must be
Metrics make for great analysis tools, Using a proper infrastructure can integrated into the process of
but they only show numbers. Data is significantly decrease the overall cost software development.
more than that. Data, and software of building a tool. Moose provides
systems in particular, presents various such an infrastructure. Assessment is a discipline.
parts and complex dependencies.
These must be taken into account to A good tool does not guarantee
obtain a global picture. magic solutions, but it can provide
accurate insights for better decisions
Figuring out what data is important
and how to extract the relevant Tools are important, but we have to
information requires dedicated remember that ultimately, assessment
training. is a human activity. It is the human
Craft
analysis

Hypothesize Existing Apply


analysis? analysis

Interpret
Confident?
results

Act
Tudor Gîrba attained his PhD in 2005,
and he now works as a consultant. His
main expertise lies in the area of
software engineering with focus on
software and data assessment.

Since 2003, he is the lead architect and


one of the main developers of the
Moose analysis platform. He published
more than 50 peer reviewed articles, he
served in program committees for
several dozen international conferences
and workshops, and he is regularly
invited to give talks and lectures.
© 2010 Tudor Gîrba
www.tudorgirba.com
tudor@tudorgirba.com He coined the term "humane
assessment," and he is currently helping
companies assess and manage large
software systems and data sets.

You might also like