You are on page 1of 100

CONNECTING THE DOTS

- Why Danish IT architecture does not result in interoperability -

BRIAN KANE
brian@kane.dk

TUTORED BY PROFESSOR MOGENS KHN PEDERSEN


Censored public version

. Master of Science in Electronic Business IT University of Copenhagen 2004

Executive summary

ii

EXECUTIVE SUMMARY
The primary goal of this thesis is to consider how recommendations and standards for Danish IT architecture, in general, assure successful systems integration, leading to interoperability in the Danish public sector. The secondary goal is to assess the specific case of integrating ECM (ESDH) systems from the FESD project with existing systems from KMD in the context of Danish municipalities. Danish IT policy states interoperability as its main objective for IT architecture work. After examining the concepts of architecture, interoperability and integration, and studying the case, we conclude on the current situation in terms of how mature the Danish work on IT architecture is. Core findings include: Both the general standards and recommendations and the specific case of the FESD project reflect an understanding of interoperability as being the exchange of business documents. Guidance on how to expose systems as services following the concept of service oriented architecture is vague at best. Specifications are too broad and unspecific to be implemented in a consistent manner. There is no coherent way of resolving the physical or semantic problems when two domains of control meet. Most relevant standards are authorized for use, some overlapping and conflicting, but no guidelines are in place for when to use which standards and how. Towards the goal of service oriented architecture, a sound underlying, perhaps publicly controlled, integration infrastructure is needed. There is a need for a long-term roadmap covering IT architectural efforts in the decades to come. The roadmap should clearly describe the current IT architectural situation, and include an explicit statement of strategic goals and operationalized milestones.

While important work is being done at data model level, the task of moving data from application to application is only vaguely described. In conclusion, Danish IT architectural work can currently best be described as initial, informal and ad-hoc.

Contents

iii

CONTENTS
1. INTRODUCTION ...............................................................................1
1.1. PROBLEM ................................................................................................................................ 1 1.2. CHRONOLOGY OF THE THESIS ................................................................................................ 4 1.3. THE APPROACH ...................................................................................................................... 5

2. WHAT IS IT ARCHITECTURE? ...........................................................7


2.1. WHY IS IT ARCHITECTURE IMPORTANT? ............................................................................... 8 2.2. IT ARCHITECTURAL INITIATIVES .......................................................................................... 10 2.2.1. Service oriented architecture ......................................................................................... 10 2.2.2. Ideal SOA illustrated...................................................................................................... 13 2.2.3. Standardization.............................................................................................................. 15 2.3. SUMMARY ............................................................................................................................. 18

3. WHAT IS INTEGRATION? ................................................................19


3.1. WHY IS INTEGRATION IMPORTANT? .................................................................................... 20 3.2. TECHNICAL INTEGRATION ................................................................................................... 22 3.3. INTEGRATION MODELS......................................................................................................... 24 3.3.1. Presentation integration model...................................................................................... 24 3.3.2. Functional integration model......................................................................................... 25 3.3.3. Data integration model .................................................................................................. 27 3.3.4. Which model? ................................................................................................................ 27 3.4. COMMUNICATION MODELS ................................................................................................. 28 3.5. INTEGRATION METHODS ...................................................................................................... 30 3.5.1. Message-oriented concept ............................................................................................. 30 3.5.2. Interface-oriented concept ............................................................................................. 31 3.5.3. Middleware types ........................................................................................................... 32 3.6. ONTOLOGIES ........................................................................................................................ 32 3.7. LEVELS OF INTEGRATION ..................................................................................................... 36 3.8. SUMMARY ............................................................................................................................. 37

Contents

iv

4. CASE: ECM KMD ....................................................................38


4.1. THE ACTORS ......................................................................................................................... 38 4.2. FUNCTIONAL GOALS ............................................................................................................ 40 4.2.1. One way integration....................................................................................................... 41 4.2.2. Two way integration ...................................................................................................... 42 4.3. ECM SYSTEM ........................................................................................................................ 43 4.3.1. FESD project ................................................................................................................. 43 4.3.2. Role of ECM................................................................................................................... 45 4.3.3. Software Innovation ....................................................................................................... 47 4.4. KMD SYSTEMS ..................................................................................................................... 49 4.5. SYSTEM PORTFOLIO .............................................................................................................. 51 4.6. PROBLEMS AND SOLUTIONS ................................................................................................. 53 4.7. PHYSICAL ............................................................................................................................. 54 4.8. SEMANTICS ........................................................................................................................... 57 4.8.1. Primary key.................................................................................................................... 57 4.8.2. Case concept .................................................................................................................. 58 4.8.3. Solutions ........................................................................................................................ 59 4.9. SCENARIO: OPENING UP ...................................................................................................... 62 4.10. SUMMARY ........................................................................................................................... 64

5. ASSESSING THE STATUS QUO .........................................................65


5.1. E-GIF AND OIO XML.......................................................................................................... 66 5.2. FESD INTEROPERABILITY GUIDANCE .................................................................................. 69 5.3. CASE: ECM SYSTEM............................................................................................................ 72 5.4. INTEROPERABILITY ............................................................................................................... 73 5.5. ENTERPRISE ARCHITECTURE MATURITY ............................................................................. 74

6. CONCLUSION .............................................................................78 7. APPENDICES ..............................................................................80


APPENDIX 1: INTERVIEWEES ....................................................................................................... 80 APPENDIX 2: WHAT IS WEB SERVICES? ....................................................................................... 81 APPENDIX 3: THE STACK(S) - GAINING A VOCABULARY ........................................................... 83 APPENDIX 4: STANDARDS MAP................................................................................................... 90

8. BIBLIOGRAPHY ...........................................................................92

Contents

Boxes
Box 1: Business document exchange ......................................................................................... 31

Figures
Figure 1: Problem ........................................................................................................................... 2 Figure 2: Problem, and feed-back ................................................................................................ 4 Figure 3: Past, present and future ................................................................................................ 6 Figure 4: Ideal SOA, logical 1 ..................................................................................................... 13 Figure 5: Ideal SOA, logical 2 ..................................................................................................... 14 Figure 6: Ideal SOA, physical ..................................................................................................... 15 Figure 7: Presentation integration model ................................................................................. 25 Figure 8: Functional integration model..................................................................................... 26 Figure 9: Data integration model ............................................................................................... 27 Figure 10: Request, reply............................................................................................................. 28 Figure 11: Star typology .............................................................................................................. 48 Figure 12: Systems grouped by vendor..................................................................................... 52 Figure 13: Basic integration scenario ......................................................................................... 54 Figure 14: Integration model possibility ................................................................................... 55 Figure 15: Case concepts ............................................................................................................. 59 Figure 16: Data set filtration ....................................................................................................... 63 Figure 17: Problem, again ........................................................................................................... 65 Figure 18: IT architecture to product......................................................................................... 65 Figure 19: Conflicting logics ....................................................................................................... 68 Figure 20: Formal, de facto and proprietary............................................................................. 70 Figure 21: Application level standardization ........................................................................... 71 Figure 22: Integration model standardization.......................................................................... 71 Figure 23: Sample WS network .................................................................................................. 82 Figure 24: The W3C WS Stack .................................................................................................... 83 Figure 25: SOAP message ........................................................................................................... 85 Figure 26: Publish, find & bind .................................................................................................. 88 Figure 27: Engaging a WS ........................................................................................................... 89 Figure 28: Standards Map ........................................................................................................... 91

Tables
Table 1: Service oriented architecture contrasted ................................................................. 11 Table 2: Levels of integration...................................................................................................... 36 Table 3: Goals for ECM systems................................................................................................. 40 Table 4: Level of compliance (physical) .................................................................................... 56 Table 5: Integration model pros and cons................................................................................. 56 Table 6: Level of compliance (semantic) ................................................................................... 61 Table 7: Conflicting inter-process communication standards................................................ 66 Table 8: Assessing enterprise architecture; integration. ......................................................... 75 Table 9: EDI vs. Distributed Object XML Paradigms .............................................................. 87

Introduction

1. INTRODUCTION
Work on IT architecture in the Danish public sector has intensified dramatically in the past few years. A focus on a more holistic view has emerged, along with a trend of looking at how we can break down the barriers between IT systems, and reap the benefits of thinking across traditional divides. Last summer, in June 2003, the Ministry of Science, Technology and Innovation published a milestone white paper on enterprise architecture, thus putting these issues firmly on the agenda. The first two paragraphs of the white paper sum up the goals for the work:
E-government is largely a matter of getting public sector IT systems geared to interoperability. The authorities must have the capability to use each other's data so that citizens, companies and case officers do not have to provide and check the same information over and over again. This requires, for example, common data definitions and coherence in the handling of security and users. And it means dispensing with 'technological islands' if we are to create a platform for new work practices. In this regard, a coherent enterprise architecture framework in the public sector is an important factor. Like a number of other countries, Denmark has now placed enterprise architecture high on its agenda because through enterprise architecture it is possible to govern the organisation and interoperability of IT systems. This is the background to this White Paper on the principles for a common public sector enterprise architecture. (Ministry of Science, Technology and Innovation 2003)

Since the publication of this paper a year ago, there has been increasing interest in the subject. Is this much ado about nothing? Definitely not. This is the core strategy for the inner workings of the nervous system for the whole public sector in Denmark, and will influence the way we work and live, and not least the competitiveness of Denmark and the individuals and firms that are located here. Although the White Paper does try to give practical advice on some points, it is mainly a policy paper that sets out the strategic focus on IT architecture. In this thesis, I will examine how the goal of IT systems geared to interoperability is implemented in an actual project, and by doing that, perhaps learn where we can intensify efforts to ensure that we reach that goal. We want to learn whether the strategic goals can actually be operationalized.

1.1. Problem
The Danish public sector is intensifying its efforts in e-government. While being a rather small country, Denmark has practised a policy of relatively decentralized power, and it has therefore not been very eager to dictate things to the lower levels of public administration. This is changing as awareness is growing of the fact that some form of central

Introduction

planning of the IT architecture is needed. The classic example is to draw a parallel to city planning, which is a widely used metaphor for describing IT architecture and processes surrounding it. It is a very effective way of describing the need for central control or coordination, as nobody in Denmark would dream of allowing city planning to be uncontrolled. At any rate, there is now political motivation behind the rhetoric, and things are moving. Standardization is the basic building block for many of the goals that (almost) everyone agrees on, like being able to heighten efficiency by allowing systems to communicate and expanding offerings and services to the customers: Danish citizens and firms, and other public organizations. The main goal of this paper is to present an analysis of the relationship between IT architecture recommendations and standards on the one side and, on the other side, actual integration resulting in interoperability. This is plainly illustrated in Figure 1.

Figure 1: Problem

The central question that this thesis will seek to answer is therefore:

How do Danish IT architecture recommendations and standards ensure integration and interoperability in public sector organizations?

One of the most highly profiled projects at present, second only to the Virk project, is the project called Fllesoffentlig Sags- og Dokumenthndtering1 (FESD). FESD sets out to become the standardized way of handling enterprise content management (ECM, in Danish called ESDH). When developed, vendors will be able to certify that their products comply with the standard, and will be able to label their products as such. The vision is that these systems should be at the core of much of public sector work, coordinating case work and decision-making processes across organizational boundaries. The FESD project is supposed to be a prime example of the practical implementation of the vision for eGovernment. A major criterion for the success of the FESD standard is the handling of existing systems in place in organizations, which have to be integrated with the ECM systems from the

See www.e.gov.dk/fesd

Introduction

FESD work. Obviously IT systems are found at all levels of the public sector from local authorities in the Danish municipalities to state level. As we move towards a more coordinated perspective, we face the inevitable problems that stem from that view; we have a myriad of heterogeneous systems that have not considered each others existence before now, and therefore have different ways of defining things, and employ different concepts. The situation today could, in one word, be described as inconsistent; there is a great need for coordinating IT architecture. The existing systems will not be replaced in the short term. This means that any new public IT project has to carefully consider how to handle its co-existence with systems already in place. As one of the main arguments for introducing ECM systems is to improve efficiency, it would not be true to the cause to ignore the issues of integrating new systems with existing ones. And I dare say it has not been ignored; this problem has spurred a lively debate in the trade press lately, but no one way forward seems obvious. One of the most pressing problems in terms of sheer volume is the integration to the existing systems in the Danish municipalities. The municipalities are important because they have a lot of case work, so having two systems that overlap and are not integrated would not be appreciated. The current situation is that a Danish vendor, KMD, has the lions share of the municipalities market and has a huge portfolio of systems. KMDs products are widely used by the municipalities, and there is heavy vendor lock-in to KMD. The needed capabilities of the municipalities are shifting and evolving, and moving towards needs not currently provided by KMD, needs that KMD is working towards providing, but is playing catch-up to the smaller vendors in the FESD project. Earlier, KMDs position in the market, essentially one of monopoly, allowed the firm to partially dictate trends and offerings to the municipalities. This situation is changing as we now see more clear demands on the part of the municipalities, and we are seeing a clear interest among municipalities in the ECM systems coming out of the FESD project. This poses a problem for the municipalities as to which path to pursue. There is a need for the municipalities to have a clear strategy for their IT architecture, specifically for the integration of ECM systems from the FESD project into their existing portfolio of systems from KMD. But since the FESD project should adhere to central IT architecture guidelines, a lot of the strategy should already be mapped out. We will look in detail at the FESD project, and the challenges posed by integrating an ECM system from the FESD project with other systems, and conclude on the appropriateness and completeness of the Danish IT architecture work.

Introduction

I have interviewed representatives of all the central actors2 in my case, principals from KMD, the FESD project, the municipalities and the FESD vendors who are developing the actual FESD systems. They have all spoken about the situation, what the issues are, what goals they see, and so on. They unanimously view the issues of IT architecture and integration to be among the most important in moving the public sector forward. In this sense, they all agree. There is some apprehension to be detected; representatives of the municipalities and KMD are especially worried about the actual feasibility of current IT architecture recommendations in making concrete integration possible. This thesis will examine the specific case of integration in the Danish municipalities between ECM systems from the FESD project and existing systems from KMD. This specific case will show us how IT architecture standards and recommendations trickle down to the actual implementation of an IT project. Figure 2 simply shows how I hope to draw conclusions as to the state of IT architecture on the basis of the specific case. It will later become evident why there is not a problem of induction in this argument, although it may appear so now.

Figure 2: Problem, and feed-back

During my analysis of the case I will make concrete recommendations on integration, specific to the case. So the product of this thesis will be twofold. Firstly, it will result in an evaluation of IT architecture initiatives, and, secondly, it will result in a proposal for a solution to the actual integration between the FESD systems and systems from KMD in Danish municipalities.

1.2. Chronology of the thesis


Let me present the layout of this thesis, and how the individual parts fit together. The central question for this thesis, as I have defined it above, has two main concepts, IT architecture and integration.

Interviewees represent the case and not the general findings of the thesis directly. I have listed interviewees in an Appendix 1, page 80. I generally do not quote interviewees directly, since an appreciation of the situation cannot be found in a few sentences. When I do rarely make a direct quote, I will provide a time reference to the sound recording associated with the interview. Recordings of interviews have been made available at http://xxx.xxxxxx.xxx (user: xxxxxx, password: xxxxxx), but please appreciate that some of the material is quite sensitive, and strictly for personal listening and that it is not for distribution in any form.

Introduction

In chapter 2 I will unfold the first of those concepts, looking at what architecture is, what the point of IT architecture is, and try to take a closer look at one of the most central concepts of IT architecture strategy, which also has a prominent place in the White Paper, that is Service Oriented Architecture (SOA). I will also be looking at other IT architecture standards and current initiatives. In chapter 3 I will present the other main concept: integration. We will see why we are concerned with integration, and we will be looking slightly more in-depth at the technical issues that have to addressed, and also at models for understanding integration scenarios. Moving on, in chapter 4, I will look at the specific case of integrating a system from the FESD project with existing systems from KMD in the context of Danish municipalities. We will be drawing on the concepts outlined in chapter 3, to understand what problems exist and how we can tackle those problems. Ideally, IT architecture standards and guidelines should influence the FESD project, which should in turn influence the ECM system coming out of the FESD project and make sure it is able to handle integrations. In chapter 5, I will try to examine lessons learned from the case in chapter 4, and see whether there is a relation between the three elements: IT architecture standard, the FESD project and actual ECM systems. Lastly, I will conclude on my findings.

1.3. The approach


I would like to make it very clear that I am not trying to explain why the situation is as it is, merely explain what the situation is, and what consequences it has. As seen in Figure 3, I divide the problem into past, present and future. Understanding the why is about understanding the dynamics of the complex context. How do the players influence the arena, what capabilities do they have, how is public opinion dictated, and so on. These are very messy questions, which do not lend themselves easily to structured analysis, and I will not be answering them. The past is complex in the sense that it is a black box of intricate relationships between the involved parties and their surroundings. I have used arrows in the figure, but they do not indicate causal relationships. The past cannot be analyzed clinically and be subjected to formal logical reasoning, since we do not have access to it, and, because of its nature, it is a mixture of political goals, business strategy, scarce resources and honest goals for making the Danish public sector work better. To try to give a meaningful presentation of an alleged causal relationship between these influences is utterly meaningless.

Introduction

Figure 3: Past, present and future

The present, on the other hand, is relatively plainly visible. We can see what the situation is with regard to integration, available documented standards, actual initiatives, what capabilities the parties have, and so on. Here we have an actual situation that can be viewed to some extent. Here I lean towards realism; there is a world and it may be observed and examined, or in other words, I feel that we can look semi-rationally at the situation and again semi-rationally map out a course of action stating pros and cons. This does not mean that the present is simple; it just means that it is not as meaningless to look at, and in fact we may be able to say something reasonable about it. Again, the future is complex and therefore unknown to us. It might even be more complex than the past, but we might be able to speculate on some possible scenarios, and try to look at a few what-ifs and contemplate possible routes. I do not consider there to be one truth or reality when talking about the problems that occur in the cross-fire of politics and strategy. Understanding the complex parts is a very important dimension nonetheless, since this is the basis for understanding the motivations of the players and perhaps understanding how to influence the scene, once we have diagnosed the problem, if in fact there is one. But again, I will not be working with that issue. Naturally there are conflicting interests in the snapshot of the present as well as the more complex past and future, and these must be weighed along with the influence assigned to issues such as overlapping or incomplete standardizations, to name but one. But without further ado, let us begin with IT architecture.

What is IT architecture?

2. WHAT IS IT ARCHITECTURE?
IT architecture is not much different from ordinary architecture at the superficial level. Merriam-Webster defines architecture as a: formation or construction as or as if as the result of conscious act. b: a unifying or coherent form or structure. Architecture is nothing new. It is not new to the public sector. We have had architectural plans at application level for a long time, but the difference is a change in scope. When we say IT architecture, we think of a broader plan for how the individual parts should fit together. We are taking the birds-eye perspective, and want to create a coherent form or structure for all the bits we see in that perspective. Perhaps the best picture of the ideal scenario would be if all players agreed on architectural goals and basic foundations. In a private organization it is natural to have one master plan for architectural work. If everyone acted as one concerted whole, we would move towards the ideal situation. I do not want to make the point that the Danish public sector could in general be regarded as a private organization. But in this case it seems appropriate to draw parallels. We do not want to build cathedrals and dictate decentral doings, but we do want strong coordination of what goes on. Often IT architecture is compared with city planning. The reason for this is that in IT architecture, like city planning, you need to have some elements that would not be developed decentrally if not coordinated. Like waterworks, roads and telecommunications, there is a need for an infrastructure which would not simply emerge out of the organizations normal operation. The individual standards might not suit everybody; to some, the train tracks might initially be too narrow or too wide, but in time we agree on a certain width, and all trains can go on all tracks, which is a good thing when we want to achieve maximum flexibility and efficiency. Not all things have to be standardized. Usually you operate with different levels of standardization; some standards might be global, meaning in the whole sphere of control or influence of the standardizing organ. Some standards might be regional, either in a geographical sense or more often in a logical sense; with regard to certain types of objects one standard may apply and for another group of objects a different standard applies. And lastly you have local standards, which could simply be conventions within one organization or department. These, you could argue, are not really standards at all, since only one party uses them, but we wont quibble over words. The reason for differentiating between global, regional and local standards is that it is simply the most efficient approach. We constantly have to keep in mind what the purpose of standardization is: to make life easier for us. When certain standards are irrelevant for others to know about, there is no reason for standardizing. This is all very general, and to get a more detailed picture we should dive into some examples, which we will do, after discussion why IT architecture is important.

What is IT architecture?

2.1. Why is IT architecture important?


The world we live in is changing. Organizations are forced to adapt to this. Weick presents the notion of "loosely coupled" organizations, using educational organizations as examples. He argues that although certain actions or events may seem "irrational," it is often because the rules of the game are not defined.
Imagine that youre the referee, coach, player or spectator at an unconventional soccer match: the field for the game is round; there are several goals scattered haphazardly around the circular field; people can enter and leave the game whenever they want to; they can throw balls in whenever they want; they can say thats my goal whenever they want to, as many times as they want to, and for as many goals as they want to; the entire game takes place on sloped field; and the game is played as if it makes sense(Weick 1976)

We do not fully understand what is going on, but this is the world we are in, and we must learn to deal with it. IT architecture is about being consistent in the way you deal with sloped playing fields. Some years ago I read a book by Marshall Berman called All That Is Solid Melts into Air. I really like that title, and I think of it often when considering development trends. The book is basically about how modernity is much more chaotic than we think. Modernity has been made out to be an exponent of the rational, but here it is presented as a process that cannot be finished. I think it also applies to the development in the way we want information systems to behave, and to some extent it also applies to what we want from an organization. Let me explain. Historically, our information systems have been designed in the broadest sense of the word. By that I mean that someone has had a vision of what they wanted to achieve. They have formulated goals, designed specs and proceeded to develop and implement these. This process assumes that it is possible to know what capabilities you will need in the future, and that it is possible to intellectualize those assumptions of the future into some sort of model of what we want to achieve. Being able to know what capabilities we, as an organization, need to have in the future is becoming more and more difficult. This motivates a move from the formulation of capabilities as something concrete towards the abstract or what you might call meta capabilities. This is less relevant on the strategic level, since goals are already on a meta level, but more true towards the operational side of things. All our solid and concrete beliefs of what we need in the future melt into the unknowable, chaotic and creative. But this is just my opinion. Luckily it maps quite conveniently onto the very practical types of functionality that organizations have been looking for and which vendors are happy to provide. At a cost, naturally. Therefore we see formulations of necessary capabilities like flexibility, which is very abstract. Managing competitive advantage today means to some degree adapting faster than your competition rather than seeking a stable and long-lasting niche. When speed grows in strategic importance, the IT infrastructure should grow accordingly, to match it. This is

What is IT architecture?

where we need technology to help us. Providing the hope for more flexible, faster adapting, and more distributed and integrated organizations, technology might be one of the tools used to keep abreast with the competition. This is not to argue that speed and adaptability alone will give you an edge, but simply that it is a necessary, not sufficient, premise towards obtaining competitive advantage. The markets seem to be moving at an ever increasing speed, and therefore this has become a very important separate capability. As we all know all too well, there was a trend of talking about core competencies etc. some years ago, and the resulting need for decoupling and partnering, as it is sometimes called. IT architecture is important for many reasons. As the world around us changes, so must we change our priorities; we are seeing a shift from prioritizing decentral control and decision making towards prioritizing adaptability, flexibility and efficiency. Because we want to be able to keep up with the speed of the world around us, IT architecture becomes relatively more important than it has been earlier.
The economic environment in which businesses find themselves today is perhaps the most turbulent in history. It is dominated by three powerful influences: globalisation, a knowledge and information revolution, and structural change. (Scott, Comer 1999:130) only those enterprises that realize the necessity of being able to access internal and external data quickly, to integrate and manage this data effectively, and to make it available both within the company and externally over the Web will be able to maintain and extend their lead over their rivals. (Shi, Murthy 2003)

The quote from Scott sounds a little dated, but is nevertheless still relevant. Shi concludes on the consequences of the current situation for organizations and the need for freeflowing information. The public sector, especially in social welfare systems such as the Danish, is influenced greatly by these developments. These are the circumstances the Danish public sector has to come to terms with. Back in 1985, Porter (Porter, Millar 1985) already said that that information and movement of information is important for the competitive landscape. The information revolution is affecting competition in three vital ways: It changes industry structure and, in so doing, alters the rules of competition. It creates competitive advantage by giving companies new ways to outperform their rivals It spawns whole new businesses, often from within a companys existing operations The point Porter is making is that connecting information gives a whole new dimension to the market. Bringing things down closer to actual IT architecture, an important article in Harvard Business Review (Hagel, Brown 2001:109) a few years ago posed the five central questions CIOs and CEOs must ask themselves:

What is IT architecture?

10

Does our management team have a shared vision of the long-term (five to ten years out) business implications of the new IT architecture? Do we have a transition plan that balances the state of the architectures development with a clear understanding of the areas of highest business impact? Are we moving fast enough today to build our expertise and exploit immediate opportunities for streamlining intercompany processes, outsourcing activities in which we dont have distinctive capabilities and designing Web services that we can market to other companies? Do we have a clear understanding of the obstacles within our organization that may hinder us from exploiting the full value of the new IT architecture, and do we have initiatives under way to overcome these obstacles? Are we exerting sufficient leadership in shaping both the functionality offered by providers of Web services (defining, for example, the performance levels required for mission-critical applications) and the standards needed to collaborate with our partners? This thesis will take questions such as these as its backdrop for looking at Danish public sector practices in IT architecture.

2.2. IT architectural initiatives


In the following we will explore the main government initiatives on IT standardization and recommendations. This will give an image of which issues are being worked with, and what priorities are evident. Before we dive into actual initiatives like the OIO XML work and the Danish version of an e-Government Interoperability Framework (e-GIF) we will look at service oriented architecture, which is a central concept defining much IT architectural work these days, both in the public and private sector.
2.2.1. Service oriented architecture

One of the most central concepts laid down is what is called Service Oriented Architecture (SOA). It is Danish public policy for systems to be developed to adhere to service oriented architectural principles.
The core of this common public sector architecture work is the choice of the serviceoriented architecture model, which defines the interoperability between IT systems as services offered by one system component and used by another.(Ministry of Science, Technology and Innovation 2003:31)

Because this is such a central theme, we will spend some time examining what SOA means, to understand how it should influence the practical work with IT architecture. It is important from the start to stress that SOA is not a standard, and is not tied to a specific technology. SOA is a concept that can be implemented with various technologies (but they naturally have to be suited for the task). SOA relies on the term services:

What is IT architecture?

11

A service is a function that is well-defined, self-contained, and does not depend on the context or state of other services. (Barry 2003:19)

The following Table 1 is taken from the White Paper (Ministry of Science, Technology and Innovation 2003) which contrasts SOA against what is seen as other paradigms. Mainframe Architecture Platforms Networks Data Formats Technology Focus Users Monolithic and centralised Restricted and closed Non-transparent and inaccessible Operating system IT operators Client/server Architecture Homogeneous and controlled LANs widespread but isolated Binary and proprietary Database Case officers Service Oriented Architecture Diverse and unpredictable Internet, omnipresent and linked Semantic and divided Interface Suppliers, employees, customers/users Promotes business agility, adaptability and interaction.

Business Value

Digitalisation of data-centric operations

Provides data to users

Table 1: Service oriented architecture contrasted

A real world system is likely to have elements of all three archetypical architectures we see in the top row of the table (as well as other, less crisply defined architectures), so naturally it would not be as clear cut as we see here. However, for the sake of simplicity, I will briefly take you through the right-most column and interpret what is meant. The importance of this lies in SOA as the main definition of the ideal architecture, so it is crucial to understand what we mean by it. On the platform level the word unpredictable is used. As a consequence of a fast-moving world where enterprise adaptability is a sought-after competency, organizations must change their fundamental outlook on the world. We touched upon this in previous sections. One might even call the unpredictability an adoption of a new ontology (in the classic sense; assumption of how the world works, etc.). Earlier, in IT architecture, we have seen a heavy consensus on planning and control. This approach, as I understand it, has the essential premises that 1) you can foresee the future and what needs there will be 2) you can embed the assumptions regarding the future into a system you have control of. I imagine these ways of thinking stem from a mechanical image of the world, but they are proving less and less useful. The reality of today is that you cannot as clearly or easily predict what the future will bring. The horizon is moving ever closer. This is increasingly true as the viewer and the viewed are further apart, or in other words it becomes more difficult to make predictions in decentralized organizations. The second assumption is about control. It is not viable to assume that a central power has supremacy in a specific interaction. This is quite obvious when we think of how cross-organizational business

What is IT architecture?

12

processes are becoming the norm, and how the on-demand movement requires realtime interaction between different systems. So what do we do to address this changed landscape? SOA to the rescue. The platforms in SOA thinking have to be thought of as distributed application. In the coming sections we will be illustrating this in more depth, so do not despair if it does not make perfect sense now. Web services is a specific technology that can be used to implement the concept of SOA. With SOA you distribute your application across physical and logical divides. There is a tricky question of definition here, because the point of SOA is also that you can use functionality (services) that are outside your realm of control or ownership. So the borders of an application become very fuzzy. One service could be viewed as a little application in its own right or as a subpart of all the applications which that service is a part of. In theory you could imagine building an enormous number of applications using a static number of services that are aggregated in different ways. As opposed to earlier, SOA is a much more organic platform, where elements can be added and used continuously, with no central power having a final say or absolute control. It is ideas like SOA that are the basis for the need for integration technology because when we decide to split up the elements of an application we need to tie the little bits back together again. This is where middleware, brokers, etc. come in, which we will learn about in the next chapter. Moving on to the network level in the table, we find much more familiar concepts. We have long been conditioned to see the advantages of one big internetwork, which has been designed to be resilient to faults. The physical infrastructure of the public internet that underlies SOA thinking at the platform level has been around for some time, and has long since adopted the distributed thinking that made it so attractive to the American defense. As for the next dimension of SOA from the table, technology focus, they use the term interface. In this context interface must mean machine-to-machine interface and not a user-oriented interface, as you might think. Obviously when applications have to interact more and more, there is a need to focus on the interface they expose to other systems. This does not mean that you can pay less attention to the operating system or the database. The database design also becomes more critical since you have to take the integration issues into consideration when you design it (providing you can know anything about the integration needs at design time). With regard to users, I would not say that SOA thinking places less emphasis on internal employees, but simply broadens the field of vision to include other parties in the (information) supply chain, who previously were not as interesting to consider. The business value box is what its all about, and resembles what I started out with in my presentation of SOA. The consequences of implementing SOA, ideally, should be faster

What is IT architecture?

13

moving organizations. This could mean a faster moving supply chain which has lower switching costs, faster prototyping of new products, greater adaptability to market trends or greater accountability, to name just a few possible positive consequences.
2.2.2. Ideal SOA illustrated

Clearly we do not live in a perfect world. Nevertheless, I will now try to describe what an ideal SOA means. In an ideal world we move towards true SOA, in which all parties adhere to the same conventions. To try to show this, I have tried to develop some illustrations of what adopting SOA means. There are two logical illustrations and one physical illustration. The logical illustrations show how the architecture can conceptually be thought of, while the physical illustration is closer to infrastructure level.

Figure 4: Ideal SOA, logical 1

In Figure 4 we see a snapshot of a scenario. Here we see a logical representation of high granularity web services (the smaller circles) being parts of a larger granularity web service. The inside part of the rings both the smaller and the large symbolizes a specific, perhaps legacy, platform, while the outside band symbolizes a standardized interface towards the outside. This picture illustrates the hierarchical relationship between applications published as web services that will always exist at any time. So we have a set of applications of different size that are aggregated into more and more complete applications. The groups of circles that do not connect symbolize systems which are not integrated in any way. The smallest applications could be simple services providing banal functionality like basic arithmetic functions, while the largest circle could be the entire network of applications. Using this method, applications used in multiple instances would simply be depicted multiple times. This could be described as a static state view. In relation to our ideal scenario, this figure describes one way the system should be able

What is IT architecture?

14

to be described when SOA has been implemented. Notice that even systems of the same type should in principle communicate using the shared standards (the outer ring should be in between). The next illustration of an ideal scenario is more of a map, trying to give an overview of the application chunks and their relationships. Here each web service is only depicted once.

Figure 5: Ideal SOA, logical 2

Notice again how the applications that have different insides externally provide a consistent way of accessing the insides. It is clear that the concept of an application begins to crumble here. What we call an application is simply a matter of choosing to draw a circle around a group of circles/applications. Being able to draw this map of our systems would indicate something close to the ideal.

What is IT architecture?

15

Figure 6: Ideal SOA, physical

In the physical depiction of an ideal scenario, we can see that the many-to-many connections of the logical illustration have been replaced by a star typology with central hubs. In practice it would not be efficient to physically make point-to-point connections between all endpoints. In theory this is possible, but it would be too expensive to administer and it would not allow for much central coordination. In this illustration the satellites are the applications. The different patterns indicate different platforms, standards, etc. The hubs act as brokers between the different native ways of communicating and the internal lingua franca. Communication between hubs happens via an open shared network in the common and standardized fashion. To understand the way the different illustrations supplement each other, you could relate this to the difference between the physical internet and the distributed peer-to-peer applications that run on it. The analogy only partially holds water but it might help to give an impression of the difference. You should now have an idea of the distributedness that is central to the idea of SOA. In my presentation of SOA, I have assumed that we were dealing with different platforms, protocols, etc. (hence the different patterns in the figures), but this may not always be the case. Standardization work could help iron out the differences between systems, if we should choose to take that path. We turn now to standardization.
2.2.3. Standardization

Standardization as an IT architectural effort is supposed to be an essential part of making integration possible. We earlier distinguished between global, regional and local architecture work, and we must remember that many standards are global in the literal sense;

What is IT architecture?

16

they are supposed to be used consistently everywhere in the world. These standards are usually quite generic in that they can be used for all sorts of things (e.g. the standard XHTML 1.0 does not dictate what the web page presents). These standards can be influenced via committee work in the standards organization, and we do have Danish representation on several committees, but it would not make much sense to invent a Danish standard if there is a global one covering the same area. Standards organizations are naturally influenced heavily by industry, so there is a lot of politics and market powers involved. This means that several standards organizations come up with rivalling standards that do basically the same thing, but usually have a bias towards a specific vendor, normally Microsoft, Sun or IBM, to name a few. Standardization work, on the generic level, is thus primarily about choosing between rivalling standards. In the context of this thesis, we are talking about integration leading to interoperability, so let us narrow our scope a little. With respect to standardization of integration technologies, we must make an important distinction. I differentiate between two basic strategies: converge and connect. By converge I mean to make heterogeneous systems homogeneous, or put differently, to make detailed standardization of actual technologies that are the systems. Connect means to standardize on the meta-technologies that leave the heterogeneous systems as they are, but dictate how to link the systems together, how to approach the problem and to overcome the differences in a structured and consistent manner. Standardization of actual technologies is a long-haul strategy, because it means altering systems already in place. That translates into an expensive strategy in the initial phases, but when it is completed, many things will become easier. Standardization of meta-technologies and approaches, which does not dictate actual standards but helps us to address the differences, lets us tolerate existing systems, and to some degree encourages the differences. An argument for a connect strategy would emphasize that there is a reason for systems being different; this is not accidental. Systems express and reflect the context they are in, and are therefore appropriate. Now we will turn to a very brief presentation of the two main initiatives for IT architecture that are relevant to the question of interoperability. There are other initiatives such as standard contracts and benchmarking work which are also parts of good IT architecture practices, but not relevant to our questions of interoperability. We will be returning to an actual evaluation of the initiatives after we look at our case. e-Government Interoperability Framework In order to achieve a shared platform of standards, a working group has set up the eGovernment Interoperability Framework(Det Koordinerende Informationsudvalg 2004) which makes recommendations on the global standards. An e-GIF defines the standards used in public government. Many countries have an e-GIF, although these vary in terms

What is IT architecture?

17

of the role they play and how detailed they are. This will not be an extensive investigation into the (at present, 107 different) actual standards listed in the Danish e-GIF, simply a statement of its purpose. The recommendations in the e-GIF are meant to help projects and organizations make choices that are not aimed at the specific problem in the project but are meant to serve a larger good; it might be a disadvantage for the individual project to use a specific standard, but they should use it because it has been chosen as the common standard. The e-GIF recommends on a whole range of issues3, some of which are more relevant to our context than others. A lot of the standards are quite banal since you would not consider using anything else (like TCP/IP and JPG) so these are obviously not of much use. We will make an assessment of the e-GIF at a later stage, when we have more insight into what we should expect to find there. OIO XML OIO XML is perhaps the biggest and most important architectural work done yet in Denmark.
The main purpose and value is to support exchange and reuse of data related to public and private service delivery, including cooperation, business reengineering and alignment of related services. (National IT and Telecom Agency 2004)

The important part of OIO XML work is that it aims to document what information exists in what systems in the public sector. The way that this has been done has basically been to convert paper based forms and data models from databases to XML Schema4. XML has meant that this task is significantly easier to do than earlier. With XML being so widely adopted as a structured messaging format, we suddenly have a consistent way of describing our data and constraints on the data, the XML Schema. This means that we do not have to agree on all the details of the structure of a message bilaterally between each set of communicating points, but we can all simply say we adhere to the same specific XML Schema (not quite that simple, but nonetheless). With OIO XML we ideally know exactly what information exists in the various public offices, and this work can therefore form the basis for a unified view of the public sector. Obviously we still have to agree on the semantics we have to agree that when we wrap an element in the tag street, that the receiving party agrees that what we call street is just that, and not street name or something else. This issue is quite an important one in integration problems, and we will examine it more closely in section 3.6.

If you are not familiar with this work, I suggest you go to http://egovernments.org/referenceprofilen/ now, for a quick look 4 See http://www.xml.com/pub/a/2000/11/29/schemas/part1.html
3

What is IT architecture?

18

As a message format, XML provides the flexibility we have long been looking for and is the basis for countless standards, most noticeably the whole family of web services related standards, but in our case those for describing data in particular. There are related committees that work on issues that use XML. These include for example the DokForm5 work which was initiated some years ago. This work is still in its initial phases, but is focused on metadata for ECM systems, which essentially means choosing the right descriptors for a case in an ECM system, the argument being that this would allow for easier exchange of case data. Again, we will return to the OIO XML, and an evaluation of it in particular, when we have learnt how it fits into the bigger picture.

2.3. Summary
In this chapter we have looked at a central concept which is supposed to define Danish IT architecture strategy: service oriented architecture. We have seen how this means a much more distributed way of thinking and involves abandoning the silo-mentality often observed. We have then looked at two essential and more concrete initiatives: OIO XML and e-GIF, where OIO XML maps the information in existing systems using Schema and where the purpose of e-GIF is to aid us in making choices leading to standardization covering both a connect and converge strategy.

See http://www.oio.dk/XML/standardisering/dokform

What is integration?

19

3. WHAT IS INTEGRATION?
Every time a closed system opens, it begins to interact more directly with other existing systems, and therefore acquires all the value of those systems. (Kelly 1999)

In the following sections we will take a look at the business goals of integration efforts, which cover the broad arguments for thinking of integrating systems. After that we will look at technical models that show some of the questions we must answer to be able to describe an actual integration. First of all, we could decide on what integration means, or what we are trying to achieve. Here is one attempt at a definition:
Integration: Two systems are integrated if an event in one system (system A) that might potentially affect decisions being made in another system (system B) is always reflected in system B in system real time.(McComb 2003:225)

What the quote states is that, if two systems have overlaps in terms of what objects or entities they consume or manipulate, integration means that the manipulation of the entity is visible wherever the entity is used, or in other words the manipulation is replicated between the systems. McComb uses the term decision, which to me suggests human involvement, which, just to be clear, is of course not necessarily the case. He also seems to think that the replication has to happen real-time, which suggests that non-realtime integrations are lesser integrations. So time is a factor. On systems and applications: when I say we want to integrate two or more systems, we have to have a notion of what a system is in the first place. In this context I use the term system synonymously with an application, with the added dimension of the applications surroundings, where decisions have been made about frameworks, operating systems, programming languages, lower level infrastructure etc., but both terms have become fuzzier than they were earlier. This is certainly the situation in the specific case we will be looking at in a little while. So when considering system or application integration, it is essential to bear in mind the context and surroundings of the application. My understanding of truly integrated systems is a situation where all the individual systems act as one. What we have to be clear about is the importance of choosing our scope; large enterprise systems are architecturally made of smaller parts, and these smaller parts could again be divided into subparts. If we choose the scope of the whole enterprise, the whole system might not be very integrated, but zooming in on specific parts may reveal very tight integration. This seems obvious, but is nevertheless important. I prefer an understanding that says that systems are integrated when the border between the parts and the whole becomes blurry. This is very debatable, but I found it to be the least poor description. The reason why it is debatable is that a cardinal point in

What is integration?

20

SOA is to split up systems in well defined pieces, but let them work together seamlessly. So it is a question of what technological context you have. There are many problems that a strict definition cannot handle. For example, an application could have multiple presentation layers exposing different parts of the business logic, or, put differently, an application could serve very different types of users, that perform actions on the application that have little or no influence on each other. The functionality that lies in application layer might be so generic that it does not determine what it is used for. This we would normally consider two separate applications since there is no causal relationship between them, determining decisions being made in another system. On the other hand, they are clearly using the same physical application layer code to perform tasks, but would under this definition not qualify as integrated. A clear-cut and absolute definition might not be within reach, but I think we have the basic concept in place and we will continuously be adding to our understanding of the term. Interoperability has also been understood in many ways. The way I will use the term is as the result of integration; integrated systems result in interoperability between them. This understanding is not the most technically elaborate, but is sufficient for our purposes.

3.1. Why is integration important?


Now we will review the broader context integration has to be seen in. What is this integration supposed to achieve in the first place? What are the business needs that drive the actual low level needs? The definition of what is needed is of course dependent on whom you ask. The web services technology, for example, which has been widely marketed as an integration technology, has largely been brought forward by industry. This obviously leads to a somewhat biased view of what we need. How the standards have come to look like they do today would make a very interesting discourse analysis. However there does seem to be a surprising consensus on the need for effective integration technology, so let us look at those perspectives. This is really a continuation of the arguments for IT architecture. IT architecture means a more holistic view, and as a consequence of a holistic view we need to link the pieces. Otherwise the macro perspective would not be new. Porter uses the concept of linkages to describe the interdependencies in and between organizations, and how it can be the source of competitive advantage:
Linkages exist when the way in which one activity is performed affects the cost of effectiveness of other activities. Linkages often create trade-offs in performing different activities that should be optimized.[] Careful management of linkages is often a powerful source of competitive advantage because of the difficulty rivals have in perceiving them and in resolving trade-offs across organizational lines. [] Linkages not only connect value activities inside a company but also create interdependencies between its value chain and those of its suppliers and channels. A company can create competitive advantage by optimizing or coordinating these links to the outside.(Porter, Millar 1985)

What is integration?

21

A technological embodiment of linkages is integrations and can thus be the basis for improving the organizations performance. But what does this mean for the strategic process? For one, it becomes more important than ever to integrate the general strategic process with IT strategy and enterprise architecture. A few of the potential impacts of successfully leveraging integration technology might include (Marks, Werrell 2003: 89): Identification of target markets. The scope of products and services could shift, either as a consequence of the organizations own strategy-change or in response to competitions initiatives. Corporate Performance Management (CPM). As systems become more integrated and exchanging information becomes easier, cheaper and less error prone, we will see a move to closer-to real time reporting. This will enable the management to keep abreast with up-to-the-minute metrics and act accordingly. Value chain visibility. Web services will enable the organization to see straight down the supply chain, and in some cases also up the demand chain. Information form disparate sources will be aggregated to create more knowledge and insight of the available information. This will happen first internally in the organization and increasingly between them. Automation of business processes. Business process definition management will become cheaper and more viable across organizational boundaries. Agility. Organizations will feel less constrained by the IT infrastructure as flexibility increases. Errors due to unforeseen consequences of changes to systems will be less likely, as the infrastructure becomes more service oriented. Organization structure. As the IT infrastructure becomes more open and less of a constraint, it will define the organizational structure less and less. This will mean that we will see more fuzzy hierarchies and more fuzzy organisational borders. You might think that this sounds very business minded, and not very relevant for the public sector, but if we consider for example the way public sector organizations are now called public firms and have to submit financial reports much in the same way private firms do, we will realize that there has been a shift in the way public sector organizations are thought of. Obviously, there are important differences between public and private organizations, for example the level of accountability we could expect. This means that some considerations may weigh heavier than others, but there are no fundamental differences that drastically alter our IT architectural goals. The technological infrastructure to achieve these ambitions in an efficient way has not been put firmly in place, so there has been a technological vacuum with regard to fulfilling the identified business need, which is also a need for more dimensions to compete in. If for example the technological capability of integrating seamlessly with partners is achieved, the ability to manage your business network will therefore be an even more important competency in the future, since technology lock-ins will have relatively less

What is integration?

22

force. The future might give us the opportunity to blur the difference between a traditional value chain internally in an organization and the external supply and demand chains. This means that one piece of the value chain can be put more easily in an independent organization, but still keep the close information-based ties that are so important. This relates to the core competency trend some time ago and, more recently, the move towards on-demand thinking, which many corporations are embracing. This is, of course, one of the main strategic directions IBM is pursuing. The reason for this is that organizations need to lower their fixed costs to a minimum. At a time where the thoughts of Ulrich Beck are more relevant than ever, it has become more attractive to pay someone else to take some of your risk, or share it with others. Marginal costs become relatively more attractive, even if you do pay a little more. Again, the technology to do this is something we need. But let us move slightly away from the strategic to the operational. As always IBM (Nix 2004) is ready to give advice on what we need: We want and need: to integrate systems regardless of their implementation to move from monolithic, custom-coded apps to choreographed, scripted components. agility and flexibility to reconfigure business functions to try new process models. to move from tightly coupled systems to loosely coupled ones to deal with inevitable change. a well-understood programming model for connecting businesses via the Internet. This will mean that we might see a fragmentation and integration/collaboration scenario, where the lower transaction cost will mean that it would be more of a viable solution to outsource it. This will give rise to new markets of specialized functions, traditionally thought of as strictly in-house.
Transaction cost theory asserts that the price of a product is comprised of three elements: production costs, co-ordination costs and profit margin. Co-ordination costs include the transaction costs of all the information processing necessary to co-ordinate the application of the resources employed in primary activities. (Scott, Comer 1999:135)

As the price of coordination costs come down, transaction costs as a whole come down, and the market will be more plastic.

3.2. Technical integration


The overall issue we are trying to grasp is how to make sense of applications that have to interact in some way. Earlier applications were developed with a stove-pipe mentality,

What is integration?

23

meaning that each application was thought of as an independent silo, which was not supposed to communicate or interact with other applications. This is changing. Now we want applications to come together as a larger whole and become coherent. What were once multiple separate applications should now become one distributed application. Distributed means that we may view the individual applications as parts of a greater application, and so on one level it is merely a concept; we can choose to look at a group of applications as one larger distributed application if they are integrated. But there is also a technological level that must be considered. We cannot simply choose to view multiple applications as one whole, if the technological infrastructure is not in place. Integrating systems can have many purposes, be it providing a web interface to your old mainframe applications, consolidating multiple legacy systems into one and adding additional application logic, or probably one of the most common, trying to open up your systems to be able to interact more freely with other systems in other organizations. If you know exactly what you want, and you will need the same thing for a long time, and the marketplace does not change, then you are fine. But this is simply not the case. Examples of types of things organizations need to be able to do include: On the developer level: Use functionality from different physical and logical applications (thus breaking up our understanding of what an application is and its boundaries) On the business level: Aggregate output from multiple systems to achieve a more transparent view of the organization. A common use being business intelligence or to facilitate cross-application business processes. We have been able to use functionality from other applications before, but because it was proprietary technology and no one was powerful enough to dictate the whole market, these technologies have been unsuccessful. There have been initiatives that can do what we need, e.g. CORBA or Microsofts COM, but due partly to the lack of widespread adoption these technologies have not been able to deliver on the need for distributed computing in any real way. The reasons for this capability are many, but let me just provide one example. In todays business environment, businesses are bought and sold at an enormous rate. The costs involved in mergers and acquisitions are huge. Most of the difficulties in M&As are usually attributed to organizational problems, but considerable resources go to integration of systems. Having a technology that could facilitate this process and help to design systems that allow for rapid take-over, would increase the market value of a firm and the money spent on the buyer side. An attractive prospect. M&As make me think of an upcoming restructuring of the Danish municipalities, so it is clear that M&A thinking is very relevant in the public sector as well.

What is integration?

24

In the following sections I will present the most fundamental building blocks and concepts for system and application integration. We will need this terminology when we have to talk about the actual issue at hand in our case, the questions surrounding the integration between Enterprise Content Management systems and systems from KMD. I will not draw on all concepts in later chapters, but the following should also give an impression of the field in a broader sense and what types of problems are dealt with. For the purpose of this thesis, I have been forced to caricature things a little, and simplify certain things. Instead of giving an over-dose of technical details, I have tried to keep things as concise as possible, and only present what we need in order to understand the issues in systems integration. It is probably easiest to start out with the good old 3-tier understanding of an application. As we know, a generic application consists of a data layer, an application layer where program logic resides, and a presentation layer where the interface is built. An example of a simplification is the presumption of a three-layer architecture in the following integration models, which is not at all a given. Many applications have additional levels, some applications have layers that are mixed together, and some applications are missing one layer, e.g. a user interface if the application is only used for machine-to-machine situations, or perhaps has no database at all if the application only manipulates data from another application.

3.3. Integration models


This is what its all about: integration. But what is integration exactly? What types of integration exist, and how do they differ? In the following we will take a look at integration models which indicate different entry points to applications and are thus different ways of integrating two applications. Following the standard three-tier architecture, these are the three basic integration models. You may access an application either by way of the presentation layer, the functional (application) layer or directly to the database at the data layer. The different types of access are appropriate in different situations depending on goals and constraints.
3.3.1. Presentation integration model

Sometimes the only access you have to an application is via the presentation layer. Not all applications have a presentation layer in a graphical sense, but most do.
This form of integration is useful only when the integration can be accomplished using the user interface or presentations level of the legacy applications. Integration of this type is typically oriented to textual user interfaces such as IBM 3270 or VT 100 interfaces. (Ruh 2001:22)

Figure 7 demonstrates an integration where two presentation layers are aggregated into a third. This is slightly simplified, as you would obviously need to have some application to control the aggregation and produce the third presentation, but the figure shows the goal of an integration. A great advantage of this integration model in our context is that you do not need to have control over the applications inner workings. This means that

What is integration?

25

you do not have to have access to the lower level application or data layer to use this model.

Figure 7: Presentation integration model

The down side is off course that you are limited to the functionality that is given in the original presentation. This means that you cannot create new functions or logic without them having to go via one or more presentation layer screens. For example if you want to create a new function called CreateCaseAndSendToOrg(orgid), and this process would ordinarily take the user to multiple screens in a certain order, some special software called middleware would have to do exactly the same as the user, and expose that process as a function that could be invoked by another application, thus simulating the creation of a new function. Simulating users also creates performance issues as you add an additional layer to the integration, and the presentation layer is probably the slowest with the most overhead, although if it is a text-based application that is in question, this is less of an issue.
3.3.2. Functional integration model

In functional integration we use the existing business logic in the underlying applications to access and manipulate data. Figure 8 shows how middleware can aggregate the two application layers, perhaps transform them and publish them to the third, new, application on top, essentially creating a new application. This ensures consistency (to the extent that the underlying application business logic ensures consistency) in the database and can help to prevent problems with relational integrity, if securing relational integrity has been put in the business logic layer. Functional integration is by far the most robust method and is the basis of the bulk of integration patterns.

What is integration?

26

Figure 8: Functional integration model

Functional integration can be grouped into three approaches that reflect the goal of the integration: Data consistency is quite self-explanatory, the point being to maintain consistency across multiple applications that have overlaps in data, for example, if name and address information is in multiple applications, there is a need to push an update out to all the relevant applications. This approach promotes loose couplings where the (especially the receiving) applications have little or no awareness of the other applications. The sending application uses a fire-andforget strategy where there might not even be a response from the receiver: a typical asynchronous approach. Multistep process involves an orchestrated series of interactions between systems. There is often a need for multiple applications to support a business process, where the individual applications have different roles, but the process needs to follow a specific sequence and perhaps a specific schedule. The role of middleware is to couple these applications and perhaps use a workflow engine to control the actual process steps. A multistep process usually promotes tight couplings among applications, since all applications are typically needed for the process as a whole. If one application is down, the whole chain is down. Since the individual communications between the applications and the middleware are part of a larger process, there is a greater need for ensuring that the communication has gone as planned. There will therefore be more two-way communication, which can be synchronous or asynchronous.

What is integration?

27

Component integration is about modularizing applications and unifying your application portfolio into one coherent whole. This plug-and-play idea could be thought of as Legos that can be used as clusters of functionality that can be interchanged easily with other blocks. Component integration is more strategic as it requires considerable effort to implement. It requires an enterprise-wide plan and coordination of efforts, and detailed knowledge of the existing applications is needed. It encourages having one consistent semantic model of data and functions across all applications, which is a tall order in many real world scenarios. Component integration favours tight coupling.
3.3.3. Data integration model

Data integration simply exposes the databases directly to the other application, thus completely circumventing the business logic of the application. Figure 9 illustrates this. The classic scenario for this type of integration is if you want to aggregate multiple sources of data to one more unified view, allowing for data mining work to be done for instance. In other words it would be suitable for one-way integration where you do not want to manipulate data, but simply pull it out. It might also be suitable if you want to manipulate data in a single or a few fields, for example an address, or a last name. The danger when using this approach is that if you do not understand the data model completely, you will not be able to manipulate data correctly, perhaps rendering the application inconsistent.

Presentation

Application

Middleware

DATA

DATA

Figure 9: Data integration model 3.3.4. Which model?

Deciding which integration model to use is a significant question. In general, functional integrations are preferred if at all possible. In some cases it is not feasible to be able to

What is integration?

28

access the application or business logic layer where the code is and which is most tightly controlled by the application vendor if it has not been developed in-house. If the application layer is not accessible, you should consider which of the two alternatives has the best pro/con ratio, where screen-scraping (parsing the presentation level data and structuring it) at the presentation layer is inflexible and has performance issues, and data integration duplicates business logic in the application layer, and could lead to inconsistency. If the reason for choosing a data integration model is a lack of access to the application layer and you cannot therefore pursue a functional approach, you should be careful when manipulating the data directly. Not knowing exactly what the business logic is will put you at a disadvantage when manipulating the data directly and building your own business logic again, essentially duplicating the business logic. If you wanted to manipulate the data in a more complex way, you would have to build new business logic in the other applications. This is fine if the business logic you want does not exist in App A, but if the logic already exists, which it often would, you would have to duplicate this logic in App B, which is not efficient, and could lead to inconsistencies if it is not done correctly.

3.4. Communication models


When two systems communicate there is a fundamental differentiation between synchronous and asynchronous communication. Choosing one or the other depends on what we want to achieve and the dynamics of the context the applications are in. First, we should agree on a little convention on wording, as seen on Figure 10.

Figure 10: Request, reply

Synchronous communication basically goes like this: 1. 2. The sender sends a request to the receiver. The sender waits. The receiver receives the request from the sender, processes it, and sends a reply to the sender. During processing, the receiver is not open for other incoming communications. The sender receives a reply from the receiver. The receiver continues processing and the sender continues with further processing. This point might be counterintuitive until you think it about for a while.

3.

The point with synchronous communication is that, while waiting for a response from the receiver, the sender is on hold and not free to continue other tasks. In other words

What is integration?

29

there is a tight coupling between the two parties. This type of communication needs an open and stable means of communication to be successful. If the underlying infrastructure is not operational or if either party is not ready to communicate, the transaction will not be successful. Also, latency issues are bound to influence the performance of this type of communication. If the lower level infrastructure, i.e. the network, has high latency, this could drastically influence performance. There may be situations where you would not want the transaction to be completed unless you have an instant receipt from the receiver, for example when paying by credit card. Any systems where the sender would expect a response or result of the transaction on the spot would (as a rule of thumb) have to use synchronous communication. Normally, when a physical user is initiating the communication, that user would expect to see a result immediately. This is the case with most web applications for example. There is no clear borderline between synchronous and asynchronous communication. All communication takes time to process, and what qualifies as synchronous, or real-time communication in one context, might not qualify as this in another context, so this is not a well defined concept. Note also that it entirely depends on our focus whether a communication is synchronous or asynchronous. Very often the lower level protocols, like the transport layer protocols, will be synchronous, while the higher level protocols might be more asynchronous. If nothing specific is stated, we are talking about the communication that immediately carries the data we want to integrate, for example a SOAP message (refer to Messages & SOAP, page 84 for further explanation). The universal text-book example of synchronous communication is the phone call. The communication can only take place if both parties are home and ready to speak (not in the shower, etc.). The communication is real-time, and when the phone is hung up, the communication is concluded. Variants of synchronous communication exist depending on what type of reply the receiver sends, which in some cases only amounts to a receipt of having received the request. This then becomes borderline asynchronous communication which we will now briefly consider The asynchronous communication model is looser. Here time is not considered to be critical, and the availability of the receiver is not a relevant concern. One of many very widely-used protocols that use the asynchronous communications is SMTP, the protocol for sending and routing e-mail between servers. When you initiate a SMTP communication the receiver does not get it instantaneously, and is not required to reply instantaneously either. More often than not, the mail is routed through a number of different servers, depending on how far the communicating parties are from each other. What are the essential characteristics of an SMTP communication? For one, the sender is indifferent to whether the sender (not the person, the system) receives the message, in the sense that the communication is not part of a tightly bound chain of events. The sender is indifferent to what consequences the communication has at the receivers end. Asynchronous communication more closely resembles a fire-and-forget type of thinking.

What is integration?

30

In applications, asynchronous models may result in the need for more plumbing, because you have to handle all the what-ifs higher up in the system. When the messaging platform is asynchronous what happens if the message fails further upstream on the way to the final receiver? Then whatever consequences sending the request had have to be rolled back at the sender. So the choice of communication model heavily influences and perhaps broadens the responsibility of the application.

3.5. Integration methods


Moving quickly along, we will now look at some of the more physical methods of transferring data from sender to receiver. Until now we have had brief looks at where in the application you can enter (integration models) and how the communication takes place between the two interacting parties (communication model). Now we will look at what the content of the communication looks like. We will be looking at the message-oriented concept, and the interface-oriented concept. Again, the following two basic concepts are not supposed to be a binary choice, but should naturally be mixed and matched for the specific purpose. Sometimes you also read of differentiating between white box and black box integration. White box integration exposes the internals of the application or dataset, requires deep understanding, and leads to tight coupling, while black box integration hides the internals of the application, integration is done through an API, connector, or some other form of interface. The black box concept usually leads to more loosely coupled system. While the concept of white box and black box does not completely map onto the two following concepts, it might be interesting to ponder why it does not.
3.5.1. Message-oriented concept

With the message-oriented model, commands, and parameters, are sent in one structured whole. The context of the transaction must be provided in the message so the receiver knows what the commands relate to. So both the data and labels for what the data means, and thus what to do with it, are transferred simultaneously. When sending a message you must know that the receiver understands the message, and can interpret the message. So this does not presuppose knowledge of the capabilities of the receiving party. This type of integration favours loosely coupled communication, but this not a must. In message-oriented integration it would make life a lot easier if we had a consistent way of describing structure, and at the same time assigning semantic mark-up. One of the first spins on the role of XML-messaging was as the successor to EDI. Let us take a quick look at that, before we continue with the other main integration method based on interfaces.

What is integration?

31

Business document exchange Businesses have long had a need for automated document exchange. This, among other reasons, was why EDI came about. This motivation is still very much valid, but the problem with EDI is of course that it is expensive to set up, and expensive to maintain in the sense that changes require coordination and configuration on the part of the sender, the receiver and probably also a 3rd party VAN provider. So we still want the EDI benefits, but we want them to be far more flexible and we want them at a lower cost. Staying in tune with the need for flexibility, we also want to avoid heavy non-standard lock-in, which is predominant with EDI. This is why we are moving to XML as the message language for business documents. As collaborations become more complex, involving more parties that have to play certain roles in a planned business process, it would greatly increase the value of document exchanges if there was a standardized way of defining a business process. Today the bulk of an inter-organizational business process is conducted in a non-standard way, and many are documented inconsistently, if they even use the same modelling language to describe them. This means that we have only a limited number of what we could call collaboration points, where we can set up a document extract which is outbound. Essentially, this just means that if we want an extra collaboration point, we can of course have it, but it is expensive. But if all internal business processes are made up of standards-based building blocks, the boundaries of the firm dissolve in a practical or technological sense. We need the capability, if possible, to allow any business process element, embodied by a business document, to be unbound to the context it originated in. This will mean that we could mix and match collaboration points across organizational boundaries. A business document approach also reflects one extreme of how you could use web services, the other extreme being very fine-grained web services with high volume interaction. This discussion is expanded on page 84. Business document exchange is not integration as such in that the application is not integrated, but simply allows for efficient exchange of information.
Box 1: Business document exchange 3.5.2. Interface-oriented concept

Where a message-oriented concept hides the application that will consume the message, an interface-oriented orient does just the opposite, it exposes it. What this approach to integration does is to create an interface for a remote object on the local application. This means that when we connect to an interface we create an invocation directly, and are able to see what resources the receiving application is exposing. This also means, for example, that the context for the operation is already given. If we call a specific method, and provide some parameters, it is obvious that the context is that specific method on that specific object, and we will be notified at once if the operation is not successful.

What is integration?

32

3.5.3. Middleware types

Very briefly, commercial products implement these integration methods in different ways. Message oriented middleware (MOM), not surprisingly, uses the message oriented concept. Historically this has gained its foothold via an integration method for exposing IBMs thirty or more year old mainframe systems, for which IBM developed the MQ Series which employs the MOM-concept into a queuing pattern. Similarly, message-oriented concepts can be used for synchronous communication if done correctly. Naturally you will see the vendors of MOM touting the benefits of this pattern, and claiming that it can be used for everything basically. Nobody but loose coupling purists would agree entirely on that, so again, this is not an either/or choice. Remote procedure call / distributed object technology are two generations of basically the same type of thinking, leaning towards interface-oriented concept. The difference between the two lies in the paradigm of programming the applications have been developed with, the newest being of course object oriented thinking. Object oriented middleware, not surprisingly, uses more object oriented concepts, where you instantiate objects, reference them in a certain way, call methods, etc, and RPC is more traditional procedures. Interface-oriented integration does not require synchronous communication as you might think. You could well have a sender initiating a communication asynchronously using the interface-oriented concept, and continue processing while the receiver administers the request and composes the reply, before sending the reply back to the sender. These are just a couple of the most central types; there are many, many more. The market is naturally fast moving and new concepts are being developed all the time. We have now considered a few of the concepts used to describe different types of integration, but there is a whole dimension we have not touched upon yet. A large part of the problem with integrations is concerned with the meaning of different concepts in the different systems. When unifying systems into a more coherent whole, we need to address the issues of meaning. More about that now.

3.6. Ontologies
In the previous couple of pages we have looked at concepts and models for integrating two or more systems. By integration we have meant the physical communication between two systems: sending data between the two. This is clearly an essential component, since there would not be much point in talking integration if the data could not be moved

What is integration?

33

between the systems. Another extremely important component in application integration is the whole semantic area.
When we said that a relational database schema defines an ontology, we were at the root of the problem. There are as many ontologies as there are databases schemata in the world and consequently ontologies are a hot topic. In a time of company mergers, enterprise application integration, Internet portals, and supply chain integration, ontologies clash fairly often and must be reconciled. In the past, the traditional tactic used with EDI was to negotiate bilateral agreements. However, this is hardly sufficient to meet the demands of a networked and globalized business community. (Daum, Merten 2003: 177-8)

When human beings communicate verbally, we use the atmospheric air as a medium for sound waves. We send vibrations of air between sender and receiver that carry the message we want to communicate. Many sub-processes are involved in an every-day communication, one of which is the interpretation at the receivers end. When the message comes in, it needs to be assigned some meaning. The communication will not be successful if there is not an agreement on what the meaning of the communicated elements is. If a Dane tries to speak Danish to a Chinese person speaking Mandarin they are not likely to be successful in terms of verbal communication. In this case there is no semantic overlap; they use completely different words, and they might not even be able to speak of the same things with an interpreter if they come from vastly different domains with collections of concepts that simply do not overlap. Another problem could be if two Chinese people come from two different contexts; they use the same words but maybe they assign them different meanings. This could potentially be a more difficult situation, because one might assume that we agree on meaning when we actually do not. In short there are the following four situations: 1. 2. 3. 4. Different words for different things Different words for the same things Same words for different things Same words for same things

Exactly the same problems occur when we want applications or systems to communicate, and all are problematic if we want to achieve a transparent communication, which is what we usually want when talking of application integration. The first and last one can be troublesome because you have to establish in some ordered way that they are in fact the current case. There is, therefore, a challenge that extends beyond physical communication to one of semantic integration.
Clearly, semantic heterogeneity and divergence hinder the notion of generalization, and as commonalities of two entities are represented in semantically different ways, the differences are more difficult to see. Thus, ontological analysis clears the ground for generalizations, making the properties of the entities much more clear.(Linthicum 2004:395)

What is integration?

34

Traditionally, in philosophy of science and related fields, an ontology could be defined as a set of assumptions of what the world is and how it works. In the world of IT and applications we often map the world to a data model and have certain ideas of what associations exist between the objects in that data model, and how the objects can be manipulated. In the context of applications this is an ontology. An ontology is therefore a semantic model that defines the assumptions of the world that the application is trying to cope with. When applications have different ontologies, problems arise, just like when people have different ontologies, or when scientific traditions have different ontologies. There is a noticeable dissimilarity between humans and machines. So this discussion is not about what happens in the actual interpretation, and the point is not whether an application can or cannot reason (although application integration technologies that deal with semantics do have roots in artificial intelligence, so the issue is not far off). But let us take a practical example: What is the single largest body of unstructured and messy knowledge? Of course, the web. Within the past few years initiatives have sprung up to address this issue. If the amount of information on the web continues to grow, and it will, we will most certainly need to create some order, so we dont drown in oceans of information when we are just looking for that single drop of relevant information. The public World Wide Web has several huge problems, one of those is data format, and SOA thinking also tries to remedy that. HTML is characterized by mixing content and structure with formatting and presentation. This way of formatting information is bad news for application integration. Applications are not usually concerned with what data looks like (font, size, etc.) but much more concerned about what the information means. The actual technical implementation of the need for some sort of semantic order will hopefully at some stage become a full-blown codified network of correlations between concepts. These initiatives are about creating a meta-layer of structure that assigns meaning to concepts by relating them to other concepts. The last part will be able to document for instance that: 1) mother is sub-concept of parent and 2) X is mother of Y, and then autonomously reason that 3) X is parent of Y. Work on this has been labelled the semantic web and is one initiative that could promote the true vision of the SOA. These technologies are useful for application integration since the problems are much the same, but on a smaller scale. For the time being we will simply manually agree on naming conventions and use schema to define the structure, much like designing relational tables with the important enhancement that you can have a complete structure (spanning multiple tables in relational thinking) in one XML-based message.
Using these Web-based standards as the jumping-off point for ontology and application integration, its possible to define and automate the use of ontologies in both intra- and intercompany application integration domains domains made up of thousands of systems, all with their own semantic meanings, bound together in a common semantic meaning of data. (Linthicum 2004:401)

This is the long-term vision, and not a reality today, but the technologies are promising. When you have established an ontology framework, you have defined what concepts exist in the different domains or, stated in a different way, what classes and methods

What is integration?

35

exist in you applications and how they interrelate. But just to recap, the two situations that pose the biggest problem are different words for the same things and same words for different things. Now there are two basic strategies you can pursue: 1. You can try to achieve same words for same things or different words for different things. This is the ideal situation where there is no need for semantic translation the applications use the same language so no further processing is needed. If you for some reason are stuck in one of the problematic situations, you will need to translate concepts during the communication. This process is known as mapping.

2.

To achieve the ideal situation you will need to build a metadata model6 for your organization or domain of control. In this model you give all you data labels, define constraints, and relationships using XML Schema for example as part of the puzzle. This approach has the important requisite that is access to all the implicated systems. And access means that you can alter the internal workings of the application to use the shared meta model or at least expose the systems using a distributed concept, e.g. web services, in a way that adheres to the meta model. In some situations you do not have this access to the applications. This could be due to lack of resources or willingness to invest, or could simply be because it is a third party application, and the vendor is not willing to open up. These are the same reasons as those for using middleware on the physical level. In this situation we must use semantic middleware to do the translation. This is called a mapping server:
Mapping servers store the mappings between ontologies (stored in the ontology server). The mapping server also stores conversion functions, which account for the differences between schemas native to remote source and target systems. (Linthicum 2004:398)

A mapping server will give the integration effort the missing piece, and is the translation hub of the communication. The problem could be more complicated though. If there is no possibility of 1:1 mapping of concepts, the problem will take on new dimensions. For example if one system has an object called mother but the other system only has a more general term, e.g. parent, you need to relate the concepts. This is fairly simple as you can construct a simple relation between the two, but the programme logic of the two applications has to support the change of concept; it has to make sense to use one variety of a concept in one context and another variety of the concept in another context. This is often a hurdle of magnitude. When two applications have been modelled for different purposes, they more often than not employ different concepts. If your semantics are far apart, it makes everything quite a bit more difficult. When we have different words, we

See http://www.eaijournal.com/PDF/MetadataLinthicum.pdf

What is integration?

36

consequently have different worlds. Now these worlds definitely have overlaps (otherwise there would be no point in the integration in the first place) but when the actors orientation in the field is determined by specific words, it causes organizational turbulence to change them. The problem increases in severity as the concepts become more and more complex and composite, as we will notice when we look at our case. From an integrational point of view, the ultimate ideal would be if all parties agreed on meaning, and were able to make an ontology map, thus making the relationships between concepts explicit. Having only the physical integration left, the problem is half solved, requiring much less insight into the minds of the people in the different domains. This translates into less process work and less change management. This is utopia. In a slightly less ideal world we have a coordinated metadata model which all parties agree on, and which is continuously being phased in by everyone, and naturally steadily developed as requirements shift. This metadata model is dictated by a central body, which has the mandate to decide on standards across the full application portfolio.

3.7. Levels of integration


But let us recap the different dimensions of systems, and deduce from that the possible levels of integration. It has to be stressed that the following table is a simplification, as it is subject to the complexity of reality that cannot be reduced to a simple table; it does not always make sense to speak of levels of integration in an abstract way. Low integration Integration model Integration capability Communication model Integration method Time Platform Standards Ontologies/semantics Presentation layer One-way, read Asynchronous Message oriented Non-real-time Different Different Different
Table 2: Levels of integration

High integration Application layer, Data layer Two-way, manipulate, update Synchronous Interface oriented Real-time Same Same Same

This is the simple presentation of integration. What really would be more realistic would be to talk of appropriateness. For example high latency networks would make message oriented integration methods more feasible and asynchronous communication more suitable. This is just to stress that more integration or tighter integration is not necessarily better, even though it might be closer to the ideal definition of what integration is. Actually, SOA, and newer thinking, looks to looser couplings between systems, and views this as the ideal.

What is integration?

37

A type of integration that falls outside any of the categories we have touched upon is invocations. Think of invocations as shortcuts between applications. If you are working on an entity in one application that has a parallel entity in another application, an invocation would allow you to instruct App1 to open App2 with the current entity open. For this to work, App1 would have to be aware of App2 in some way. This might just mean being able to one-way access the database underlying App2 to extract the data needed to make the shortcut. Notice that no manipulation of any entities in App2 takes place, so there is no integration as such, simply optimization of the user interface; the user could manually open the other application, find the relevant entity and continue processing there, but this is just faster. Note also that this is not presentation layer integration, since no transfer of data happens programmatically.

3.8. Summary
In this chapter we have glanced at some of the questions that need to be addressed when talking of integration. We have looked at integration models, communication models and integration methods, all as part of the physical integration. We have seen how even scratching the surface reveals quite a few important questions that will influence the dynamic of a possible SOA. When should we use an interface oriented concept asynchronously for example, and when should be use the more loosely coupled message oriented concept? Lastly we have looked at the other half of the integration problem, the semantics, which might be easier to define as a problem than the physical dimension, but not easier to solve. We turn now to our case, to see if we can find any possible courses of action for the integration of an ECM system with systems from KMD.

Case: ECM

KMD

38

4. CASE: ECM KMD


In this chapter I will be presenting an examination of an actual integration scenario, namely the integration problem of the FESD systems with systems from KMD. The point of this exercise is to shed light on the completeness of the standards and recommendations from the Danish architecture work, to see how they hold up in a real life situation and also to structure the problem the municipalities are facing with regard to integrating the ECM systems in the FESD standard with existing KMD systems, and to learn from that. We will start by looking at the context of this integration scenario, including a brief look at the actors, to see how they influence the situation. After that we will explore what the business case is, the functional goals for the integration between an ECM system from the FESD work and the systems from KMD. Continuing on, an analysis of the FESD standard will show what the standard covers. In conjunction with examining the FESD standard, we will study an ECM system that is an implementation of the FESD standard. Moving along, we will consider the KMD systems that are at the other side of the table in our integration problem before we look at a snapshot of the current situation for a typical municipality with regard to other existing systems. This snapshot should show the current systems employed, but should also give an indication of the context that a municipality is, in terms of complexity. This will provide us with a basis for describing the issues that exist in the case, and we will lastly look at possible strategies for addressing those issues, and solving the integration problem. At the conclusion of this chapter we should understand how the FESD work is influenced by central standardization work, and how this affects the products from the FESD project, which in turn should ensure reliable and healthy integration.

4.1. The actors


First let us look briefly at the handful of parties that are involved in this problem, as I assess it, after having spoken to them all. We will not be engaging in an analysis of how the actors influence the integration problem, but this section should give a brief impression of the surroundings of the problem and underscore the obvious: that the actual technical integration is merely one of other dimensions to the problem. The municipalities interests and goals are presumed primarily, plainly, to be working systems. But naturally the municipalities seen as a whole, the individual municipalities and responsible management, would like to be viewed as successful. Prestige is consequently an influential factor. I presume the interests and goals of KMD to be mainly

Case: ECM

KMD

39

profit maximising, as the general logic that governs a private enterprise. Although owned by the municipalities in principal, the power balance seems to be in favour of KMD. There is a slightly odd relationship between KMD and the municipalities which are their owners and clients at the same time. This does lead to a conflict of interests with the seemingly dominating part viewing the municipalities as clients. Then we have the vendors of ECM systems like Software Innovation, CSC and Accenture. They are clearly also profit maximising, and this of course is perfectly legitimate. The FESD group that is chartered to provide a consistent FESD standard is relatively close to the political sphere and operates on a strategic level. They will provide something that might be called a specification for a standardized ECM system (the FESD standard), which is an actual document and in that sense very operational, but is heavily influenced by the strategic visions which in turn are influenced by political surroundings. This definitely influences the rhetoric used in that context to describe problems, solutions and systems. KL are assumed to want to underscore their raison dtre and thus see an opportunity for advancing and strengthening the municipalities. This is greatly encouraged by the municipalities. When dealing with these parties it should at least be mentioned that the problem has several facets that influence the dynamics of the problem and the actions of the parties. There is no question that the overall position of the municipalities and KMD is influenced by politics. Both parties live in a domain which is heavily politically loaded. The field of IT integration is of great importance to the inner workings of a municipality, but it seems that there has been somewhat of a vacuum of controlling powers other than KMD in the field of IT in general. This is gradually changing, and pressure is mounting on the municipalities to organize themselves and to become a more equal counterweight to KMD. The pressure felt by KMD is in all likelihood also closely knit to its business objectives. KMD, which is owned by KL, is widely considered to be the more powerful of the two in terms of dictating agenda and the market. These political and business objectives clearly obstruct a view of the naked truth of the problem and a rational analysis leading to a sound solution. These are simply the terms in which we must view the problem. The wider context that both the municipalities and KMD are part of is the legislative prescriptions and constraints. The municipalities have to abide by the laws that govern how fast they have to complete certain tasks, what information can be passed between systems, who is allowed to see certain information, when certain information has to be destroyed, et cetera. These constraints also have to be considered in the proposed architecture and plumbing of systems for the public sector in general. Last but not least, there is the practical dimension. Here we have all the questions of what can be done and how it should be done to achieve maximum efficiency, quality and flexibility.

Case: ECM

KMD

40

4.2. Functional goals


The determination of functional goals is obscured by the uncertainty of external factors and conflicting views of the role of the ECM/FESD system. It is clear that any problem of integrating systems depends on the actual context and organization. As we surely have established by now, application integration is not a clear-cut concept. It covers a whole family of technologies on different levels, with different approaches, and so on. To be able to choose between them, or at least prioritize between approaches, we have to know what goals we have for our integration efforts. This poses a challenge, because, in interviews with key stakeholders, there does not seem to be consensus on what has to be achieved, and in several cases stakeholders explained that the needed competencies to analyse the situation and make informed decisions are simply not present. But who should decide what the goals should be? The municipalities are still very fragmented, so individually they do not have much power, but the municipalities organization, KL, would be an obvious body to put this challenge to. According to Tom Bgeskov, initiatives are on the drawing board to address precisely the problem of defining exactly what functional goals are needed. For the purposes of this work, we will have to make more general recommendations, building on more general functional goals. The municipalities do have some goals for their integration efforts. In the broadest of terms these are, not surprisingly, not that different from the goals they have for the IT systems themselves. Basically the goals of integration efforts are to enhance the positive effects of their IT systems, or at the very least not to interfere with or degrade the positive effects of the existing IT systems. In a survey conducted two years ago by Lars Hougaard (Hougaard 2002), Mr. Hougaard asked all Danish municipalities to respond to a series of questions regarding the use of ECM systems. One question asked was what the overall reasons for introducing ECM systems were. The responses can be seen in Table 3 (my translation). What would you consider the primary goal of introducing ECM systems to be? 1st priority N Because it can provide better service for our citizens Because it can provide higher efficiency Because it can provide higher quality Other Total
Table 3: Goals for ECM systems

2nd priority N 27 20 16 3 66 % 41% 30% 24% 5% 100%

% 16% 39% 35% 10% 100%

18 45 41 12 116

Perhaps surprisingly, higher quality is stated as being almost as important as efficiency. These figures should be seen in the political context that does not promote an emphasis on hard verifiable goals. Municipalities and their management are risk adverse, and are more likely to point to non-efficiency type of explanations that cannot easily be verified,

Case: ECM

KMD

41

so as not to expose themselves to failure. Nevertheless, higher efficiency and higher quality does account for 74% of the respondents 1st priority, so let us have a look at what that means for the way IT systems should work. During my interviews with representatives from the municipalities and KL, I identified the following functional goals of integration efforts. They are grouped into categories but naturally have overlaps.
4.2.1. One way integration

To a large extent, quality of work means making the right decisions when processing requests from the public. This, again, translates into having relevant information presented to you in an effective manner. The individual employee is not, and should not be, concerned about the origin of the information, simply that the right information is provided at the right time. He or she should not be required to access multiple systems if the work at hand requires an overview of information. In other words, the necessary aggregation of information should be possible for specific business processes. There is a need for access to and viewing of relevant information aggregated across applications. According to Danish law, a citizen can request information about what information a public sector organization has on him or her. The nature of these requests might force the employees in the organization to retrieve information from multiple cases. An ideal system would allow for searches based on a range of parameters that cut across the actual logical organization of cases. This is not entirely an integration issue, as it can be a problem within a single system. Ironically, application integration technologies can be used to integrate an application with itself, where you extract data to make an altered data model that is more flexible and for example allows for advanced searches. There is a need to be able to structure or view data in multiple ways. Documentation There is a clear need for strong documentation in the ECM system. Being good public sector workers also means being accountable for decisions made. Being accountable necessitates the right tools for actually accounting the processes and their product and then making this information readily available. For these accounting tools to be used they have to be easy to use and have to record the basis on which a decision was made. This requires the possibility of taking snapshots of relevant documentation that might change over time. One system might not be designed to support the concept of freezing states so that you can go back historically and see the changes made to an object. If this is the case one possible solution is to integrate with an external system that allows you to pull information out of the system and record it in timeline fashion in the external system.

Case: ECM

KMD

42

So there is clearly a need for strong documentation, and it is highly likely that this is an application integration problem. Greater transparency In a more administrative context, there is a need to be able to manage the ever increasing amounts of data and information and to be able to transform that into valuable knowledge about how the organization is functioning. Danish public sector organizations have to transfer vital data about the information they have in their systems to a central archive. This normally happens once every five years. There are obviously legal obligations that the organizations have to abide by, and this is a challenge in our context. When the legal requirements for information are not met by one system alone, but are distributed among multiple systems with different functions, an eventual consequence must be the need for some kind of aggregation. The need for greater transparency for management and analytical tasks as well as the need for a structured aggregation of data could be met with a data warehousing type of concept, which is not the basis for day-to-day synchronous work. There is a need for systems serving the legal requirements, as well as systems serving the managements need for business intelligence, to be able to consolidate data into one off-line, asynchronous structure for further analysis and manipulation
4.2.2. Two way integration

Having multiple systems does however beg the question of how to handle functional overlaps. In other words, we need to clarify what to do if an employee can perform the same tasks in two different systems. Embedded in IT systems in general is a certain logic that governs the way business is done, which concepts are adopted etc. Having multiple IT systems with functional overlap would be less than ideal. In principle it would be best if no collection of systems had more than one place to do a specific task, for a certain employee. The reality is that some information can be viewed in multiple applications and the same information naturally occurs in multiple contexts. As a consequence, changes should be visible in all the different contexts the information occurs in. In other words: There is a need for tighter, two-way, or high granularity integration, not simply exchange of business documents, where data can be manipulated from multiple applications. Shorten business processes The overall goals are concerned about removing the hard work from the business process that does not add value to the product, that being a resolution to a query from a citizen or other public office. All business process steps that do not add value should in

Case: ECM

KMD

43

principle, if possible, be automated. This is an efficiency perspective but is naturally closely knit to quality, as the premise would be that jobs are not cut, and consequently there would be more time to improve the quality of the work. I am not entirely sure that there is not a hidden agenda there, but cutting overheads is a clear goal for whatever reason. There is no point in adding steps to processes, if they can be automated or greatly reduced. There is a general need to shorten business processes and to cut non-value adding activities. Minimize manual data entry A central driver for the use of IT systems in the first place is efficiency. The introduction of new IT must therefore not interfere with existing systems achieving that goal. A practical example is that the introduction of a new IT system must not mean that manual entering of information should be done twice. I mention this separately from shortening business processes in general because it is viewed as vital, and manual data entry is just one of the ways business processes can be shortened. This means that data entered in one system should be automatically available in other systems. This is a simple and clear objective the need to tightly integrate systems and propagate changes throughout the application portfolio plus no double manual entry of data. Data entered once in one system should be available to all other relevant systems. There is a need for replication of information between applications, so information resident in one application can be viewed in another, providing there is a clear consensus on which application owns the information. These are the core goals for an integration, and any integration scenario should be evaluated against them.

4.3. ECM system


Let us turn now to one party in the integration scenario. Beginning with a look at the FESD project, we will look at the main objectives and methods of the project before going on to looking at a product based on the FESD standard.
4.3.1. FESD project

In broad terms, the goal for the FESD project (Den Digitale Taskforce 2004) is to become the basis for a standard specification for ECM systems in the public sector and to provide especially smaller public organizations with lower barriers to purchasing these large systems. Purchasing an ECM system is certainly not a small task, since it will be the central working environment for many of the staff. This is a daunting task for smaller organizations, e.g. municipalities, and might deter them from buying altogether, or maybe going for products aimed at a lower-end market.

Case: ECM

KMD

44

The other main objective of what is to become the FESD standard is interoperability between the different public sector organizations. The idea is that, when business processes operate across organizational boundaries, systems will have a consistent way of describing cases and metadata (end eventually also business process logic), so that the goal of obtaining a seamless integration becomes more realistic. An essential part of integration is therefore the concept of standardization. In previous chapters we have looked a little at the models, methods, etc. that are the building blocks in every integration. The problem is of course that we have multiple systems, which are exactly that: multiple systems, which do not work as a whole. The converge strategy of standardization tries to move to systems closer together conceptually, and will therefore be easier to integrate. The connect strategy tries to map out what to do when we are dealing with different systems. The FESD standard would like to do both, thus become a standard for ECM systems from different vendors (converge) and the FESD standard should also cover how these systems integrate with each other and other systems (connect). These are the two standardization goals. The FESD group has formulated a few objectives for the standardization efforts (Software Innovation 2004)7. They include: Shared concept of a case Integration formats between ECM systems and external sources Exchange of case- and document information between public sector organizations, including shared work flow interoperability. Transfer of data for central, long term, storage. The way that the FESD project group has defined the specification is by starting at a strategic level. This is what you always hear: IT systems should be coupled with strategic business objectives. A consequence of this is that the FESD specification consists of highlevel functional definitions (the user/system should be able to do a, b and c) with only general descriptions of how these should be implemented on the lower technological level. In practical terms, the FESD standard has built upon the work done by the Norwegian government which resulted in the NOARK-standard (Riksarkivet og Statsarkivene 2003). The NOARK-standard has been around for quite a while, and is currently at version 4.1. Some of the main reasons for initiating the NOARK work were to ease the transfer of data to a central repository in a specific format for long-term storage, and to consistently define what was to be kept and what (generally personal information) had to be de-

I am quoting Software Innovation since all bids submitted by the vendors have a standard text written by the FESD group.

Case: ECM

KMD

45

stroyed for legal reasons. As the standard has evolved, it has tried to keep up with the movements in IT and IT architecture. The latest revision specifies, among other things, the use of XML as message format. Although the Danish specifications for the FESD standard are just beginning to trickle out to the public, it is widely acknowledged that the data model of the Danish standard will closely resemble the NOARK data model. This means that the basis for the ECM efforts in Denmark will be based on a data model initially developed in the mid-eighties when application integration and interoperability were not prime considerations and not at all a mature field. Naturally the standard has evolved over the past fifteen years or so, and is therefore a very robust, albeit somewhat conservative, approach. Nevertheless, it was the one standard with a proven track-record in the ECM field, so it is very understandable that the Danish efforts have begun there. The NOARK-standard basically defines the following: Description of the information the ECM system should be able to record and export. High-level specification of functionality seen from a user perspective (i.e. what should the system be able to do, not functionality in an application logic sense). A core data model that all implementations and vendors are forced to adopt. The data model defines the detailed specification of the information in the system. Vendors develop platforms that may be submitted for certification with the Norwegian authorities and, if successful, are granted permission to label their product Noark-4.1 Certified. This means that Norwegian public (and private) sector organizations can rest assured that certified products adhere to the core requirements in the standard. The situation with regard to the FESD project is very similar; it also includes a core data model and high level specification of functionality.
4.3.2. Role of ECM

The practical result of the FESD project is three products from three vendors or consortia. When looking at the case for integrating an ECM system with systems from KMD, we will pick one of the three as a representative of the FESD project. We do not want to treat all three systems as separate systems, because what we want is to make general observations. Examining one system and its integration scenario should indicate what strategy the municipalities should pursue in tackling the issues of integrating their ECM systems with their KMD systems. Of the three, the most complete technical descriptions were made available to me by Software Innovation, so we will have a closer look at the Software Innovation system Public 360 as one part in the integration scenario. Let us first have a short look at what role the FESD/ECM system should have, and then look at how flexible the actual ECM system is.

Case: ECM

KMD

46

Different views are held on the role of the ECM. One view is that it should be the backbone of the daily work of all employees, and therefore be the presentational hub, where all information is accessed and updated. A different perspective is that the ECM systems should be used mainly by administrative staff and management, while the actual case work should be done in the native KMD systems. The FESD group sees the role of the ECM systems as the central hub of information where the entire audit trail is recorded (Tom Bgeskov 19:50). This requires that all cases be initiated and terminated in the ECM system, while jumping to other systems between start and finish to do additional work. A practical scenario is described in the following: When a case is initiated, be it from when a letter is received by mail or a form is filled out on the municipalitys website, it must be documented in the ECM system. It cannot be integrated straight to the KMD systems even though this would be easiest unless there is a way of freezing some image of the transaction into the ECM system. After you have created a case in the ECM system and jumped to one of the KMD systems and done some calculations there, the case might result in a financial transaction such as a payment to the citizen. This transaction has to be connected to the case in the ECM system in order to be able to maintain the audit trail. The audit trail including transactions should be accessible via the ECM system This means that the ECM system is the primary point of documentation that includes static snapshots of other systems (e.g. the KMD systems), while a great deal of the actual work is done in the respective KMD systems. According to Jacob Kanto, the day-to-day work should be done in KMDs systems since this is where the employee has access to systems relevant to the specific case, for instance granting planning permission for a new building. Consequentially, the role of ECM systems is where document routing and work flow happens. This is also where the audit trail is formally placed: who does what, on what grounds, with whose authorization etc. This view of the ECM system as the backbone of information and business processes is backed by the FESD project group and the municipalities I have spoken to. KMD and certain voices in KL consider the ECM system to be another functional system alongside the KMD systems, but aimed at administrative staff. I will consider the role of the ECM systems to be the backbone of the information flow, since the most authoritative voices agree on that, including the vendors producing them. On a side-note, it is worth mentioning that many of the smaller KMD systems are actually very simple systems that mainly do some straightforward calculations. These could just as easily be done in the ECM system in conjunction with a work-flow engine, which is most often already present. This is another pressure point for KMD. Having established the role of the ECM system, let us turn to look at an actual ECM system in the FESD project, and how it lives up to the expectations of flexibility.

Case: ECM

KMD

47

4.3.3. Software Innovation

The vendors core technical specifications are not public given that they are considered trade secrets, but I have been lucky enough to be allowed to inspect the material. It is ironic that, arguably, one of the most important aspects of the FESD standard, the technical architecture and ideas for integration scenarios, is not publicly available and open to public scrutiny. In the introduction to the technical specification, it says (Software Innovation 2004:21, my translation):
Integration and interoperability are some of the most important standardization efforts and will also result in important improvements in efficiency in those organizations that introduce the system. Public 360 [the Software Innovation product] is in its architecture built for interoperability, with a technical architecture based on web services. The system is therefore able to support integration on a number of levels, including: between internal applications (system integration), between public sector organizations, to/from citizens, to shared data repositories

This does sound good. Let us dive a little deeper and see how this can be done. Basically, Software Innovation later says that they support integrations on presentation layer, application layer and data layer. Presentation layer integration can be achieved by using an emerging technology called WSRP portlets (OASIS 2004b) which OASIS describes like this:
Web Services for Remote Portlets (WSRP) has defined a standard for interactive, presentation-oriented web services. (OASIS 2004a)

This is not like the more primitive screen-scraping methods, which basically parse unstructured presentation formatted data. This is, as the OASIS quote says, a consistent way of exposing parts of the system as little pieces of presentation that can communicate with the back-end system. Essentially, it allows you to take a piece of the user interface out and present it in a different context, perhaps in a different application. This is just a visually oriented technology that allows you to make a collage of portlets, but without actually integrating the different portlet sources on an application or data layer. We could call this visual integration, which could help meet a need for having the big picture in case work. In principal you could use this method for accessing the application and data layer, but there is no real reason to, since this system also exposes these layers. The application layer can be accessed in a number of ways. There are web services available that can be invoked directly. This requires that the other party is specifically set up to communicate with this products proprietary web services. Direct web service to web service communication is thus only relevant when two Software Innovation systems are communicating. It is also possible to create objects that proxy remote objects directly, without using web services in the message oriented way. This is not generally to be recommended since this again would involve hard-coding at the other end, and be inflexible

Case: ECM

KMD

48

if changes are made a general goal is to be resilient to changes at either end of an integration, in other words allowing for expansion. It could be necessary to code against the legacy web services or directly on the objects if the goal is true tight integration (see section 4.2.2) which is the case if you are trying to pursue the data consistency approach (see section 3.3.2). Where tight integration is not the goal, e.g. if you want to exchange business documents (whole cases, individual documents, etc.), the strategy of tight integration is not beneficial since the overhead of transaction costs will simply be too high. In this case we need to employ a standardized format, and Software Innovation point to the DokForm standard. What this means is that we abandon the point-to-point concept of using the systems own web services and instead use a standardized document format. This can be described as a logical star topology (Figure 11) where the individual ECM systems do not communicate directly, but via a broker. Software Innovation points to a BizTalk8 server since they operate on a strictly Microsoft platform.

M EC M EC M EC

M EC

BizTalk

M EC

M EC M EC

M EC

Figure 11: Star typology

In practice, the events would follow these steps if an ECM wants to send a message to another system from a different vendor: 1. 2. 3. The sending system uses its own message format and protocols to communicate with the broker, in this example a BizTalk server The broker translates the incoming message to a shared format, for example DokForm, depending on the specific content. The broker routes the message to the receiving parties queue.

See http://www.microsoft.com/biztalk/

Case: ECM

KMD

49

4.

The message is delivered in DokForm-format to the receiving party or converted to the receiving partys own format if applicable. This depends on the receiving partys capability of reading the DokForm format natively.

Using the DokForm format means that we are able to route cases, and descriptions of cases, between ECM systems and other systems that are capable of reading the DokForm standard. We will return to a discussion of the use of business document routing as a strategy for systems integration in the next chapter. Moving on to data layer integration, if you would want to access the data layer directly, this is also supported via standard ODBC-adapters. To sum up, it can be said that the system uses quite robust standards, and allows for a good degree of flexibility. It must be stressed that none of this has been actually demonstrated, as there exists only very little actual experience with ECM system integrations. The problem that the ECM vendors state again and again is that the interfaces they must develop their systems to use are not defined yet; they do not have sufficient information to actually do the integrations. The way the vendors deal with this is simply to say, We can do anything you want, but this solution is not very operational since we all know that everything is possible if your pocket is deep enough. The ECM systems are, generally speaking, recently developed and employ modern models, which allow for the goals of the integration efforts to be completed with relatively little effort. In the light of the ECM systems role as the central repository for all decision influencing information, an important point is their ability to accept inward data, i.e., receive data from other applications. When viewing the whole network of information sources, I do not consider the ECM systems to be the bottleneck on a technical level.

4.4. KMD systems


Describing the KMD system portfolio in an adequate way is not easily done if you include their internal workings and architecture First of all, the sheer volume of different systems does not allow for a simple overview. With 2500 employees and a history of thirty years, KMD has managed to produce a great many systems. Many of those were conceived at a time when strict division of labour between functional areas was the norm, and there has thus not been any real focus on cross system views. Many of the systems that exist in the KMD portfolio were developed in the late seventies or early eighties, so the existing systems naturally do not use best practice architecture of today. The organizational structure at KMD still reflects this, with their departmental silos (bluntly put) dividing the organization according to field of expertise. The KMD systems seem to have undergone an almost organic development during the past 25 years without the presence of stringent modelling of data and metadata. Recently, KMD initiated a metadata model project, mainly to facilitate code reuse for efficiency reasons.

Case: ECM

KMD

50

Seen from the viewpoint of the municipalities, there are over 200 different systems from KMD that have very well-defined (narrow) functions, but are not thought of in the context of each other. In other words, the systems are not integrated in any way. KMD is moving towards a more modern architecture by putting web services in front of the older platforms. The older platforms, by the way, are not obsolete. They offer unparalleled security and stability not offered by any other system, so it is not simply worthless antiques that are being used. Security in general and in particular stability are essential qualities of systems running the public administration, but this is not the fastest way to a SOA, which in general does not have the same level of resilience. KMD has begun to think across all these different systems. Some time ago KMD developed Sagshenvisnings- og Advissystemet (S&A) which in essence allowed the case worker to be notified of events from all the underlying functional systems. This means that S&A has some integration functionality across existing functional products from KMD. KMD is developing S&A into KMD Sag, which is supposed to be their bid for the ECM market. KMD clearly has the advantage of unhindered access to their own application and data layers, but they are lagging behind the newer more nimble ECM vendors. As an example, the ECM solution offered by KMD Sag today is only able to handle two document types, Word documents and scanned documents, so if you want to file an incoming e-mail, you have to print it out, send it to the scanning department, and have it attached to the case! This is hardly functionality anybody is taking seriously. On the other hand, case workers tend to be conservative and may not be the people most open to change. Stick to what you know is the message from KMD, and it is not going entirely unheard. Seen from the perspective of the municipalities, taking a snapshot of the KMD systems today is really quite simple. Right now we just want to know what the municipalities have in their domain of control. The only exposure the municipality has to the systems is text-based terminal data, flying through a secure network. The municipalities use a terminal client to access the systems and all processing therefore happens at KMD. There is no decentralized processing outside KMD, and therefore no access to either application or database. Just for the record, as far as I have been able to find out, KMD primarily uses IBM MVS mainframe systems as their back-end system, running on DB2 databases. The bulk of the application runs off the mainframe and is written in COBOL code generated by Cool:Gen. They also have a number of systems running on PL/1 (an old IBM developed language), and for back-end integration they use a message-oriented concept based on IBMs MQ Series. On top of this they have lighter .NET application servers publishing application logic as SOAP-based web services. The application logic published as web services is quite fine-grained and moves to interface type integration methods. The presentation is usually kept in 3270 terminal type interface, but some indications have been made to start a transition to a graphical web based interface. Using the web services

Case: ECM

KMD

51

requires deep knowledge of the system and a fair amount of insight into the legal background; the applications are almost always implementations of legal requirements imposed on the municipalities, and use of the application would be difficult to impossible without some knowledge of these. Towards the lighter front-end application servers they normally do not use standardized integrations, but develop the integrations themselves. Today only their own applications use the web services provided, and in KMDs perspective this is a trusted domain. If we were to envision an opening up from KMD, it would require building an infrastructure that can authenticate other parties and access control the web services. But right now, all this is of little importance to the municipalities, unless KMD decides to change their strategy of not allowing access to the application, or any other kind of opening up.

4.5. System portfolio


To be able to get an impression of the context this problem has to be seen in, we have to chart the current situation for a municipality with regard to existing systems, platforms, languages, interfaces, protocols, etc., etc. This is an extremely difficult task as no municipality I have been able to identify actually has this knowledge. Decisions on what systems are needed and decisions on purchasing a system are left to the departments paying for them. The process of buying a system for a specific purpose in a municipality only involves IT staff when an agreement is made on how to host and maintain the application. In other words the IT department is chiefly involved in very operational considerations. The absence of an enterprise-wide plan that each system must be qualified against leads to the disorder that seems to be the norm. To get an impression of the world the municipalities inhabit, in terms of IT systems, I will present a typical municipalitys application portfolio (Hougaard 2004). This will give us an impression of what context the issues of integration have to be seen in. The following data includes very different systems from office automation applications, like the Office Suite, to infrastructure systems such as Active Directory and Citrix Metaframe. These systems do not serve a specific task, but are general in nature. This does not make them less interesting, but it does generally mean that integration issues have been thought of earlier by the producers or vendors, since they are widely used across many industries. Most of the applications, though, do not fall into that category. Most of the systems are quite narrow systems that perform a specialized task and are often developed for that task alone.

Case: ECM

KMD

52

Figure 12: Systems grouped by vendor

Each of the small slices of the pie in Figure 12 represents one of the 159 identified systems across Vejle Municipality. This is a work in progress, so more systems are likely to be added in the future. What immediately become obvious are the two large blocks and the many very small blocks. The biggest block representing 40% (64 systems) is KMD. The second largest block is identified systems without positive id of the vendor, but they are definitely not the same vendor, and the rest are identified individual vendors. In summary, the picture is that, in terms of the number of systems, KMD represents 40%, and the remaining 60% are very fragmented with no one vendor having more than a small percentages. In terms of volume of transactions or work-time spent in one or the other application, there are no studies examining that. There is little doubt, however, that the KMD applications account for the bulk of the work done, and probably make up more than 90% when calculated as volume. This quote summarizes the situation beautifully:
Silo applications were paid for and built on behalf of departments in the organization and were tuned to their requirements. Typically, those requirements were to computerize paper-based systems designed to make that single departments life easier. [] new technology by itself does not solve the major underlying problem, which is managing complexity. The major source of complexity is that most organizations simply have too many standalone applications, each with its own presentation layer, business processing logic, and database. (Britton 2001:8)

Case: ECM

KMD

53

So what we are seeing is that the municipalities are apparently drowning in the complexity of the giant number of very different systems, spread out in every department, without central planning. The typical scenario in a municipality: When a department wants to purchase a specific system that fits a specific need for that department, they first approach the vendor, and then go through the normal purchasing process which varies according to the size and cost of the system. Naturally, if we are talking millions, there will have to be a more formal process, but this is still initiated decentrally. When the system is purchased, or when the purchase is being completed, the internal IT department is contacted for an agreement on the hosting and servicing of the application. The IT department then takes over the day-to-day management of the application. As we can see, the IT department is basically doing what everybody is asking, which in a sense is good, but it reduces the IT department to strictly operational tasks, basically a data centre with staff for servicing the applications. A separate goal for integration projects really to take off is to assure that strategic IT gets the attention it needs, at management level. This is a hard nut to crack. Even though state level policy is trying to dictate that more attention is paid to IT in the municipalities, IT still seems to be the least sexy field, e.g. in the municipalities. Trying to undertake the task of integrating all of these systems is a daunting prospect. As several municipalities say, it is quite intimidating to begin to tackle this issue, and any attempts are easily discouraged by the overwhelming complexity of the task. This is not to say that we would want to integrate all systems, but apart from a handful of systems that clearly either do or do not have a reason for being integrated, there is a grey zone of systems that are uncertain. The first task in application integration is to consider which applications need to be integrated and why. To answer those questions you need to know what applications you have in the first place. This is clearly where the municipalities should start, if they want to put their integration efforts into a wider perspective. We can conclude that integrating the ECM system with the KMD systems is the main issue, but there are definitely other related applications which should be taken into consideration. In many cases the volume of transactions the applications process will not make it worthwhile to start thinking of application integration. The most efficient way to go might be plain manual integration where employees simply enter the information twice.

4.6. Problems and solutions


Until now, we have spent a good deal of space setting the scene. We know a little integration terminology, in broad terms we know what we want to achieve by integrating the two systems, and we know where we are today. What we will do now is to look a little closer at how the present situation restricts us from going where we want to go. We would like to know what is stopping us from reaching our integration goals. Naturally

Case: ECM

KMD

54

we must begin with the present situation and look at how the current situation confines us, and from there move on to possible futures. The two big groups of constraints that specifically target this particular problem are the physical internetworking problems and the more semantic problems. Apart from those we have other important challenges concerning non-IT related issues, although also critical. We will first look at the physical problems in relation to our goals and then at the semantic problems.

4.7. Physical
Let us focus on how the current situation prevents us from reaching our goals in terms of solutions to physical problems. Concluding on the discussion of the role of the ECM system from earlier, the high level situation is shown in Figure 13. The user can access both the ECM system and the individual functional systems from KMD (in Danish: fagsystemer). Remember that the product KMD S&A which is being transformed into KMD Sag is aware of the individual functional systems. Due to the existing infrastructure between the functional systems and KMD Sag, the most obvious entry point into the KMD systems would be KMD Sag. Utilizing the business logic in KMD Sag should provide us with the most convenient access to the functional systems.

Figure 13: Basic integration scenario

The vision of the ECM system as the central hub for communication and coordination of business processes requires some sort of interaction between the two. The most significant barrier to achieving this in the ideal way is the very limited access to KMDs application or data layer. The only systems that have access to the KMD application layer are their own presentation layers. As noted earlier, the only existing entry point to the KMD systems, from the outside, is via the presentation layer. So unfortunately a presentation integration model is the only approach immediately available to the municipalities. One possible route around this problem is illustrated in Figure 14 below. The work-around would make use of a broker that simulates an actual user doing work on the KMD system. The incoming 3270-streams would be screen-scraped, which means parsed into a message format, probably some kind of XML, or presented to the ECM application layer as either a procedural or object oriented interface, depending on the system. The incom-

Case: ECM

KMD

55

ing stream is not structured, which is a problem, but presentation layer data almost never is. The up side is that 3270 parsing is relatively easy, and the ratio between actual data and overhead, like text formatting, is good. When screen-scraping against a web interface there is a lot of HTML and other messy stuff obscuring the data. Similar techniques are used other places in the public sector (the Virk.dk project) with some success.
ECM / FESD system KMD

Figure 14: Integration model possibility

The result would be a two-way integration allowing for everything the user is able to do directly in the KMD applications. In the ECM system you could choose to present the KMD system as web forms, essentially putting a new presentation layer on KMDs systems if the goal was consistency in form and layout and you wanted to avoid switching between many applications. I would not recommend this tactic as the long-term fundamental strategy, since it would be very maintenance heavy. Also, if KMD decided they would like to tease the other party, they could easily do so by slightly altering the screen format. So there ought to be some kind of consensus between KMD and the municipalities about the strategy, even though this particular strategy of using a presentation integration model would not require much cooperation from KMD, merely that they do not actively try to obstruct efforts.

Case: ECM

KMD

56

This is the most viable solution I can envisage with things being as they are with regards to KMDs openness. On a physical level this would allow for almost all the functional goals to be realized, including one-way pull of information, two-way synchronization of information. Obviously this would also mean minimizing manual data entry as this could be done programmatically. It would not be a good solution for data warehousing and business intelligence work because of the limited access to the KMD database, but one might argue that most of the interesting metrics (whether the efficiency of case work on average meets legal requirements for example) would be documented in the ECM system, so that could be done there. In summary the following tables show the connection between this scenario and our goals for the integration and the overall pros and cons. Goals Adaptability Lower coordination cost Service oriented architecture Shorten business processes Documentation Minimize manual data entry Two way integration Greater transparency Level of compliance Low Medium / Uncertain Medium / Potential High High High Low-Medium Low
Table 4: Level of compliance (physical)

Pros Fast: Relatively simple method. Does not require detailed knowledge of inner workings of the KMD applications. Cheap: Does not require major recoding of either applications.

Cons Brittle: If the incoming 3270-stream changes, reconfiguration will be needed at the broker. Limited: Business process logic would have to follow the available user interface. Functionality not provided by the UI would not be possible even though underlying functionality allows it. Limited: The integration would be limited to text. Slow: Performance deteriorates when going via the presentation layer, instead of going straight to the application.

Secure: Leverages existing security models, and is as secure as they are. Independent: Can be done largely without the involvement of KMD.

Table 5: Integration model pros and cons

Later we will look at the scenario for KMD opening up to the web services, and how to tackle that situation. For now, we are looking at strategies that reflect the current situation. Let us now proceed to the equally interesting and challenging issue of semantics.

Case: ECM

KMD

57

4.8. Semantics
As you may recall from section 3.6 on ontologies, integration issues are not solved by physical integration alone. When you integrate systems, you are trying to integrate two worlds, which have been conceptualized by different people, probably for different reasons. This is certainly the case in our situation. The overarching difficulty is that the ECM systems and KMD systems have been made to solve different problems. ECM systems are made to deal with relatively short decision making processes, where a case is initiated, normally in response to a request from the public or from another public sector office, is then processed, and is concluded with an assessment or ruling. Intuitively one might think that this type of work is common to all public sector organizations but actually it most closely resembles the work done at state level in ministries, and at executive level in municipalities. Case work done on the floor in municipalities normally leads to a decision, but the decision has consequences (or actions) that have to be executed even after the decision has been made. For example if the municipality receives an application for housing benefit (boligsttte) the initial case work requires an assessment of whether the applicant is eligible to receive this benefit, and, if so, the case will require that monthly payments are made to the applicant, so the case continues to generate activity. We will look into this and related issues in the following and try to give indications of possible routes forward.
4.8.1. Primary key

The primary id in the KMD systems is based on social security numbers (CPR). In ECM, the primary id are cases/documents themselves. This means that from the start we have a different outlook. The reason for this difference has to do with the different role of the ECM system from that of the KMD systems. The main difference is that that the KMD systems are mainly used for administrating people and transactions bound to people, and for archiving documents, while the ESDH systems are focused on processes. This will give a fundamental gap between the concepts that they use, and this will drizzle down to the underlying data models, and thus present problems when trying to integrate the different systems. In KMD systems, there is one record for every person for every functional type of system (e.g. KMD Boligsttte and KMD Dagpenge has one record for every person). This record allows the case worker to see all activity for a given person for a given type, e.g. the Dagpenge system will allow the case worker to see all Dagpenge activity for a person, but not much else. One person can have many related transactions and many related documents. In the ECM systems, there is one record for every event. The primary object is the case and the case can have multiple people related. People can naturally be related to many cases. You can naturally search across cases to find all cases for a certain person, but the cases will still be individual cases. This problem stems from the case concept which is fundamentally different.

Case: ECM

KMD

58

4.8.2. Case concept

Naturally the concept of case is central to the possibilities of consolidating the two systems into one, or even just to letting them co-exist. There are three basic concepts of cases in the context of the FESD defined ECM systems: Event based cases, which are organized around just that: an event. A case is started, typically in response to a query from the public or another public office. The case is processed and the workflow surrounding it is executed. The case typically closes with a response to the query. A case becomes a decision. Dossier cases, which do not have a clear ending date, as with event based cases. Dossier cases can be open and closed for a certain object, e.g. a person, and a certain type. When a decision has been made the case changes state to idle, but is not closed because for example transactions that are a consequence of the decision also will be bound to the case, even if the actual work on the case has been concluded. Folder cases, which can contain other event based cases or dossier cases. For example a citizen requests a specific type of benefit, gets it for a while, and then has an interval where the benefit is not received and then gets it again. This situation will be considered multiple cases by the event based concept, while using dossier cases it would be regarded as one case that has the states open/close or active/ passive. The ECM systems that adhere to the FESD standard do not have to implement one or the other of the first two, but there is an established convention to lean towards the first one, the event based concept. This is partly because the event based concept lends itself easier to analysis of business metrics, e.g. making sure that the organization is giving citizens an answer to a query within the timeframe set by law. Using dossier cases, you would have to process the case object further to obtain the necessary metrics. This is an issue because the KMD systems lean towards the dossier type of case. The KMD systems have a more narrow scope in this context, as their main function is in the administration of citizens cases, as opposed to the ECM systems which must not be confined to the thinking in the municipalities, and have roots in general business content management. This, Jakob Kanto says, is one of the most important problems when thinking integration and interoperability between the systems [Jakob Kanto 58:30]. The problem of differing concepts of case is illustrated in Figure 15.

Case: ECM

KMD

59

Figure 15: Case concepts

A dossier case will not be flexible enough to handle access rights to different parts of the case. The nature of a dossier case is that it is one case, so if you have access to it, you have access to the whole thing. This is not so much of a problem in the stove-pipe systems in the KMD world, because the systems are already functionally separated, so if you are allowed to view a system, you would also be allowed to see all the objects in that system. In a sense, a dossier case is a non-case since it does not have multiple cases for one person, but just one. An integration of the case concepts will require some kind of mapping between the two. In deciding between aggregating several records (seen from the ECM perspective) or splitting up one record (seen from the KMD perspective), the first is probably the most viable. The problems of primary keys and case concepts are the biggest issues jeopardizing a potential successful integration.
4.8.3. Solutions

As part of the work of building a complete metadata model for all systems, we obviously have to resolve the issue of conflicting concepts. A successful physical integration will not be of much use if the semantics are not resolved. If we remember our account of ontologies, one point made was that integration is not interesting if there is no overlap in the entities the different systems handle. In this case there is that link, and that is, of course, people. People or person-objects are the semantic overlap that seems most sturdy and straightforward. I will therefore base all my recommendations on this entity as the primary connector. I will not try to resolve the issue of case concepts directly, since, as I have noted, the case concept in the KMD systems is very different, and perhaps not really a case at all. The cases-problem comes under the category, same words for different

Case: ECM

KMD

60

things, and should continue to be thought of as different things. Let us explore what that means. In the user interface, jumps must be made at person level. It will not and should not be possible to jump directly from an ECM case to an object in the KMD systems. An ECM case has people objects related and this will provide the exit/entry point between the two systems. The semantic integration is naturally limited by the underlying physical layer; we cannot manipulate data from the KMD-data layer if we do not have access to it. The semantic integration must take into account the context of integration based on a presentation layer physical integration on the KMD side of the broker. The physical presentation layer integration will allow the broker to simulate a user and present information retrieved, as that simulated user, in the ECM system. We have to consider what types of information we want to transfer over the physical link and simultaneously consider the semantic integration. Let me give an example. If we decide that the semantic model will have a one-to-many relation that will return huge datasets and we have to access this data via the presentation layer, it will not be easy. So the physical and semantic integration setups place mutual constraints on each other. Within the physical integration model based on presentation layer, I would not recommend building an integration scenario which will require the broker to flick through many pages, simulating a user, and then return the results to the ECM systems. Latency would be too high. So taking into account the current situation with presentation layer access being the only way in, the most realistic scenario is integration based on shortcuts (as described in section 3.7). Seen from a users point of view, I see the scenario like this: 1. Expand ECM data model to include ForeignSystems (or something similar) where information about all the different KMD systems is stored. This table must be kept up-to-date at all times, but it is not necessary to have a real-time synch since KMD systems rarely are added or removed. For every case in the ECM system the case worker should manually set one or multiple relevant foreign systems that are relevant for the specific case. These types of things can be set by preference, so for example a group of case workers have a certain foreign system (KMD system) set by default. This process cannot be automated, since the ECM system cannot automatically know what the case is about, unless there is some business process logic that can define this (e.g. all cases initiated by e-mail to address xxx, should be flagged as related to system yyy). When working in the ECM system on a case, we would like to access relevant data; all cases have people related, and people have CPR-numbers. This information is sufficient for the broker to create a script dynamically that will enter the relevant system and find the correct entity. To the user it appears as if there is a

2.

3.

Case: ECM

KMD

61

connection; the user hits a button and the correct person and relevant details are opened in another window. The physical presentation layer integration model does allow for us to transport data between the applications, but also results in constraints that mean that the use of this channel for wide spread use and high volume data is not viable. Table 6 recaps some of the architectural and functional goals. As the table shows, this solution is primarily beneficial for the day-to-day case workers, who will see their work become smoother and potentially more efficient. Goals Adaptability Lower coordination cost Service oriented architecture Shorten business processes Documentation Minimize manual data entry Two way integration Greater transparency Level of compliance Low Medium / Uncertain Low High Low Low - Medium Low Low
Table 6: Level of compliance (semantic)

From a user oriented perspective the short cuts solution is not at all bad. The down-side is of course, well, pretty much everything else. But as we look at the list, these are predominantly things management are interested in. Not to say that management is not interested in efficiency; management most definitely is. But some of the issues, like documentation, which mainly management is interested in, are not strong points with this solution. In terms of realistic implementation and change management strategy, introducing a short-cuts based solution first would not be a bad way to go. This could condition users to think across applications, and prepare for thinking multiple applications as a whole (not explicitly in those terms of course). As things are as they stand with regard to accessibility to KMDs application, this scenario is probably the most realistic, all things considered. It expresses, unfortunately, the lowest common denominator, but does give the case workers on the floor a powerful impression of integration. Among the most important conclusions from this section is the finding that case concepts in ECM systems from the FESD project and case concepts from KMD are so different that it does not make sense, nor is there any reason, to try to understand them as one. They are two complementing ways of handling actions on people, and should be dealt with as such.

Case: ECM

KMD

62

4.9. Scenario: Opening up


The other trend we might be seeing (and many may be hoping for) is an opening up at KMD. In this section I will very briefly consider how this changes the premises for integration, and what consequences it would have. But first of all we have to agree on what opening up actually means. As mentioned earlier, KMD has developed quite a few web services, which are used internally between application servers, and these web services might be the basis for a new strategy. Firstly, opening up for more public consumption of the web services would be improved if the web services were aggregated towards a business document paradigm , with less chatty web services, and larger chunks of business document size web services, that more closely resembled the actual processes that the users go through. This would make the integration a lot simpler on the receiving end, and move the scenario from being very unrealistic to slightly less unrealistic. Neither I nor KMD consider public consumption of the existing web services to be very useful in practical terms, so an opening up would involve substantial effort from KMD in reshaping the web services to be suited for external use. This is a major barrier and means that KMD cannot simply pull a switch and open up and expose the internal application layer. If we assume that the work of streamlining the web services for external use has been done, our scenario changes. This means that the physical constraints are loosened, and we are left with the semantic problems. Let us recap; when working in the ECM system on a case, we would like to access relevant data; all cases have people related, and people have CPR-numbers. The query against the KMD systems should therefore look like this in plain language: Select all transactions for CPR xxx in System yyy, between ECM casecreate-date and current-date. The returned dataset will include all transactions for a certain person and type within the case time frame, but might in theory also include transactions for other cases if there are multiple open cases for the same person and same type. In practice this should not be a big problem due to the nature of the KMD systems, and is at any rate an impracticality and not the basis for relational non-integrity or other nasty things. A query from the ECM system using a combination of the identification of the person with the type of case will give a list of transactions. Combining the individual activities of the case in KMD Sag with timing information should result in a unique mapping between activities in KMD Sag and the specific case in the ECM system. In other words it should be possible to combine existing information to bridge the two concepts. This is shown in Figure 16.

Case: ECM

KMD

63

Figure 16: Data set filtration

This is naturally not as simple as it might look; it is only the core of the problem. But if completed successfully, it would tackle almost all of the problems and would deliver on many of the goals. All this is purely speculation, and a concrete recommendation relies heavily on how KMD chooses to open up. It does not make sense to speculate on the details of this, as there is no evidence that anybody within KMD has done serious focused work on this, although they have started initiatives. It is clear that this is a serious issue for KMD. While speaking of KMDs willingness to open their systems for possible integration, Jacob Kanto says:
My experience with KMD testifies that management communicates: We want to do this, we can do this, we have to do this. But as soon as you speak to middle management or the technical staff they say: We cant do this, we dont want to do this and we wont do this. [Jacob Kanto 14:20]

I am quoting Mr. Kanto out of context here, and the quote does exaggerate his sentiments slightly, but is nevertheless illustrative of the name KMD is getting in the public domain. I have found that while KMD naturally does have strategic interests at stake here, there are considerable technical issues that have to be resolved before an opening up can commence. KMD has expressed willingness to start this work, but there is naturally the question of funding, nobody seems to want to pay for it, and KMD is pursuing the prudent tactic of waiting for someone to put their money where their mouth is, and is thus forcing the market to define precisely what is needed. As mentioned earlier the technical difficulties are one aspect of the problem along with the strategic. The positive aspect of the business and political obstructions is that they can change relatively easily if the strategic considerations at KMD change. KMD might simply decide that pursuing short-term goals of trying to stall integration projects does not support the long-term strategic sustainability of the firm, and might even give KMD a considerable strategic disadvantage and let other players enter a market they previously dominated. This, of course, is purely speculation on my part! But if KMD were to change the direction of their policy, it could relatively quickly change the context of the problem, and stagnation in certain efforts could be turned to proactive engagement. We used the case as an illustration of Danish IT architecture standards and recommendations, in an attempt to describe the current situation.

Case: ECM

KMD

64

4.10. Summary
Owing to limited space, this is a somewhat superficial but realistic example of a genuine integration. We have looked at the core goals for integrating the ECM system with systems with KMD. We have looked at the actors and examined firstly, an ECM system, and secondly, the KMD systems as seen from the municipalities. We have then looked at the most important problems in reaching our goals, and we then mapped out one approach to overcoming the constraints of the physical integration and the single largest semantic issue: the concept of what a case is. In conclusion, there are definitely options open to the municipalities. In the short term they would have to adopt a presentation integration model. The semantic issues regrettably constrain even the presentation integration model, leading to a solution using shortcuts. The solution based on short-cuts would primarily address the needs of the everyday case worker, and not so much the goals of management. An opening up by KMD is the only long-term viable solution, but is not very easily done. It will require substantial (and expensive) reworking at KMD before a sound solution can be arrived at.

Assessing the status quo

65

5. ASSESSING THE STATUS QUO


After spending considerable time surveying the actual situation, we now turn to evaluating it. As we remember, the basic question was how Danish IT architecture recommendations and standards ensured integration and interoperability.

Figure 17: Problem, again

First let us sum up what should ideally be the chain of influencing factors. As can be seen in Figure 18, the basis for ensuring interoperability is a sound IT architectural framework. This base should provide a solid foundation that concrete projects can build on. The most high-profiled project currently active, the FESD project, should therefore implement the concepts given in the recommendations and standards this is the arrow in the figure between the first and second tiers from the bottom. Next, the FESD standard should constitute a solid base for developing an actual implementation of the FESD standard; this is illustrated by the arrow between second and third tiers. The last step is, of course, that an ECM system backed by a solid standard should result in a system capable of interoperability.

Figure 18: IT architecture to product

Assessing the status quo

66

All the individual pieces, including the arrows, have to be successful in fulfilling their roles for the process as a whole to be successful. We will now scrutinize each of those and conclude on the robustness of them. Until now we have 1) glimpsed Danish IT architecture work, we have 2) discovered some dimensions of systems integration, and we have 3) tried to examine an actual problem involving an ECM product from the FESD project. It would be interesting to see what link there is between no. 1 and no. 2; I would expect to be guided when choosing between different integration methods, for instance, or on how to handle semantic clashes. I would have expected to find content for chapter 2 in public recommendations. The reason for the apparent missing link between no. 1 and no. 2 is that in fact there is no link. We will expand on this in the following section on e-GIF and OIO XML before we move on to examine other relations from Figure 18.

5.1. e-GIF and OIO XML


In general, the situation seems to be that multiple competing standards are authorized for use. It is ironic to see how many standards overlap in what they seek to standardize. We often have several different standards trying to standardize the same functionality. The reason for overlapping standards in the first place is normally that different vendors have existing competencies, have already built a technology framework and are lobbying standards organizations to use their specific approach. But when it comes to recommendations, we must make choices between existing standards. One very significant example can be seen in the category called inter-process communication where the following standards are listed: Standard/protocol Remote Procedure Call Protocol Java Remote Method Invocation CORBA/IIOP/IDL Component Object Model Web services Status Leave De Facto Certified De Facto Certified
Table 7: Conflicting inter-process communication standards

The table consists basically of the main alternative ways of calling remote code, or remote procedure calling (in the generic sense, not to be confused with Remote Procedure Call Protocol, which is a specific instance of the general concept). This is just one example, and I could have listed more. Naturally this reflects the situation that there are many ways of doing things, and that employed systems already in place use different standards. But an inevitable implication has to be that choices do not seem to have been made and thus the Danish e-GIF does not have the robustness to guide actual projects. Most of the standards

Assessing the status quo

67

listed are de facto standards in the real world, and while it is nice that de facto standards get formal authorization, what is needed is guidance in the non-obvious choices. Due the lack of choice between overlapping or conflicting standards, the e-GIF loses its ability to be an authority. One might argue that what I described in the chapter on integration is not standards, but techniques or approaches, and therefore should not be in the e-GIF, and one might be right (depending on the specifics). But the point remains; I have not been able to find material covering anything like the content of chapter 3 in any public document on Danish IT architecture, in the e-GIF or elsewhere. In other words, we are not given specific guidance on which way we are supposed to move. True, for some types of protocols it does not make sense to dictate a specific standard, e.g. it would not make sense to make a definitive decision between J2EE or .NET, because the two can coexist relatively easily, and they both have specific strengths and weaknesses. But when it comes to the back-bone of interconnectivity, which is supposed to be the basic plumbing in public sector infrastructure, it does not make sense to authorize all standards, effectively not saying much. Another example is ebXML vs. UBL; both are certified for use, although UBL is recommended and ebXML is just certified (you might be interested in reading about this issue on page 90 (appendix) which considers briefly the issue of conflicting standards specifically in the web service space) . In short, we cannot look to this resource for guidance on how to design our integration strategy; all technologies we might have considered seem to be included, so our toolbox has not become better defined. Now, although having multiple standards for the same functionality defeats the purpose in the first place, this is to be expected to some degree. But what is more worrying is when we have multiple standards that overlap, and are mutually exclusive in the logic they use. This means that we have a technology which is supposed to interconnect different systems, and allow for easy integration and collaboration, but what happens is that we move the incompatibility up one level. An example is web services. The e-GIF includes the basic web services trio (SOAP/WSDL/UDDI). Web services is often thought of as the integration technology that will solve all problems. But there are no recommendations on how to use the standard, one simple question being how fine-grained the web services should be. As illustrated in Figure 19, the basic problem is systems without interoperability. The web services solution is to construct a framework on top of the legacy systems, and let them integrate through that, using web services. The current situation is that some specific, but important, cases (e.g. UBL vs. ebXML), have in fact ended up with several incompatible paradigms (sometimes formally standardized, sometimes not). Although they are based on the same very generic technology, e.g. SOAP, the more detailed logic that they follow is different. This naturally undermines the whole purpose of standardization, and leads to the same problems we had to begin with: expensive and inconvenient cross-systems integration.

Assessing the status quo

68

Technology A Problem

Technology B

System A

System B

Logic A

Ideal solution

Meta framework

Meta framework

System A

System B

Current situation (in some areas)

Logic A

Logic B

Meta framework A

Meta framework B

System A

System B

Figure 19: Conflicting logics

As to the other big architectural initiative, OIO XML, this does not help us much either in terms of integration. It is absolutely clear that the OIO XML is a huge step, make no mistake. But what the OIO XML work does is essentially to define data models, and help us to see what information is in system A and system B. This is expressed using XML, due to XMLs positive traits as a message able to carry both content and structure. But it does not tell us how to get the information from system A to system B. OIO XML is often touted as an interface (in Danish: grnseflade). Using our newly acquired vocabulary, we could say that OIO XML helps us to map the semantics of individual systems, thus defining ontologies. But it does not instruct us on how to connect two systems physically and

Assessing the status quo

69

certainly does not say anything about what to do when we have two conflicting ontologies. There is an emphasis on the converge strategy on the semantic or data level, which means that, in situations where we cannot edit the data model, we lack advice. This is the case in many situations. Whenever you want a system to communicate with an external system outside your domain of control (for example in the case of KMD), you face the issue of resolving semantic differences. When you are faced with systems which would be very expensive, and therefore not realistic, to alter, you face the issue of resolving semantic differences. And this can be resolved in many ways, and without guidance one might fear it being done inconsistently. Before we move on to evaluating the FESD standard, let us conclude that the extent of the IT architectural ground work, in terms of ensuring interoperability, is sparse, and extends to loose formulations of adherence to SOA using SOAP-based web services.

5.2. FESD interoperability guidance


There is no uncertainty about what is to be achieved. In the everyday terms of senior decision makers, we want our systems to able to talk together. It is interesting to conclude how the FESD project solves the interoperability issue, with the limited guidance. When reading the specification material from the FESD project, it is clear that the language is almost identical to that of the White Paper. As we will see, this has not been a solid enough foundation to achieve standardized ways of integrating with ECM systems from the FESD project. Let us explore that issue a little more. As stated earlier, the FESD standard is in essence two things. The first is a data model, complete with classes, relations, etc. The second is a list of functionality described in natural language, not unlike use cases. In the FESD project, the data model is only considered core requirements. In other words, vendors are allowed and encouraged to extend the data model to include expanded information not specified in the standard, and to allow for functionality not specified in the standard. This extends the vendors possibilities for differentiation but simultaneously extends the possibilities for the systems to become incompatible in terms of the concepts they employ. Two systems must use the same core elements, but extending the core elements will most likely make them less compatible. As illustrated in Figure 20, the likely outcome is systems that do adhere to the core specifications in the FESD standard but extend the standard. Some of the extending will become de facto standard as the systems use certain conventions not defined in the actual standard, and some extensions will become part of the formal standard as the standard evolves. Most of the extensions are likely to continue to stay proprietary.

Assessing the status quo

70

Figure 20: Formal, de facto and proprietary

The situation we see in Figure 20 will occur in several dimensions; we could, for example, draw a separate illustration covering every one of the following: platforms such as the choice between o Databases (e.g. MSSQL, Oracle, DB2). Here the bubble of the FESD standard would be non-existent, since the standard does dictate this. Programming languages/frameworks, J2EE, .NET or the older languages. Again, the FESD standard does not define this. OSs (Microsoft based, Unix based), application servers, etc.

Protocols, including the whole web services portfolio of protocols such as SOAP, UDDI, WSDL (see Appendix 2 on page 81 for a more complete account of web services). The protocol question includes what protocols the ECM system is able to expose externally towards other systems, but also involves what protocols the application uses to communicate internally between the different modules of the application. This influences the versatility of the application and how easily (read: cheaply) the application can be transformed and adapted. Basically it impacts how high maintenance costs will climb. Again, there is very little in the FESD standard about this, apart from general references to the use of web services and XML, which are vague descriptions at best. Functions, methods, procedures (which term you use depends on which platform you are on) are not defined in the standard and would therefore be expected to be quite proprietary. Functionality is defined in the FESD standard in the sense that the certified systems must be able to do certain things as defined in the spec. But this functionality is termed quite broadly, leaving plenty of room for interpretation. In addition to this, the application logic is not specified at all, so there is no consistent way of describing functions and defining what application logic must be present and how it is to be accessed. As we know, the same functionality can be implemented in an almost infinite number of ways; it would

Assessing the status quo

71

not be unrealistic to expect that the illustration of overlaps between vendors at the application logic layer will resemble Figure 21. Integration and communication models as described earlier in this paper, or other ways of depicting how an ECM system within the FESD standard could interface with other systems would be an important piece in the FESD standard and in the integration puzzle. If we are very lucky we will see a scenario as seen in Figure 22 where the vendors, to some extent, choose to develop their integration models with some common denominators. Apart from very core integrations (to the CPR database for example), functioning integrations have not been demonstrated, and the FESD standard is lacking information on how integration should be done, simply saying that it should be possible. Again it is up to the individual vendor to describe how to fulfil this need.

Vendor 2 Vendor 1

Proprietary

Figure 21: Application level standardization

It should in all fairness be stressed that the FESD project has placed a clear emphasis on the goal of making a system available to public sector organizations, a system which is guaranteed to have the functionality they require without going through the laborious task of writing contracts and inviting vendors to bid for them. I do not concern myself with this dimension, and the FESD project might be very successful in that sense. Nevertheless, proprietary solutions are a shortcoming which will complicate this considerably.

Figure 22: Integration model standardization

Assessing the status quo

72

In an application integration context, as we now are in, it does pose considerable problems that many of the aspects of a complete ECM system are not specified in the standard. From a technical point of view there are essential pieces missing which would ease the integration efforts. This means that the vendors are free to compete on the flexibility of the technological implementation of their solution. One vendor might implement functionality with a completely different underlying architecture than the next. The other aspect of this is therefore that the buyers are not guaranteed certain IT-architectural standards, apart from using XML and web services, and therefore need considerable technical insight to make an informed decision. One might say that this was part of the problem that the FESD project wanted to solve, but that would be oversimplifying it a bit. Nonetheless, FESD-certified ECM systems do not necessarily use the same technological standards to implement the functionality, and this could be a cause for concern when looking ahead to the need for inter-organizational integrations. I am not promoting a one-size-fits-all type of thinking; standards are really just as much about using a consistent language to define our differences (expressed popularly). Notice how the questions we have discussed in chapter 3 are not supported by standards or recommendations. In conclusion, the FESD project is understandably not connected to central IT strategy in a significant way, the first arrow in Figure 18 on page 65 is weak. We turn now to the link between the FESD standard and the products coming out of the FESD project, the actual ECM systems.

5.3. CASE: ECM system


We have examined an ECM system sufficiently to draw conclusions in chapter 4 and will not repeat those here. In our brief review of an example of an ECM system, we found that the application architecture seemed robust, and allowed for integration using different integration models and methods. Although not emphasized in the case in chapter 4, I have looked at the technical implementations of several of the vendors, and although they have some overlaps, they are quite different. The technical descriptions are not readily available because they are considered sensitive material. But this just accentuates the fact that we are not working towards standardization on all levels. The extent of descriptions in the vendors specifications is usually as a choice of platform like we will use COM based communication here, but does not say how, and this results in proprietary APIs, meaning you would need a broker and middleware even within FESD certified products. The vendors, rightly, point to a lack of input in their work with defining interfaces. The specifications have wording such as, this will be resolved at a later time and to be specified and pending further specification from the FESD group. It is clearly in the vendors interest to demonstrate standardized integration scenarios, but they seem to lack contributions to those efforts both from the IT standards and recommendations and the FESD work. As a result they invent their own application architecture.

Assessing the status quo

73

Seen from an integration perspective this is bad news, because it does not allow us to propose general solutions, and it forces us to treat each system individually. We know what we call (some of the) data in all three systems, which is clearly good, but we do not know how to access that data, how to transfer the data, and how to enter it into the receiving system. As a consequence, my case which looks at integration between the Software Innovation product and systems from KMD might, or might not, be relevant for a situation with another system within the FESD project. In other words the findings in the case of Software Innovation vs. KMD would not automatically be relevant if we looked at other ECM systems9. The fact that findings from a specific integration scenario with a FESD product cannot be generalized to other FESD products means that the link between the FESD standard and the ECM systems is broken. This is the second arrow in Figure 18.

5.4. Interoperability
In our case we found that the ECM was capable of integrating in various ways. We have now seen that this capability of integration is quite proprietary, not influenced by the underlying factors, i.e. standards and recommendations and the FESD work. As we cannot conclude on the basis of observing a wet street that it has been raining, likewise we cannot necessarily conclude that the ability to integrate in the ECM system is due to an effective foundation from the FESD project. In all probability it is not. The specific ECM system we are looking at is not the weak link in the integration scenario. It is quite flexible and the architecture seems sound. It is all the goods things we want the ECM systems to be, but for all the wrong reasons. There are no specific guidelines in the FESD standard that dictate what internal architecture the ECM vendors in the FESD standard must adopt. This means that within the FESD standard we have three different vendors, with three different systems, with three different application layers, with three different data models (although sharing core elements), which are not interchangeable. In this specific case we are left with individual systems that, as individual systems, have the capabilities we desire, but in terms of physical integration, the situation does not leave us better off than if we had simply gone to market and picked three systems that are influenced by ordinary trends in technology. Again, I am concluding only on integration capability, and not on other goals for the FESD project.

In the specific case, the bottle-neck was KMD in terms of functionality, since we only have presentation layer access, but had KMD allowed for more intricate interoperability, the differences between the ECM implementations of the FESD standard would be even more obvious.

Assessing the status quo

74

In conclusion, the last arrow between the ECM system and interoperability in Figure 18 stands and is valid, but does not contain the important dimension of standardization that we wish for. To put these findings into perspective, we now turn to a more structured way of describing the status quo in the next section on architecture maturity.

5.5. Enterprise Architecture Maturity


There does not seem to be a Danish framework for assessing enterprise architecture maturity, so I have borrowed an American model (Federal Enterprise Architecture Program Management Office 2004), which provides a good starting point. I have chosen to use some space on this model, seen in Table 5, because I think it gives a good impression of what the current situation is for Danish public sector organizations. Spend some time looking at the table; it gives a nice idea of the scope of integration, from one extreme to the other. Interoperability Level 0: No evidence presented Level 1: EA is initial, informal and ad-hoc No evidence presented Interoperability standards are defined at conceptual bases (list of standards that are nonproprietary, i.e. patterns, web services) Interoperability standards are defined at the business function level, and are aligned to the TRM and SRM10. Data No evidence presented Data architecture is broadly defined and not linked to other portions of the architecture. Business Logic No evidence presented Standard business roles (logic) are broadly defined and conceptual in nature Interface No evidence presented Interface components and requirements are broadly (conceptually) defined.

Level 2: Formal but basic, follows some best practices

Data relationships, interdependencies, and definitions are defined at a conceptual level.

Standard business rules are integrated and described for portions of the architecture.

Detailed external interface descriptions are contained within the EA.

TRM: Technical Reference Model, SRM: Service Component Reference Model. These are different reference models aimed at guiding the choices of the federal agencies, not unlike our own Danish referencemodel but more detailed.
10

Assessing the status quo

75

Level 3: EA is beginning to be operationalized across the enterprise (i.e. part of transition, budget)

Interoperability standards are defined through patterns and are related to business functions. Business functions are aligned to components and services at the enterprise level. Interoperability and sharing of information is one of the backbones of the target architecture

Common and defined approach to integrating data with business processes and mission priorities is defined and used throughout the EA

Business rules are integrated and described throughout all portions of the architecture.

Some form of a node diagram depicts interrelationships between interfaces and business functions.

Level 4: EA is operationalized and provides performance impact to business operations.

The target architecture reflects a transition plan and judgment on the data required for the future state.

The transition strategy describes the changes required to business rules.

Interface descriptions and node diagrams are integrated with performance measures. Interfaces are represented at the enterprise and function levels. The AE demonstrates the establishment of common components that are integrated through well defined interface requirements.

Level 5: IT planning is optimized through the EA

Using common interoperability standards, the EA demonstrates the ability to link and integrate common technologies and business processes.

EA demonstrates its ability to increase integration and promote the reuse of data within the enterprise and across other agencies. (Linkage of data to common components, business functions).

The EA demonstrates the results of viewing common business rules across the enterprise and across other agencies.

Table 8: Assessing enterprise architecture; integration.

When I examine the table and relate this to my findings, the picture is not difficult to uncover. Let me take them one by one: Interoperability: Here we are on level 1. Concepts have been vaguely defined by initial standardization efforts at state level, which serve to give indications of the directions we are moving in, but as we saw in the section on e-Government Interoperability Framework, this is too broad to be effective or practical. Notice that this does not mean that the municipalities do not employ standards that fall outside the range of standards in the Interoperability Framework, simply that a

Assessing the status quo

76

broadly defined goal is present. Moving to higher levels would require assessing what standards are relevant in which scenarios, and being far more specific. Data: Here I would say we are at level 2, this is our strong point. Major work is under way in defining which data exist in what systems: the OIO XML work, but apparently the activity is mainly at state level. In the municipalities for example, there is no knowledge of exactly what data exist, which systems they reside in and what relationship they have, exposing overlaps, etc. A lot of day-to-day work takes place decentrally, so there is a gap there. Business logic: I would say level 0. This is our weak point. We might know what data we have, but what and how applications access and manipulate that data is not documented. Good and sound knowledge of the business logic does exist among the decentralized employees, but this knowledge is very tacit and practical, and does not allow for easy translation into hard specifications leading to integration. Interface: Sadly, we are again on level 0. To my knowledge, very little work has been done on mapping interfaces apart from very specific work on very specific tasks, which have been aimed at solving very specific problems, and have therefore not been aimed at making better enterprise architecture. So in terms of enterprise architecture maturity, things look somewhat discouraging. In the words of the framework we are probably overall on level 1 at best: EA is initial, informal and ad-hoc. This is the context or backdrop in which we have to view the problem. The table only presents one of four dimensions in the readiness assessment, and also includes change management, convergence and business alignment. Convergence is also quite interesting as it indicates the efforts to standardize, but I have left it out since we have already looked at standardization. Business alignment deals with the connection between strategic goals and the implementation of that strategy in the enterprise architecture. As I mentioned before, there are political goals for IT in the public sector, but this whole exercise testifies that there is little connection between strategy and implementation, yet alone alignment. With regard to change readiness, no structured approaches have been put forward. This issue is dealt with on a project-to-project basis, so there is little macro perspective on the issue of moving the organization in a specific direction. These are the circumstances that the problem of integrating two systems has to be seen in. Individually, organizations simply do not have the maturity to fathom the task of placing the specific problem at hand into a larger context and it is therefore difficult for them to manage a specific problem like the integration of two systems. As we move closer to the more practical problem, i.e. integrating the KMD systems with the ECM systems, it becomes a little less demanding; it is easier to handle a narrow and relatively well-defined problem than to put the problem in a broader framework.

Assessing the status quo

77

As with many problems related to IT, integrations can be solved in numerous ways. If those numerous ways are not overseen and coordinated, a sceptical person might fear that the situation would not lead to interoperability.

Conclusion

78

6. CONCLUSION
How do Danish IT architecture recommendations and standards ensure integration and interoperability in public sector organizations? Both in the recommendations and in our examination of the FESD project we found that the data model is the core consideration. The OIO XML work is essentially developing XML Schemas which are basically data models in XML format. The FESD project also has the data model as its primary focus. This heavy emphasis on data models testifies to document thinking: having good knowledge of the data models means that we have precise information on what a given system looks like at data layer level. So with an XML Schema for a given system, we are certain that we send the relevant data, formatted correctly for easy entry into the system. This means that we are essentially sending electronically formatted forms around XML documents with constraints on each field; this field must be a 10-character integer, etc. These forms could be sent between systems using a simple idea of web services, perhaps just having one web service called ReceiveForm for each organization. This would allow other organizations to push a document to what is essentially a mailbox. Being a simple file, these XML documents could just as easily be sent via SMTP or FTP. Using web services would just be an efficient way of sending forms. Not to say that an efficient way of sending forms is not a very attractive prospect. It most certainly is. But one might ask whether this constitutes interoperability or integration. SOA prescribes loosely coupled, low-granularity and distributed applications. The concept of routing XML documents is definitely very loosely coupled and is extremely lowgranularity, but is most definitely far from being a distributed application. The applications are so loosely coupled that I would even say that they are not coupled at all. The concept of interoperability in the Danish work does not seem to include any notion of integrating applications. In SOA, a service provides some kind of functionality to the rest of the architecture, thus the name service. This cannot be the case if we are merely sending documents around. Many integration scenarios cannot be solved using document thinking. Our case of integrating an ECM system with systems from KMD is a prime example of this. It could be argued that this is a case of same words for different things. If an EDI-onspeed idea is the goal, then the initiatives are relevant. If, on the other hand, the initiatives of the Ministry of Science, Technology and Innovation (in cooperation with other organizations) are meant to reflect interoperability understood as SOA and systems integration, then we are missing important and necessary pieces of the puzzle. Earlier, I presented certain core questions posed by Hagel and Brown (2001) in Harvard Business Review. The first of these questions stressed the importance of a shared vision

Conclusion

79

for IT architecture five to ten years into the future. After having examined Danish initiatives there appears to be a significant gap between the strategic direction set out and the actual initiatives meant to underpin the strategic direction. On the surface a shared vision is not obvious and work is therefore needed to show that we actually have a gap between the strategic and the operational. There is heavy reliance on SOA thinking on policy level, which theoretically wraps heterogeneous systems in a common interface. However, the recommendation is seriously lacking in guidance on how to complete a SOA without doing it inconsistently. In the absence of detailed guidance, it is fair to assume that we will not achieve our goals of universal interoperability. The Danish public efforts in IT architecture must describe exactly what the current situation is and lay out a roadmap for the decades to come, while making sure we can adapt to emerging technology. The process of (continuously) re-aligning the business to technology trends is probably one of the most important competencies. Of course customers need to agree that the basic value proposition is sound, but the strategic advantage might be found in the flexibility to adapt to new technologies, not merely on a technological level but on a strategic.
Attention [] has shifted away from the technology itself to the interrelationships between the technology and the firms ability to manage concordant changes in internal structure and work processes [] the interrelationships between interorganizational systems and internal innovation and redesign (Short, Venkatraman 1992:8)

As a nation, Denmark will be facing stiff competition in the future. Having a highly developed, superior IT architecture in the public sector would give us tremendous competitive advantages. Being a smaller country and having less momentum than larger nations should give Denmark the advantage of adaptability. A vision of European entrepreneurs choosing to register their firms in Denmark due to the efficient and effortless communication with the public administration is quite attractive. But we are not quite there yet. One of the necessary pieces of the puzzle in making sure we are moving forward is the establishment and central coordination of an integration infrastructure. Today, when two organizations want to interact, the interaction happens directly between them. This means that responsibility for the actual transport and manipulation of messages has to be placed at one of the two interacting parties. If we want to move towards SOA, there is a need for an infrastructure that carries the applications logic between systems. Compare this to the idea that everyone could in theory deliver their physical post in person, but having a national distribution network with postmen delivering the post on your behalf makes everything more efficient. Unless efforts are focused on the bigger picture we will find ourselves trapped in a pointto-point world with bilateral agreements, when we would much rather have been in a service oriented world, with a coordinated, distributed perspective. The Danish situation can presently best be described as initial, informal and ad-hoc. Important and essential work is being done, but the larger picture appears hazy.

Appendices

80

7. APPENDICES
Appendix 1: Interviewees
Primary Sources

Name Jan Danielsen Tom Bgeskov Jakob Kanto Lars Hougaard Michael Biermann Finn Ltken Grimstad Morten Koefod Bjarne Hallender

Title Chefkonsulent Konsulent ESDH-projektleder Projektkonsulent / ESDH ansvarlig Forretningschef KMD Sag Vice President, Enterprise Technical Development Produktansvarlig, 360 Salgschef

Organization FESD / ITST FESD / KL Helsingr Kommune Vejle Kommune KMD KMD Software Innovation Software Innovation

Appendices

81

Appendix 2: What is web services?


In the following I will provide a birds-eye view of web services as a technology from 40,000 feet. My goal is to present the standard reasons for using web services, if you are completely new to the concept. As a technology, web services aims to be an implementation of SOA, which of course we know is a critical issue for the Danish IT architecture work. One of the major drivers of the need for a technology such as web services is simply the reuse of code, but also to allow use of code across the network, thus building a distributed application. DCOM is Microsoft specific and CORBA is extremely complicated, so while it is possible to do cross-platform integrations it does not succeed in being the cost-effective solution it would optimally be. The W3C defines a web service as:
Definition: A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Webrelated standards. (Booth, Haas & McCabe 2004:7)

We probably have a good understanding of what HTTP and XML is, but unless we have a clear understanding of what the rest of the alphabet soup means, e.g. WSDL and SOAP, this might not clarify things very much. So perhaps we should start somewhere else. In the most general terms, web services technology allows you to utilize code across a network, e.g. the internet, in a fashion that is lightweight and consistent. This means that an application can use code resources physically residing in a different box. If, for example, you have a remote web service that multiplies two numbers, you are able to call that web service method or function, and give two parameters, and the WS will return the multiplied number. This is a poor example, since it would be far easier to calculate this locally. But naturally it is possible to expose any function, the point being that this becomes the basis for exposing your business services in a more efficient way, and indeed is also the basis for new services and products. Programs become loosely coupled systems. The concept of a program thus changes from client-server or n-tier concepts to a much more distributed concept.

Appendices

82

WS

WS

WS

WS

WS

WS

WS WS WS

Application 1

WS WS WS

WS

Application 2
WS

Figure 23: Sample WS network

Figure 23 illustrates some of the concepts. Several points can be made. Firstly web services may use and expand on other WS. Secondly applications can be overlapping as multiple applications can use the same WS as the basis for their respective functionality. The applications in the figure might themselves be WS, or they might expose the functionality as a browser or non-browser based application. Or they might provide a small corner of functionality in a larger application, such as an ERP system, or something completely different. Even though I have shown the applications with clear borders, the borders of the program may become fuzzy and not well defined, since the actual functionality may be coming from farther away in the WS network. Many questions arise when trying to grasp the consequences of such an architecture. Obviously, running code over the internet is not the same thing as running it locally, either on the same physical machine, or behind the corporate firewall. There are security issues - the traditional concepts of confidentiality, integrity and availability are important, but also non-repudiation is an essential feature in business class applications. Latency issues become critical. Running code over the wire to a server which might be anywhere in the world means waiting perhaps thousands of times longer for a response. This dynamic pushes for a new way of coding, to try to minimize the amount of roundtrips to the remote WS. This results in a trend towards more high-level WS, than on the individual function level. So granularity is an issue. We will expand and structure this discussion in the following sections. There are many reasons for the upcoming of Web services, but in essence it is about liberating information, and making it available where and when it is needed. As always it is naturally also about the vendors looking for new business, but broad industry backing testifies that WS are here to stay. The technology is still very much in its infancy, but will, as far as we can see, grow up to be a powerful technology that can shape businesses.

Appendices

83

Appendix 3: The Stack(s) - Gaining a vocabulary


The WS Stack gives a good overview of the complete sub-technologies in a single view.

Figure 24: The W3C WS Stack The web services "stack" is a semimythical creature, composed in equal parts of technical specification, running code, and marketing campaign (Clark 2003)

Please note that the individual vendors operate with different variants of the WS Stack that incorporates the extra offerings, for example security or tracking, to name just a couple.
Microsoft, for one, has a rather wild and woolly welter of high-level elements in the web services stack: coordination, inspection, referral, routing, policy, transaction. (Clark 2003)

WebServices.Org, IBM, the W3C, Microsoft of course, Sun, The Stencil Group, Oracle, Borland, BEA, and Hewlett-Packard all have their own web service stacks. Figure 24 shows the web service stack as defined/viewed by the W3C (Booth, Haas & McCabe 2004). This could arguably be the most authoritative source, so this is where we will start. We will zoom in on a few of the most central elements of the WS stack. Obviously there is no point in accounting for all the different protocols in the WS stack, since many of them are not WS specific. I will focus on the ones you could call core WS specific protocols, and pay less attention to the ones that are of a more supporting nature.
Communications

At a transport level the W3C have been decidedly non-specific as to what protocol should be used. Definitely HTTP will become the de facto standard, in part because of the readi-

Appendices

84

ness of network infrastructure; firewalls are ready to allow HTTP traffic. But it should be stressed that the choice of transport protocol has consequences for the rest of the infrastructure. For example HTTP has relatively strong and more reliable sender identification, because HTTP normally is used for two-way traffic (request/response) directly between sender and receiver. SMTP on the other hand does not have the same mechanisms, or only more indirectly: an SMTP message may be routed though the network without the sender and receiver actually having direct contact. If SMTP had stronger sender identification, spam might not be as big a problem as it is. Choosing SMTP will mean that for example sender identity will have to be handled somewhere else in the architecture. This is naturally only at a communications level. Of course there would always be a need for verification of identity on higher levels, using e.g. signing.
Messages & SOAP

SOAP is the W3C recommendation that is the de facto standard for request and response messages for consuming and providing XML Web Services. What does that really mean? Well, lets have a look at how far we have come. First let us move back to the original purpose of this section, SOAP, regardless of which concept of SOAP we wish to adopt. XML Protocol (XMLP) Working Group chartered to design: an envelope encapsulating XML data. a convention of RPC (Remote Procedure Call) applications. a data serialization mechanism for non-syntactic data models. a mechanism to use HTTP as a transport mechanism. (Haas 2001)
SOAP [] combines the data capabilities of XML with the transport capability of HTTP, thereby overcoming the drawbacks of both EDI and tightly coupled distributed object systems such as CORBA, RMI, and DCOM. It does this by breaking the dependence between data and transport and in doing so opens up a new era of loosely coupled distributed data exchange. (Coyle 2002:112) With SOAP, we find both meaning of the word coupling applicable. SOAP is decoupled from any underlying transport software and, through its message structure, enables the creation of loosely coupled networks that dont require tight binding between processes in order to communicate. (Coyle 2002:126)

So far, we have our XML document defined, but what we need now is a way for it to get from A to B. We know that there is momentum behind using HTTP, specifically HTTP POSTs to push an XML-document to a server. So why do we need anything extra? When examining the motivation for a technology such as XML WS, it is clear that there is a very obvious need for supporting standards apart from the minimalist transport of an XMLdocument.

Appendices

85

Here in Figure 25 we have the basic SOAP message structure.

Figure 25: SOAP message

The SOAP Envelope element defines the SOAP message and contains the header information and body tags. The envelope is evidently a required tag. The Header on the other hand, is not required, but is a very important part of SOAP. Without the header, there would be no point in using SOAP at all. In the header you can put any kind of extra information associated with tracking, distribution and workflow or anything else that you can think of while providing extra value to your SOAP infrastructure. This means that SOAP, like XML, allows for extendibility to accommodate things like orchestration or vendor specific, proprietary functionality. The flip side of the coin is naturally that this simultaneously promotes standards battles. But SOAP does not solve all problems, in fact many aspects need to be extended to accommodate quite necessary functionality.
The good news is that all major Web Services initiatives have decided to use SOAP as their low-level XML messaging protocol. [] The bad news is that SOAP made itself attractive for a wide variety of uses by deferring many important issues. SOAP provides no extended features. It does provide a mechanism for adding extended features, but such additions arent necessarily interoperable. (Dick 2003:115)

So naturally, SOAP has been designed to be able to be extendable, but this is proprietary functionality that might not (probably will not) be compatible with other implementations. What vendors, for instance, choose to put in the SOAP headers is influenced by their product offerings and general architecture, so we are all but finished with the standards battle. The Body contains the actual XML code that we want to transport. This could be procedure call type code or more low-granularity business document type code, or anything in between. This is pretty straightforward.

Appendices

86

Earlier SOAP was an abbreviation of Simple Object Access Model. After some people noticed that SOAP was hardly object oriented and certainly not simple, SOAP has lost its connection to that definition and now has at least two meanings:
Service Oriented Architecture Protocol: In the general case, a SOAP message represents the information needed to invoke a service or reflect the results of a service invocation, and contains the information specified in the service interface definition. Simple Object Access Protocol: When using the optional SOAP RPC Representation, a SOAP message represents a method invocation on a remote object, and the serialization of in the argument list of that method that must be moved from the local environment to the remote environment.(Booth, Haas & McCabe 2004)

Although it is not entirely clear from the W3C definitions above, these two understandings of SOAP seem to map to two different paradigms of WSs altogether.
Broadly speaking, Web Services can be classified into two major categories: RPC-oriented Web Services (synchronous) Document-oriented Web Services (asynchronous) (Nghiem 2003:61)

The first concept is based on a high granularity view, where the atom of the process is the procedure11 call. This views the remote machine as if it were local. Following this concept would lead to very chatty Web Services with many very low level calls back and forth. Several problems are associated with this approach including the need for maintaining state between local and remote machines, and all the stability problems linked with using the public internet. On the other hand this concept of WS will resemble existing programming practice, and might therefore require less effort on the part of the developers. The document-oriented concept of WS addresses some of the shortcomings of the RPCview. In most scenarios, business processes take longer than a few seconds to complete. Many would take several days or even more. This means that one would have to maintain state for all this time, and perhaps lock resources, e.g. database records and memory. There is also a considerable motivation for minimizing the amount of traffic over the wire, to try to minimize the traffic that has to be secured and to minimize the susceptibility to latency. Taking these factors into account, in some cases, one would prefer a concept of WS where the WS transactions more closely resemble the business process they represent. The second concept considers the individual WS to be loosely coupled, which for example means that the two (or more) communicating parties do not need to be available at the same time. This might not always be an important factor, and considering that it takes more of an effort in terms of plumbing to build document-oriented WS, RPC WS are frequently the most relevant. The classic example is credit card verification. In this

I will use procedure, function and method as interchangeable terms. The choice of one in favour of the other has mainly historical roots, and has no real consequence for this discussion
11

Appendices

87

situation one would prefer an RPC-like model, because unless the service is available the transaction should fail, and therefore we have no need for asynchronous support. The document-based approach more closely resembles the way EDI worked:
The EDI paradigm is about connecting similar systems across organizational boundaries. The distributed objects is about connecting different systems in the same organization. When EDI software sends information over the network, it sends large self-contained documents like Request for Quote, Quote, and Purchase Order. The receiving software accepts a document and processes it over minutes, hours, or even days and sends back another large, self-contained document whenever it is ready. When distributed objects software sends information over the network, it sends small, detailed instructions like Create New Order, Add Line Item to Order, and Set Payment Terms of Order. The receiving software processes the instruction in milliseconds and immediately returns the results to the waiting sender. (Dick 2003:102)

These two concepts (Dick 2003:103) should not be seen as a binary choice. In most cases you would place yourself somewhere in the middle, or use a combination of approaches. XML Feature Messaging Protocol Framework Flow Language EDI SOAP BizTalk or ebXML ebXML or BPML Distributed Objects SOAP WSDL + UDDI XLANG or WSFL

Table 9: EDI vs. Distributed Object XML Paradigms Description WSDL

When you have set up your WS you will want to interact with other parties. There you will need a consistent way of describing what resources you are making available, how to access them and what the other party should expect you to return. In essence it is a XMLdocument abstracting the actual service provided. If at a later time you wish to publish your WS with a central repository, you will also need to be able to describe your service. WSDL can also describe which protocols the provider expects for the transaction (e.g. HTTP, SMTP, and FTP) and an interesting detail is the binding style element which can take both parameters rpc and document. So this is also where you can define which SOAP paradigm you choose to base your transaction on. We will not go into detail about WSDL, not because it is not an important part, but because there is relative consensus about how to use WSDL, and it does not therefore present the same potential for proprietary systems and vendor/product lock-in.

Appendices

88

Processes

Discovery One of the most important elements of a truly distributed WS architecture will need effective discovery mechanisms. How does one (manually or automatically) actually locate and interface with a WS?

Pu bli sh

Figure 26: Publish, find & bind

The standard text-book explanation is usually something like the high-level abstraction of the process seen in Figure 26. The service provider publishes the service to a 3rd party. This implies some kind of active interaction, but is not necessarily so. W3C (Booth, Haas & McCabe 2004) present at least three different types of 3rd party brokers: registries, indexes and peer-to-peer (P2P) type discoveries. Registries are like yellow pages, where the service provider has to give explicit consent (and perhaps pay a fee) to be listed. Here we can usually expect some kind of validation service from the broker, so service clients view the offered services with a high degree of reliability. Indexes are more like a Google service. A centralized system which could be based on robots or other technology discovers and lists available services. Providers do not give consent in the index scenario. The P2P broker scenario is based on concepts found today mainly in the very popular file sharing networks. One might argue that the P2P model suits the nature of the internet and the distributed architecture, but there is a clear trade-off between distributedness and reliability. The choice of model for discovery will be based on choices including security, reliability and anonymity. Down the road we will probably also see some kind of federation services that will provide gateways between the different types of infrastructures, e.g. an index listing services discovered via P2P and later validated for availability. Today the situation is not this advanced; the technology and market have yet to mature. While there are hopes of automating the process of discovering and agreeing on WS usage in WS networks, the typical flow of events today when using business level WS is shown in Figure 27. As we can see, we need some human interference, normally on the

nd Bi

Appendices

89

requester side, for the interaction to be successful. This is due in part to security issues that have not been standardized completely, and due also to the absence of an infrastructure to describe generic WS. If, for example, you are looking for a WS that can convert gif to jpeg, you would have to find a specific service and hardcode the service into your application. It is not possible to discover a service based on a generic description of the service provided (e.g. image manipulation) and then choose the one most suited on the basis of certain criteria (price, location, jurisdiction, etc.).

Figure 27: Engaging a WS

The Web Service Description (WSD) provides the technical definitions of what resources are available in the WS, normally conveyed as a Web Services Description Language (WSDL) document, but it does not convey the meaning of the individual elements. What is a person, for example? The semantics of the interaction have also to be defined:
While the service description represents a contract governing the mechanics of interacting with a particular service, the semantics represents a contract governing the meaning and purpose of that interaction. [] It may be explicit or implicit, oral or written, machine processable or human oriented, and it may be a legal agreement or an informal (non-legal) agreement. (Booth, Haas & McCabe 2004:8)

As we know, the semantics are often a big problem, one that tends to be slightly neglected. We now conclude by showing how the WS arena is not an easy clear-cut market, but really made up of many dozen sub-standards, formal and de facto, that often compete for the limelight.

Appendices

90

Appendix 4: Standards map


Now we have a picture of what the lowest common denominator is, in terms of what we think a web service is. By this I mean that in reality there is a great big mishmash of standards organization and standards that overlap and compete for recognition. In the following I will illustrate the slight confusion that exists about the different components of the full standards spectre. It is the distilled product of reading on standards from a long list of sources and to accredit any one, would not make sense. You can see the map on Figure 28. I have organized the standards around the WS-stack as presented earlier. This forces the different standards into some boxes where they might not entirely belong. The point is not to give a conclusive categorization but simply an overview. The map does not convey how mature the standard is, but I have indicated the relevant organizations that back the standard using three-letter abbreviations; OAS for OASIS, ORA for Oracle etc. This is of some value, but does not entirely give an impression of maturity. Do not think of the presentation as absolute fact it is a very dynamic field, and is first of all prone to fluctuations over time, and is also very likely to be influenced by political rhetoric from standards organizations and vendors. As you look at the map, it is immediately clear that there are many standards, and the way they build on each other is quite complex, and not very well defined. When looking at the figure we can conclude the following two important points: There seems to be an apparent lack of standards that fit in the management box on the right. There are several key areas where there are conflicting standards, where one group of vendors and standards bodies have formulated one standard, and another group has formulated another. This is very noticeable in the area of business processes. I have indicated the rivals arrow where there is active rivalry in the market. There are many more standards that overlap but have not engaged in active rivalry. Note that I have not included standards referencing XML. The WS standards make extensive use of XML, and XML naturally has a whole range of standards of its own. You might argue that when using WS, you could place, for example, security capabilities somewhere else than tightly knit to web services. You could put some security in the network, or on the XML-document, but this does not mean that we must ignore the need for security or other capabilities at the WS level.

Appendices

91

Security
ebBPSS
ebXML Business Process UN rivals

Business Process
rivals

BPEL
Web Services Business Process Execution Language - process description SIE/SAP/

Management

WS-CHOR
Web Services Choreography Interface Message sequencing OAS//SUN

rivals

Based on

WS-CAF
Web Services Composite Application Framework ARJ/FUJ/ION/ORA/SUN

XKMS
XML Key Management

ebXML Security
Insert

Subpart of

ebXML
OAS

BPEL4WS
Uses BEA/IBM/MIC Based on

XML Encryption
XML Encryption Syntax and Processing

XML Signature
XML-Signature Syntax and Processing W3C Based on

Web Services Flow Language


IBM

WS-CTX
Web Service Context

EbCPPA 1.0 / 2.0 WS-Security


OAS EbXML Collaboration Protocol Profile (CPP) and Agreement (CPA)

WS-CF XLANG
MIC Web Service Coordination Framework

Transaction
WS-Coordination SAML
Security Assertion Markup Language

WS-Policy
BEA/IBM/MIC/

WS-Transaction Framework
BEA/IBM/MIC

WS-Business Activity

WS-Atomic Transaction WS-TXM


Web Service Transactioin Management

WS-Trust
IBM/MIC/RSA/VER/ rivals

ID-WSF
Identity Web Services Framework LIB rivals

WS-Federation
BEA/IBM/MIC/

Builds on

Uses

Based on

ID-FF
Identity Federation Framework OAS//LIB

Discovery
ebXML Registry Services
Registry Information Model OAS

Based on

UDDI 1.0/2.0/(3.0) ID-WSF Discovery Services


LIB Universal Description, Discovery and Integration W3C

Description
WSDL 1.1 / 1.2
Web Service Description Language W3C

Soap Extensions
EbMS 1.0 / 2.0
EbXML Messaging OAS

WS-Transaction

WSRP
Web Services for Remote Portlets OAS//IBM/MIC/ORA/SUN Uses

XACML
XML Access Control Markup Language OAS

Web Services Reliability


OAS//ORA/SUN

SOAP 1.1 / 1.2


Based on rivals Simple Object Access Protocol W3C

WS-Addressing
BEA/IBM/MIC/

XACL
IBM

WSEeliableMessaging
BEA/IBM/MIC

WS-I Basic Profile 1.0 / 1.0a


Web Service Interoperability Framework WSI

Figure 28: Standards Map

The figure is just meant to illustrate that WS is not well defined unless you put it into a much more detailed context. Leaving it up to decentralized actors, will probably result in, very justified but not very coordinated choices. And that is not the fastest route to consistent and coherent IT architecture.

Bibliography

92

8. BIBLIOGRAPHY
Accenture 2004, FESD-projektet, kontrakt, Accenture, Copenhagen. Accenture 2004, FESD-projektet, kontrakt, Bilag 3, Accenture, Copenhagen. Barry, D.K. 2003, Web services and service-oriented architecture : the savvy manager's guide, Morgan Kaufmann; Elsevier Science, San Francisco, Calif.; Oxford. Booth, D., Haas, H. & McCabe, F. 2004, 2004-02-11-last update, Web Services Architecture W3C Working Group Note 11 Februrary 2004 [Homepage of World Wide Web Consortium], [Online]. Available: http://www.w3.org/TR/2004/NOTE-ws-arch20040211/wsa.pdf [2004, 02/12] . Britton, C. 2001, IT architectures and middleware : strategies for building large, integrated systems, Addison-Wesley, Boston, Mass. ; London. Chaudhri, A.B., Rashid, A. & Zicari, R. 2003, XML data management : native XML and XML-enabled database systems, Addison-Wesley, Boston, Mass. Clark, K.G. 2003, , Is There a Consensus Web Services Stack? [Homepage of O'Reilly], [Online]. Available: http://www.xml.com/pub/a/2003/02/12/deviant.html [2004, 02/19] . Computer Sciences Corporation 2004, FESD-projektet, kontrakt, Computer Sciences Corporation, Copenhagen. Coyle, F. 2002, XML, Web Services and the Data Revolution, Addison-Wesley, Indianapolis. Cutler, R. & Denning, P. 2004, 2004/02/02-last update, Annotated List of Web Services Specs [Homepage of W3C], [Online]. Available: http://lists.w3.org/Archives/Public/www-ws-arch/2004Feb/0022.html [2004, 03/09] . Daconta, M.C., Obrst, L.J. & Smith, K.T. 2003, The Semantic Web : a guide to the future of XML, Web services, and knowledge management, Wiley Pub., Indianapolis, Ind. Daum, B. & Merten, U. 2003, System Architecture With XML, Morgan Kaufmann Publishers, San Francisco. Daum, B. & Horak, C. 2001, The XML Shockwave, Software AG Corporate Marketing, Ltzelbach, Germany.

Bibliography

93

Den Digitale Taskforce 2004, , ESDH [Homepage of Projekt Digital Forvaltning], [Online]. Available: http://www.e.gov.dk/fesd [2004, 04/24] . Det Koordinerende Informationsudvalg 2004, , REFERENCEPROFILEN [Homepage of Det Koordinerende Informationsudvalg], [Online]. Available: http://egovernments.org/referenceprofilen/ [2004, 02/01] . Dick, K. 2003, XML A manager's guide, 2nd edn, Pearson Professional Education, Harlow. Douma, S. & Schreuder, H. 2002, Economic approaches to organizations, 3rd edn, Pearson Higher Education, London. Evans, P. & Wurster, T.S. 2000, Blown to bits how the new economics of information transforms strategy, Harvard Business School Press, Boston, Mass. Federal Enterprise Architecture Program Management Office 2004, OMB Enterprise Architecture Assessment v1.0, Federal Enterprise Architecture Program Management Office, Washington, USA. Fielding, R.T. 2000, Architectural Styles and the Design of Network-based Software Architectures, University of California. Goldfarb, C.F. & Prescod, P. 2004, Charles F. Goldfarb's XML handbook, 5.th edn, Prentice Hall, Upper Saddle River, N.J. Grand Central Communications 2004, , Products Page [Homepage of Grand Central Communications], [Online]. Available: http://www.grandcentral.com/products/gcmanage.html [2004, 04/01] . Haas, H. 2001, , XML Protocol Activity: foundation of Web services [Homepage of W3C], [Online]. Available: http://www.w3.org/2001/Talks/0710-hh-grid/slide11-0.html [2004, 03/10] . Hagel, J., Brown, J.S. 2001, "Your Next IT Strategy", Harvard business review, vol. 79, no. 9, pp. 105-113. Hohpe, G., Woolf, B. & Brown, K. 2004, Enterprise integration patterns : designing, building and deploying messaging solutions, Addison-Wesley, Boston. Hougaard, L. 2004, Indledende kortlgning, Vejle Kommune, not published, Vejle. Hougaard, L. 2002, Undersgelse af danske kommuners erfaringer med ESDH, Institut for Statskundskab, Aarhus Universitet, Aarhus, Denmark.

Bibliography

94

Kaye, D. 2003, Loosely coupled : the missing pieces of Web services, RDS Press, Marin County, Calif. Kelly, K. 1999, New rules for the new economy 10 ways the network economy is changing everything, Fourth Estate, London. Lim, B., Wen, J. 2002, "The impact of next generation XML", Information Management & Computer Security, vol. 10, no. 1, pp. 33-40. Linthicum, D.S. 2004, Next Generation Application Integration, Addison Wesley Longman Higher Education, Reading, Mass. Linthicum, D.S. 2001, B2B application integration : e-business-enable your enterprise, AddisonWesley, Boston, Mass. Marks, E.A. & Werrell, M.J. 2003, Executive's Guide to Web Services, John Wiley & Sons, New Jersey. McComb, D. 2003, Semantics in business systems The savvy manager's guide, Elsevier, London. Menard, C. 1997, Transaction cost economics recent developments, Edward Elgar, Cheltenham. Ministry of Science, Technology and Innovation 2003, White Paper on Enterprise Architecture, Ministry of Science, Technology and Innovation, Copenhagen. National IT and Telecom Agency 2004, , InfoStructureBase [Homepage of National IT and Telecom Agency], [Online]. Available: http://isb.oio.dk/info/ [2004, 02/03] . Nghiem, A. 2003, IT Web services a roadmap for the enterprise, Pearson Professional Education, New Jersey. Nix, M. 2004, , IBM Web Services Strategy [Homepage of IBM], [Online]. Available: http://www.interop.dk/articles/Nix10-29.pdf [2004, 02/01] . OASIS 2004, , OASIS Web Services for Remote Portlets (WSRP) Overview [Homepage of OASIS], [Online]. Available: http://www.oasisopen.org/committees/download.php/3488/wsrp-overview-rev2.ppt [2004, 05/03] . OASIS 2004, , OASIS Web Services for Remote Portlets TC [Homepage of OASIS], [Online]. Available: http://www.oasisopen.org/committees/tc_home.php?wg_abbrev=wsrp [2004, 05/03] .

Bibliography

95

Porter, M.E., Millar, V.E. 1985, "How information gives you competitive advantage", Harvard business review, vol. 63, no. 4, pp. 149. Riksarkivet og Statsarkivene 2003, 2003-06-26-last update, Noark-4 [Homepage of Riksarkivet og Statsarkivene], [Online]. Available: http://www.riksarkivet.no/arkivverket/lover/elarkiv/noark-4.html [2004, 02/24] . Ruh, W.A. 2001, Enterprise application integration : a Wiley tech brief, John Wiley, New York ; Chichester. Scott, M., Comer, P. 1999, "The strategic importance of XML applications", International journal of e-business strategy management, vol. 1, no. 2, pp. 130-137. Shi, N.S. & Murthy, V.K. 2003, Architectural issues of Web-enabled electronic business, Idea Group Pub., Hershey, PA. Short, J.E., Venkatraman, N. 1992, "Beyond Business Process Redesign: Redefining Baxter's Business Network", Sloan management review, vol. 34, no. 1, pp. 7. Simon, S.H. 2001, XML, McGraw-Hill Publishing Company, New York. Software Innovation 2004, FESD-projektet, kontrakt, Sofware Innovation, Nrum. Software Innovation 2004, FESD-projektet, kontrakt, Bilag 3, Sofware Innovation, Nrum. Sullivan, L. 2004, 01-03-2004-last update, Driving Standards [Homepage of CMP Media LLC], [Online]. Available: http://www.informationweek.com/shared/printableArticle.jhtml?articleID=182010 98 [2004, 03/09] . Tannenbaum, A. 2001, Metadata solutions : using metamodels, repositories, XML, and enterprise portals to generate information on demand, Addison-Wesley, Boston, Mass. ; London. Weick, K.E. 1976, "Educational Organizations as Loosely Coupled Systems", Administrative Science Quarterly, vol. 21, no. 1, pp. 1. Williamson, O.E. 1997, "Hierarchies, markets and power in the economy an economic perspective" in Transaction cost economics, ed. C. Menard,.

You might also like