You are on page 1of 64

Evaluating a

specific approach
to better re-use of
public sector
information

The Government Data


Mashing Lab

CHRIS MARSDEN
JONATHAN CAVE
STIJN HOORENS

Draft – Not cleared for publication

PM-2169-DFT

11th October 2006

Prepared for the Cabinet Office Data Grand


Challenge
Preface

This Project Memorandum is a ‘think piece’ contribution to government policy making. It


aims to isolate and examine the socio-institutional barriers to better re-use of Public Sector
Information (PSI). It assesses the draft proposal for a Data Mashing Lab (DML) (see:
Appendix 3) and is intended for a specialized audience of government policy makers. At a
later stage the report may be made public if a request is made under the Freedom of
Information Act to the Department for Transport.
The findings are informed by key informant interviews and workshops. Interviews were
conducted on a non-attribution basis in the period 14 September to 4 October 2006 and
are supported by the literature review outlined in the opening two chapters, and the
insights of four workshops also attended in May-September 2006. We thank the workshop
and interview participants for their full and frank contribution to this research project. The
breadth of stakeholders interviewed was broad for such a short research project and
included experts from: academia, Non-Departmental Public Bodies (NDPB), trading
funds, private entrepreneurs, corporations, standards bodies, and government policy
departments with both domestic and international responsibilities. The schedule of
interviews and workshop programmes are contained in Appendix 2.
This draft Project Memorandum is subject to RAND Europe’s quality assurance process.
RAND Europe’s work is objective, multidisciplinary and based upon the core value of
quality. All its products are peer-reviewed before final dissemination as part of our quality
assurance procedures. For more information on RAND’s quality standards please see
http://www.rand.org/standards. RAND Europe is an independent, not-for-profit, research
institution that helps improve policy and decision-making through research and analysis.i
For more information about RAND Europe or this document, please contact:
Chris Marsden
Senior Analyst
RAND Europe
Westbrook Centre, Milton Road
Cambridge CB4 1YG
E-mail: marsden@rand.org
Tel.: +44 1223 358 845

iii
Contents

Preface........................................................................................................................ iii
Executive summary.................................................................................................... vii
List of abbreviations.....................................................................................................ix

CHAPTER 1 Government data sharing analysed...................................................1


1.1 Data sharing outlined ........................................................................................ 1
1.2 PSI Re-use......................................................................................................... 3
1.3 Barriers to re-use of PSI..................................................................................... 3
1.3.1 Technical barriers ................................................................................. 4
1.3.2 Socio-Institutional Barriers ................................................................... 5
1.3.3 Economic Barriers ................................................................................ 5
1.3.4 Legal Issues........................................................................................... 6
1.3.5 Addressing Data Federation Barriers in Stages ...................................... 6

CHAPTER 2 Assessing a staged approach to barriers ............................................8


2.1 Four stages in PSI re-use identified.................................................................... 8
2.1.1 Stage 0: Base case.................................................................................. 8
2.1.2 Stage 1: Experiment within existing policy initiatives............................ 9
2.1.3 Stage 2: Experiment with a new cross-government policy
initiative ............................................................................................... 9
2.1.4 Stage 3: Experiment with a large-scale PPP ........................................... 9
2.1.5 Stage 4: Redraw the legal and economic environment to
encourage data federation ................................................................... 10
2.2 Concluding remarks ........................................................................................ 10

CHAPTER 3 Scoping the case for a DML ...........................................................11


3.1 Placing the DML within government policy towards PSI re-use ...................... 11
3.2 DML Form and Function ............................................................................... 13
3.2.1 Definition of DML roles, functions and responsibilities...................... 13
3.2.2 Stakeholder outreach (trust and communication) ............................... 14
3.2.3 Staffing and Retention........................................................................ 14
3.2.4 PPP issues........................................................................................... 15
3.2.5 Inputs, activities and outputs. ............................................................. 15
3.2.6 Funding.............................................................................................. 16

v
Evaluating a specific approach to better re-use of public sector information RAND Europe

3.3 Measuring the Impact of the DML.................................................................. 16


3.3.1 A Logical Framework for Impact Assessment ...................................... 16
3.3.2 Further specific activities and associated performance metrics ............. 17
3.4 DML and developing legal-economic issues..................................................... 18

CHAPTER 4 Next steps in data sharing .............................................................. 20


4.1.1 Legal issues ......................................................................................... 20
4.1.2 Economic issues.................................................................................. 20
4.1.3 Solving further data federation problems............................................. 20

REFERENCES.......................................................................................................... 21
Reference List ............................................................................................................ 23

APPENDICES .......................................................................................................... 27
Appendix 1: Examples of Existing Data Federation Initiatives.................................... 29
1. Department for Transport: Transport Direct............................................... 29
2: Google Laboratory, example of beta ‘data mashing’ community .................. 30
4. Office of National Statistics (ONS) ............................................................. 32
5. DfES: Data linking Children in Care and National Pupil Database ............. 32
6. Her Majesty’s Revenues and Customs proposed data lab ............................. 32
Appendix 2: Interview and Workshop Schedule......................................................... 33
Appendix 3: July 2006 Data Mash Lab Proposal........................................................ 43

ENDNOTES............................................................................................................. 51

vi
Executive summary

Government departments and agencies collect a wide range of data in the course of their
duties. In Chapter 1, we explain that the re-use of such datasets, collectively described as
Public Sector Information (PSI), produces new forms, services or applications. The
possibilities offered by “data mashing”, a particular type of re-use based on certain
published and accepted data standards, have recently received a lot of attention. There is
however a tension between the possibilities offered by data re-use and the barriers to
implementing or even conceiving suitable data transfers and combination. The Cabinet
Office’s Data Grand Challenge has proposed a Data Mashing Laboratory (DML) to
function as a catalyst to test these new ways of data sharing in a confined ‘sandpit’ setting.
In Chapter 2, we explain that the evolution of PSI re-use faces four specific barriers:
technological, socio-institutional, economic and legal hurdles. The barriers identified may
be addressed via a staged approach: 1) Experiment within existing policy initiatives; 2)
Experiment with a new cross-government policy initiative; 3) Experiment with a large-scale
Public-Private Partnerships (PPP); and 4) Redraw the legal and economic environment to
encourage data federation. The DML is an example of an experiment with a new cross-
government policy initiative, and analysis of the barriers towards new ways of PSI sharing
suggests that Stage 2 should be explored before launching into initiatives requiring more
overt and irreversible commitment.
Following interviews with key informants and a review of relevant literature, we offer in
Chapter 3 the following findings and recommendations for the DML:
 Position the DML to take advantage of established PSI re-use policy initiatives.
 Essential DML outreach activities require pre-defined budgeted resources.
 Plan the appropriate mix of retention, return and attrition to staff DML with skilled
and experienced people.
 Provide opportunity for an open discussion forum at the outset of DML to establish
Intellectual Property Rights (IPR) rules and ‘terms of engagement’ with PPP.
 Rapid DML development of PSI re-use prototypes will be highly dependent on access
to good quality and relevant data.
 A range of metrics are suggested to evaluate the impact of the DML.
In Chapter 4, we recommend further research into the legal-economic implications of
greater PSI re-use, to be conducted in parallel with the detailed development of the DML
proposal.

vii
List of abbreviations

AJAX Accepted Asynchronous JavaScript and XML


AKT Advanced Knowledge Technologies
APPSI Advisory Panel on Public Sector Information
BDL Business Data Linking
BRE Better Regulation Executive
CIO Chief Information Officer
CRAG Coordination of Research and Analysis Group
CSR Comprehensive Spending Review
DCA Department of Constitutional Affairs
DEFRA Department for the Environment, Food and Rural Affairs
DfT Department for Transport
DM Data Mashing
DML Data Mashing Laboratory
DPA Data Protection Act
EGU Electronic Government Unit
EPSRC Engineering and Physical Sciences Research Council
ESRC Economic and Social Research Council
FOIA Freedom of Information Act
GIS Geographic Information System
HMSO Her Majesty’s Stationary Office
ICO Information Commissioner's Office
IPR Intellectual Property Rights
IM Information Management
IT Information Technology
KC Knowledge Council

ix
Evaluating a specific approach to better re-use of public sector information RAND Europe

KM Knowledge Management
MISC31 Ministerial Committee on Data Sharing
MoD Ministry of Defence
NDPB Non-Departmental Public Body
NGO Non-Governmental Organisation
OECD Organisation for Economic Cooperation and Development
OFCOM Office of Communications
OFT Office of Fair Trading
OFTEL Office of Telecommunications
ONS Office of National Statistics
OPSI Office of Public Sector Information
PPP Public-Private Partnership
PSI Public Sector Information
R&D Research and Development
RDF Resource Description Framework
S&T Science and Technology
SPIRE Spatial Information Repository
VML Virtual MicroData Laboratory
XML Extensible Mark-up Language

x
CHAPTER 1 Government data sharing analysed

1.1 Data sharing outlined


As the “Information Society” develops, so too does the quantity and range of available
data, including, specifically relevant for this report, Public Sector Information (PSI). These
data are often collected and codified by organisations in pursuing specific missions, but are
of interest and potential utility for re-use in many other areas, and to many other
organisations. The benefits of re-use are increasing, both as a result of increased coverage
and as an indirect consequence of the increasing complexity and pace of the decisions
facing government bodies, businesses and citizens. On the other hand, there are significant
real and perceived barriers, including the costs of obtaining, assessing, exchanging and
combining PSI.
The re-use and recombination of PSI are related to the extensive fields of Information
management (IM) and its more recent descendant, Knowledge Management (KM).
However, while IM and KM practices and concerns constitute both legacy and, when
incompatible, obstacles to the sort of data sharing considered here, they are primarily
designed for a different context and different issues. In particular, they are often employed
to clarify and improve the use, combination and sharing of data within departments or
government as a whole and to facilitate coordinated activity. By contrast, the data sharing
considered here is more externally focussed. Insights from Information and Knowledge
management certainly contribute to the resolution of technical data sharing problems and
understanding the potential for transforming government – issues that go beyond our
current focus.
While ‘data mashing’ is the concept that receives most attention, we use two other terms in
this report to explain new ways for exploiting PSI: data sharing and data federation. The
three terms are related but have different meanings. We define data sharing and re-use as:
“The active cooperation of two or more bodies to exchange or compare data.”
We define data federation as:
“The merging of that data to produce new forms, services or applications of data, whether for
private (controlled) or public (open) use.”
Clearly, this is more limited than PSI re-use, referring to the operational ‘merging’ or
mixing of data. Finally, we define ‘data mashing’ (whether for PSI or other data) as:

1
Evaluating a specific approach to better re-use of public sector information RAND Europe

A particular type of data sharing based on common use of published and accepted Asynchronous
JavaScript and XML (AJAX) software family data standards.
Figure 1 shows schematically that ‘data mashing’ is a type of data federation, which is a
type of data sharing for the definitions of PSI re-use that we have used.
Figure 1: Data Sharing, Re-use, Federation and mashing – A Schematic Representation

Data sharing

Data re-use

Data federation

Data
mashing

Data mashing has become associated with overblown claims as to its potential and current
value through the use by proponents of ‘Web2.0’ services and applications. O’Reilly states:
“The potential of the web to deliver full scale applications didn't hit the mainstream till
Google introduced Gmail, quickly followed by Google Maps, web based applications with
rich user interfaces and PC-equivalent interactivity.”2

This report uses the term ‘data mashing’ to describe any Internet-based federation of two
or more data types using existing tools to remove technical standardisation as a barrier to
service delivery. Several examples of data mashing can be found in Appendix 1. The public
are important re-users as well as consumers of PSI in data mashing. The user is able to
‘pull’ content3 and even adapt and mix content into a user’s own ‘mash-up’. A mash-up is
a combination of existing media reworked into a new and innovative type4. We caution
that this phenomenon is already generating a hype that may prove illusory.
There is a tension between the possibilities offered by PSI re-use and the barriers to
implementing (or even conceiving) this. The evolution of new PSI re-uses faces specific
technological, socio-institutional, economic and legal hurdles. The barriers are gradually
being tested and – in some cases – overcome through public and private initiatives, as seen
in Appendix 1. However, because the barriers take the form of potential dangers and a
perceived imbalance between costs and benefits, there is – at the present stage – a ‘chicken-
and-egg’ problem. Neither the opportunities nor the risks can be fully evaluated without
concrete experience accessible to the broad range of stakeholders whose interests are
affected. But without the participation of a sufficiently broad sample of stakeholders, such
concrete examples as emerge will tend to be limited in scope and/or too narrowly focused
to clarify the current uncertainty that is the chief barrier. The risks are on the one hand
that beneficial forms of data re-use will be deterred and on the other that inappropriate re-
use may occur.
This chapter summarises the distinguished role of government and key specific barriers.

2
RAND Europe The Government Data Mashing Lab

1.2 PSI Re-use


Government departments and agencies collect a wide range of data in the course of their
duties. Many of these are potentially useful in other contexts, which range from re-use of
individual datasets by other entities to the federation of different datasets into distinct
information products for use by public bodies, private firms and citizens. PSI tends to have
a distinctive character because it can, in principle, be seen and managed as a single,
authoritative repository. Compared to data in private hands, PSI is in general subject to
stringent accountability, accuracy, integrity and transparency constraints. Moreover, costs
associated with, for example, data collection and management are borne by the public,
which makes PSI to some degree a public good and creates a presumption that it should be
used in ways that advance legitimate public interests5.
Across government, two types of data sharing in evidence can be observed: sharing at the
policy and sharing at operational level. At the policy level, the Data Grand Challenge is a
‘flag bearer’ for policy analysis. Other initiatives include: the studies by the Advisory Panel
on Public Sector Information (APPSI);6 the Office of Fair Trading investigation into PSI
supply7 due to report in October 2006;8 and the Organisation for Economic Cooperation
and Development (OECD) investigations into data mashing using PSI, including the
outputs of the workshop of 31 May 2006.9 We also note the UK contribution to the
European Commission’s work in this regard, particularly the implementation of the 2003
Directive on PSI Re-use10, and examples from the United States11. At operational level,
there are far fewer UK examples of data sharing across departments. The leading case
identified is that of the Office for Public Sector Information (OPSI) in the Cabinet Office,
with AKTive PSI, which we examine in Section 1.3.1 below.
We can briefly summarise the results of the literature review as arguing that:
1. There is a need to examine the re-use of publicly-held data in particular;
2. This examination should not be limited to the provision of such data in raw form
to private entities for development of commercial products; and
3. Even the commercial provision of information products by public entities
addresses only that part of the public benefit that can be monetised through
markets, and other social uses also need capturing.

1.3 Barriers to re-use of PSI


Progress towards more efficient PSI re-use can be described in terms of barriers to be
overcome. We can suggest a series of ‘steps’ or stages towards solving these problems:
1. Technical barriers. While complex, these barriers are relatively straightforward to
address, and many existing initiatives are developing answers in (often
forbiddingly) technical terms for non-specialists.
2. Socio-institutional barriers. These hurdles are more difficult, because the
stakeholders argue from different premises when they communicate at all. The
keys to success here are trust and communication, leading to a willingness to share

3
Evaluating a specific approach to better re-use of public sector information RAND Europe

suitable data, to expend effort in combining them and to clarify risks, ownership,
discretion and standing.
3. Economic barriers. Once trust and communication have been established among
public-sector stakeholders, it is possible to engage with the market in order to
address the economic hurdles and ensure adequate finance, suitable contractual
arrangements and engagement with demand. The latter is not simply a matter of
marketing, because the innovation surrounding federated data lies as much in how
they are used as in how they are put together. The inherent complexity of
economic barriers reflects not just the range of stakeholders but, especially for
Public Private Partnerships (PPPs), the potential incompatibility of their remit
and objectives, which makes 'efficient contracting' difficult.
4. Legal barriers are slow to reform, depending on the solution of other barriers.
Contractual issues and Intellectual Property Rights (IPRs) play an important part.

1.3.1 Technical barriers


The ‘native’ scope and structure of PSI is generally determined by the collecting
organisation’s need for the data, weighed against the costs of various data management
strategies (collection, validation, storage, etc.). The data are thus not always ‘visible’ to
other potential users in forms that facilitate re-use. This affects the meaning, compatibility
and quality of data. With regard to meaning, it is necessary to develop common standards
or at least labels – the latter known as ontologies.12 These permit different data sets to be
compared using the same classification.
In particular, data are often preserved or available either in ‘raw’ forms that do not always
clearly indicate their quality, coverage, limitations, etc. or in ‘cooked’ forms that reflect the
uses to which those data have already been put. This may result in inappropriate forms of
federation. For instance, functionality may be lost if component data are limited (e.g. by
stripping off identifiers to protect privacy) instead of managing the risk at the level of the
combined product (by matching the records and then stripping the identifiers) or vice
versa. Other aggregation problems arise when: component data quality limitations are not
adequately understood by those re-using them; inconsistencies in the organisation or
institutional context of data prevent their linkage; or quality standards conflict.
There have been various attempts to create electronic records for of PSI, with the most
basic XML13 data input taking place in about 1998. More recent initiatives label data using
Resource Description Framework (RDF), a data standard that enables ‘semantic’ mixing of
data based on the raw data source. That includes a project which uses OPSI-controlled
data, notably The London Gazette, to place information in semantic web form, in
collaboration with the EPSRC project AKTive PSI14. The semantic web is a ‘next
generation’15 of web services which will provide far greater richness and functionality for
World Wide Web (‘web’) applications on the Internet. Web inventor Tim Berners-Lee has
recently described the timeframe for such a transformation to take place16.
This process requires a network effect to become commonplace, in which the RDF
standard is used by pilot or ‘brochure ware’ products and projects, leading early adopters to
trigger a ‘cascade’ effect via growing incentives (learning by doing, and peer education and
reputational incentives) to follow others who have adopted the standard. The resource and

4
RAND Europe The Government Data Mashing Lab

adoption curve needs in this process mean that it will be at shortest a medium term project
to move from basic to semantic web standards.

1.3.2 Socio-Institutional Barriers


Socio-institutional barriers pertain to the release or sharing of data, partnership with other
data owners and concerns over the ownership and exploitation of federated data products.
These are the least-defined concerns in preliminary studies of PSI federation, not least
because they are often obscured behind technical problems, economic costs or legal
barriers. This lack of clarity accounts to some degree for their persistence. However, the
respondents interviewed for this report17 attest to a ‘silo mentality’ in government that
clearly impedes data sharing: you can only re-use data you can access. The socio-
institutional concerns are generally broader than technical ones. If recognised at all, they
are initially more likely to be seen as barriers than as challenges unless and until the
technical uncertainties are resolved.
A direct concern is the ownership of data, or the loss or uncontrolled transfer of IPRs. The
basic issues of ‘thought ownership’ arise because those institutions contributing component
data ‘own’ them in a holistic sense. They control their use, they are the keepers of
information about their strengths and limitations and they have a right to reap the benefits
of their use. Ownership of the federated product is shared, and the different aspects of
ownership may transfer imperfectly. Participation may be inhibited by insufficient
ownership shares in the federated product or by fears of pre-emption of the data owner’s
ability to use the data in other ways. A less tangible concern is the loss of policy discretion
– in the sense that ‘knowledge is power,’ the sharing of information with other bodies may
raise political or governance concerns. Finally, there may be concerns over confidentiality
regarding the use of data outside the agency’s formal and informal controls on access and
integrity.

1.3.3 Economic Barriers


Economic concerns may be narrowly parochial – the design and implementation of
combined data products impose both data-specific and joint costs, which must be borne by
the stakeholders. These costs may be offset against the returns from commercial
exploitation, but these returns may be uncertain, delayed, distributed in ways that do not
reflect the incidence of costs, diffused across the economy as a whole or – in the case of
federated data offering non-monetisable public returns – insufficient to cover costs. The
OFT has stated that:
“In 2003/4 it was estimated that the turnover of the larger PSI holders was in the region of
£1bn. The total value of public sector information in the UK economy is much higher as
the information is often used as inputs for other products which may be supplied by the
PSI holders themselves, or private bodies.”18
In addition to the static economics of commercial viability and adequate return on public
investment, there are dynamic barriers arising on one hand from lack of clarity at the
outset about the eventual magnitude and distribution of costs and returns among public
sector stakeholders and on the other from the evolving relation with private sector finance
and commercial service suppliers and changing patterns of demand. This includes concerns
regarding trading fund status and the economic purpose of PPPs for PSI.

5
Evaluating a specific approach to better re-use of public sector information RAND Europe

1.3.4 Legal Issues


Finally, there is a set of essential legal barriers. There are many rules designed for a world in
which data were collected and used in ‘closed’ relationships. Some, including Crown
Copyright and privacy rules, embody a precautionary approach that does not necessarily
encourage mutually beneficial alternative arrangements. Copyright issues in PSI were
somewhat simplified since 2003 by use of the click-wrap licence issued by Her Majesty’s
Stationary Office (HMSO), which provides users a class licence to access much PSI.
However, other copyright problems fall in the short term ‘intractable’ category, and are
thus not examined in detail in this report. Two specific issues with legal innovation in
information sharing mitigate some of the ‘lock down’ effects of copyright. One is the
increasing use of primary legislation to require wider re-use of PSI.19 The converse legal
movement is provided by the vexed issues of IPRs, including Crown Copyright, and data
protection rules. This concerns specifically the right of data subjects to consent to new uses
of their data.20 A related intractable area of concern is privacy – whether data owners may
betray the trust of data subjects21.

1.3.5 Addressing Data Federation Barriers in Stages


This suggests an overall evolutionary path from technical solutions to narrowly-defined
problems to ultimate changes in the legal and regulatory infrastructure that facilitates or
enables data federation, as shown in Figure 2. This is not a single path; for instance, some
initiatives have gone straight from the resolution of technical problems to the market. Just
as the hurdles were not necessarily sequential, the stages represent different approaches that
could be complementary, substitute or sequential, depending on institutional setting. But
these may not always produce the best products in a social sense, and the same close
definition that allows them to sidestep socio-institutional issues may limit their ability to
engage the wider community.
The order of these barriers is conceptual. Nor is it the case that the barriers are always
independent. To take a trivial example, technological uncertainty can reduce economic
prospects, but ensuring an economically viable future can encourage investment and
research necessary to overcome or circumvent technical problems. The sequence is used to:
• Stress that each step is an interaction among stakeholders, and thus generates both
specific knowledge (i.e. about a given application) and general knowledge (about
issues arising and good practice in relation to data federation);
• Give order and flow to the practices investigated below; and
• Highlight the socio-institutional step which: is often overlooked by data federation
initiatives with a narrow remit, involves the widest breadth of potential
stakeholders (and thus subjects initiatives to the most rigorous developmental test)
and offers the greatest unexploited potential for cross-cutting lessons.

6
RAND Europe The Government Data Mashing Lab

s
rrier
a l ba
Leg
rs
arrie
ic b
nom
Eco
nal
itutio
io -inst rs
Towards better re-use Soc barrie
of government data
al
hnic
Tec ers
i
barr

ge 4
Sta e 3
Stag
ge 2
Sta e 1
Stag e0
Stag

Figure 2: Conceptual visualisation of barriers and stages towards better re-use of PSI.

The importance of the cross-cutting lessons arises from the fact that data federation is not a
matter of design, but a collaborative activity. The uses that provide benefits from
federation are often discovered by users of the products; the problems and solutions are
often of general applicability; and the realisation of potential rests on willingness to
participate and testing of perceived barriers and obstacles.
These barriers differ in complexity and in the speed and ease with which they can be
addressed. Note that we do not say ‘overcome’ because it is not obvious that the concerns
underlying the barriers should all be set aside, nor that all forms of data federation are
justified in light of those concerns. Our point is that the key barrier to progress, which
must be overcome, is uncertainty. Socio-institutional issues are essential to scoping
possibilities for data federation – not only may potentially valuable products be missed if
component data are not available, but alternative institutional arrangements cannot be
developed or bench-tested. This leads on to a different way of resolving economic and legal
issues. These are not distinct: IPRs, trading fund status, and charging are all economic
matters enshrined or institutionalised in law. Understanding of the socio-institutional
possibilities could therefore lead to reform of market forces and legal framework.
We therefore identify a need to address the socio-institutional interests in a shared
environment, where serious exploration of realistic possibilities is both feasible and likely.
Such an environment should have suitable ground rules and wide participation by a
combination of direct participants and observers. In this way, it could serve as a test bed
for development of actual products and ‘solutions’ (or mechanisms for solving) common
and crosscutting problems. It could also serve as a simple and understandable proof of
concept, clarifying the potential gains and necessary safeguards both to data owners and to
policy makers.

7
CHAPTER 2 Assessing a staged approach to
barriers

2.1 Four stages in PSI re-use identified


The barriers to data federation must be tested and understood in order to be addressed –
some of the concerns are valid limitations on the way forward, some are simple
misunderstandings and others can be resolved through suitable initiatives. We suggest an
approach in four stages corresponding to the four identified classes of policy barriers.
While progression through barriers to better PSI federation is not necessarily sequential,
some current initiatives may already be close to market breakthrough, while initiatives may
be capable of raising as many problems as they solve. This is inevitable in a relatively
complex dynamic environment with multi-disciplinary problems. The stages in
reformulating PSI federation as described have varying degrees of ambition, and strategies
for achieving success can be envisaged as requiring different time-scales, from short- to
medium- to long-term (e.g. from a year to an electoral cycle to an entire policy cycle)22.

2.1.1 Stage 0: Base case


In policy and impact assessments, it is good practice to consider the ‘zero option’ of doing
nothing new23. This is partly because reform options often fail to consider the effective
working of existing systems; in common parlance: ‘if it isn’t broke, don’t fix it’. It is useful
to include the counterfactual in this overview in order to compare the benefits of other
stages to the base case. Existing departmental activities (see for instance Box 1) could in
some cases be stretched by introduction of a cross-departmental approach.
Box 1. Case Study: A devolved approach at DEFRA and the evolution of SPIRE.

An example of a top-down initiative is increased data sharing across the five


Directorates General of the Department for the Environment and Rural Affairs
(DEFRA). The example can be extended. In DEFRA, the Science and
Technology (S&T) function is responsible for supplying data and analysis for
policy making. This function has been split among operational S&T units within
Directorates General, whereas formerly there was a single S&T Directorate.
According to several interviewees for this project, the intention was to devolve
data and analysis into each operational unit to inculcate a culture of operationally
driven S&T analysis and place decisions on data sharing at a daily unit-based
level. Simultaneously with the inception of this devolved approach, the

8
RAND Europe The Government Data Mashing Lab

department as a whole introduced a common information management system,


SPIRE24. This operational centralized policy project places databases on a
common technology platform. The inspiration for SPIRE comes from Non
Departmental Public Management and has led to the promulgation of
Geographical Information System standards for DEFRA25. Further policy
impetus came from the 2002-3 foot-and-mouth epidemic. SPIRE is becoming
widespread, with operational support from the DEFRA Chief Executive Officer.
Askew states: “The securing of high level management buy-in to SPIRE has been
addressed as a priority. SPIRE has been established as a DEFRA corporate
priority programme with support from Board level downwards. High level
support was demonstrated by the DEFRA Minister for Rural Affairs opening and
supporting the SPIRE workshop and conference.”26

A pan-government initiative, described in Stages 2 and 3 below, could provide that


external impetus to help DEFRA think further about data mashing. Such a pan-
government initiative would however need to be sensitive to the existing changes
happening within departments such as DEFRA, where a combination of decentralised
S&T and the new SPIRE system means there is already significant operational change
taking place. However, the opportunity exists for e.g. air quality data federated with
environmental health data to provide ‘quality of environment’ maps for the UK.27

2.1.2 Stage 1: Experiment within existing policy initiatives


Consideration of policy options should include incremental reforms as well as radical
options. The benefits include reduced institutional stress, less potential for reactive
resistance and maintenance of political capital. In particular in the field of information
sharing, proposals to change the status quo are well-developed, both within departments
and agencies, and across government. Best practice examples of data sharing within
departments include the DEFRA approach outlined above, as well as the Office for
National Statistics Laboratory.28

2.1.3 Stage 2: Experiment with a new cross-government policy initiative


The non-technical problem most frequently-mentioned by interviewees has been the socio-
institutional barriers among agencies and government departments. Interviewees identified
such ‘stove piping’ as an important organisational problem in previous attempts to merge
government agencies or departments and in ‘joined-up government’ initiatives.29 It can be
tackled through incentives and training for personnel to reverse their approach to
information ownership – in the organisational rather than legal-economic sense. A
‘sandpit’ or closed environment in which controlled data sharing can be undertaken in a
collegial and innovative environment has been proposed: the Data Mashing Laboratory
(DML). The analysis of the barriers towards new ways of PSI sharing suggests that Stage 2
should be explored before launching into initiatives requiring more overt and irreversible
commitment. Its terms, funding and structure need extensive examination in order to
clarify the issues and dispel potentially-troublesome misunderstanding.

2.1.4 Stage 3: Experiment with a large-scale PPP


Aside from the PSI sharing problems within government (as explored in Stage 0, 1 and 2
above), it is useful to distinguish those arising between government and other parties. This
latter category includes two types of data sharing:

9
Evaluating a specific approach to better re-use of public sector information RAND Europe

1. Sharing of data between government and known private parties, such as


Department for Transport (DfT) PSI re-use in the Transport Direct project. The
issues arising in this example are at least tractable and, it is that which we focus on
in Chapter 3.
2. Private re-use of PSI taken from known repositories with no prior government
knowledge of the private entity project. Such activities include those that DfT
commissioned from MySociety for various ‘data mashing’ projects, using existing
technical tools and known PSI sets.30
The second sharing example takes place in a less controlled and regulated space than the
first, and begins to raise a series of issues that are as much legal and economic as socio-
institutional. This includes combinatorial factors such as raising the degree to which
government entities are risk-averse with data in such environment.

2.1.5 Stage 4: Redraw the legal and economic environment to encourage data
federation
A laboratory may isolate and identify the social-institutional barriers and help inform other
barriers31. It is a gateway to a long-term solution and a decision point for addressing
economic and legal barriers32. The radical redrawing of institutional arrangements such as
the Trading Fund agreements with commercially exploitable agency data, and Treasury
calculation of long-term economic benefit from data sharing with the private sector, are
areas that at least require analysis, ideally in conjunction with the 2007 Comprehensive
Spending Review. Legislation change to further implement changes would require
Parliamentary scheduling, and could not feasibly be undertaken before 2009-10.33

2.2 Concluding remarks


We have identified five stages, from the ‘base case’ Stage 0 to ‘redrawing the legal and
economic environment to encourage data federation’ Stage 4. In Chapter 3, we focus on
the problems that are achievable within the short and medium term, through a closer
analysis of the characteristics of the stages 1 to 3, with the focus on a closed test bed
environment (Stage 2). In Chapter 4 we suggest paths towards solving longer-term
economic and legal issues.

10
CHAPTER 3 Scoping the case for a DML

This chapter sets out the case for a government DML as a short- to medium-term step
towards demonstrating the benefits of PSI re-use34. The case for such a protected
environment rests on its institutional context, inputs, activities and outputs. Figure 3
describes these issues, which are then explored in more detail.
Figure 3: Schematic Representation of Data Mash Laboratory

Inputs: DML boundaries:

• Data • IPR arrangements


• Software, IT • Confidentiality
• Personnel • Location within
• Outreach government
• Finance DML activities:

• Data mashing
• Developing software tools,
techniques
• Exploring institutional,
contractual forms
• Identifying, prototyping
products Outputs:
• ‘Making the case’ through • Data mash products
examples, and participation • Software, procedures,
standards
• Incentives for further data
collection, exchange
• Partnerships
• …

Section 3.1 shows the forces creating institutional momentum for such a demonstrable
project. Section 3.2 considers its form and function. Section 3.3 considers its objectives
and the metrics against which its impact can be assessed, and Section 3.4 attempts to
extrapolate future development functions for the DML from interviewee comments. As in
Chapter 2, much of our analysis derives from interview data and workshop presentations
detailed in Appendix 2.

3.1 Placing the DML within government policy towards PSI re-use
In considering the case for a DML, it is an essential first step to analyse recent external
developments affecting the DML’s potential impetus and catalyst for data sharing. On 1
November, the Office of Public Sector Information (OPSI) will merge with The National

11
Evaluating a specific approach to better re-use of public sector information RAND Europe

Archives. OPSI itself was a May 2005 re-branding and repurposing of Her Majesty’s
Stationary Office (HMSO) in response to the July 2005 regulations implementing the
Reuse of PSI Directive 2003. This Directive, which will be reviewed in 2007/8, and the
work of other international bodies, notably the OECD, continue to set a transformative
agenda for PSI35. The OPSI Director states: “we are driving forward a transformation in
the creation, management, representation, dissemination and re-use”36 of data under her
control.
The better use of data sharing has been highlighted by the Better Regulation Executive
(BRE) in the Cabinet Office, inspired by the Hampton Review. They identify two strands
– more efficient data collection, and better use of existing data. Our concern here is the
latter. There is now a Hampton data-sharing group driven by the BRE. Its structure and
role are not yet clear, but it is unlikely to emphasise analysis:
The Cabinet Office has a further interest in data sharing as part of the ‘Transformational
Government’ remit of the Electronic Government Unit (EGU). Together with the
Delivery Unit in the Prime Minister’s Policy Unit, it has established a leading role in
coordinating departmental policy regarding data sharing and outputs for citizens. The
eGov Monitor states: “Without sharing data, delivering seamless services through joined
up government would not only be difficult, but downright impossible to deliver.”37
A further advisory body is the Chief Information Officer (CIO) Council, comprising CIOs
of each government department. The CIO Council decided in September to deal with
knowledge management issues more fully by establishing a Knowledge Council (KC),
chaired by TNA chief executive Natalie Ceeney, and a Delivery Council responsible for
implementing projects requested by that KC38. These bodies have yet to be constituted,
but the suggestion is that the Delivery Council would initially report in to the Prime
Minister’s Delivery Unit, in order to ensure political priority. The KC is intended to
comprise neither CIOs nor Chief Scientific Advisors39, but senior ministerial policy aides.
Figure 4 below shows key central data sharing and knowledge management initiatives. The
DML needs to be positioned to take advantage of these established initiatives

12
RAND Europe The Government Data Mashing Lab

Figure 4: Central PSI Sharing Initiatives

Cabinet Committee
MISC31

OPSI -The National Archives E-Government Unit


‘Transformational Government’

Prime Minister’s Delivery Unit Better Regulation Executive

Department of Constitutional CIO Council


Affairs - FOIA DPA Unit

Knowledge Council Delivery Council

Coordination of Research and Data Grand Challenge to


Analysis (CRAG) Group Science and Innovation Committee

3.2 DML Form and Function


Informed expertise (i.e. interviews and literature40) on successful DML construction and
implementation emphasises the need for the following key factors: definition of roles,
functions and responsibilities; funding; central government position and buy-in;
stakeholder outreach (trust and communication); and high-quality data inputs and
outputs. We deal with each of these in turn.

3.2.1 Definition of DML roles, functions and responsibilities


We consider first the DMLs literal description, given the importance of terminology to the
willingness of entrenched interests to accept and engage with institutional innovation.
While the term ‘data’ is broad and encompasses many different formats and inputs, it is
relatively straightforward to interpret within central government. Some interviewees
suggested that a DML should overtly aim at knowledge transfer; while this is by no means
a semantic distinction no objection has been raised to the use of the term in this context41.
The word ‘mashing’ created more perturbation, with the implicit suggestion from
interviewees that a fashionable technologically determined term may put off central
government policymakers of more conservative mindset. The Web2.0 applications upon
which mashing rests are well understood by technology experts, and their immediate
usefulness commented upon. Indeed, despite the radicalized term, this is a far less
technologically ambitious than for instance the AKTive PSI initiative. Amazon
Corporation experts have described the broad unauthorized use of data in the public use of
masking as “extreme re-use”42, which raises many economic and legal aspects that the
DML may be well-advised to avoid. There is therefore recognition that ‘data mashing’
describes the aspiration of the DML well, but that its technology and legal-economic
implications for uncontrolled reuse of PSI are unwelcome to many,

13
Evaluating a specific approach to better re-use of public sector information RAND Europe

A terminology that is used by the Ministry of Defence (MoD) is ‘data fusion’. This
describes the merger of two data sets to produce a new product that has greater
functionality than the sum of the two parts. In the MoD sense, data fusion is about fusing
in real time all sorts of information (from sensors, intelligence) to create an overall picture
of the ‘battle space’ at any one time. Given the security and integrity concerns of the MoD,
the term ‘data fusion’ carries no unfortunate implications of broad uncontrolled re-use,
and therefore may be a less value-laden term to employ. However, as it describes the
statistical inference drawn from multiple datasets, it actually describes a more
technologically sophisticated approach than data mashing. Alternatives such as ‘data
federation’ carry confusing comparisons with European federalism. A less value-laden term
would be ‘data meshing’. However, the distinction between ‘mashing’ and ‘meshing’ may
be too subtle to carry more than semantic confusion forward from the current term.
The word ‘laboratory’ also has technologically driven implications, but carries the well-
understood meaning of a controlled environment somewhat insulated from external
influences, which can distort both participation and the ability to draw general lessons
from experience. In this case, these influences include at a practical level the data sets, but
more pertinently to this report the various barriers identified in Chapter 2. The playful
term ‘sandpit’ is used to describe the environment captures the spontaneous and risk-free
character of the activities to be undertaken by comparison with departmental daily
requirements. There is a further issue between the sandpit and the test-bench: whether
people will take the venture seriously. This depends on a balance of what stakeholders are
asked to contribute (inputs in Figure 3) and hope to gain. Thus there should be a credible
path for exploiting results.
We therefore consider that the DML portrays both the benefits of the ‘sandpit’
environment and the comfort which will be provided to assuage existing legal, economic
and other concerns that the experiment will not undertake unnecessarily risky public
activities with PSI data.

3.2.2 Stakeholder outreach (trust and communication)


The public and government outreach and engagement activities will require substantial
communications expertise, such as dedicated corporate communications, conference
organisation and press office staff. Such activities are essential to the public purpose of the
DML. We therefore suggest that costing of these ‘ancillary’ services to the main substantive
technological experimental function of the DML be accounted for and estimated in order
to more fully scope the costs structure of the DML and ensure full resourcing.

3.2.3 Staffing and Retention


One of the critical internal issues is likely to be staffing. Interviewees have indicated that in
other PSI federation experiments, staff shortages resulted in general support reductions,
postponed development of several minor datasets and projects (websites, blogs and wikis)
and delays in IT equipment upgrading.
Beyond these ‘priority risks’ are specific issues of human capital, awareness and good
practice. Human capital is concerned with accession and secondment as well as retention.
The DML should be staffed with suitably-skilled and –experienced people and plan for the
appropriate mix of retention, return and attrition. This is not a given - one interviewee’s

14
RAND Europe The Government Data Mashing Lab

experience in the US ‘Reinventing Government’ data sharing initiatives of 1993-4


indicated that the visibility of government data mashing expertise to the private sector
inevitably led to substantial recruitment by start-up and other private sector companies,
resulting in very large and irreplaceable knowledge transfer out of government43. To
manage these risks will require a well-considered reward structure for both departments
and individuals to encourage retaining of staff and expertise and their later embedding
back into departmental structures. Such policies should also take into account both
awareness-raising and the transfer of methodological and good practice advances –
participants will learn from their experience, and take this learning with them when they
go. Thus, while the DML might initially reduce the availability of data mashers44, it should
ultimately contribute to both demand and supply. When DML ‘alumni’ go elsewhere they
will help others understand what the DML is and might be. Finally, those who return to
their ‘homes’ will be able to make useful suggestions as to what data to collect, in what
forms and how to conceive DM opportunities.

3.2.4 PPP issues


Given the existence of private-sector data mashing initiatives and the expertise, market
access and efficiencies of the private sector, it is highly likely that at least some data
mashing products will entail public-private partnerships beyond the DML per se. In the
DML, private sector participants can contribute relevant (non-PSI), expertise in technical,
marketing and usability matters, finance, etc. But they have proprietary interests in the
products and their exploitation, so their engagement must be structured to avoid distorting
subsequent outsourcing/partnership competition or the existing balance of competition.
A large-scale private sector involvement could bring costs and benefits. First, we consider
costs. The engagement with the private sector may create legal overhead in negotiating
terms of access to the DML, as well as the usual costs associated with Non-Disclosure
Agreements and other instruments associated with R&D collaboration. Were
commercially useful products to ‘spin out’ of the DML at a later stage, that would require
much fuller consideration of the terms of engagement – and that prior to private sector
involvement rather than the more expensive ‘re-engineering’ of IPRs that would result after
the fact45. To fully conduct simulations of PPP would require a greater infusion of
commercial engagement expertise on the government side, similar to that which MoD uses
for its much larger scale deployment simulations for new technologies and systems.
Benefits from early and full private sector involvement in the DML are likely to be greater
entrepreneurial engagement in prototype development at an early stage, however this
depends to a great extent on the ‘terms of engagement’ with the PPP. Attempting to set
IPR rules at the outset of engagement may deter particularly start-up actors, but the danger
of attempting to retrofit IPRs once working prototypes have been developed is another
consideration. Given the experience that institutions such as the BBC and Google Labs
have already developed, as well as OPSI, it is suggested that an open discussion forum may
help to set out the rules of the road prior to the formation of the DML.

3.2.5 Inputs, activities and outputs.


Major projects that the DML could undertake in its first year of operation include:

15
Evaluating a specific approach to better re-use of public sector information RAND Europe

1. Creating a linked dataset for research use and exploring the use of innovative
linking techniques;
2. Acquiring a wider range of data and using methods developed for data mashing to
expand the usefulness of PSI; and
3. A pilot study for linking identifiable data.
The actual types and uses of PSI in the DML are technically complex and outside the
scope of this report, but it is evident that rapid development of initial prototypes will be
highly dependent on access to good quality and relevant data. Interviewees suggested
various interesting types of data mashing that could be performed, including “indices of
multiple deprivation indicators” that could show departments the linkages between their
respective metrics and deprivation.

3.2.6 Funding
There are no independent figures for the costs of PSI gathering, the expense of incidents
caused by inadequate data sharing, or the potential citizen gains from data mashing46. The
OFT inquiry is expected to estimate the total commercial trading in PSI at over £1billion,
but clearly the costs and benefits of various economic models are worthy of further and
more rigorous investigation47.
Interviewees reiterated that, in comparison with the overall cost of PSI collection, the
amount envisioned for the DML, at £10million over two years, was a relatively trivial
sum48. It was suggested by those with experience of MoD and other larger projects that
some concern might be expressed at under-resourcing, especially if all start-up costs and
ongoing overheads are accounted in the £5m annual costs. The costs of IT systems alone
may be substantial (though mitigated by the easy commercial and indeed free availability
of much AJAX-based software). We note that the software may be free, but the provision
of support for running it is not, and needs to be budgeted. On the other hand, the tools
developed within the DML may be of interest to and reusable by the different participants,
which would support either a ‘public good’ or ‘voluntary contribution’ support model.

3.3 Measuring the Impact of the DML

3.3.1 A Logical Framework for Impact Assessment


Impact assessment is often facilitated by a logical framework49. This takes the form of a
matrix as in Table 1: the rows lay out the ‘intervention logic’ showing how the DML
should produce its effects, while the columns identify the criteria, associated measurable
indicators and key assumptions or risk factors for each stage in the process. The overall or
generic structure for the intervention logic runs:
Design → Inputs → Activities → Outputs → Outcomes → Broader Impact →
Sustainability [→ Monitoring and evaluation]
While a full logical framework is beyond our current scope, this simplified presentation
does help to highlight the criteria and organise potential performance indicators.

16
RAND Europe The Government Data Mashing Lab

Table 1: Logical framework for DML Evaluation


Intervention
Criterion Indicators Assumptions
logic
Relevance of
Resolution of incentive,
Design design, clarity of Establishing documents
buy-in issues
objectives
Data, software, processes, personnel Credibility of commitments,
Inputs Quantity, quality (number, training, duration), funding ability to reallocate as
(structure, amount) necessary
Appropriate management
Management Initiative selection process,
Activities for experimental
coherence entry/exit/transition arrangements
interdepartmental setting
DM products: cost of production,
Resolution of IP, etc. issues
quality
Outputs Efficiency Access and political
Awareness raising: activities, direct
commitment
measures, new participation
Market outcomes, engagement with Development of public
Effectiveness external parties, subsequent demand, appropriate
modification business models
Outcomes
Contributions of funding, staff, data,
Changes in priorities and
Participation uptake, observer/participant status,
resource availability
engagement with further development
Uptake by citizens, complementary
Broader (private and mixed) products, other
Utility ‘Bottom-up’ innovation
Impact transformational government impact
measures
Matching private RTD and
Contribution to Knock-on initiatives, changes in data
Sustainability investment, improvements
wider objectives collection, data policies
in data collection

3.3.2 Further specific activities and associated performance metrics


The following items should be considered for inclusion in both qualitative and quantitative
metrics:
 The basis for the DML’s funding, targets and operations: this should be clarified by
the DML team in consultation with external bodies.
 In-house research productivity: staff should be encouraged to publish articles for
government, press, scholarly and corporate audiences.
 An active strategy for improving knowledge of operations and the opportunities
available to private sector, data mash enthusiast experts and government50.
 International collaboration: the DML should be in discussions with international
bodies about sharing the UK experience to
− Develop best-practice guidelines for remote access facilities explore;
− Innovative new ways of sharing data across borders.
 a DML Newsletter (frequency of issue and subscriber numbers as metrics);
 a DML Blog or Wiki51 (again with frequency of posting, numbers of active
participants and website ‘hits’ as metrics);
 an annual conference – for instance of Terra Future type52;
 regular practitioner Workshops and Seminars (with participants, tabled papers,
participants and website hits on presentations from workshops as metrics);
 Joint events (for instance with Trading Funds, private partners, other departmental
initiatives and international partners including e.g. OECD).

17
Evaluating a specific approach to better re-use of public sector information RAND Europe

A further key metric is the successful establishment and operation of the ‘Policy
Observation board’ – both for its internal DML and external cross-government
coordination, and its integration of DML outcomes to law/economics/operational issues
across government.

3.4 DML and developing legal-economic issues


The major medium-term issue is likely to be coordination with other government
departments. This may determine whether its funding is unsecured or short-term, as its
lifespan will require reassessment at regular intervals. A rather obvious issue is ministerial
attention span, bringing the need to demonstrate progress on prototypes that appeal to
policy-oriented and non-technical audiences. There is therefore a serious risk to the DML
if data sets are unavailable or unreliable.
A further risk arises outside government: insufficient external participation. The
‘triangulation’ between government, existing commercial partners and commercial/ social
entrepreneurs who may be technical or thought leaders is likely to be complex, and
contractual terms may most easily be devolved initially to Cabinet Office policy experts.
International liaison may be insufficient, as it is often the ‘Friday afternoon’ task for an
overworked new initiative (APPSI is a potential policy partner).
The DML impact on Stage 2 barriers should be evaluated in its life-cycle context. While
some trajectories lead directly from technical development to market exploitation; the
DML can help ensure efficient and effective progression and that capture of wider lessons
relating to Stages 3 and 4. We conclude this Chapter with a short explication of economic
and legal issues most relevant to the DML. Some are a direct consequence of eventual
market deployment while others arise from inter-governmental economic relations or the
handling of technical, socio-organisational or legal issues by economic means. For instance,
purely technical choice among standards may fail to demand-side considerations
appropriately. Market competition among standards or products may aid the choice and
provide incentives for improvement. Similarly, coordination may be simplified by
incentive contracts or (internal) market-testing. The design of property rights and trading
mechanisms is an economic problem, especially when the parties may not know enough
about each other to negotiate suitable divisions of effort or reward, or may not be able to
monitor each others’ activities for opportunism or free-riding.
The main ‘internal’ issue is ground rule design. Participants contribute information,
processes, concepts, legacy intellectual property and market access. Laboratory
development and testing will produce new products, standards, procedures, etc. Some can
be jointly exploited but others may be better used by one of the parties, or through third-
party collaboration. Some insights can be gleaned from the literature on research joint
ventures – esp. with regard to intellectual property53 and liability54, timing (when rights are
negotiated, valued or assigned), standard-form contracts and ‘exit conditions’ for
withdrawal or subsequent exploitation. Perspective also comes from analogous (primarily
3rd stage) military initiatives. However, they may be simpler - public sector participants
share an overall ministry (and thus congruent goals and constraints) while many private
sector partners share the relatively closed world of the Defence Industrial Base. The public

18
RAND Europe The Government Data Mashing Lab

interdepartmental character of the DML creates new possibilities for crossover innovation
but also new concerns: differences in objectives, constraints and freedom to implement
appropriate contractual and financial participation. This calls for suitable subsidies,
institutional funding and (in-kind) ‘pay or play’ provisions.
Familiar private RTD joint venture issues of organising rights, responsibilities and
liabilities to induce efficient information sharing, effort and exploitation may need to be
clarified for the DML. Use of information creation, sharing and exploitation measures can
help it serve as a test bed for organisational and contractual forms, mechanisms for
matching partners joint business models55.
Data mashing has aspects of both complements and substitutes, which are treated
separately in the literature. Clearly, data used in the final product are complementary, but
substitute components (data, organisational schema, ontologies, interfaces, etc.) may
already exist. This competition may distort development56, while complementarity can
induce free-riding, tipping and an inefficient pace of development57.
A key demand side issue is the difference (if any) between data mashing products and
exploitation of other public assets by “(re)selling on wider markets58.” How should
development costs be covered and should successes subsidise failures? Should competition
among public-sector data-mashing products be encouraged? How should joint costs and
revenues be hypothecated? Should public and private (sector) data be mashed together or
distributed by means of proprietary software or standards? What public-public or public-
private partnerships should govern market exploitation? How should commercial risk be
assessed, underwritten and allocated to public bodies?
Finally, a successful data mashing product may ‘defeat’ rival products and dominate the
market – likely for products offering uniquely authoritative information or network
externalities. Should publicly-derived data mashing products run the risk of driving
competitors out of business, or ‘win’ by triggering private- and public-sector imitators,
derivative or complementary products, etc. leading ultimately to greater value for money –
at least for the consuming public? At issue is the balance of interests between the public
‘owners’ of source data and product users, taking into account public benefits (e.g. more
efficient use of transport). It is an important consideration in DML design because
participants’ expectations influence outcomes and because at some point market-derived
performance metrics will be strongly suggested.
The case for the DML rests on socio-institutional issues, to help change the way
government departments interact with each other to build a common perspective on PSI
re-use. A DML structured and monitored along the lines suggested here is capable of
testing these barriers and, in the process, furthering Transformational Government. But
this is not the end of the process. The DML must reflect its technical, economic and legal
contexts but does not in itself pre-empt their barriers or opportunities. Beyond the
medium term issues lie Stage 3-4 barriers to data federation. The concluding Chapter 4
sets out research projects to address these remaining barriers.

19
CHAPTER 4 Next steps in data sharing

We have set out the problem of PSI re-use, the analysis of the problem, and assessed a
proposal for isolating and managing a specific element of the problem: socio-institutional
barriers to greater PSI re-use. We now identify the thus far intractable barriers to greater
data sharing: legal and economic reform of the trading and sharing environment.

4.1.1 Legal issues


We can analyse legal issues according to three categories: law and the integrity of data
(including the data subject); law and IPRs (including arrangements for Trading Funds);
and law and PPP contractual arrangements. Our interest is whether a later stage of this
research project, whether conducted by or for government, can gain traction on the
manifest problems by extending the analysis in the three previous chapters. That could
point to new extended remits for the DML, for the Knowledge Council, or entirely new
reform-based efforts. Impact assessment is needed to assess the most efficient approach to
any general legal reform of terms of re-use of PSI.

4.1.2 Economic issues


The central economic issue in government costing of data for sharing is that of up- or
downstream pricing – whether to charge on cost recovery basis, as currently is undertaken
by for instance Trading Funds (whose further instruction from Treasury is to maximise
commercial value from data re-use); or on a marginal cost basis (as for instance by The
National Archives census project). Marginal cost is not necessarily economically
disadvantageous for Treasury or the wider UK economy, as the downstream uses of that
data may result in far greater public surplus due to wider re-use and new applications.
Upstream, however, the costs of creating electronic files and subsequent RDF/semantic
web mark-up of PSI are very large, and the budgetary implications of this investment may
be discussed in the CSR. It is clearly a subject for much needed research into cost-benefits.

4.1.3 Solving further data federation problems


In this short report, we have analysed a specific proposal for a relatively modest central
government initiative to use current and freely available technologies to attempt to
illustrate the socio-institutional difficulties of greater cross-government data federation.
This is a small interim step in a wider reassessment and analysis that is needed to
investigate the further reform of PSI re-use. In particular, we draw attention to the costs
and benefits of developing commercial partnerships and applications from the PSI that
exists, including the challenges of digitisation of paper recordslix and manipulation of
sensitive data (for privacy, security, or other reasons).

20
REFERENCES

21
Reference List

ARTICLE 29 Data Protection Working Party (2003) Opinion 7/2003 on the re-use of
public sector information and the protection of personal data, 10936/03/EN at
http://ec.europa.eu/justice_home/fsj/privacy/docs/wpdocs/2003/wp83_en.pdf
Askew, D. (2004) SDI Creation At A Thematic And Organisational Level; Experiences
From The UK, Presented At 10th EC GI & GIS Workshop, ESDI State Of The Art,
Warsaw, Poland, 23-25 June, At http://Www.Ec-Gis.Org/Workshops/10ec-
Gis/Papers/24june_Askew.Pdf#Search=%22defra%20spire%22
Barker, Anna (2006) 11 July, presentation to Work and Pensions Economics Group, D.1
Topic: The ONS Session. Restricted and Government Datasets for Research Use: The
Practitioners Corner at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/Barker.ppt
Barr, J. (2006) Web Services 2.0: Best Practices for Extreme Reuse, paper given to
WWW2006 conference 23-26 May, at
http://www2006.org/programme/item.php?id=d12
Cabinet Office (2005) Transformational Government: Enabled by Technology at
http://www.cio.gov.uk/documents/pdf/transgov/transgov-
strategy.pdf#search=%22transformational%20government%22
Cabinet Office/Prime Minister’s Strategy Unit with Department for Trade and Industry
(2005) Connecting the UK: The Digital Strategy, at
http://www.dti.gov.uk/files/file13434.pdf#search=%22connecting%20britain%20the%
20digital%20strategy%22
CIO Council (September 2006) tabled paper, Information and Knowledge Management
Strategy: Overall framework – outline of approach
Cross, Michael (2006) National Archives squares the data circle, Technology Guardian, 14
September at 3,
Darlington, John, Jeremy Cohen, William Lee (undated, mimeo) An Architecture for a
Next-Generation Internet based on Web Services and Utility Computing, London e-
Science Centre
Dunleavy, P. (1989) Paradoxes of an Ungrounded Statism, Chapter 7 in Castles, F.G. (ed)
The Comparative History of Public Policy, Polity Press, Cambridge, at 265-266.

23
Evaluating a specific approach to better re-use of public sector information RAND Europe

eGov Monitor (2006) Data Sharing in Public Sector - Resolving the Conundrum, 11
September, at http://www.egovmonitor.com/node/7533
European Commission SEC (2005) 791 Impact Assessment Guidelines, update 15 March
2006.
Gershon, P. (2004) Releasing Resources for the Frontline: Independent Review of Public
Sector Efficiency, HM Treasury, London at http://www.hm-
treasury.gov.uk/spending_review/spend_sr04/associated_documents/spending_sr04_eff
iciency.cfm
Hampton, P. (2005) Reducing administrative burdens: effective inspection and
enforcement, HM Treasury, London at http://www.hm-
treasury.gov.uk/media/A63/EF/bud05hamptonv1.pdf
Harlow Carol (1997) “Back to Basics: Reinventing Administrative Law”, Public Law 245-
261
Hood, C. (2006) Chapter 22: The Tools of Government in the Information Age , in
Goodin, Robert E., Michael Moran, and Martin Rein (eds)Handbook of Public Policy,
Oxford University Press, Oxford.
Kelly, Frank (2006) Data and innovation – the case for experimentation, Journal of the
Foundation for Science and Technology 19:2, at 14-15
Kingdon, J. (1984) Agendas, alternatives and public policies. Boston: Little Brown.
Lachman, Beth et al (2002) Lessons for the Global Spatial data Infrastructure:
International Case Study Analysis, Documented briefing, RAND Corporation.
Melody, W. H. (1996) The strategic value of policy research in the information economy,
in Dutton: William H. ed. (1996) Information and communication technologies:
Visions and realities, 303-317. London: Oxford University Press.
OECD (2006, 30 March) Digital Broadband Content: Public Sector Information And
Content, at http://www.oecd.org/dataoecd/10/22/36481524.pdf and workshop of 31
May at
http://www.oecd.org/document/17/0,2340,en_2649_37441_36860241_1_1_1_37441
,00.html
Polak, J. (2006) Presentation to Cambridge-MIT Institute workshop.
Pollock, R. (2006) The Value of the Public Domain, July, Institute of Public Policy
Research, London.
Ritchie, Felix (2006) 11 July, presentation to Work and Pensions Economics Group, D.1
Topic: The ONS Session. Restricted and Government Datasets for Research Use: The
Practitioners Corner, at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/RItchie.ppt
Towers Perrin (2001) Report for Regulatory Stteering Group: Ofcom Scoping Project, at
http://www.ofcom.org.uk/static/archive/Oftel/publications/about_oftel/2001/towe100
1.pdf

24
RAND Europe References

Tullo, Carol (2006) Unlocking the potential of public sector information, Public Servant,
October, at p35.
Weiss, P. (2002) Borders in Cyberspace: Conflicting Public Sector Information Policies
and their Economic Impacts, US Department of Commerce, and comments of Mike
Liebhold at OS Terra Future conference, Southampton, 19 September 2006, for
instance.

25
APPENDICES

27
RAND Europe Appendix 1

Appendix 1: Examples of Existing Data Federation


Initiatives

1. Department for Transport: Transport Direct60


Transport Direct is a clear example of the benefits and pitfalls of government data mashing. This
incorporates DfT, agency and private franchisee (bus and train company) data in a tool for
travellers to identify total journey time.

29
Evaluating a specific approach to better re-use of public sector information RAND Europe

2: Google Laboratory, example of beta ‘data mashing’ community


The use of GIS systems has been transformed by Google Maps making available the source code
for their data, together with the ability to code using AJAX-based software tools, since 2003.

30
RAND Europe Appendix 1

3. BBC Backstage, taken from backstage.bbc.co.uk


BBC Backstage is
“for individual developers and designers to build things using BBC content and anyone who has an idea for
how to use BBC content in new ways. It is not for big corporates to play around with. backstage.bbc.co.uk
is for non-commercial use by the little people. backstage.bbc.co.uk is part of the BBC’s wider remit to
"build public value" by sharing our content for others to use creatively. backstage.bbc.co.uk aims to
promote innovation amongst the design and developer community: if people are able to do interesting,
productive things with the content then we’d like to support them. Finally and as a useful by-product of
the above, backstage.bbc.co.uk is an opportunity to identify talent in the online community.”

31
Evaluating a specific approach to better re-use of public sector information RAND Europe

4. Office of National Statistics (ONS)


The Business Data Linking (BDL) branch created VML in 2003. Virtual MicroData Laboratory
(VML)61 has acquired a number of social datasets. The VML is also used by a number of other
areas in ONS as a way of providing a secure area for internal research, e.g. Census and
Longitudinal Study. The board’s remit has therefore extended beyond business data.
“ONS collects large amounts of business microdata in the course of its business. BDL provides access
to the data via its secure "Microdata lab", where academic researchers can carry out statistical analyses.
This data is confidential, therefore access is tightly restricted. The restrictions can be summarised as:
• only researchers fully employed at bona fide academic or charitable research institutes, or civil
servants, may have access. There is no facility at the moment for PhD students.
• the employer is required to sign an agreement taking collective responsiblity for the actions of all
its researchers. Researchers are required to agree to standard secondment contract terms. There is
no access without signed agreements.
• projects must be of academic value and demonstrate (a) a clear interest for ONS in the results (b)
the specific need for the datasets requested.
• access is only granted through BDL's secure microdata lab on site at ONS premises.
There are no exceptions to these rules. We do not provide access to data in any other manner. In
particular, please do not request subsets, linked datasets, or aggregated figures.”62
Almost all the recent UK empirical microeconomic studies on productivity referenced in the
2006 Budget Productivity Report are based on VML analyses 63. This stands in sharp contrast to
five years ago, when productivity analysis relied heavily on aggregate data and research from other
countries. The range of productivity analyses at a national level has stimulated the use of this data
for regional analyses, with the devolved assemblies being particularly keen to carry out more local
analysis. On a European scale, the VML has been used in a major project - backed by the
European Commission Research Directorate - to create a database on measures of economic
growth, productivity, employment creation, capital formation and technological change at the
industry level for EU member states.

5. DfES: Data linking Children in Care and National Pupil Database64


6. Her Majesty’s Revenues and Customs proposed data lab
This is a “Whole Customer View” project that would allow Heads of Duty in businesses to see a
complete tax picture for the business. The project does not involve any new linking at source;
although database links FAME and the IDBR are being used for company structure. There is no
comprehensive list even of major data sources and the interactions between them.

32
RAND Europe Appendix 2

Appendix 2: Interview and Workshop Schedule

A series of interviews was conducted to inform this report. The organisations interviewed and the
dates of the interviews are listed in the table below.
Table 2: Organisations interviewed and dates of interview.
Date Organisation
7 Sept Access to Knowledge
11 Sept Office of National Statistics
11 Sept Department for Trade and Industry
12 Sept Cambridge-MIT Institute workshop
13 Sept Office of Public Sector Information
13 Sept Meteorological Office
19 Sept Terra Future conference
19 Sept Ordnance Survey
20 Sept DEFRA
22 Sept E-Government Unit
20 Sept Information Commission
20 Sept ESRC
27 Sept IBM
22 Sept Openstreetmap.org
6 October Cambridge University Centre for Mathematical Sciences
3 October Her Majesty’s Stationary Office

Furthermore, four workshops were attended to inform this report:


1. Data Federation Strategic Transport workshop held at the Department for Trade and
Industry on 12 September 2006
2. TERRA FUTURE conference held at Ordnance Survey, Southampton on 19 September
2006
3. Data-Mashing Workshop held at The Royal Society on Tuesday 25th July 2006
4. “Merging diverse datasets can produce new insights: What new applications are possible
and are there new privacy and regulatory issues?” held at The Royal Society on 9th May,
2006
The agendas of these workshops are included in the following pages.

33
Evaluating a specific approach to better re-use of public sector information RAND Europe

Data Federation Strategic Transport


Tuesday 12th September 2006
Agenda

08:00 - 08:30 Welcome & Registration, coffee, croissants, etc


08:30 - 08:45 Introduction, Peter Landshoff
08:45 - 08:55 John Polak, Professor of Transport Demand and Head of the Centre
for Transport Studies at Imperial College London
08:55 - 09:05 Brian Collins, Chief Scientific Advisor DfT and Cranfield
09:05 - 09:20 Graham Cattell, Group Director of Projects BP International
09:20 - 09:25 Ray Browne, DTI
09:25 - 09:35 Briefing on Break-out Groups and relocation to rooms
09:35 - 10:00 Breakout Session 1: Identify the top three issues in the groups
10:00 - 10:25 Plenary Session 1: Present the top three issues from the groups
10:25 - 10:40 Coffee Break
10:40 to 11:30 Breakout Session 2: Answer questions for the top two issues in the
groups
11:30 - 12:28 Plenary Session 2: Dr Jonathan Mosedale presents the encapsulation
of what has been presented (10 minutes)
12:28 - 12:30 Close – Peter Landshoff
12:30 - 13:00 Lunch and Networking

34
RAND Europe Appendix 2

TERRA FUTURE
This conference was held on 19 September 2006 | Ordnance Survey, Southampton, SO16 4GU
The event looked at the impact of future trends on information businesses and invited more than 130
thought leaders from business, government and academia to express their views on new and evolving
technologies, societal change and consumer demands.
Keynote speaker Sir Tim Berners-Lee, inventor of the World Wide Web, opened the event exploring how
the semantic web – an automated extension of the web using machine-readable information to share and
reuse data – has the potential to boost its reach and functionality: “Everything can be given a uniform
resource identifier (URI), which describes concepts as well as objects. Translating your data into Resource
Description Framework (RDF) language means you can explain what it does, make it available and connect
to other people.” More
Keynote speaker: Sir Tim Berners-Lee, Inventor of the World Wide Web
Other speakers were:
John Darlington
Daniel Erasmus
Leticia Gutierrez Villarías
Mike Liebhold
Glenn Lyons
Robin Mannings
Sheila Moorcroft
Dr. Tracy Ross
Jens Jacobsen
Dr. Cathy Dolbear
Sir Tim Berners-Lee, inventor of the World Wide Web, will introduce and inspire debate on the future of
location information. Key themes will include the future of the World Wide Web and the growing
importance of geographic information (GI).
GI is stimulating new uses of the World Wide Web, evolving existing applications and underpinning the
creation of new ones to adapt to global trends. I am delighted to be addressing the attendees at Terra future
and anticipate a productive and inspiring debate between those driving the development of location data
and the information businesses looking to embrace it. Tim Berners-Lee
--------------------------------------------------------------------------------
Mike Liebhold is a Senior Researcher for the Institute for the Future (IFTF), California, USA, focusing on
proactive, context-aware and ubiquitous computing including the social implications and technical
evolution of a geospatial web. Most recently, Mike was a producer and program leader for the Technology
Horizons New Geography Conference at the Presidio in San Francisco. Previously, Mike was a visiting
Researcher at Intel® Labs working on a pattern language based on semantic web frameworks for ubiquitous
computing. Mike is also co-author of Proactive Computing through Patterns of Activity and Place,
publication pending. In the 1980s and early 1990s at Apple® Advanced Technology Labs, Mike led the
Terraform project - an investigation of cartographic and location-based hypermedia. Mike also led the
launch of strategic partnerships with National Geographic®, Lucasfilm, Disney®, MIT, AT&T Bell Labs
and others. As Chief Technology Officer for Times Mirror Publishing, Mike helped launch over 20
professional and consumer web content services, led very early large-scale Intranet designs and then worked
as a senior consulting architect at Netscape. During the late 1990s Mike worked on start-ups, building
large-scale international public IT services and IP networks for rural and remote regions in China, India,

35
Evaluating a specific approach to better re-use of public sector information RAND Europe

Europe and Latin America. Mike occasionally publishes his thoughts about micro-local and geospatial
computing on his web log at http://www.starhill.us.
--------------------------------------------------------------------------------
Leticia Gutierrez Villarías is the DIP Ontology Engineer at Essex County Council (ECC). DIP
(http://dip.semanticweb.org/), a European Integrated Project running for 3 years, aims to produce a new
technology infrastructure for Semantic Web Services (SWS). ECC leads the eGovernment use case,
identifying real eGovernment scenarios which may benefit from these new technologies and implementing
them in order to prove their usefulness. We are currently working on a GIS-based emergency planning
system which combines SWS and GIS technologies together in order to facilitate the automation of
information sharing among different governmental organizations and other partners based on a spatial
point of view during an emergency situation.
--------------------------------------------------------------------------------
John Darlington has over 20 years in the software industry working for companies including IBM,
Microsoft and Sony. More recently he has helped establish and grow a number of technology startup
companies. He is currently working with the University of Southampton to help engage business and
government in adopting semantically rich web services. One of the projects he manages is the AKTive PSI
project which aims to explore what is possible with a broad range of public sector information, using recent
advances in web-based information technologies.
--------------------------------------------------------------------------------
Ed Parsons is Ordnance Survey's Chief Technology Officer and is responsible for all IT operations at the
national mapping organisation, including the development and implementation of the IT strategy to
underpin all business activities. Ed also manages Ordnance Survey’s web presence and is in charge of its
geospatial management. He also leads the organisation’s Research Group, charged with exploring and
developing Ordnance Survey’s long-term future. Ed has worked in the GI and LBS industry throughout his
career.
--------------------------------------------------------------------------------
Daniel Erasmus has, for the last 10 years, been facilitating scenario processes to a diverse body of clients
across 3 continents. He has worked with a range of private and public sector clients including Nokia,
Rabobank, the city Rotterdam , the Rijksgebouwendienst, Schlumberger, Telenor, Vodafone, etc. Visit the
DTN’s web site for more detailed information (www.dtn.net). Daniel’s first web site, the Van Gogh
Gauguin experience received a Cannes Nomination, and ID magazine bronze prize. He is a board member
of the European Internet Archive, the foundation Reflecting, and co-developer of Ci’Num.

36
RAND Europe Appendix 2

AGENDA
DATA-MASHING WORKSHOP

Tuesday 25th July 2006 : 09:30 – 14:00


The Royal Society, 6-9 Carlton House Terrace, London SW1Y 5AG
09:30 Arrivals and coffee
10:00 Welcome and Introduction Frank Kelly,
Department for
Professor Frank Kelly, Chief Scientific Adviser at DfT, will introduce the
Transport
proposal for a cross-government data mashing lab and the Committee on
Science and Innovation’s ‘Data Grand Challenge’, for which DfT is lead
department.
10:15 Presentation - Imperial College Internet Centre John Darlington,
Imperial College
The Imperial College Internet Centre will continue the work of the
London e-Science Centre and develop the applications and technologies for
the Internet industries and services. These are the applications facilitated by
the availability of innovative content and that can be accessed and
processed electronically and delivered instantly on-demand to a global
audience via the Internet. This talk will present the rationale for the
Internet Centre and discuss some of the service-based technologies it is
developing.
10:30 Presentation – Nigel Shadbolt Nigel Shadbolt
University of
The Advanced Knowledge Technologies IRC has been researching and
Southampton
developing infrastructure, tools and techniques to rapidly integrate
information. This information integration has drawn on a whole range of
structured, semi-structured and unstructured content. Application contexts
have included scientific and engineering domains, health, defence and
government. This presentation will give a summary of achievements,
lessons learned and future work planned. It will also argue for the
importance of a means of exploring the governmental data and information
integration.
10:45 Presentation – Isochrones and data mashing Chris Lightfoot,
Chris will present travel time isochrones work produced as a proof of MySociety
concept for DfT's work on the 'Data Grand and describe analogous data
mashing applications developed by mysociety.
11:00 Questions / Discussion
11:20 Short break

37
Evaluating a specific approach to better re-use of public sector information RAND Europe

11:40 BBC Backstage - Open Innovation at the BBC Matthew Locke,


BBC
As internet technologies are maturing and adoption levels are now in the
majority, the landscape of innovation is changing. We are moving from a
world of long-term R&D that was primarily located in research labs and
academia, to a vast distributed network of 'lead-users', who innovative via
collaborative social networks. Matt Locke will talk about how the BBC is
responding to this new innovation landscape, and describe various pilot
projects including Backstage and Innovation Labs.
12:00 OPSI - Unlocking the potential of public sector information. John John Sheridan,
Sheridan presents the Office of Public Sector Information's role, responsibilities Office of Public
and plans for future web services. Sector
Information
12.20 Transport Direct Paul
Drummond,
Implementing the transport information portal www.transportdirect.info
Department for
has involved combining material originally intended for different purposes.
Transport
The success of this initiative has depended not just on technical capability
and innovation but also on significant stakeholder support and a solid
consumer research basis.
12.30 Ito! Peter Miller, Ito!
Video demonstration of tools for the visualisation of public transport and
related data.
12:40 Questions / Discussion Chair: Frank
Kelly
13.10 Buffet Lunch

14:00
Close
Speaker profiles

Frank Kelly - Department for Transport

Frank Kelly has been DfT's Chief Scientific Adviser since August 2003, with responsibility for the quality
of science and scientific advice. He is also Professor of the Mathematics of Systems at Cambridge
University where his main research interests are in random processes, networks and optimisation.

John Darlington - Director of the London e-Science Centre


John Darlington is Professor in the Department of Computing at Imperial College and Director of the
London e-Science Centre and the recently formed Imperial College Internet Centre.
http://www.lesc.ic.ac.uk/admin/role.html

Nigel Shadbolt - Advanced Knowledge Technologies


Nigel Shadbolt is Professor of Artificial Intelligence (AI) in the School of Electronics and Computer
Science at Southampton University. He is Director of the EPSRC Advanced Knowledge Technologies
IRC, Fellow and Deputy President of the British Computer Society and chairs the Society’s Knowledge
Services Board. http://www.aktors.org/akt/objectives/

38
RAND Europe Appendix 2

Chris Lightfoot - mySociety


Chris Lightfoot is a developer at mySociety - which has two missions. The first is to be a charitable project
which builds websites that give people simple, tangible benefits in the civic and community aspects of their
lives. The second is to teach the public and voluntary sectors, through demonstration, how to most
efficiently use the internet to improve lives. http://www.mysociety.org/

Matthew Locke - BBC Backstage


Matt Locke is Head of Innovation for BBC New Media & Technology. He is responsible for developing
and running research programmes within the BBC and with external partners, including developing
academic and industry partnerships, and developing open innovation initiatives like
http://backstage.bbc.co.uk and http://open.bbc.co.uk/labs/. BBC Backstage was recently nominated for a
NewStatesman New Media 2006 Award, in the Innovation category.
John Sheridan - Office of Public Sector Information (OPSI)
The Office of Public Sector Information seeks to provide a framework of best practice for opening up and
encouraging the re-use of public sector information. Formerly known as Her Majesty’s Stationery Office
(HMSO), OPSI recently announced its merger with National Archives. The new organisation will lead on
information policy and management across government and the wider public sector.
http://www.opsi.gov.uk/
Paul Drummond - Transport Direct
Paul Drummond is Transport Direct's Technical Manager. The Transport Direct web portal provides
comprehensive, easy-to-use, multi-modal travel information and ticketing. Transport Direct works with
private and public sector travel providers and local and national government. The non-profit service is
funded by DfT, the Welsh Assemble and the Scottish Executive. http://www.transportdirect.info/

Peter Miller - Ito!

Ito! is a UK registered company providing web based mapping, movies and data management services for
the transport professional and for the transport user. Our services are based on an advanced multi-modal
transport model of the UK’s transport system including both roads and public transport visualised using
state of the art special effects techniques. Ito! was founded by Peter Miller and Hal Bertram with previous
experience in the transport sector and film industry.

39
Evaluating a specific approach to better re-use of public sector information RAND Europe

40
RAND Europe Appendix 2

41
RAND Europe Appendix 3

Appendix 3: July 2006 Data Mash Lab Proposal

Summary: proposal to establish a lab/forum having the funding and mandate to


experiment in building data mashing services of government data
Background

Government collects and uses a wide range of data to both inform and deliver its policies.
This data is generally used for specific purposes and is rarely made easily accessible for
other uses. Yet data held by government for one purpose can offer immense benefits in the
delivery of other services, particularly when combined or 'mashed' with data from other
sources.

Advances in information and communication technologies and the development of more


sophisticated and easy-to-use software tools continue to remove the technical barriers to
65
realising data mashing applications. The ability to produce innovative data applications is no
longer the preserve of computer scientists whose role is increasingly in providing the tools
and services permitting others to develop highly personalised and specific applications. As a
result a highly diverse community from the public, private and voluntary sectors are engaged
in the development of novel data mashing applications.

No single data collector or user, government departments included, can reliably predict how
data may be used when combined with data from other sources. Realising these benefits,
therefore, requires permitting greater access to data in order to permit experimentation in
developing innovative data applications. Among the obstacles to improving access are
regulatory and administrative barriers, poor incentives and limited awareness and expertise
across government.

The challenge of realising new data applications is not unique to the public sector. Within the
private sector there has been a trend away from a highly controlled development from
concept to finished product, towards a more iterative approach where the rapid development
66
of beta version products is followed by testing and further modification of concept and
67
design . The engagement of a diverse stakeholder community during conception,
development and testing is essential to success. Such an approach allows the gradual
evolution of a product shaped by the stakeholder community. As well as assisting the
development of identified data applications, the approach has the additional advantage of
helping to identify unforeseen applications for data.

Government could benefit from finding new way of engaging the existing capability of the non-
government sector in delivering the potential benefits of data mashing. To do so will require
adopting more flexible ways of working, particularly in terms of the commissioning and
management of projects:

• not seeking to define final data applications but allow experimentation and the gradual
evolution of applications;
• recognizing the potential added value of suitably anonymised official data being made
available for mashing with other data sources.

43
Evaluating a specific approach to better re-use of public sector information RAND Europe

A government data mashing lab: the vision


What purpose?
i. To enhance the development and delivery of government policies by accelerating the
delivery of innovative real-time and archived data mashing applications.
ii. To improve government accountability and contribute to the empowerment of citizens
by allowing more innovative and customisable information services to develop through
improved access to public sector data.
Key features
• A new (“laboratory-style”) model of working with the non-government sector on data
mashing applications.
• Management team that includes specialists from outside of government.
• An appropriate solution that responds to the capability of new technologies and
stakeholder engagement.
Activities:
Identifying and realising data mashing applications by:
• facilitating the development of proofs of principle, pilots and beta applications
demonstrating what is possible;
• stimulating cross-organisational applications where the lack of a clear home is a likely
cause for "resistance" and failure to commit adequate resources;
• supporting projects that would not otherwise receive funding due to their small scale
or high risk;
• commissioning of projects within overall objectives endorsed by ministers and senior
68
management but without their direct engagement thereafter ;
• engaging the capability of small, creative organisations, businesses and other
working partners as well as specialist units within larger organisations where
appropriate.
To realise these projects, the "labs" will also:
• support the development of mashing toolkits and related middleware;
• resolve data access issues – while the lab will not create original datasets, it will seek
to facilitate access to developers through negotiation with individual data holders;
• establish procurement processes and project management that ensure control of the
project till completion and a focus on continuous product improvement.
In realising these activities, the labs will seek to become a focus for stakeholder engagement,
knowledge exchange and transfer, supporting government and fostering better understanding
of policy needs within the non-government sector.
Organisation
The aim is for a management-light unit that focuses on providing strategic direction, securing
resources and ensuring communication with the broader government community. The
location of the Unit within Whitehall is largely immaterial but could usefully sit wherever
access can be provided to a Ministerial champion.
i. Management and 'in-house analysts' team -to facilitate and manage project
commissions, with a strong focus on contract negotiation and deliverables. It is
essential that the team have access to cross-disciplinary expertise in information
technology, software development, the regulatory framework and project
management. Options exist for these personnel to be seconded to the Group from
outside of government and from data-owning Departments as a means of ensuring
relevance to policy needs. The team will be supported by a small secretariat
responsible for creating examples of “the art of the possible” and producing an annual
report to participating Departments.
ii. Stakeholder group – In order to build a development community, the forum will need
to attract people from a wide range of backgrounds, interest and keep them happy
about their work. The realisation of innovative applications requires a diverse
stakeholder community and not just programmers, analysts and policy makers.
Participation must bring technical specialists together with policy owners to stimulate
innovation and delivery.

44
RAND Europe Appendix 3

iii. Overview panel - to ensure accountability, the activities of the forum will be subject
to regular scrutiny by a cross-government overview panel drawn primarily from key
government stakeholders. The panel will have no involvement in the day-to-day
running of the forum and will convene every six months.
It is expected that governance arrangements will evolve during the life of the forum as lessons
are learnt and better ways of working developed.
Way of working
As well as engaging cross-government interest and support, the forum must be capable of
exciting the interest of a broad range of academic researchers and developers in the private
and not-for-profit sectors.
i. Public-facing - in time the activities of the forum will be highly visible and public
access to projects deliverables encouraged. Access to products, and the development
of appropriate e-tools, allowing dialogue between customers, developers and the
public will be essential drivers of innovation. It will have the added benefit of
demonstrating government commitment to exploring innovative ways to deliver
services of social benefit.
ii. Stakeholder engagement at all stages - strategic thinking is done up-front to ensure
that all parties have a common vision. It will seek to engender a dialogue allowing both
developers to demonstrate applications to potential policy customers and policy
customers to communicate their needs
iii. Rapid decision-making with minimal administration - ideas must not be killed off by
bureaucratic procedures and premature analytic criticism. Guidelines for selecting
ideas areas are however needed in terms of a business case that outlines the
evidence that the innovation is likely to succeed; suggests how the idea can be
developed; identifies potential benefits commensurate with development costs
Financial implications
It is proposed that the lab is granted a ‘bedding in period’ of two years for the Unit, with
guaranteed funding of £10m, to enable experimentation and the creation of expertise to be
completed. During this period consideration should be given to allowing access to the
resources by private sector organisations. Private sector players could explore the potential
for translating innovation through data-mashing into novel products and services. If this
proves possible, the relationship would be governed by contractual and licensing a regime.

45
Possible structure for a Government Data mashing lab

Key Facilitators
Gov Policy Customers •Information Government & Other Data Holders
Gov departments and Commissioner; Government: ONS, OS, Met Office, Land
NDGBs •Legal advice (DCA?); Registry, OGDs, etc
•OPSI (copyright); Others: Private sector, Data Archive,
Communicate international
•e-gov unit
needs
Provides
Communicates advice
obstacles Request / negotiate Provide data access
access
Communicate
'In-house' Analysts proposals MASHING LAB
•Analyst staff from Management Group
across government Consults / Contract Communicate, Negotiate, & Provides oversight
•Multi-disciplinary Commission Oversight Panel
Advisers…
TO include external secondeesa
Reports to…

CONTRACT

Middleware & Application prototypes Ideas / Proposals Commercial


development tools LICENSE
Products

SUPPLY

Private Sector
Established ICT, SMEs and start-ups

47
Way forward
Key elements required for success are:
• A mandate to build this experimental approach, accepting levels of uncertainty
on outcomes in the belief that this will lead to value-added services.
• A licence to make these experiments public.
• Cross-government support in identifying a management team, access to
financial resources, hosting the Unit, providing Ministerial support, and
developing a stakeholder group.
Establishing a management group and governance
An initial management group will be established that will include representatives from
across government and from the not-for-profit, research and private sectors. The group
will be tasked with defining the proof of concept and terms of reference, and providing a
sounding board during the bedding in period over the first 12-24 months.
Assuming the Unit succeeds in achieving a commercially viable model (including
Departmental support for engagement with not-for-profit sectors), the governance of
the Unit should be reviewed after five years with a view to changing it location and
nature.
Establish funding
Initial core funding: it is anticipated that the lab will receive core funding of £10m from
the science budget during a 'bedding-in' period of two years.
Product licensing: the management group will explore how, like some public sector
research establishments, the labs could become largely self-funding from being
allowed to alter and license its experiments for private sector use.
Co-funding: It is anticipated that the lab will attract significant co-funding from other
sources to support specific application development and advance its core functions as
appropriate.
Building a stakeholder group
In building a stakeholder group, the management group will focus on engaging:
• Government organisations - as direct policy customers of applications or as
bodies responsible for data use issues (e.g.: DCA, OPSI, e-government unit,
69
BBC ). Also as a source for short-term secondments into the Unit of technical
specialists and policy owners.
• Private sector - the relative autonomy of application-level software from
underlying infrastructure layers and the relatively low entry-costs makes the
sector particularly attractive to SMEs, start-ups and venture capital investment.
• Voluntary sector – the forum will seek to engage the innovative ability of a
sector that has already been demonstrated through the development of
70
applications by MySociety following a commission by DfT.
• Research community – the forum will seek to remove the non-technical barriers
71
that inhibit the engagement of the industrial and academic research
community to application development and development of middleware data
mashing tools.
All stakeholders will be encouraged to identify and promote potential mashing
applications and public service needs to encourage the development of innovative
solutions. The management team, enabled through appropriate secondments, will
ensure that work of the lab retains high degrees of relevance to policy and commercial
objectives that exist or evolve over the lifetime of the project. To heighten awareness
and encourage stakeholder engagement the management team will consider
establishing a “mash-up competition” granting awards for the most innovative mash-
ups of public sector data.

49
Evaluating a specific approach to better re-use of public sector information RAND Europe

Liaising with key data holders


To facilitate access to key data sources and resolve legal issues concerning data use,
the management group will consult key data holders such as the Office of National
Statistics, Ordnance Survey, the Environment Agency, the Data Archive and key
government departments.

50
ENDNOTES

i
For more information on RAND Europe, please see: www.randeurope.org
2
See: http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
3
This development has similarities with the overall Creative Commons movement for attributable non-
commercial copyright. See Lessig, L. (2005) Free Culture: The Nature and Future of Creativity. New York:
Penguin Books.
4
A term used in relation to the Internet only since 2004, its best description remains that on Wikipedia, itself
an exemplar of user-generated content: http://en.wikipedia.org/wiki/Mashup_%28
web_application_hybrid%29
5
For wider public domain arguments, see Pollock, R. (2006) The Value of the Public Domain, July, Institute
of Public Policy Research, London.
6
For instance, the comparative country case studies found at: http://www.appsi.gov.uk/reports/research.htm
7
See the market study page at: http://www.oft.gov.uk/Business/Market+studies/commercial.htm
8
See Guardian story of April 2006 at: http://technology.guardian.co.uk/weekly/story/0,,1752262,00.html
9
See OECD (2006, 30 March) Digital Broadband Content: Public Sector Information And Content, at
http://www.oecd.org/dataoecd/10/22/36481524.pdf and workshop of 31 May at
http://www.oecd.org/document/17/0,2340,en_2649_37441_36860241_1_1_1_37441,00.html
10
Updates on implementation available at
http://europa.eu.int/information_society/policy/psi/implementation/index_en.htm#psigroup
11
Lachman, Beth et al (2002) Lessons for the Global Spatial data Infrastructure: International Case Study
Analysis, Documented briefing, RAND Corporation.
12
A very narrow sense of ontology –in this context, it is a formal specification of how to represent objects.
13
Insert ontology explanation
14
See http://www.aktors.org/people/
15
See Darlington, John, Jeremy Cohen, William Lee (undated, mimeo) An Architecture for a Next-Generation
Internet based on Web Services and Utility Computing, London e-Science Centre
16
Berners-Lee, Tim (2006) Presentation to Terra Future conference, 19 September, at
http://www.w3.org/2006/Talks/0919-os-tbl/
17
For a list of organisations interviewed, see Appendix 2.
18
http://www.oft.gov.uk/Business/Market+studies/commercial.htm
19
Available at: http://www.opsi.gov.uk/ACTS/acts2000/20000036.htm
20
Rights embodied in Directive EC/46/95 and the Data Protection Act 1998.
21
Privacy concerns include risks arising from re-use by other parties with whom data subjects may not have the
above-described ‘informed consent’ relation. See specifically ARTICLE 29 Data Protection Working Party
(2003) Opinion 7/2003 on the re-use of public sector information and the protection of personal data ,
10936/03/EN at http://ec.europa.eu/justice_home/fsj/privacy/docs/wpdocs/2003/wp83_en.pdf
22
See in this regard the over-arching policies described in:
 Hampton, P. (2005) Reducing administrative burdens: effective inspection and enforcement, HM
Treasury, London at http://www.hm-treasury.gov.uk/media/A63/EF/bud05hamptonv1.pdf

51
Evaluating a specific approach to better re-use of public sector information RAND Europe

 Gershon, P. (2004) Releasing Resources for the Frontline: Independent Review of Public Sector
Efficiency, HM Treasury, London at http://www.hm-
treasury.gov.uk/spending_review/spend_sr04/associated_documents/spending_sr04_efficiency.cfm
 Cabinet Office (2005) Transformational Government: Enabled by Technology at
http://www.cio.gov.uk/documents/pdf/transgov/transgov-
strategy.pdf#search=%22transformational%20government%22
 Cabinet Office/Prime Minister’s Strategy Unit with Department for Trade and Industry (2005)
Connecting the UK: The Digital Strategy, at
http://www.dti.gov.uk/files/file13434.pdf#search=%22connecting%20britain%20the%20digital%20strat
egy%22
23
See European Commission SEC (2005) 791 Impact Assessment Guidelines, update 15 March 2006.
24
Spatial Information Repository.
25
See Askew, D. (2004) SDI Creation At A Thematic And Organisational Level; Experiences From The UK,
Presented At 10th EC GI & GIS Workshop, ESDI State Of The Art, Warsaw, Poland, 23-25 June, At
http://Www.Ec-Gis.Org/Workshops/10ec-Gis/Papers/24june_Askew.Pdf#Search=%22defra%20spire%22
26
Askew (2004) at 10.
27
KCL did some of this – add URL FELIX to supply mid-October
28
Cite URL and research paper October FELIX
29
cite NAO ‘joint targets’ report.
30
Including those of Trading Funds, such as Ordnance Survey (OS) and Meteorological Office (‘Met Office’).
31
Kelly, Frank (2006) Data and innovation – the case for experimentation, Journal of the Foundation for
Science and Technology 19:2, at 14-15
32
See Harlow Carol (1997) “Back to Basics: Reinventing Administrative Law”, Public Law 245-261
33
Assuming no dissolution of Parliament before the end of the 2008-9 session.
34
We do not set out a ‘blueprint’ for the lab – for an example of such a report, see Towers Perrin (2001)
Report for Regulatory Steering Group: Ofcom Scoping Project, at
http://www.ofcom.org.uk/static/archive/Oftel/publications/about_oftel/2001/towe1001.pdf
35
On motivations see Kingdon, J. (1984) Agendas, alternatives and public policies. Boston: Little Brown.
36
Tullo, Carol (2006) Unlocking the potential of public sector information, Public Servant, October, at p35.
37
See eGov Monitor (2006) Data Sharing in Public Sector - Resolving the Conundrum, 11 September, at
http://www.egovmonitor.com/node/7533
38
CIO Council (September 2006) tabled paper, Information and Knowledge Management Strategy: Overall
framework – outline of approach
39
Note the coordination role of Coordination of Research and Analysis (CRAG) Group:
http://www.gsr.gov.uk/gsr_network/crag_members.asp
40
See Hood, C. (2006) Chapter 22: The Tools of Government in the Information Age , in Goodin, Robert E.,
Michael Moran, and Martin Rein (eds)Handbook of Public Policy, Oxford University Press, Oxford.
41
On information economy research in particular, see Melody, W. H. (1996) The strategic value of policy
research in the information economy, in Dutton: William H. ed. (1996) Information and communication
technologies: Visions and realities, 303-317. London: Oxford University Press.
42
See Barr, J. (2006) Web Services 2.0: Best Practices for Extreme Reuse, paper given to WWW2006
conference 23-26 May, at http://www2006.org/programme/item.php?id=d12
43
Such transitions need not be an unmitigated loss; they can provide ‘intelligent’ partners for public-private
initiatives, disseminate good practice and mobilise competitive forces that increase efficiency and bolster
demand for (and benefits from) data mashing in wider markets.
44
This concern was raised by one interviewee.
45
Paul David OII on e-science and patent pooling for basic research
46
An excellent source on the paramouncy of economic analysis by Treasury in British central government is
Dunleavy, P. (1989) Paradoxes of an Ungrounded Statism, Chapter 7 in Castles, F.G. (ed) The Comparative
History of Public Policy, Polity Press, Cambridge, at 265-266.
47
See Weiss, P. (2002) Borders in Cyberspace: Conflicting Public Sector Information Policies and their
Economic Impacts, US Department of Commerce, and comments of Mike Liebhold at OS Terra Future
conference, Southampton, 19 September 2006, for instance.

52
RAND Europe Endnotes

48
Alternate claims that the BBC Backstage cost £150,000 were dismissed by interviewees as external costs
rather than the internal BBC costs in staffing, overheads, IT support etc. necessary for the project. See
http://backstage.bbc.co.uk/

50
This might borrow from OGC practice with an initial provision of good practice examples and resources and
leading on to a roadmap or reference for data mashing initiatives. Adherence to such a roadmap might also
provide a solid basis for inter-organisational initiatives.
51
This refers to an open forum for joint authoring: the DML could support Wikis on institutional, economic,
technical and other cross-cutting areas to assure that engagement is sustained and that valid ‘peripheral’ outputs
are produced.

53
IP issues include: i) sharing returns for commercial data mashing products; ii) valuation of legacy intellectual
property as opposed to products of collaborative or parallel activity; and iii) the distinction (if any) among
rights to information, compendia (per se database protection), interface and other software, rights to specific
uses or channels of distribution, etc.
54
The liability issues include financial liability to third-party rights holders and liabilities arising as a result of
the development and exploitation of data mashing products – for instance, liability for incompleteness, error,
etc. The situation is complicated by legal issues (e.g. the extent to which public information can be relied on
for different purposes), but derives its force from the potential economic consequences.
55
Alternatives range from the sort of ‘internal stock or options markets’ used by large firms such as General
Motors to reallocate research funding to specifically designed auctions for rights to contribute to or exploit
joint products once their characteristics have been clarified. Such mechanisms may be needed to prevent
distortion of data mash product design by strategic cooperative and effort incentives.
56
if, for instance, the owner of the ‘winning’ interface, etc. has lower costs of contributing to or exploiting the
final product or is able to claim a greater share of the joint proceeds
57
Katz and Shapiro (1986).
58
HM Treasury - Selling Into Wider Markets: A Policy Note for Public Bodies -Dec 2002
lix
See Cross, Michael (2006) National Archives squares the data circle, Technology Guardian, 14 September at
3, describing CEO Natalie Ceeney’s plans to enter into PPP arrangements to digitise census data, and the use
of search techniques using technology such as the Autonomy IDOL server.
60
taken from http://www.transportdirect.info
61
Ritchie, Felix (2006) 11 July, presentation to Work and Pensions Economics Group, D.1 Topic: The ONS
Session. Restricted and Government Datasets for Research Use: The Practitioners Corner, at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/RItchie.ppt
62
http://www.statistics.gov.uk/about/bdl/
63
Reference, VML Annual Report 2005/6 at p
64
Barker, Anna (2006) 11 July, presentation to Work and Pensions Economics Group, D.1 Topic: The ONS
Session. Restricted and Government Datasets for Research Use: The Practitioners Corner at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/Barker.ppt
65
Data mashing may be defined as a website or web application that uses content from more than one source
to create a completely new service (see:
http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29)
66
Incomplete versions that function sufficiently to demonstrate a proof of principle and demonstrate that it
can be evolved into something useful in the foreseeable future (see:
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s03.html)
67
For example Google labs (see: http://labs.google.com/)
68
The upper echelons of management are not necessarily best placed to identify the goals and potential benefits
(or the risks) of data mashing.
69
See BBC Backstage as an example of an open mash-up arena (http://backstage.bbc.co.uk/)
70
http://www.mysociety.org/

53
Evaluating a specific approach to better re-use of public sector information RAND Europe

71
Expressions of interest and support for the forum have been received from the London e-science centre
(http://www.lesc.ic.ac.uk/index.html), Advanced Knowledge Technologies (www.aktors.org), University of
Southampton and the Cambridge-MIT Unit (http://www.cambridge-mit.org/cgi-bin/default.pl).

54

You might also like