Professional Documents
Culture Documents
specific approach
to better re-use of
public sector
information
CHRIS MARSDEN
JONATHAN CAVE
STIJN HOORENS
PM-2169-DFT
iii
Contents
Preface........................................................................................................................ iii
Executive summary.................................................................................................... vii
List of abbreviations.....................................................................................................ix
v
Evaluating a specific approach to better re-use of public sector information RAND Europe
REFERENCES.......................................................................................................... 21
Reference List ............................................................................................................ 23
APPENDICES .......................................................................................................... 27
Appendix 1: Examples of Existing Data Federation Initiatives.................................... 29
1. Department for Transport: Transport Direct............................................... 29
2: Google Laboratory, example of beta ‘data mashing’ community .................. 30
4. Office of National Statistics (ONS) ............................................................. 32
5. DfES: Data linking Children in Care and National Pupil Database ............. 32
6. Her Majesty’s Revenues and Customs proposed data lab ............................. 32
Appendix 2: Interview and Workshop Schedule......................................................... 33
Appendix 3: July 2006 Data Mash Lab Proposal........................................................ 43
ENDNOTES............................................................................................................. 51
vi
Executive summary
Government departments and agencies collect a wide range of data in the course of their
duties. In Chapter 1, we explain that the re-use of such datasets, collectively described as
Public Sector Information (PSI), produces new forms, services or applications. The
possibilities offered by “data mashing”, a particular type of re-use based on certain
published and accepted data standards, have recently received a lot of attention. There is
however a tension between the possibilities offered by data re-use and the barriers to
implementing or even conceiving suitable data transfers and combination. The Cabinet
Office’s Data Grand Challenge has proposed a Data Mashing Laboratory (DML) to
function as a catalyst to test these new ways of data sharing in a confined ‘sandpit’ setting.
In Chapter 2, we explain that the evolution of PSI re-use faces four specific barriers:
technological, socio-institutional, economic and legal hurdles. The barriers identified may
be addressed via a staged approach: 1) Experiment within existing policy initiatives; 2)
Experiment with a new cross-government policy initiative; 3) Experiment with a large-scale
Public-Private Partnerships (PPP); and 4) Redraw the legal and economic environment to
encourage data federation. The DML is an example of an experiment with a new cross-
government policy initiative, and analysis of the barriers towards new ways of PSI sharing
suggests that Stage 2 should be explored before launching into initiatives requiring more
overt and irreversible commitment.
Following interviews with key informants and a review of relevant literature, we offer in
Chapter 3 the following findings and recommendations for the DML:
Position the DML to take advantage of established PSI re-use policy initiatives.
Essential DML outreach activities require pre-defined budgeted resources.
Plan the appropriate mix of retention, return and attrition to staff DML with skilled
and experienced people.
Provide opportunity for an open discussion forum at the outset of DML to establish
Intellectual Property Rights (IPR) rules and ‘terms of engagement’ with PPP.
Rapid DML development of PSI re-use prototypes will be highly dependent on access
to good quality and relevant data.
A range of metrics are suggested to evaluate the impact of the DML.
In Chapter 4, we recommend further research into the legal-economic implications of
greater PSI re-use, to be conducted in parallel with the detailed development of the DML
proposal.
vii
List of abbreviations
ix
Evaluating a specific approach to better re-use of public sector information RAND Europe
KM Knowledge Management
MISC31 Ministerial Committee on Data Sharing
MoD Ministry of Defence
NDPB Non-Departmental Public Body
NGO Non-Governmental Organisation
OECD Organisation for Economic Cooperation and Development
OFCOM Office of Communications
OFT Office of Fair Trading
OFTEL Office of Telecommunications
ONS Office of National Statistics
OPSI Office of Public Sector Information
PPP Public-Private Partnership
PSI Public Sector Information
R&D Research and Development
RDF Resource Description Framework
S&T Science and Technology
SPIRE Spatial Information Repository
VML Virtual MicroData Laboratory
XML Extensible Mark-up Language
x
CHAPTER 1 Government data sharing analysed
1
Evaluating a specific approach to better re-use of public sector information RAND Europe
A particular type of data sharing based on common use of published and accepted Asynchronous
JavaScript and XML (AJAX) software family data standards.
Figure 1 shows schematically that ‘data mashing’ is a type of data federation, which is a
type of data sharing for the definitions of PSI re-use that we have used.
Figure 1: Data Sharing, Re-use, Federation and mashing – A Schematic Representation
Data sharing
Data re-use
Data federation
Data
mashing
Data mashing has become associated with overblown claims as to its potential and current
value through the use by proponents of ‘Web2.0’ services and applications. O’Reilly states:
“The potential of the web to deliver full scale applications didn't hit the mainstream till
Google introduced Gmail, quickly followed by Google Maps, web based applications with
rich user interfaces and PC-equivalent interactivity.”2
This report uses the term ‘data mashing’ to describe any Internet-based federation of two
or more data types using existing tools to remove technical standardisation as a barrier to
service delivery. Several examples of data mashing can be found in Appendix 1. The public
are important re-users as well as consumers of PSI in data mashing. The user is able to
‘pull’ content3 and even adapt and mix content into a user’s own ‘mash-up’. A mash-up is
a combination of existing media reworked into a new and innovative type4. We caution
that this phenomenon is already generating a hype that may prove illusory.
There is a tension between the possibilities offered by PSI re-use and the barriers to
implementing (or even conceiving) this. The evolution of new PSI re-uses faces specific
technological, socio-institutional, economic and legal hurdles. The barriers are gradually
being tested and – in some cases – overcome through public and private initiatives, as seen
in Appendix 1. However, because the barriers take the form of potential dangers and a
perceived imbalance between costs and benefits, there is – at the present stage – a ‘chicken-
and-egg’ problem. Neither the opportunities nor the risks can be fully evaluated without
concrete experience accessible to the broad range of stakeholders whose interests are
affected. But without the participation of a sufficiently broad sample of stakeholders, such
concrete examples as emerge will tend to be limited in scope and/or too narrowly focused
to clarify the current uncertainty that is the chief barrier. The risks are on the one hand
that beneficial forms of data re-use will be deterred and on the other that inappropriate re-
use may occur.
This chapter summarises the distinguished role of government and key specific barriers.
2
RAND Europe The Government Data Mashing Lab
3
Evaluating a specific approach to better re-use of public sector information RAND Europe
suitable data, to expend effort in combining them and to clarify risks, ownership,
discretion and standing.
3. Economic barriers. Once trust and communication have been established among
public-sector stakeholders, it is possible to engage with the market in order to
address the economic hurdles and ensure adequate finance, suitable contractual
arrangements and engagement with demand. The latter is not simply a matter of
marketing, because the innovation surrounding federated data lies as much in how
they are used as in how they are put together. The inherent complexity of
economic barriers reflects not just the range of stakeholders but, especially for
Public Private Partnerships (PPPs), the potential incompatibility of their remit
and objectives, which makes 'efficient contracting' difficult.
4. Legal barriers are slow to reform, depending on the solution of other barriers.
Contractual issues and Intellectual Property Rights (IPRs) play an important part.
4
RAND Europe The Government Data Mashing Lab
adoption curve needs in this process mean that it will be at shortest a medium term project
to move from basic to semantic web standards.
5
Evaluating a specific approach to better re-use of public sector information RAND Europe
6
RAND Europe The Government Data Mashing Lab
s
rrier
a l ba
Leg
rs
arrie
ic b
nom
Eco
nal
itutio
io -inst rs
Towards better re-use Soc barrie
of government data
al
hnic
Tec ers
i
barr
ge 4
Sta e 3
Stag
ge 2
Sta e 1
Stag e0
Stag
Figure 2: Conceptual visualisation of barriers and stages towards better re-use of PSI.
The importance of the cross-cutting lessons arises from the fact that data federation is not a
matter of design, but a collaborative activity. The uses that provide benefits from
federation are often discovered by users of the products; the problems and solutions are
often of general applicability; and the realisation of potential rests on willingness to
participate and testing of perceived barriers and obstacles.
These barriers differ in complexity and in the speed and ease with which they can be
addressed. Note that we do not say ‘overcome’ because it is not obvious that the concerns
underlying the barriers should all be set aside, nor that all forms of data federation are
justified in light of those concerns. Our point is that the key barrier to progress, which
must be overcome, is uncertainty. Socio-institutional issues are essential to scoping
possibilities for data federation – not only may potentially valuable products be missed if
component data are not available, but alternative institutional arrangements cannot be
developed or bench-tested. This leads on to a different way of resolving economic and legal
issues. These are not distinct: IPRs, trading fund status, and charging are all economic
matters enshrined or institutionalised in law. Understanding of the socio-institutional
possibilities could therefore lead to reform of market forces and legal framework.
We therefore identify a need to address the socio-institutional interests in a shared
environment, where serious exploration of realistic possibilities is both feasible and likely.
Such an environment should have suitable ground rules and wide participation by a
combination of direct participants and observers. In this way, it could serve as a test bed
for development of actual products and ‘solutions’ (or mechanisms for solving) common
and crosscutting problems. It could also serve as a simple and understandable proof of
concept, clarifying the potential gains and necessary safeguards both to data owners and to
policy makers.
7
CHAPTER 2 Assessing a staged approach to
barriers
8
RAND Europe The Government Data Mashing Lab
9
Evaluating a specific approach to better re-use of public sector information RAND Europe
2.1.5 Stage 4: Redraw the legal and economic environment to encourage data
federation
A laboratory may isolate and identify the social-institutional barriers and help inform other
barriers31. It is a gateway to a long-term solution and a decision point for addressing
economic and legal barriers32. The radical redrawing of institutional arrangements such as
the Trading Fund agreements with commercially exploitable agency data, and Treasury
calculation of long-term economic benefit from data sharing with the private sector, are
areas that at least require analysis, ideally in conjunction with the 2007 Comprehensive
Spending Review. Legislation change to further implement changes would require
Parliamentary scheduling, and could not feasibly be undertaken before 2009-10.33
10
CHAPTER 3 Scoping the case for a DML
This chapter sets out the case for a government DML as a short- to medium-term step
towards demonstrating the benefits of PSI re-use34. The case for such a protected
environment rests on its institutional context, inputs, activities and outputs. Figure 3
describes these issues, which are then explored in more detail.
Figure 3: Schematic Representation of Data Mash Laboratory
• Data mashing
• Developing software tools,
techniques
• Exploring institutional,
contractual forms
• Identifying, prototyping
products Outputs:
• ‘Making the case’ through • Data mash products
examples, and participation • Software, procedures,
standards
• Incentives for further data
collection, exchange
• Partnerships
• …
Section 3.1 shows the forces creating institutional momentum for such a demonstrable
project. Section 3.2 considers its form and function. Section 3.3 considers its objectives
and the metrics against which its impact can be assessed, and Section 3.4 attempts to
extrapolate future development functions for the DML from interviewee comments. As in
Chapter 2, much of our analysis derives from interview data and workshop presentations
detailed in Appendix 2.
3.1 Placing the DML within government policy towards PSI re-use
In considering the case for a DML, it is an essential first step to analyse recent external
developments affecting the DML’s potential impetus and catalyst for data sharing. On 1
November, the Office of Public Sector Information (OPSI) will merge with The National
11
Evaluating a specific approach to better re-use of public sector information RAND Europe
Archives. OPSI itself was a May 2005 re-branding and repurposing of Her Majesty’s
Stationary Office (HMSO) in response to the July 2005 regulations implementing the
Reuse of PSI Directive 2003. This Directive, which will be reviewed in 2007/8, and the
work of other international bodies, notably the OECD, continue to set a transformative
agenda for PSI35. The OPSI Director states: “we are driving forward a transformation in
the creation, management, representation, dissemination and re-use”36 of data under her
control.
The better use of data sharing has been highlighted by the Better Regulation Executive
(BRE) in the Cabinet Office, inspired by the Hampton Review. They identify two strands
– more efficient data collection, and better use of existing data. Our concern here is the
latter. There is now a Hampton data-sharing group driven by the BRE. Its structure and
role are not yet clear, but it is unlikely to emphasise analysis:
The Cabinet Office has a further interest in data sharing as part of the ‘Transformational
Government’ remit of the Electronic Government Unit (EGU). Together with the
Delivery Unit in the Prime Minister’s Policy Unit, it has established a leading role in
coordinating departmental policy regarding data sharing and outputs for citizens. The
eGov Monitor states: “Without sharing data, delivering seamless services through joined
up government would not only be difficult, but downright impossible to deliver.”37
A further advisory body is the Chief Information Officer (CIO) Council, comprising CIOs
of each government department. The CIO Council decided in September to deal with
knowledge management issues more fully by establishing a Knowledge Council (KC),
chaired by TNA chief executive Natalie Ceeney, and a Delivery Council responsible for
implementing projects requested by that KC38. These bodies have yet to be constituted,
but the suggestion is that the Delivery Council would initially report in to the Prime
Minister’s Delivery Unit, in order to ensure political priority. The KC is intended to
comprise neither CIOs nor Chief Scientific Advisors39, but senior ministerial policy aides.
Figure 4 below shows key central data sharing and knowledge management initiatives. The
DML needs to be positioned to take advantage of these established initiatives
12
RAND Europe The Government Data Mashing Lab
Cabinet Committee
MISC31
13
Evaluating a specific approach to better re-use of public sector information RAND Europe
A terminology that is used by the Ministry of Defence (MoD) is ‘data fusion’. This
describes the merger of two data sets to produce a new product that has greater
functionality than the sum of the two parts. In the MoD sense, data fusion is about fusing
in real time all sorts of information (from sensors, intelligence) to create an overall picture
of the ‘battle space’ at any one time. Given the security and integrity concerns of the MoD,
the term ‘data fusion’ carries no unfortunate implications of broad uncontrolled re-use,
and therefore may be a less value-laden term to employ. However, as it describes the
statistical inference drawn from multiple datasets, it actually describes a more
technologically sophisticated approach than data mashing. Alternatives such as ‘data
federation’ carry confusing comparisons with European federalism. A less value-laden term
would be ‘data meshing’. However, the distinction between ‘mashing’ and ‘meshing’ may
be too subtle to carry more than semantic confusion forward from the current term.
The word ‘laboratory’ also has technologically driven implications, but carries the well-
understood meaning of a controlled environment somewhat insulated from external
influences, which can distort both participation and the ability to draw general lessons
from experience. In this case, these influences include at a practical level the data sets, but
more pertinently to this report the various barriers identified in Chapter 2. The playful
term ‘sandpit’ is used to describe the environment captures the spontaneous and risk-free
character of the activities to be undertaken by comparison with departmental daily
requirements. There is a further issue between the sandpit and the test-bench: whether
people will take the venture seriously. This depends on a balance of what stakeholders are
asked to contribute (inputs in Figure 3) and hope to gain. Thus there should be a credible
path for exploiting results.
We therefore consider that the DML portrays both the benefits of the ‘sandpit’
environment and the comfort which will be provided to assuage existing legal, economic
and other concerns that the experiment will not undertake unnecessarily risky public
activities with PSI data.
14
RAND Europe The Government Data Mashing Lab
15
Evaluating a specific approach to better re-use of public sector information RAND Europe
1. Creating a linked dataset for research use and exploring the use of innovative
linking techniques;
2. Acquiring a wider range of data and using methods developed for data mashing to
expand the usefulness of PSI; and
3. A pilot study for linking identifiable data.
The actual types and uses of PSI in the DML are technically complex and outside the
scope of this report, but it is evident that rapid development of initial prototypes will be
highly dependent on access to good quality and relevant data. Interviewees suggested
various interesting types of data mashing that could be performed, including “indices of
multiple deprivation indicators” that could show departments the linkages between their
respective metrics and deprivation.
3.2.6 Funding
There are no independent figures for the costs of PSI gathering, the expense of incidents
caused by inadequate data sharing, or the potential citizen gains from data mashing46. The
OFT inquiry is expected to estimate the total commercial trading in PSI at over £1billion,
but clearly the costs and benefits of various economic models are worthy of further and
more rigorous investigation47.
Interviewees reiterated that, in comparison with the overall cost of PSI collection, the
amount envisioned for the DML, at £10million over two years, was a relatively trivial
sum48. It was suggested by those with experience of MoD and other larger projects that
some concern might be expressed at under-resourcing, especially if all start-up costs and
ongoing overheads are accounted in the £5m annual costs. The costs of IT systems alone
may be substantial (though mitigated by the easy commercial and indeed free availability
of much AJAX-based software). We note that the software may be free, but the provision
of support for running it is not, and needs to be budgeted. On the other hand, the tools
developed within the DML may be of interest to and reusable by the different participants,
which would support either a ‘public good’ or ‘voluntary contribution’ support model.
16
RAND Europe The Government Data Mashing Lab
17
Evaluating a specific approach to better re-use of public sector information RAND Europe
A further key metric is the successful establishment and operation of the ‘Policy
Observation board’ – both for its internal DML and external cross-government
coordination, and its integration of DML outcomes to law/economics/operational issues
across government.
18
RAND Europe The Government Data Mashing Lab
interdepartmental character of the DML creates new possibilities for crossover innovation
but also new concerns: differences in objectives, constraints and freedom to implement
appropriate contractual and financial participation. This calls for suitable subsidies,
institutional funding and (in-kind) ‘pay or play’ provisions.
Familiar private RTD joint venture issues of organising rights, responsibilities and
liabilities to induce efficient information sharing, effort and exploitation may need to be
clarified for the DML. Use of information creation, sharing and exploitation measures can
help it serve as a test bed for organisational and contractual forms, mechanisms for
matching partners joint business models55.
Data mashing has aspects of both complements and substitutes, which are treated
separately in the literature. Clearly, data used in the final product are complementary, but
substitute components (data, organisational schema, ontologies, interfaces, etc.) may
already exist. This competition may distort development56, while complementarity can
induce free-riding, tipping and an inefficient pace of development57.
A key demand side issue is the difference (if any) between data mashing products and
exploitation of other public assets by “(re)selling on wider markets58.” How should
development costs be covered and should successes subsidise failures? Should competition
among public-sector data-mashing products be encouraged? How should joint costs and
revenues be hypothecated? Should public and private (sector) data be mashed together or
distributed by means of proprietary software or standards? What public-public or public-
private partnerships should govern market exploitation? How should commercial risk be
assessed, underwritten and allocated to public bodies?
Finally, a successful data mashing product may ‘defeat’ rival products and dominate the
market – likely for products offering uniquely authoritative information or network
externalities. Should publicly-derived data mashing products run the risk of driving
competitors out of business, or ‘win’ by triggering private- and public-sector imitators,
derivative or complementary products, etc. leading ultimately to greater value for money –
at least for the consuming public? At issue is the balance of interests between the public
‘owners’ of source data and product users, taking into account public benefits (e.g. more
efficient use of transport). It is an important consideration in DML design because
participants’ expectations influence outcomes and because at some point market-derived
performance metrics will be strongly suggested.
The case for the DML rests on socio-institutional issues, to help change the way
government departments interact with each other to build a common perspective on PSI
re-use. A DML structured and monitored along the lines suggested here is capable of
testing these barriers and, in the process, furthering Transformational Government. But
this is not the end of the process. The DML must reflect its technical, economic and legal
contexts but does not in itself pre-empt their barriers or opportunities. Beyond the
medium term issues lie Stage 3-4 barriers to data federation. The concluding Chapter 4
sets out research projects to address these remaining barriers.
19
CHAPTER 4 Next steps in data sharing
We have set out the problem of PSI re-use, the analysis of the problem, and assessed a
proposal for isolating and managing a specific element of the problem: socio-institutional
barriers to greater PSI re-use. We now identify the thus far intractable barriers to greater
data sharing: legal and economic reform of the trading and sharing environment.
20
REFERENCES
21
Reference List
ARTICLE 29 Data Protection Working Party (2003) Opinion 7/2003 on the re-use of
public sector information and the protection of personal data, 10936/03/EN at
http://ec.europa.eu/justice_home/fsj/privacy/docs/wpdocs/2003/wp83_en.pdf
Askew, D. (2004) SDI Creation At A Thematic And Organisational Level; Experiences
From The UK, Presented At 10th EC GI & GIS Workshop, ESDI State Of The Art,
Warsaw, Poland, 23-25 June, At http://Www.Ec-Gis.Org/Workshops/10ec-
Gis/Papers/24june_Askew.Pdf#Search=%22defra%20spire%22
Barker, Anna (2006) 11 July, presentation to Work and Pensions Economics Group, D.1
Topic: The ONS Session. Restricted and Government Datasets for Research Use: The
Practitioners Corner at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/Barker.ppt
Barr, J. (2006) Web Services 2.0: Best Practices for Extreme Reuse, paper given to
WWW2006 conference 23-26 May, at
http://www2006.org/programme/item.php?id=d12
Cabinet Office (2005) Transformational Government: Enabled by Technology at
http://www.cio.gov.uk/documents/pdf/transgov/transgov-
strategy.pdf#search=%22transformational%20government%22
Cabinet Office/Prime Minister’s Strategy Unit with Department for Trade and Industry
(2005) Connecting the UK: The Digital Strategy, at
http://www.dti.gov.uk/files/file13434.pdf#search=%22connecting%20britain%20the%
20digital%20strategy%22
CIO Council (September 2006) tabled paper, Information and Knowledge Management
Strategy: Overall framework – outline of approach
Cross, Michael (2006) National Archives squares the data circle, Technology Guardian, 14
September at 3,
Darlington, John, Jeremy Cohen, William Lee (undated, mimeo) An Architecture for a
Next-Generation Internet based on Web Services and Utility Computing, London e-
Science Centre
Dunleavy, P. (1989) Paradoxes of an Ungrounded Statism, Chapter 7 in Castles, F.G. (ed)
The Comparative History of Public Policy, Polity Press, Cambridge, at 265-266.
23
Evaluating a specific approach to better re-use of public sector information RAND Europe
eGov Monitor (2006) Data Sharing in Public Sector - Resolving the Conundrum, 11
September, at http://www.egovmonitor.com/node/7533
European Commission SEC (2005) 791 Impact Assessment Guidelines, update 15 March
2006.
Gershon, P. (2004) Releasing Resources for the Frontline: Independent Review of Public
Sector Efficiency, HM Treasury, London at http://www.hm-
treasury.gov.uk/spending_review/spend_sr04/associated_documents/spending_sr04_eff
iciency.cfm
Hampton, P. (2005) Reducing administrative burdens: effective inspection and
enforcement, HM Treasury, London at http://www.hm-
treasury.gov.uk/media/A63/EF/bud05hamptonv1.pdf
Harlow Carol (1997) “Back to Basics: Reinventing Administrative Law”, Public Law 245-
261
Hood, C. (2006) Chapter 22: The Tools of Government in the Information Age , in
Goodin, Robert E., Michael Moran, and Martin Rein (eds)Handbook of Public Policy,
Oxford University Press, Oxford.
Kelly, Frank (2006) Data and innovation – the case for experimentation, Journal of the
Foundation for Science and Technology 19:2, at 14-15
Kingdon, J. (1984) Agendas, alternatives and public policies. Boston: Little Brown.
Lachman, Beth et al (2002) Lessons for the Global Spatial data Infrastructure:
International Case Study Analysis, Documented briefing, RAND Corporation.
Melody, W. H. (1996) The strategic value of policy research in the information economy,
in Dutton: William H. ed. (1996) Information and communication technologies:
Visions and realities, 303-317. London: Oxford University Press.
OECD (2006, 30 March) Digital Broadband Content: Public Sector Information And
Content, at http://www.oecd.org/dataoecd/10/22/36481524.pdf and workshop of 31
May at
http://www.oecd.org/document/17/0,2340,en_2649_37441_36860241_1_1_1_37441
,00.html
Polak, J. (2006) Presentation to Cambridge-MIT Institute workshop.
Pollock, R. (2006) The Value of the Public Domain, July, Institute of Public Policy
Research, London.
Ritchie, Felix (2006) 11 July, presentation to Work and Pensions Economics Group, D.1
Topic: The ONS Session. Restricted and Government Datasets for Research Use: The
Practitioners Corner, at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/RItchie.ppt
Towers Perrin (2001) Report for Regulatory Stteering Group: Ofcom Scoping Project, at
http://www.ofcom.org.uk/static/archive/Oftel/publications/about_oftel/2001/towe100
1.pdf
24
RAND Europe References
Tullo, Carol (2006) Unlocking the potential of public sector information, Public Servant,
October, at p35.
Weiss, P. (2002) Borders in Cyberspace: Conflicting Public Sector Information Policies
and their Economic Impacts, US Department of Commerce, and comments of Mike
Liebhold at OS Terra Future conference, Southampton, 19 September 2006, for
instance.
25
APPENDICES
27
RAND Europe Appendix 1
29
Evaluating a specific approach to better re-use of public sector information RAND Europe
30
RAND Europe Appendix 1
31
Evaluating a specific approach to better re-use of public sector information RAND Europe
32
RAND Europe Appendix 2
A series of interviews was conducted to inform this report. The organisations interviewed and the
dates of the interviews are listed in the table below.
Table 2: Organisations interviewed and dates of interview.
Date Organisation
7 Sept Access to Knowledge
11 Sept Office of National Statistics
11 Sept Department for Trade and Industry
12 Sept Cambridge-MIT Institute workshop
13 Sept Office of Public Sector Information
13 Sept Meteorological Office
19 Sept Terra Future conference
19 Sept Ordnance Survey
20 Sept DEFRA
22 Sept E-Government Unit
20 Sept Information Commission
20 Sept ESRC
27 Sept IBM
22 Sept Openstreetmap.org
6 October Cambridge University Centre for Mathematical Sciences
3 October Her Majesty’s Stationary Office
33
Evaluating a specific approach to better re-use of public sector information RAND Europe
34
RAND Europe Appendix 2
TERRA FUTURE
This conference was held on 19 September 2006 | Ordnance Survey, Southampton, SO16 4GU
The event looked at the impact of future trends on information businesses and invited more than 130
thought leaders from business, government and academia to express their views on new and evolving
technologies, societal change and consumer demands.
Keynote speaker Sir Tim Berners-Lee, inventor of the World Wide Web, opened the event exploring how
the semantic web – an automated extension of the web using machine-readable information to share and
reuse data – has the potential to boost its reach and functionality: “Everything can be given a uniform
resource identifier (URI), which describes concepts as well as objects. Translating your data into Resource
Description Framework (RDF) language means you can explain what it does, make it available and connect
to other people.” More
Keynote speaker: Sir Tim Berners-Lee, Inventor of the World Wide Web
Other speakers were:
John Darlington
Daniel Erasmus
Leticia Gutierrez Villarías
Mike Liebhold
Glenn Lyons
Robin Mannings
Sheila Moorcroft
Dr. Tracy Ross
Jens Jacobsen
Dr. Cathy Dolbear
Sir Tim Berners-Lee, inventor of the World Wide Web, will introduce and inspire debate on the future of
location information. Key themes will include the future of the World Wide Web and the growing
importance of geographic information (GI).
GI is stimulating new uses of the World Wide Web, evolving existing applications and underpinning the
creation of new ones to adapt to global trends. I am delighted to be addressing the attendees at Terra future
and anticipate a productive and inspiring debate between those driving the development of location data
and the information businesses looking to embrace it. Tim Berners-Lee
--------------------------------------------------------------------------------
Mike Liebhold is a Senior Researcher for the Institute for the Future (IFTF), California, USA, focusing on
proactive, context-aware and ubiquitous computing including the social implications and technical
evolution of a geospatial web. Most recently, Mike was a producer and program leader for the Technology
Horizons New Geography Conference at the Presidio in San Francisco. Previously, Mike was a visiting
Researcher at Intel® Labs working on a pattern language based on semantic web frameworks for ubiquitous
computing. Mike is also co-author of Proactive Computing through Patterns of Activity and Place,
publication pending. In the 1980s and early 1990s at Apple® Advanced Technology Labs, Mike led the
Terraform project - an investigation of cartographic and location-based hypermedia. Mike also led the
launch of strategic partnerships with National Geographic®, Lucasfilm, Disney®, MIT, AT&T Bell Labs
and others. As Chief Technology Officer for Times Mirror Publishing, Mike helped launch over 20
professional and consumer web content services, led very early large-scale Intranet designs and then worked
as a senior consulting architect at Netscape. During the late 1990s Mike worked on start-ups, building
large-scale international public IT services and IP networks for rural and remote regions in China, India,
35
Evaluating a specific approach to better re-use of public sector information RAND Europe
Europe and Latin America. Mike occasionally publishes his thoughts about micro-local and geospatial
computing on his web log at http://www.starhill.us.
--------------------------------------------------------------------------------
Leticia Gutierrez Villarías is the DIP Ontology Engineer at Essex County Council (ECC). DIP
(http://dip.semanticweb.org/), a European Integrated Project running for 3 years, aims to produce a new
technology infrastructure for Semantic Web Services (SWS). ECC leads the eGovernment use case,
identifying real eGovernment scenarios which may benefit from these new technologies and implementing
them in order to prove their usefulness. We are currently working on a GIS-based emergency planning
system which combines SWS and GIS technologies together in order to facilitate the automation of
information sharing among different governmental organizations and other partners based on a spatial
point of view during an emergency situation.
--------------------------------------------------------------------------------
John Darlington has over 20 years in the software industry working for companies including IBM,
Microsoft and Sony. More recently he has helped establish and grow a number of technology startup
companies. He is currently working with the University of Southampton to help engage business and
government in adopting semantically rich web services. One of the projects he manages is the AKTive PSI
project which aims to explore what is possible with a broad range of public sector information, using recent
advances in web-based information technologies.
--------------------------------------------------------------------------------
Ed Parsons is Ordnance Survey's Chief Technology Officer and is responsible for all IT operations at the
national mapping organisation, including the development and implementation of the IT strategy to
underpin all business activities. Ed also manages Ordnance Survey’s web presence and is in charge of its
geospatial management. He also leads the organisation’s Research Group, charged with exploring and
developing Ordnance Survey’s long-term future. Ed has worked in the GI and LBS industry throughout his
career.
--------------------------------------------------------------------------------
Daniel Erasmus has, for the last 10 years, been facilitating scenario processes to a diverse body of clients
across 3 continents. He has worked with a range of private and public sector clients including Nokia,
Rabobank, the city Rotterdam , the Rijksgebouwendienst, Schlumberger, Telenor, Vodafone, etc. Visit the
DTN’s web site for more detailed information (www.dtn.net). Daniel’s first web site, the Van Gogh
Gauguin experience received a Cannes Nomination, and ID magazine bronze prize. He is a board member
of the European Internet Archive, the foundation Reflecting, and co-developer of Ci’Num.
36
RAND Europe Appendix 2
AGENDA
DATA-MASHING WORKSHOP
37
Evaluating a specific approach to better re-use of public sector information RAND Europe
14:00
Close
Speaker profiles
Frank Kelly has been DfT's Chief Scientific Adviser since August 2003, with responsibility for the quality
of science and scientific advice. He is also Professor of the Mathematics of Systems at Cambridge
University where his main research interests are in random processes, networks and optimisation.
38
RAND Europe Appendix 2
Ito! is a UK registered company providing web based mapping, movies and data management services for
the transport professional and for the transport user. Our services are based on an advanced multi-modal
transport model of the UK’s transport system including both roads and public transport visualised using
state of the art special effects techniques. Ito! was founded by Peter Miller and Hal Bertram with previous
experience in the transport sector and film industry.
39
Evaluating a specific approach to better re-use of public sector information RAND Europe
40
RAND Europe Appendix 2
41
RAND Europe Appendix 3
Government collects and uses a wide range of data to both inform and deliver its policies.
This data is generally used for specific purposes and is rarely made easily accessible for
other uses. Yet data held by government for one purpose can offer immense benefits in the
delivery of other services, particularly when combined or 'mashed' with data from other
sources.
No single data collector or user, government departments included, can reliably predict how
data may be used when combined with data from other sources. Realising these benefits,
therefore, requires permitting greater access to data in order to permit experimentation in
developing innovative data applications. Among the obstacles to improving access are
regulatory and administrative barriers, poor incentives and limited awareness and expertise
across government.
The challenge of realising new data applications is not unique to the public sector. Within the
private sector there has been a trend away from a highly controlled development from
concept to finished product, towards a more iterative approach where the rapid development
66
of beta version products is followed by testing and further modification of concept and
67
design . The engagement of a diverse stakeholder community during conception,
development and testing is essential to success. Such an approach allows the gradual
evolution of a product shaped by the stakeholder community. As well as assisting the
development of identified data applications, the approach has the additional advantage of
helping to identify unforeseen applications for data.
Government could benefit from finding new way of engaging the existing capability of the non-
government sector in delivering the potential benefits of data mashing. To do so will require
adopting more flexible ways of working, particularly in terms of the commissioning and
management of projects:
• not seeking to define final data applications but allow experimentation and the gradual
evolution of applications;
• recognizing the potential added value of suitably anonymised official data being made
available for mashing with other data sources.
43
Evaluating a specific approach to better re-use of public sector information RAND Europe
44
RAND Europe Appendix 3
iii. Overview panel - to ensure accountability, the activities of the forum will be subject
to regular scrutiny by a cross-government overview panel drawn primarily from key
government stakeholders. The panel will have no involvement in the day-to-day
running of the forum and will convene every six months.
It is expected that governance arrangements will evolve during the life of the forum as lessons
are learnt and better ways of working developed.
Way of working
As well as engaging cross-government interest and support, the forum must be capable of
exciting the interest of a broad range of academic researchers and developers in the private
and not-for-profit sectors.
i. Public-facing - in time the activities of the forum will be highly visible and public
access to projects deliverables encouraged. Access to products, and the development
of appropriate e-tools, allowing dialogue between customers, developers and the
public will be essential drivers of innovation. It will have the added benefit of
demonstrating government commitment to exploring innovative ways to deliver
services of social benefit.
ii. Stakeholder engagement at all stages - strategic thinking is done up-front to ensure
that all parties have a common vision. It will seek to engender a dialogue allowing both
developers to demonstrate applications to potential policy customers and policy
customers to communicate their needs
iii. Rapid decision-making with minimal administration - ideas must not be killed off by
bureaucratic procedures and premature analytic criticism. Guidelines for selecting
ideas areas are however needed in terms of a business case that outlines the
evidence that the innovation is likely to succeed; suggests how the idea can be
developed; identifies potential benefits commensurate with development costs
Financial implications
It is proposed that the lab is granted a ‘bedding in period’ of two years for the Unit, with
guaranteed funding of £10m, to enable experimentation and the creation of expertise to be
completed. During this period consideration should be given to allowing access to the
resources by private sector organisations. Private sector players could explore the potential
for translating innovation through data-mashing into novel products and services. If this
proves possible, the relationship would be governed by contractual and licensing a regime.
45
Possible structure for a Government Data mashing lab
Key Facilitators
Gov Policy Customers •Information Government & Other Data Holders
Gov departments and Commissioner; Government: ONS, OS, Met Office, Land
NDGBs •Legal advice (DCA?); Registry, OGDs, etc
•OPSI (copyright); Others: Private sector, Data Archive,
Communicate international
•e-gov unit
needs
Provides
Communicates advice
obstacles Request / negotiate Provide data access
access
Communicate
'In-house' Analysts proposals MASHING LAB
•Analyst staff from Management Group
across government Consults / Contract Communicate, Negotiate, & Provides oversight
•Multi-disciplinary Commission Oversight Panel
Advisers…
TO include external secondeesa
Reports to…
CONTRACT
SUPPLY
Private Sector
Established ICT, SMEs and start-ups
47
Way forward
Key elements required for success are:
• A mandate to build this experimental approach, accepting levels of uncertainty
on outcomes in the belief that this will lead to value-added services.
• A licence to make these experiments public.
• Cross-government support in identifying a management team, access to
financial resources, hosting the Unit, providing Ministerial support, and
developing a stakeholder group.
Establishing a management group and governance
An initial management group will be established that will include representatives from
across government and from the not-for-profit, research and private sectors. The group
will be tasked with defining the proof of concept and terms of reference, and providing a
sounding board during the bedding in period over the first 12-24 months.
Assuming the Unit succeeds in achieving a commercially viable model (including
Departmental support for engagement with not-for-profit sectors), the governance of
the Unit should be reviewed after five years with a view to changing it location and
nature.
Establish funding
Initial core funding: it is anticipated that the lab will receive core funding of £10m from
the science budget during a 'bedding-in' period of two years.
Product licensing: the management group will explore how, like some public sector
research establishments, the labs could become largely self-funding from being
allowed to alter and license its experiments for private sector use.
Co-funding: It is anticipated that the lab will attract significant co-funding from other
sources to support specific application development and advance its core functions as
appropriate.
Building a stakeholder group
In building a stakeholder group, the management group will focus on engaging:
• Government organisations - as direct policy customers of applications or as
bodies responsible for data use issues (e.g.: DCA, OPSI, e-government unit,
69
BBC ). Also as a source for short-term secondments into the Unit of technical
specialists and policy owners.
• Private sector - the relative autonomy of application-level software from
underlying infrastructure layers and the relatively low entry-costs makes the
sector particularly attractive to SMEs, start-ups and venture capital investment.
• Voluntary sector – the forum will seek to engage the innovative ability of a
sector that has already been demonstrated through the development of
70
applications by MySociety following a commission by DfT.
• Research community – the forum will seek to remove the non-technical barriers
71
that inhibit the engagement of the industrial and academic research
community to application development and development of middleware data
mashing tools.
All stakeholders will be encouraged to identify and promote potential mashing
applications and public service needs to encourage the development of innovative
solutions. The management team, enabled through appropriate secondments, will
ensure that work of the lab retains high degrees of relevance to policy and commercial
objectives that exist or evolve over the lifetime of the project. To heighten awareness
and encourage stakeholder engagement the management team will consider
establishing a “mash-up competition” granting awards for the most innovative mash-
ups of public sector data.
49
Evaluating a specific approach to better re-use of public sector information RAND Europe
50
ENDNOTES
i
For more information on RAND Europe, please see: www.randeurope.org
2
See: http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
3
This development has similarities with the overall Creative Commons movement for attributable non-
commercial copyright. See Lessig, L. (2005) Free Culture: The Nature and Future of Creativity. New York:
Penguin Books.
4
A term used in relation to the Internet only since 2004, its best description remains that on Wikipedia, itself
an exemplar of user-generated content: http://en.wikipedia.org/wiki/Mashup_%28
web_application_hybrid%29
5
For wider public domain arguments, see Pollock, R. (2006) The Value of the Public Domain, July, Institute
of Public Policy Research, London.
6
For instance, the comparative country case studies found at: http://www.appsi.gov.uk/reports/research.htm
7
See the market study page at: http://www.oft.gov.uk/Business/Market+studies/commercial.htm
8
See Guardian story of April 2006 at: http://technology.guardian.co.uk/weekly/story/0,,1752262,00.html
9
See OECD (2006, 30 March) Digital Broadband Content: Public Sector Information And Content, at
http://www.oecd.org/dataoecd/10/22/36481524.pdf and workshop of 31 May at
http://www.oecd.org/document/17/0,2340,en_2649_37441_36860241_1_1_1_37441,00.html
10
Updates on implementation available at
http://europa.eu.int/information_society/policy/psi/implementation/index_en.htm#psigroup
11
Lachman, Beth et al (2002) Lessons for the Global Spatial data Infrastructure: International Case Study
Analysis, Documented briefing, RAND Corporation.
12
A very narrow sense of ontology –in this context, it is a formal specification of how to represent objects.
13
Insert ontology explanation
14
See http://www.aktors.org/people/
15
See Darlington, John, Jeremy Cohen, William Lee (undated, mimeo) An Architecture for a Next-Generation
Internet based on Web Services and Utility Computing, London e-Science Centre
16
Berners-Lee, Tim (2006) Presentation to Terra Future conference, 19 September, at
http://www.w3.org/2006/Talks/0919-os-tbl/
17
For a list of organisations interviewed, see Appendix 2.
18
http://www.oft.gov.uk/Business/Market+studies/commercial.htm
19
Available at: http://www.opsi.gov.uk/ACTS/acts2000/20000036.htm
20
Rights embodied in Directive EC/46/95 and the Data Protection Act 1998.
21
Privacy concerns include risks arising from re-use by other parties with whom data subjects may not have the
above-described ‘informed consent’ relation. See specifically ARTICLE 29 Data Protection Working Party
(2003) Opinion 7/2003 on the re-use of public sector information and the protection of personal data ,
10936/03/EN at http://ec.europa.eu/justice_home/fsj/privacy/docs/wpdocs/2003/wp83_en.pdf
22
See in this regard the over-arching policies described in:
Hampton, P. (2005) Reducing administrative burdens: effective inspection and enforcement, HM
Treasury, London at http://www.hm-treasury.gov.uk/media/A63/EF/bud05hamptonv1.pdf
51
Evaluating a specific approach to better re-use of public sector information RAND Europe
Gershon, P. (2004) Releasing Resources for the Frontline: Independent Review of Public Sector
Efficiency, HM Treasury, London at http://www.hm-
treasury.gov.uk/spending_review/spend_sr04/associated_documents/spending_sr04_efficiency.cfm
Cabinet Office (2005) Transformational Government: Enabled by Technology at
http://www.cio.gov.uk/documents/pdf/transgov/transgov-
strategy.pdf#search=%22transformational%20government%22
Cabinet Office/Prime Minister’s Strategy Unit with Department for Trade and Industry (2005)
Connecting the UK: The Digital Strategy, at
http://www.dti.gov.uk/files/file13434.pdf#search=%22connecting%20britain%20the%20digital%20strat
egy%22
23
See European Commission SEC (2005) 791 Impact Assessment Guidelines, update 15 March 2006.
24
Spatial Information Repository.
25
See Askew, D. (2004) SDI Creation At A Thematic And Organisational Level; Experiences From The UK,
Presented At 10th EC GI & GIS Workshop, ESDI State Of The Art, Warsaw, Poland, 23-25 June, At
http://Www.Ec-Gis.Org/Workshops/10ec-Gis/Papers/24june_Askew.Pdf#Search=%22defra%20spire%22
26
Askew (2004) at 10.
27
KCL did some of this – add URL FELIX to supply mid-October
28
Cite URL and research paper October FELIX
29
cite NAO ‘joint targets’ report.
30
Including those of Trading Funds, such as Ordnance Survey (OS) and Meteorological Office (‘Met Office’).
31
Kelly, Frank (2006) Data and innovation – the case for experimentation, Journal of the Foundation for
Science and Technology 19:2, at 14-15
32
See Harlow Carol (1997) “Back to Basics: Reinventing Administrative Law”, Public Law 245-261
33
Assuming no dissolution of Parliament before the end of the 2008-9 session.
34
We do not set out a ‘blueprint’ for the lab – for an example of such a report, see Towers Perrin (2001)
Report for Regulatory Steering Group: Ofcom Scoping Project, at
http://www.ofcom.org.uk/static/archive/Oftel/publications/about_oftel/2001/towe1001.pdf
35
On motivations see Kingdon, J. (1984) Agendas, alternatives and public policies. Boston: Little Brown.
36
Tullo, Carol (2006) Unlocking the potential of public sector information, Public Servant, October, at p35.
37
See eGov Monitor (2006) Data Sharing in Public Sector - Resolving the Conundrum, 11 September, at
http://www.egovmonitor.com/node/7533
38
CIO Council (September 2006) tabled paper, Information and Knowledge Management Strategy: Overall
framework – outline of approach
39
Note the coordination role of Coordination of Research and Analysis (CRAG) Group:
http://www.gsr.gov.uk/gsr_network/crag_members.asp
40
See Hood, C. (2006) Chapter 22: The Tools of Government in the Information Age , in Goodin, Robert E.,
Michael Moran, and Martin Rein (eds)Handbook of Public Policy, Oxford University Press, Oxford.
41
On information economy research in particular, see Melody, W. H. (1996) The strategic value of policy
research in the information economy, in Dutton: William H. ed. (1996) Information and communication
technologies: Visions and realities, 303-317. London: Oxford University Press.
42
See Barr, J. (2006) Web Services 2.0: Best Practices for Extreme Reuse, paper given to WWW2006
conference 23-26 May, at http://www2006.org/programme/item.php?id=d12
43
Such transitions need not be an unmitigated loss; they can provide ‘intelligent’ partners for public-private
initiatives, disseminate good practice and mobilise competitive forces that increase efficiency and bolster
demand for (and benefits from) data mashing in wider markets.
44
This concern was raised by one interviewee.
45
Paul David OII on e-science and patent pooling for basic research
46
An excellent source on the paramouncy of economic analysis by Treasury in British central government is
Dunleavy, P. (1989) Paradoxes of an Ungrounded Statism, Chapter 7 in Castles, F.G. (ed) The Comparative
History of Public Policy, Polity Press, Cambridge, at 265-266.
47
See Weiss, P. (2002) Borders in Cyberspace: Conflicting Public Sector Information Policies and their
Economic Impacts, US Department of Commerce, and comments of Mike Liebhold at OS Terra Future
conference, Southampton, 19 September 2006, for instance.
52
RAND Europe Endnotes
48
Alternate claims that the BBC Backstage cost £150,000 were dismissed by interviewees as external costs
rather than the internal BBC costs in staffing, overheads, IT support etc. necessary for the project. See
http://backstage.bbc.co.uk/
50
This might borrow from OGC practice with an initial provision of good practice examples and resources and
leading on to a roadmap or reference for data mashing initiatives. Adherence to such a roadmap might also
provide a solid basis for inter-organisational initiatives.
51
This refers to an open forum for joint authoring: the DML could support Wikis on institutional, economic,
technical and other cross-cutting areas to assure that engagement is sustained and that valid ‘peripheral’ outputs
are produced.
53
IP issues include: i) sharing returns for commercial data mashing products; ii) valuation of legacy intellectual
property as opposed to products of collaborative or parallel activity; and iii) the distinction (if any) among
rights to information, compendia (per se database protection), interface and other software, rights to specific
uses or channels of distribution, etc.
54
The liability issues include financial liability to third-party rights holders and liabilities arising as a result of
the development and exploitation of data mashing products – for instance, liability for incompleteness, error,
etc. The situation is complicated by legal issues (e.g. the extent to which public information can be relied on
for different purposes), but derives its force from the potential economic consequences.
55
Alternatives range from the sort of ‘internal stock or options markets’ used by large firms such as General
Motors to reallocate research funding to specifically designed auctions for rights to contribute to or exploit
joint products once their characteristics have been clarified. Such mechanisms may be needed to prevent
distortion of data mash product design by strategic cooperative and effort incentives.
56
if, for instance, the owner of the ‘winning’ interface, etc. has lower costs of contributing to or exploiting the
final product or is able to claim a greater share of the joint proceeds
57
Katz and Shapiro (1986).
58
HM Treasury - Selling Into Wider Markets: A Policy Note for Public Bodies -Dec 2002
lix
See Cross, Michael (2006) National Archives squares the data circle, Technology Guardian, 14 September at
3, describing CEO Natalie Ceeney’s plans to enter into PPP arrangements to digitise census data, and the use
of search techniques using technology such as the Autonomy IDOL server.
60
taken from http://www.transportdirect.info
61
Ritchie, Felix (2006) 11 July, presentation to Work and Pensions Economics Group, D.1 Topic: The ONS
Session. Restricted and Government Datasets for Research Use: The Practitioners Corner, at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/RItchie.ppt
62
http://www.statistics.gov.uk/about/bdl/
63
Reference, VML Annual Report 2005/6 at p
64
Barker, Anna (2006) 11 July, presentation to Work and Pensions Economics Group, D.1 Topic: The ONS
Session. Restricted and Government Datasets for Research Use: The Practitioners Corner at
http://www.york.ac.uk/res/wpeg/refereeing2006/papers20006/Barker.ppt
65
Data mashing may be defined as a website or web application that uses content from more than one source
to create a completely new service (see:
http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29)
66
Incomplete versions that function sufficiently to demonstrate a proof of principle and demonstrate that it
can be evolved into something useful in the foreseeable future (see:
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s03.html)
67
For example Google labs (see: http://labs.google.com/)
68
The upper echelons of management are not necessarily best placed to identify the goals and potential benefits
(or the risks) of data mashing.
69
See BBC Backstage as an example of an open mash-up arena (http://backstage.bbc.co.uk/)
70
http://www.mysociety.org/
53
Evaluating a specific approach to better re-use of public sector information RAND Europe
71
Expressions of interest and support for the forum have been received from the London e-science centre
(http://www.lesc.ic.ac.uk/index.html), Advanced Knowledge Technologies (www.aktors.org), University of
Southampton and the Cambridge-MIT Unit (http://www.cambridge-mit.org/cgi-bin/default.pl).
54