Professional Documents
Culture Documents
Introduction
This survey has been developed jointly by the United Nations Statistics Division (UNSD) and the United Nations
Economic Commission for Europe (UNECE). Our goal is to provide an overview of active Big Data projects in Official
Statistics in order to facilitate a more informed discussion. The survey has two focuses: sharing broad information about
potential Big Data projects in the statistical community and sharing specific information about partnerships, data
sources, and tools.
This survey addresses individual projects, partnerships, data sources and tools. Please submit it multiple
times - once for each project.
For this survey a fairly wide definition of what "Big Data" is has been adopted:
Big data are data sources with a high volume, velocity and variety of data, which require new tools and methods to capture,
curate, manage, and process them in an efficient way.
The UNECE working classification of types of big data may also help define the range of potential sources of big data
being considered. This is a working classification, and is not expected to be complete, so if you find a missed area
please let us know.
The survey is meant for projects at every stage of development. If your project is still in the idea phase we would like to
hear about it and the data sources and partnerships you are exploring. Just leave any area that is not relevant to you
blank.
At the start of the survey you will have a chance to let us know how widely you are able to share the submitted
information. At a minimum all information submitted will be shared between the survey authors and used in aggregate or
anonymous form at the upcoming International Conference on Big Data for Official Statistics in Beijing and in reports to
the UNECE High-Level Group for the Modernisation of Statistical Production and Services.
If you have any information you would rather email directly, or have a question email tradestat@un.org. Questions may
also be submitted online at the Big Data Inventory Q&A page. Thank you for your time and participation.
PLEASE NOTE: submission for this survey is online only. This PDF copy is
only for reference. Submit answers at:
https://www.surveymonkey.com/s/bigdataproject
Organization:
If there are multiple organizations, then the one leading the project.
Division:
If applicable, the division or subunit of the organization doing the work.
Country:
Point of contact:
Name:
Position:
E-mail:
Yes, you may share it with organizations participating in the survey [published online behind password]
Yes, you may share it with organizations participating in the survey [published online behind password]
Further comments:
Project title:
A descriptive title for the project or proposed project. If no official title has been chosen then something that
communicates the main idea.
Project status:
Idea phase [skip to page 6
6]
Completed
Environmental statistics
Labour statistics
Mobility statistics
Price statistics
Tourism statistics
Transportation statistics
Future directions:
For completed projects: what are your next steps? For projects still in the early stages: discuss upcoming plans
and challenges. Detailed questions about partnerships and data sources appear later in the questionnaire.
1-2 paragraphs
Do you have any partnerships with other organizations or data providers on this project?
The partnerships may still be in the very early stages.
Yes
[skip
[skiptotopage
page10]
12]
No
Please discuss any arrangements you have with your primary partner organization. If you have more than one partner on
this project please discuss them in the other comments space at the end of the partnerships section.
Name of partner:
If you do not wish to disclose the name, please supply a working label - e.g. "Partner - Mobile Phone Data
Provider".
Have you already discussed this partner when submitting information about a different
project?
There is no need to enter partner information again if you already have done so on another project - you may
leave the rest of this section blank. But if there are details about the partnership that were specific to this
project that you'd like to provide you may do so.
No
International Organization
Government
Commercial
NGO
Academia
Type of partnership:
Select all that apply.
Data provider
Design partner
Technology partner
Analytical partner
In discussion
Prototyping / Testing (some data partners allow this before a contract is signed)
Contract in place
No
Other comments:
Please discuss the organizational arrangements and the history of the partnership if applicable. If you have
other partners on this project you may discuss them here.
1-2 paragraphs
Yes, we have a new source and are in discussions with the data provider to obtain the data
Yes, we have identified a new source, but no discussion with the data provider has taken place
If there are sources that have been explored, but you still do not have data please discuss
them here:
Have you already discussed this data source when submitting information about a
different project?
There is no need to enter the information again if you already have done so on another project - you may leave
the rest of this section blank. But if there are details about the data source that were specific to this project that
you'd like to provide you may do so.
No
Regional
National
International
Minutes
Hours
Days
Weeks
Months
Years
Constantly
Hourly
Daily
Weekly
Monthly
Quarterly
Annually
No
Documentation (URL):
Yes - accessible to everyone, but requires significant work to reformat (e.g. PDF, screen scraping, etc.)
Are there any privacy and confidentiality issues related to this data source?
If yes, please provide details about how you have addressed those issues. For instance, did you remove
personal characteristics or change the geographic scope of the data? Was this done by you or by the provider?
Did this degrade the usefulness of the data for analysis?
No
Do you integrate traditional data sources with the new "Big Data" source discussed
above?
No
In your project, what technologies, methods and tools did you use during the Big Data
processing life cycle?
e.g. the SVM implementation in python/scikit-learn to identify likely tourists, and hadoop / mapreduce for pre-
processing aggregation.