You are on page 1of 20

Big Data Project Questionnaire

Introduction

This survey has been developed jointly by the United Nations Statistics Division (UNSD) and the United Nations
Economic Commission for Europe (UNECE). Our goal is to provide an overview of active Big Data projects in Official
Statistics in order to facilitate a more informed discussion. The survey has two focuses: sharing broad information about
potential Big Data projects in the statistical community and sharing specific information about partnerships, data
sources, and tools.

This survey addresses individual projects, partnerships, data sources and tools. Please submit it multiple
times - once for each project.

For this survey a fairly wide definition of what "Big Data" is has been adopted:

Big data are data sources with a high volume, velocity and variety of data, which require new tools and methods to capture,
curate, manage, and process them in an efficient way.

The UNECE working classification of types of big data may also help define the range of potential sources of big data
being considered. This is a working classification, and is not expected to be complete, so if you find a missed area
please let us know.

The survey is meant for projects at every stage of development. If your project is still in the idea phase we would like to
hear about it and the data sources and partnerships you are exploring. Just leave any area that is not relevant to you
blank.

At the start of the survey you will have a chance to let us know how widely you are able to share the submitted
information. At a minimum all information submitted will be shared between the survey authors and used in aggregate or
anonymous form at the upcoming International Conference on Big Data for Official Statistics in Beijing and in reports to
the UNECE High-Level Group for the Modernisation of Statistical Production and Services.

If you have any information you would rather email directly, or have a question email tradestat@un.org. Questions may
also be submitted online at the Big Data Inventory Q&A page. Thank you for your time and participation.

PLEASE NOTE: submission for this survey is online only. This PDF copy is
only for reference. Submit answers at:
https://www.surveymonkey.com/s/bigdataproject

Thank you for your time and participation.

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 1


Big Data Project Questionnaire
Organizational Information

Organization:
If there are multiple organizations, then the one leading the project.

Division:
If applicable, the division or subunit of the organization doing the work.

Country:

Point of contact:
Name:

Position:

E-mail:

Can we share your organization and project title publicly?


 Yes, you may share it publicly [published openly online]



 Yes, you may share it with organizations participating in the survey [published online behind password]



 No, do not share this information except in aggregate / anonymous form





Can we share the detailed information you submit?


Please be as open as possible. We are collecting this information primarily to help the wider official statistics
community have an informed discussion. If there are a few details you would like to keep confidential you may
submit them by email instead of including them in this survey.

 Yes, you may share it publicly [published openly online]





 Yes, you may share it with organizations participating in the survey [published online behind password]



 No, do not share this information except in aggregate / anonymous form





Further comments:


ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 2


Big Data Project Questionnaire
Project Information

Project title:
A descriptive title for the project or proposed project. If no official title has been chosen then something that
communicates the main idea.

Project status:
 Idea phase [skip to page 6



 6]

 Proposed (in planning - not yet approved or funded)





 Approved (approved - not yet funded)





 Funded (approved and funded - not yet started)





 Ongoing (in execution phase)





 Completed



ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 3


Big Data Project Questionnaire

Potential areas of use for this project:


Select all that apply.

 Demographic and social statistics (including subjective well-being)





 Economic and financial statistics





 Environmental statistics



 Information society / ICT statistics





 Labour statistics



 Mobility statistics



 Price statistics



 Tourism statistics



 Transportation statistics



 Vital and civil registration statistics





Other domains of official statistics

Would you qualify the project as:


 Exploratory / research



 Pilot with a goal of moving it to production if successful





 For the production of statistics





 Other (please specify)





ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 4


Big Data Project Questionnaire
Project overview:
Include broad information about your project objectives and scope with an emphasis on the implications for
official statistics. Also indicate whether the project is primarily for research purposes or for production of
statistics based on Big Data.
1-3 paragraphs

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 5


Big Data Project Questionnaire
Project Information

Outcomes (for incomplete projects include project goals):


A summary of the results or desired results of the project with an emphasis on the implications for official
statistics. When discussing actual outcomes, please note how detailed the project output, e.g. coordinate
(GPS), regional, or national information updated daily, monthly, or annually.
1-2 paragraphs

Most important lessons learned so far in the project:


These might have to do with methodological issues, project management, training personnel, how to get
funding, the technical tools used in the project, or something else entirely. Essentially the largest challenges
you have faced so far and how you have (or plan to) overcome them.
1-2 paragraphs

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 6


Big Data Project Questionnaire
Project Information

Future directions:
For completed projects: what are your next steps? For projects still in the early stages: discuss upcoming plans
and challenges. Detailed questions about partnerships and data sources appear later in the questionnaire.
1-2 paragraphs

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 7


Big Data Project Questionnaire
Partnerships

Do you have any partnerships with other organizations or data providers on this project?
The partnerships may still be in the very early stages.

 Yes



 [skip
[skiptotopage
page10]
12]

 No



ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 8


Big Data Project Questionnaire
Partnerships

Please discuss any arrangements you have with your primary partner organization. If you have more than one partner on
this project please discuss them in the other comments space at the end of the partnerships section.

Name of partner:
If you do not wish to disclose the name, please supply a working label - e.g. "Partner - Mobile Phone Data
Provider".

Have you already discussed this partner when submitting information about a different
project?
There is no need to enter partner information again if you already have done so on another project - you may
leave the rest of this section blank. But if there are details about the partnership that were specific to this
project that you'd like to provide you may do so.

 Yes (skip the partnerships section) [skip to page 12]





 Yes (do not skip)





 No



If yes please specify the project title:

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 9


Big Data Project Questionnaire
Partnerships

Type of partner organization:


Select all that apply.

 International Organization



 Government



 Commercial



 NGO



 Academia



 Other (please specify)





Type of partnership:
Select all that apply.

 Data provider



 Data consumer / data aggregator (not first origin of data sources)





 Design partner



 Technology partner



 Analytical partner



 Other (please specify)





Current status of the partnership:


We understand that forming a partnership may not fit cleanly into these categories. Please include further
details if required in the 'Other comments' section below.

 In discussion



 Prototyping / Testing (some data partners allow this before a contract is signed)



 Contract in place



 Other (please specify)





ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 10


Big Data Project Questionnaire
Are there any payments or financial arrangements with this partner?
 Yes



 No



 Not applicable / Do not wish to share





Details of the financial arrangements:

Other comments:
Please discuss the organizational arrangements and the history of the partnership if applicable. If you have
other partners on this project you may discuss them here.
1-2 paragraphs

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 11


Big Data Project Questionnaire
Data sources

Do you have any data sources for this project?


 Yes, we already had the data in our organization



 [skip
[skip to
to page
page 12]
14]

 Yes, we have identified a new source and received the data





 [skip
[skip to
to page
page 14]
12]

 Yes, we have a new source and are in discussions with the data provider to obtain the data



 Yes, we have identified a new source, but no discussion with the data provider has taken place



 No specific source has been identified yet





ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 12


Big Data Project Questionnaire
Data sources & analysis (idea / discussion phase)

If there are sources that have been explored, but you still do not have data please discuss
them here:

Please discuss your planned data analysis tools and skills:


For instance, are you considering using R, SAS, Python or other tool(s) for analysis? What tools are you
already familiar with? What are you considering for the data store - local files, hadoop, a nosql database, or a
traditional relational database? Is your preference to run this on your own infrastructure, or on external
infrastructure? Either way, what challenges do you face?

[SKIP TO PAGE 20 - FINAL COMMENTS]

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 13


Big Data Project Questionnaire
Data sources

Name of data source:

Have you already discussed this data source when submitting information about a
different project?
There is no need to enter the information again if you already have done so on another project - you may leave
the rest of this section blank. But if there are details about the data source that were specific to this project that
you'd like to provide you may do so.

 Yes (skip the data sources section)





 [skip to page 19]

 Yes (do not skip)





 No



If yes please specify the project title:

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 14


Big Data Project Questionnaire
Data sources

Data source description:


A brief description of the data source.

Type of Big Data:


Choose the most specific category that describes your data source.
List does not appear in PDF
See: http://www1.unece.org/stat/platform/display/bigdata/Classification+of+Types+of+Big+Data

Who is the provider of the data source?

What is the geographical scope of the data source?


 Local



 Regional



 National



 International



 Other (please specify)





ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 15


Big Data Project Questionnaire
How granular is the information in the data source?
This should correspond to unit of time used to mark individual records. For instance, a weather station might
have a timestamp associated with each observation. But in the data set from the provider the data may be
aggregated and averaged by hour. If multiple levels of granularity are available specify the most detailed and
describe the mix in the data description.

 Timestamp (seconds, milliseconds, or more specific)





 Minutes



 Hours



 Days



 Weeks



 Months



 Years



 Other (please specify)





How frequently are data source updates made available?


You may not consume each update, but the updates are made available for consumption. If the data source
falls between a category choose the higher frequency category, e.g. a data source that posts updates every half
hour can be considered constant.

 Constantly



 Hourly



 Daily



 Weekly



 Monthly



 Quarterly



 Annually



 Nearly static (highly infrequent / no schedule)





 Other (please specify)





ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 16


Big Data Project Questionnaire
Have you established automatic links for transmitting this data source (e.g. API, automatic
file download)?
 Yes



 No



 Other (please specify)





Links to the data source (if available):


If available include both the data source and a link to any data documentation. If there aren't public links but
you would like us to host the files please email tradestat@un.org.
Data (URL):

Documentation (URL):

Is this data source publicly available?


 Yes - accessible to everyone in an easy to use format (CSV, XML, JSON, API, Excel, etc.)



 Yes - accessible to everyone, but requires significant work to reformat (e.g. PDF, screen scraping, etc.)



 No - requires explicit permission and is not publicly posted





Are there any privacy and confidentiality issues related to this data source?
If yes, please provide details about how you have addressed those issues. For instance, did you remove
personal characteristics or change the geographic scope of the data? Was this done by you or by the provider?
Did this degrade the usefulness of the data for analysis?

 No



 Yes (please give details):






ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 17


Big Data Project Questionnaire
Any other comments about this data source or data provider:
Some topics to consider addressing are...
- What were the largest limitations in working with this data source and how did you overcome them?
- What were the most useful levels of aggregation?
- What were the greatest challenges you had working with the data?

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 18


Big Data Project Questionnaire
Data analysis, tools and skills

Do you integrate traditional data sources with the new "Big Data" source discussed
above?
 No



 Yes (please give details):







In your project, what technologies, methods and tools did you use during the Big Data
processing life cycle?
e.g. the SVM implementation in python/scikit-learn to identify likely tourists, and hadoop / mapreduce for pre-
processing aggregation.

Hosting provider and/or partner:


Did you use a 3rd party, such as Amazon, deploy on your own servers or share resources with a partner
organization? If you are comfortable sharing it, approximately how much did this cost?

ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 19


Big Data Project Questionnaire
Final comments

Do you have any other comments you would like to share?


ONLINE SUBMISSION: https://www.surveymonkey.com/s/bigdataproject Page 20

You might also like