Professional Documents
Culture Documents
NSO.
Experimental Indicators in Sentiment and Mobility
The ESS Big Data Workshop 2016
Objective
To share the experience of INEGI in the use of Twitter as a
big data source.
Types of Big Data Sources
Meters and smart sensors. Traffic cameras, GPS devices, power
consume meters, IoT, smartwatches, smartphones, etc.
Social interactions. Conversations and publications on social
networks like Twitter, Facebook, FourSquare, etc.
Business transactions. Credit cards movements, scanned data,
cell phone records, etc.
Electronic files. Documents which are available in electronic
formats such as PDF files, websites, videos, audio, images, photos,
etc.
Broadcast media. Digital video and audio streamed on real-time
Types of Big Data Sources
Meters and smart sensors. Traffic cameras, GPS devices,
power consume meters, IoT, smartwatches, smartphones, etc.
Social interactions. Conversations and publications on social
networks like Twitter, Facebook, FourSquare, etc.
Business transactions. Credit cards movements, electronic
cash registers, cell phone records, etc.
Electronic files. Documents which are available in electronic
formats such as PDF files, websites, videos, audio, digital media
broadcasting
Broadcast media. Digital video and audio streamed on real-time
Process Followed by INEGI (until now)
Production process
Study Case
Tweets Minimal
Representation
Daily
Processing ~260 Millions
Free Access (4 a.m.) of Geo-Tweets
300 K ~130 Millions inside Mexico
Geo-Tweets ~ 2 Years and 8 Months ~ 24/7
Apache Spark
LOGSTASH
Clean & Sentiment
Location Query Analysis
https://twittercommunity.com/t/stream-filter-sample-size/30865
Software Stack (2016)
Tweet Structure
Tweets Map
Tweets Map
We found that
The JSON structure is easy to process (APACHE SPARK)
The content is text that we can examine to make the
sentiment analysis
Geographical coordinates can be used to filter the tweets
and obtain only those of interest (warning: not all the
tweets are geo referenced)
We can make mobility analysis, based in Tweets location
and time (a serendipity)
Exploring Big Data Base
Supervised Training
http://cienciadedatos.inegi.org.mx/animotuitero/
Python
Implementation
Sentiment Visualization (2015)
(Monthly Indicator)
{JSON:File}
http://www.inegi.org.mx/inegi/contenidos/investigacion/experimentales/animotuitero/default.aspx
Sentiment Visualization (Late 2016)
(Daily Indicator)
{NoSQL}
C#
{RESTful:API}
Mobility (Late 2016)
Integration of other sources
Big Spatial Join
(5 M Economic Units with +60 M Tweets)
https://github.com/syoummer/SpatialSpark
Some applications
Tourism Impact analysis of relevant
Migration news
Use of roads Mental health
Regional influence of big cities Misogynist/discriminatory
Mobility patterns language use
Business activity patterns SDG indicators?
Subjective wellbeing
Inequities impact
Collaboration
International
UNECE
ICHEC
UNSD
LAMBDoop
University of Pensylvania
National
KioNetworks
Dattlas
TecMilenio
INFOTEC
Centro Geo
CIDE
CIMAT
Sectur
Internal
INEGI General Directorates
Questions?
Abel.Coronado@inegi.org.mx
M.Sc. Abel Coronado
@abxda
Conociendo Mxico
01 800 111 46 34
www.inegi.org.mx
atencion.usuarios@inegi.org.mx