You are on page 1of 31

BIG DATA AND PREDICTIVE ANALYTICS

Chapter 1
INTRODUCTION

In the recent years Big Data Analytics has emerged as an important area of
interest among practitioners and academicians. Exponential growth of digital devices,
penetration of internet, tablet computers and smart phones are spawning large volumes of
data round the clock. Contrary to traditional data, Big Data comes from variety of data
sources in different forms. The volume, variety and velocity of this data pose unique
challenges for those managing data centers. Nevertheless, computing, storage and
analysis capabilities have caught up to meet these challenges. Storage of large datasets
has become easy and economical.
Along with traditional business data, firms are realizing value from social media
data obtained from sites such as Twitter or Facebook. These mediums have exhibited
potential of gathering business intelligence required for designing competitive strategies.
In this paper, we have narrated different ways in which firms can derive intelligence
which helps business managers make informed decisions. This can translate into
improved ROI for business. This paper provides conceptual underpinnings about Big
Data, Predictive Analytics, applications of Big Data Analytics, challenges and
opportunities and further research direction. This field has a great potential to address
future challenges for business and society. It provides certain unique advantages
compared to statistical sampling method.
The present paper is organized as follows. The next section (section 2) provides
review of extant literature. In this section, we review literature on Big Data, sources of
data and Predictive Analytics. The third section provides concepts about Big Data
Analytics and further delves into Predictive Analytics. Fourth section will discuss
Opportunities and challenges dealing with Big Data and Predictive Analytics. Fifth
section will present conclusion followed by future research opportunities in this field.
.
1.1 RESEARCH OBJECTIVES

Driven by the need to further explore the role of big data and Predictive
Analytics, this paper acts to bridge the knowledge gap by achieving the following
objectives:

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 1


BIG DATA AND PREDICTIVE ANALYTICS

a) To explore the existing literature on the fundamental concepts of Big Data and
Predictive Analytics
b) To clarify the evolution and definitions of Big Data and Predictive Analytics
c) To explore the upcoming opportunities and challenges of Big Data and Predictive
Analytics
d) To identify gaps in existing research and identify further research directions on the role
of Big Data and Predictive Analytics

1.2 RESEARCH METHODOLOGY

After defining objectives of our research, we identified keywords such as “Big


Data”, “Big Data Analytics”, “Predictive Analytics”, “Social Media Analytics” and
“Twitter Analytics” for searching research papers in top journals, conference papers and
web sources. The research resulted in about 400 research papers. After initial review of
over 400 papers, we have narrowed down to over hundred research papers in the field of
Big Data and Analytics for further study. Besides, we have conducted few short
interviews with practitioners working in this field to get insights from the industry. After
completing our study and analysis, research papers from different journals were classified
in Table 3 (Annexure 1). The research papers were selected from year 1992 to 2016, with
majority of them published after year 2013. In fact, above 50% of the papers have been
published in the last 3 years, which is a good indicator of significance of this topic
(indicated by red color in Figure 1 below). These papers were reviewed to collect, refine
and review various aspects of Big Data and Analytics.

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 2


BIG DATA AND PREDICTIVE ANALYTICS

Figure 1: Year-wise classification of research papers on Big Data and Predictive Analytics

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 3


BIG DATA AND PREDICTIVE ANALYTICS

Chapter 2
LITERATURE REVIEW
In this section, we have discussed process of shortlisting research papers and
we present critical review of extensive literature on Big Data and Analytics. We are
presenting our findings from literature review below.

2.1 CLASSIFCATION SCHEME OF LITERATURE

Annexure 1 gives summary of literature reviewed as stated below:


 Table 3 : Classification of research papers on Big Data and Predictive Analytics
 Table 4 – Summarize various definitions that give conceptual understanding of Big
Data.
Table 5 – Summarize use of Big Data in Operations and other domains.

2.2 EVOLUTION OF INFORMATION SYSTEMS LEADING


TO BIG DATA

Over last several decades, information systems and internet have been major
enablers of globalization.From initial use of information systems for scientific
applications and departmental information systems,we have reached an era of “Smart
Phones” and “Internet of Things”. Prime purpose of early applications of information
systems was record keeping and efficient processing of business transactions. Since then,
several breakthroughs in computer science and engineering have led to information
revolution in last few decades. Chen et al. (2012) summarize this into 3 distinct phases as
narrated in table 1 below.

Phase Description

I – Till year 19 Database systems to collect, analyze and report structured data in R
99 DBMS systems

II – 2000- 201 Wide use of Internet, entry and growth of internet firms Yahoo, Goo
0 gle, Amazonetc., web based business applications, ecommerce, supp
ly chains

III – 2010- on Entry of smart phones, RFID, Sensor technologies, Internet of Thins
Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 4
wards
BIG DATA AND PREDICTIVE ANALYTICS

Table 1: Phases of Data Evolution Chen et al., (2012)

The era of Big data seem to have started around year 2000. Several developments
and trends led to the evolution of big data as depicted in figure 3 below. Big Data and
Analytics are the natural outcome of the above evolution process. This includes
advancements in computing hardware, digital storage capabilities, high speed software
solutions, internet and mobile technologies.
When compared with traditional data, Big Data differs not only on the size but also
in its form. It gets added continuously rather than relatively static data in the legacy or
ERP systems (Davenport, 2014).

Figure 2: Evolution of Information Systems leading to Big Data (Davenport, 2014)

2.3 BIG DATA


There are several ways in which Big Data has been defined. Waller and Fawcett
(2013) define Big Data as Datasets that are too large for traditional data processing
systems and therefore require new technologies to handle them. Chen et al., (2012)
define Big Data as data sets and analytical techniques in applications that are so large and
complex that they require advanced and unique storage, management, analysis and
visualization technologies. Fan et al., (2014) consider it as an explosion of available
information. Big Data cannot be defined just by volume of data, but it includes high
velocity, diverse variety, exhaustive in scope, and relational in nature (Kitchin, 2014). In
short, Big Data refers to datasets with terra-bytes and petabytes of data created in a short
span of few hours. Traditional database management technologies are unable to scale up
to the demand of storage, analysis or management of such large volumes of continuous
data from a variety of data sources. Figure 4 below gives an overview of how Big Data is
created from variety of sources and how analytics can be performed to enable decision
making.

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 5


BIG DATA AND PREDICTIVE ANALYTICS

Figure 3: Overview of Big Data and Analytics

2.4 DIFFERENT DATA TYPES

All the diverse sources generate different forms of data which can be broadly
classified as Structured, Un-structured and Semi-structured data (Figure 4).

Figure 4: Data types in Big Data

Structured Data: Source of structured data are organizational information systems such
as point of sales data, batch processes, ERP systems, extended enterprise systems such as
SCM and CRM systems. This data is organized into well-defined table structures in a
relational database. Traditional RDBMS systems use ETL tools and processes to extract,
transform and load the data into data warehouse.

Unstructured Data: Unstructured data originates from variety of sources such as social
media, text messages, emails, attachments, videos, images and sound files. In terms of

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 6


BIG DATA AND PREDICTIVE ANALYTICS

volume and velocity this is huge and accounts for over 80% of the large datasets. Analysis
and mining of this data is more challenging than structured data.

Semi-structured data: This originates from variety of different sources which is a mix of
structured and unstructured data. Various information systems have been in use for speed,
efficiency and accuracy of information exchange with stakeholders. Firms use emails for
communication, RFID technology for faster processing in logistics (Deng et al., 2010) or
sensor devices for tracking objects. Thus there are several sources of unstructured data -
emails, XML documents, server logs, communication log from RFID tags, GPS devices,
etc. There are some tags with IP address, date and time stamp, user information which is
structured. Besides, there are error messages, SQL statements, event logs which are in
unstructured text formats. Hadoop, HDFS (Hadoop Distributed File System) and
MapReduce provide technological framework to process large volumes of unstructured
data.

2.5 CHARACTERISTICS OF BIG DATA


Big Data can be characterized primarily with 3 Vs such as Volume, Velocity and
Variety. The HACE theorem given by Wu et al., (2014) suggests that big data is
characterized by its heterogeneity, complexity, decentralization and autonomous nature.
Recently, Veracity and Value are added by Dr. Fasso Wamba et al., (2015).

Volume: Volume of Big Data is quite large - Terabytes or Petabytes of data gets collected
in the span of few hours in business or social media databases. Amount of data.
is doubling every 40 months (Davenport, 2014). Number of mobile devices is increasing
at unprecedented rates. John Chambers of CISCO predicts that there will be over 40
billion wireless devices connected to internet in another 5 years (Embry, 2015). John
Sculley, a well-known business leader and ex-CEO of Apple foresees four exponential
technologies converging at high speeds to create next generation of digital age namely
cloud computing, internet of things, Big Data and mobiles (Embry, 2015). Convergence
of these 4 key technologies will lead to every higher volume of data at exponential rates.

Velocity: The speed of data accumulation is at unprecedented rates in both traditional


enterprise systems as well as in social media. Walmart has estimated 260 million

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 7


BIG DATA AND PREDICTIVE ANALYTICS

customers visiting every week, generating revenue of more than 1300 million dollars.
These sales transactions lead to huge data trail across their supply chain. Social media is
even faster in terms of data generation.

Variety: Big Data originates from Variety of different sources: enterprise systems (such
as ERP), social media as well many other digital devices. This list includes text, video,
audio, location, date and time data, emails, sensors, RFID data, web applications, etc.
Data is in different structured or unstructured formats based on the source.

Veracity: Data needs Veracity – that is to understand how much percentage of data is
accurate. Value: Finally, businesses need to learn how we can design models to improve
business outcomes and derive “value” from Big Data. Obtaining value from the large
heterogeneous data leads to the success of any industry (Weber et al., 2014).

2.6 BIG DATA VS TRADITIONAL DATA

Traditionally, business data resides within a well-defined relational database


management system. Several large organizations have implemented ERP systems for
achieving operational efficiency and containing costs over last few decades (Bharathi and
Mandal, 2015). These ERP systems or customized application systems collect large
volumes of transactional data round the clock. These support operational and tactical
decisions in short term and medium term. Volume and velocity of data depends on the
business volumes. Information systems have been used as a decision support system for
various levels of management. Based on this need, information systems were used
internally and with business partners (customers, suppliers, dealers, etc.) to run the
business efficiently. Scope of data collection was limited within this network of
immediate stakeholders. There have been situations when higher volumes of data were
collected, for example collection of census data by government every 10 years (Kitchin,
2014). However, frequency of such cases was quite low; there was no pressure to analyze
this data in real-time. This large dataset was moved to data warehouse for further analysis
over a course of time. Once internet became backbone of information and
communication, data scenario changed dramatically on several fronts:

A. Arrival of internet accelerated the process of globalization and growth of global firms
as communication anywhere in the world is quick, economical and easy. Business

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 8


BIG DATA AND PREDICTIVE ANALYTICS

volumes of multinational companies have grown many-fold since the advent of


internet. Business firms have further upgraded their information systems to record
every detail of business transactions within the boundaries of the firm and within the
supply chain. Even Reverse Logistics (RL) business process is mapped to information
systems due to its importance in improving customer service (returns processing,
fixing defective or damaged goods) or due to strong legislative requirements for
manufacturers to recover and/or disposal of the products in some countries (Tiwari et
al., 2015) and considering its significance from sustainability point of view
(Venkatesh et al., 2015). Retailer firms such as Walmart record millions of customer
transactions every hour which result in several petabytes of data just in few hours.

B. New forms of data in semi-structured format have emerged: websites, clickstreams,


weblogs, XML files, blogs, emails, etc. Further addition of social media data such as
Twitter, Facebook, Instagram, and LinkedIn have led to exponential growth in data
volume, speed and type.

C. When it comes to Big Data, data flow is continuous and it comes from variety of
sources. There is no fixed source or structure to the data. Facebook records billions of
posts, likes, millions of photo uploads every hour (Kitchin, 2014).

D. The volume, variety and velocity of collection of data have far outstripped capacity of
manual analysis. In some cases it has even exceeded the capacity of conventional
databases. Analyzing such huge volumes of data require specialized technological
framework such as Hadoop, which is used by technology leaders such as Microsoft,
IBM and Oracle for managing Big Data (Chen et al., 2012)

Table 2 below compares Traditional Data and Big Data:

ParamTraditional Data Big Data


eters

Struct Structures are defined Mix of Structured, semi-structured and


ure unstructured data
of dat
a

Data Based on business volumes Very high, in petabytes and even more
Volu and extent of digitization

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 9


BIG DATA AND PREDICTIVE ANALYTICS

me

Variet Data source from database syst Besides data from business information
y of ems systems, text (emails,
Data documents), weblogs, sensors, RFID, etc.
Sourc
es

VelociLow to moderate based on High velocity


ty volume of business

Flow Fixed Continuous round the clock


accumulation of data

Struct Structured Data Structured, Semi-structured and


ured Unstructured data
Data

Sourc Organizational data, trading Organizational data, RFID, Sensor data,


es of partners data Google searches, Social media (Linked in, F
data acebook, Twitter, Whatsapp, etc.)

Analy Provide historical view, status Real-time, direct feedback from the
tics reports consumer, sentiment analysis, opinions

FunctiAdvises senior executives on Customer facing functions get direct


ons internal business market feedback which can be used for plan
decisions, focused on analy ning market strategies, planning etc.,
zing data for

2.7 RESEARCH GAPS

Pareto chart derived from classification of key literature on Big Data Analytics
(BDA) (Figure 2), indicates both industry and academic scholars have conducted studies
to tap the potential of BDA. There is no dearth of general articles explaining the
relevance, significance, challenges and opportunities of BDA.
Recent studies have investigated ways in which supply chain managers can
mine and derive value from BDA on structured and unstructured mix of data (Zhong et
al., 2015; Kitchin, 2014; Chae, 2015; Tan et al., 2015; Schoenherr and Speier‐Pero, 2015;
Hahn and Packowski, 2015; Sahay and Ranjan, 2008; Nair, 2012) or how social media
data can provide competitive intelligence or play role in brand promotion strategy (Kim et
al., 2016; He and Xu, 2016; Coursaris et al., 2016; Borra, and Rieder, 2014; Bell, 2012).

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 10


BIG DATA AND PREDICTIVE ANALYTICS

Besides, analytics studies have been conducted in the domains of HR (Lawler et al.,
2004), World Class Sustainable Manufacturing (Dubey et al., 2015), Process Analytics
(Vera-Baquero et al., 2015), Product Lifecycle Management (Li et al., 2015) and Cloud
Computing (Hashem et al., 2015). However, there is no study which attempts to
understand the role of Big Data and Predictive Analytics and how it is helping to add
value across different sectors. We aim to address this gap through the literature and
address the challenges in this paper. This gap has also helped us to move towards the
future directions in this field.

Figure 5: Domain Analysis of Research papers on Big Data Analytics

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 11


BIG DATA AND PREDICTIVE ANALYTICS

Chapter 3

BIG DATA ANALYTICS


(Proposed system)

Big Data Analytics has its roots in the earlier data analysis methodologies using
statistical techniques such as regression, factor analysis, etc. It includes data mining from
high speed data streams and sensor data to get real time analytics (Chen et al., 2012). It is
an interdisciplinary field which uses knowledge of computer science, data science,
statistics and mathematical models. It consists of a systematic process of capturing and
analyzing business data, developing a statistical model either to explain the phenomenon
(Descriptive Analytics), developing a model to predict future outcomes based on variable
inputs (Predictive Analytics) or developing a model to optimize or simulate outcomes
based on variations in inputs (Prescriptive Analytics). It leverages statistical techniques
such as regression, factor analysis, multivariate statistics and knowledge of mathematics
for developing equations (Dubey and Gunasekaran 2015).
Levalle et al., (2010) conducted an exploratory study on big data analytics and the
path from insights to value. They reported that with an improving technology there has
been an enormous collection of big data and researchers are still in the way for finding the
better ways to analyze these data so that they can reach to valuable information. In the
present era, researchers and people are not concerned with what happened or why it
happened commonly known as descriptive analytics but the main issue of concern is to
find out the answer of questions like what is happening in present and what is likely to
happen in the future commonly known as Predictive Analytics and what actions should be
taken to find out the optimal results basically known as Prescriptive Analytics. Therefore
business analytics can be classified into Descriptive, Predictive and Prescriptive Analytics
as explained in figure 6 below. We elaborate Predictive Analytics with further details in
the next section considering its significance for various stakeholders in the society and
business.

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 12


BIG DATA AND PREDICTIVE ANALYTICS

Figure 6: Framework for Predictive Analytics – adopted from Bose, (2014)

3.1 PREDICTIVE ANALYTICS


Predictive Analytics is defined as the process of discovering meaningful patterns of
data using pattern recognition techniques, statistics, machine learning, artificial
intelligence and data mining (Abbott, 2014). Also, referred as Advanced Analytics, it
simply means application of data analytics techniques to answer questions or solve
problems (Bose, 2014). It is a further progression of Business Intelligence (BI) and data
mining combined with statistical techniques. Business Intelligence processes help
analysis of internal and external data to enable business executives to make intelligent
decisions. The questions and variables are developed by experts in the field of study
whereas in case of Predictive Analytics, selection of model and relationship are data
driven (Abbott, 2014). It is a systematic analytical process, wherein a computer algorithm
finds out patterns and underlying relationships of dependent and independent variables. It
is designed to find the optimum regression coefficients of relationship to minimize the
errors in the model. The process uses advanced information systems to go through several
iterative steps to find out optimum outcomes to the problem.
Process mining has emerged as a new research avenue for analysis of process based
on event logs. It opens opportunities of conformance and discovering new processes in
various fields such as healthcare, retail or banking (van der Aalst, 2012). The Hadoop
framework provides a solution for dealing with these analytics requirements. Based on

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 13


BIG DATA AND PREDICTIVE ANALYTICS

source and nature of different data, there are various analytics methods which support
data mining and statistical analysis techniques.
Text Analytics techniques derives real-time and meaningful information from
unstructured data sources such as documents, emails, web pages and social media. It is
being pursued in some of the emerging areas such as sentiment analysis, opinion mining
or for extracting information from text sources. (Chen et al., 2012). In the recent years,
soon after product launch, sentiment analysis with social media data provides early
indicators of consumer feedback about product.
As the data on social media is growing and it contains valuable information for
business firms, govrnments as well as NGOs it is being tapped for deriving value. It
requires a different process of data collection and analysis due to its large volume,
continuous flow and variety of data to arrive at meaningful information. We discuss this
in detail in the next sub-section.

3.2 SOCIAL MEDIA ANALYTICS

Social media analytics is an emerging concept which is becoming part of mainstream


marketing strategy. It is based on social media data created on sites such as Twitter,
Facebook or WhatsApp. It is concerned with developing and evaluating informatics tools
and frameworks to collect, monitor, analyze, summarize and visualize social media data
(Zeng et al., 2010).
Big Data created on the social media consist of text messages, songs, pictures,
videos etc. People share information through text messages, videos, pictures, songs, etc.
They often express their intention to purchase a product, request for feedback, share their
service experience or product reviews on the social media. This data from social media
has valuable information for firms which can analyze and mine this data. Figure 7 below
summarizes a generic process consumers follow for purchasing products. This process
illustrates how consumers research products using search engines and social media. They
also contribute their views, opinions on social media before and after purchasing the
product. Technology enables firms to use this medium to support business strategies and
tactical decision making. We summarize some of the ways in which social media
analytics can be helpful for the firms in the next sub-section.

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 14


BIG DATA AND PREDICTIVE ANALYTICS

Figure 7 - Process of Product Purchase and content generation on social media

3.2.1 Sentiment Analysis


Some companies collect and analyze consumer sentiments expressed on social
media about their products or services. There are several ways of analysis using natural
language processing, sentiment lexicons or machine learning algorithms in wide variety
of industries. Sites such as www.sentiment140.com provide insights on customer
sentiment on products or services. Google Analytics provide various services – smarter
advertising (data driven relevant and effective advertising), deep customer insights
(captures customer behavior across CRM, point of sale, call center), etc.

3.2.2 Competitive Intelligence


In today’s highly competitive markets, business decision makers seek consumer
feedback about their own products as well as those of their rivals. Kim et al., (2016)
suggest use of social media analytics for gathering competitive intelligence about firm’s
product and products offered by competitors in the same market segment. Their
comparative analysis of Twitter data for 2 competing smart phones (Apple iPhone 6 and
Samsung Galaxy S5) over a period of time revealed correlation between sentiments
expressed on the social media and difference in market demand for the 2 products. They
investigated 3 metrics a) social media volume, b) purchase intention and c) consumer
sentiment for further analysis. This approach can assist firms to predict market sales
performance and estimate the gap between competing products. As a result decision
makers can adjust market strategy rapidly and compensate weakness contrasting with the
rivals as well. There are various possible ways in which the intelligence from social
media can be tapped by firms to develop competitive intelligence by helping organization
understand their suppliers, competitors, environments and overall business trends.

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 15


BIG DATA AND PREDICTIVE ANALYTICS

Business intelligence obtained from social media can enable business analysts and
decision makers to develop market insights into consumer behavior, discover new
marketing ideas, improve customer satisfaction and finally improve ROI (Kim et al.,
2016).

3.2.3 Marketing and Brand Promotion Strategy


More and more people are getting online due to popularity of smart phones, tablet
computers, leading to generation of higher levels of large datasets and more particularly
geo-location data (Bell, 2012). Online presence of consumers for reading news, searches,
ecommerce sites and social media sites provides a huge window of opportunity for firms
to connect with them. Consumers typically research products and services over internet
and social media before arriving at purchasing decisions. The extent of research depends
on price and risk of purchasing that particular product or service. For example, for high
price and high risk purchase such as home or airline ticket consumers tend to put in
extensive research, whereas for commodity items their research may be limited to few
searches on web. Firms can understand the natureof this activity, analyze the data
associated with it and plan their marketing and branding strategy accordingly. For high
price products wherein consumer purchase involvement is higher, firms are better off
providing information about products rather than using entertaining content for
advertisements on the social media (Lally, 2007). For low cost commodity items, firms
can use entertaining methods to attract consumer attention (Coursaris et al., 2016).

3.3 APPLICATIONS OF BIG DATA ANALYTICS

In a competitive situation in the current era, firms use business data and external
information to support tactical and strategic decisions. The ability of a firm to make quick
and informed decisions differentiates itself from competitors in highly competitive
markets in the current era (Bose, 2008). As described earlier, predictive analytics and
social media analytics provide an opportunity to get first hand market intelligence.
Consumers provide instant feedback about products, services or movies on the social
media. This is a valuable source for firms to gather information about consumer
sentiments and opinions. There are several organizations tapping the value from Big Data
for improving customer satisfaction, tracking customer journeys to analyze customer
attrition or purchase decisions, identify supply chain risks, gather competitive intelligence

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 16


BIG DATA AND PREDICTIVE ANALYTICS

or for making pricing decisions (Davenport, 2014). There are numerous case studies of
effective use of Big Data Analytics as summarized below:

a) With the help of historical sales data in the hurricane affected regions, Walmart’s CIO
could predict higher level of demand for certain products just ahead of hurricane
Frances. MegaTelCo could predict customer churn and design strategies to minimize
them (Provost and Fawcett, 2013).

b) Film studios have seen staggering accuracy in the way tweets from the first showing of
new films to predict success of the films as well as success of its DVDs.

c) When dealing with large volumes of data in millions or billions, Big Data Analytics
can help to discover patterns and problems such as new forms of customer churn,
business opportunities such as new customer segments and sales prospects,
understanding customer behavior through clickstreams (Russom, 2011).

d) Big Data is already being used by businesses for developing market intelligence, by
governments for designing policies, by politicians for designing political campaigns, by
medical practitioners for smart health management. Some of the emerging research
areas in this field are Big Data Analytics, Text Analytics, Network Analytics and
Mobile Analytics (Chen et al., 2012).

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 17


BIG DATA AND PREDICTIVE ANALYTICS

Chapter 4

OPPORTUNITIES AND CHALLENGES OF BIG


DATA ANALYTICS
Big Data is characterized by massive sample size and high dimensionality (Fan et
al., 2014). Analyzing this kind of data with many dimensions, huge volume and
heterogeneity present some opportunities of competitive advantage as well as some
challenges in dealing with these large and continuous data sets. We discuss these in detail
in this section.

4.1 OPPORTUNITIES
Huge volume of data used to be a technological problem just few years ago, now it
presents an opportunity (Russom, 2011). Big Data provides many opportunities and
competitive advantages. Early mover Amazon.com started collecting customer
information, preferences, purchase history, search history and books reviews. Based on
this data, it provides product recommendations which motivates customer to buy similar
or related product, improving chances of additional purchases from the same customer.
Next generation retailers will be able to track behavior of individual customers and
develop models for prediction or influencers. Walmart makes use of “Social Genome”
that tracks connections between people, products, brands and other related entities. Social
Genome is used to make product recommendation to customers when they are online or
in store (Direction S., 2012). Big data provides several advantages over traditional
method of data collection such as drawing samples.

4.1.1 Data Mining


Data mining is an analytical process of identifying patterns from datasets, which
helps in prediction of future outcomes. This process helps to discover patterns within
population and heterogeneities that are not possible with small-scale data (Fan et al 2014).
Data mining is useful in getting fine-tuned information that is not very obvious. For
example, it provides information about most profitable customers, about those which are
most probable candidates for churn or to monitor levels of customer satisfaction and

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 18


BIG DATA AND PREDICTIVE ANALYTICS

loyalty. This information helps in designing customer retention or cross-selling strategies


(Bose, 2014).

4.1.2 Large Sample size for Analytics


Big Data provides gigantic statistical samples, which enhance analytical tools
results (Russom, 2011). In sampling method, an outlier from one sample may belong to
another subpopulation. This can lead us to an invalid model and incorrect results due to
lack of inclusion of all parameters which explain variation in the group. Inside Big Data,
we are looking at entire dataset that includes all subpopulations. This helps us to better
understand heterogeneity within the dataset and gives us better understanding of the
relationships among dependent and independent variables in a model. Moreover,
analytical tools are now capable of handling large volumes of data at reasonable costs
(Russom, 2011). As the data represents entire population, model based on data gives
accurate information, leading to higher level of accuracy in business decisions. Big data
includes both enterprise systems data as well as unstructured data such as social media
which provides real time information on the ground such as consumer feedback, live
information on weather and traffic which complements very well with information from
enterprise systems.

4.1.3 Advancement of Technology


Big Data Analytics require searches within large datasets, analyzing the same for
information looking for correlations (Davenport, 2014). The challenges presented by
large data sets have motivated development of new computational infrastructure and data
storage methods. Culmination of Data science, Statistics and Applied Mathematics is
resulting in better optimization algorithms that are scalable to process large datasets with
high dimensions (Fan et al., 2014). Technology is ready with the solutions faster than the
pace of data creation with better, cheaper storage capacity and high speed computing
capability.

4.1.4 Real time


Big Data whether from structured or unstructured sources, provides real time
information to those who have ability to derive value from that particular source of data.
There are numerous examples where real-time information has been used to solve

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 19


BIG DATA AND PREDICTIVE ANALYTICS

problems. Natural disasters have potential to disrupt crucial supply chains links.
Especially impact for countries like India impact is higher as the logistics system in India
is fragmented, infrastructure is inadequate and it consists of many small to large players
(Rai et al., 2015; Bag and Anand, 2015). Data collected from geographical locations,
weather and developing natural disasters (storms, floods, earth quakes, etc.) has its
direction application in real-time monitoring supply chain risks proactively - to prevent
disruptions and reactively - to investigate past events (Yin et al., 2016). Government of
Singapore uses the geolocation data from mobile phones to manage traffic during rush
hours. This information is used to predict real-time demand for transport services during
rush hours and to divert the taxies to those areas of the city. Citizens get real-time updates
about traffic, weather conditions through social media and revise their travel plans
accordingly. Netizens provide real-time updates about variety of events through social
media. Ecommerce companies Amazon, Flipkart make product recommendations based
on earlier purchases and search history, which leads sales of additional products.

4.2 CHALLENGES WITH BIG DATA ANALYTICS


Due to its huge size and high dimensionality, Big Data pose some unique challenges.
Data collected at such enormous speed overwhelm most of the organizations. The main
challenges are related to its size, quality, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors (Fan et al., 2014). One of the important
facts to be noted is that big data is driven by huge amount of data produced every day and
stored at a cheaper rate than before. For an effective statistical procedure it is essential to
address the hurdles of big data like noise, heterogeneity, correlation and inefficiency (Fan
et al., 2014).

4.2.1 Size and Quality of Data


The biggest challenge in handling big data is its size. Traditional databases can be
easily managed using excel or other ETL tools in information systems; whereas big data
requires a specialized technology framework such as Hadoop for data management. As
big data gets generated quite fast, transportation and storage of this huge data requires
advanced planning and investment in infrastructure. The second challenge is related to
quality of data which can be defined with completeness, accuracy and timely availability
of data. In a traditional sampling process, a random sample of data is selected from a

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 20


BIG DATA AND PREDICTIVE ANALYTICS

large population size to gather detailed information and statistical analysis is conducted
on the sample. In case of big data, large datasets require sophisticated statistical and
computational methods for analysis (Fan et al., 2014). Hazen et al., (2014) acknowledge
the quality issues for Big Data in Supply Chain Management and suggest interdisciplinary
research to address data quality problems in the context of SCM and DPB.

4.2.2 High dimensionality


High dimensionality also brings Spurious Correlation which means some of the
variables may give misleading information about its relationship with the model outcome.
Any variable which does not explain the variability in the outcome has a chance of
getting included in the model. As big data has a high dimensions, selecting variables and
developing constructs is a challenge due to possibility of spurious correlation. This leads
to another issue of Incidental Endogeneity. It arises due to higher dimensions leading to
inaccurate results from the model. High dimensionality combined with large sample size
creates issues such as heavy computational cost and algorithmic instability (Fan et al.,
2014).

4.2.3 Reliability
Another challenge is about Reliability of data. Most of the unstructured data is
often unreliable, prone to outages and losses. The data comes from different sources such
as social media, smart phones, emails or text messages (Boyd et al., 2012). In
manufacturing environment, data comes from heterogeneous sources such as information
systems and variety of sensors (Le and Pang, 2013). This makes data mining process
quite intense, requires mining through a large volume of unrelated data to arrive at small
piece of relevant and meaningful information. The process can be compared with finding
a needle in a haystack.

4.2.4 Completeness of Data


Though the dataset is large and having several dimensions, does not necessarily
represent the population. If we take social media which is a source of large unstructured
data, it represents only those who are active on that particular media and express their
opinions or participate in the online debate on the topic.Some of the sites like Twitter
provide access to limited filtered datasets based on certain criteria, which is again a subset

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 21


BIG DATA AND PREDICTIVE ANALYTICS

of data. Researchers must be able to account for the biases in their interpretation of data
(Behar and Gordon, 1996).

4.2.5 Implementation of Analytics


Like any information systems project, implementing Big Data Analytics project
within an organization has its special challenges. Putting the required infrastructure in
place, higher initial cost, changes to business processes (Bose, 2014) and availability of
experienced data scientists – these are some of the challenges in implementation of
Analytics within an organization.
Penetration of internet of things, smart phones and cloud computing technologies
in the industry and society will continue to spawn even higher levels of Big Data. To
leverage insights from Big Data, industry and academia will be in need of experienced
professionals in Data Science and Predictive Analytics. This field requires both domain
knowledge and broad set of quantitative skills such as Statistics, Forecasting,
Optimization, simulation, probability etc. (Waller and Fawcett, 2013). Training thousands
of data-scientists and then translating those into measurable business outcomes will
remain a challenge for academia and industry (Dyche, 2012).
Dealing with Big Data from variety of sources (organizational legacy system,
ERP, social media) and deriving value from it, requires a clear strategy and
implementation. Data Scientists with knowledge, experience and track record need to be
on board. Initial projects must be monitored from top management. After successful
implementation of Big Data strategy for a pilot site, opportunities for the business context
can be identified .

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 22


BIG DATA AND PREDICTIVE ANALYTICS

Chapter 5

CONCLUSION AND FURTHER RESEARCH


DIRECTION
Big Data has arrived around year 2000 and growing exponentially fueled by
digitalization of society and business firms. Several technological revolutions such as
internet, cloud computing, smart phones or internet of things are powering the data
generation engines. It is defined by its volume, velocity and variety as it is created
through structured and unstructured sources.
Big Data Analytics is an interdisciplinary field which combines knowledge of data
science, statistics, mathematics and computer science. It can be further classified into 3
sub categories based on purpose of analysis. Predictive analytics primarily deals with
predicting or anticipating future outcomes based on mining of existing data. Data can be
sourced from business or social media. However, what is important is to analyze it and
get understanding of “what can happen”. This ability to predict gives immense power to
plan ahead of competition for business firms. It can give window of opportunity for
advance planning to governments in situations such as hurricanes or spread of epidemics.
Social Media Analytics is an emerging field which showcases several ways in
which firms can derive value. It has potential opportunities in getting firsthand
information about their products from market through sentiment analysis, by getting
competitive intelligence or promoting brand using social media as a platform. Depending
on how effectively firms can tap into this medium, they can have competitive advantage.
The nature of Big Data presents some unique challenges as well as opportunities.
As described above, Big Data can be collected, stored and analyzed through a systematic
process. To derive insights from the ocean of data requires years of experience and
knowledge of the interdisciplinary field of Analytics. Business firms and governments
need to build infrastructure and framework for this purpose. Like any strategic project,
analytics projects must be led from the top for its success and gaining the advantage.
There remain few obstacles in managing large datasets like quality and reliability of data
as well as availability of knowledgeable data science professionals. Data size and
completeness of data, lack of business support, inadequate staffing and skills and
problems with database software are few other barriers in implementing Big Data

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 23


BIG DATA AND PREDICTIVE ANALYTICS

Analytics strategy. Big data constitutes several other challenges like data life cycle
management, redundancy of data, analytical mechanism, confidentiality of data, energy
management, cooperation and data representation (Chen et al., 2014). However, with
strong leadership and willingness these can be overcome. Further, Big Data and
Predictive Analytics provide several opportunities to study, investigate and research in
different fields.

5.1 FUTHER RESEARCH DIRECTIONS


In this section, we outline the scope of further research that would contribute to the
scientific community with the use of the Big Data and Predictive Analytics. We cannot
afford to ignore BDA and the insightful knowledge about complex phenomenon it
provides (Kitchin, 2014) as it can provide new patterns and correlations that previously
were unknown. Understanding of these correlations provides advantage in supporting
executive decisions for maximizing value for the firm (Dyche, 2012). There are several
opportunities such as potential to search, analyze data from Jet Engines, RFID data from
supply chain partners, sensor data from digital devices, e-commerce transactions,
business data or social media data to understand patterns and correlations in respective
fields. Many such opportunities are available for researchers to analyze the data using
computers which cannot be done manually for large set of data. As noted from figure 2
above, there are several fields where the opportunities exist. The use of BD and Predictive
Analytics can further help to address the identified research gaps. Hence we argue that
future research in different domains should embrace this (BDA) approach with structured,
semi-structured and unstructured data sources from variety of sources.

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 24


BIG DATA AND PREDICTIVE ANALYTICS

REFERENCES
[1]3PLs Investing Heavily in Big Data Capabilties to Ensure Seamless Supply Chain
Integration. 2014.

[2]Retrieved January 26, 2016, From http://www.supplychain247.com/photos/3pls


_investing_heavily_in_big_data_capabilties/3

[3]Abbott, D. (2014). Applied Predictive Analytics: Principles and Techniques for the
Professional Data Analyst.John Wiley & Sons.

[4]Adomavicius, G., &Tuzhilin, A. (2005). Toward the next generation of recommender


systems: A survey of the state-of-the-art and possible extensions. Knowledge and Data
Engineering, IEEE Transactions on, 17(6), 734-749.

[5]Alstete, J. W., & Cannarozzi, E. G. M. (2014). Big data in managerial decision-


making: concerns and concepts to reduce risk. International Journal of Business
Continuity and Risk Management, 5(1), 57-71.

[6]An inconvenient truth. Paramount, 2007. Al Gore.

[7] Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., &Buyya, R. (2015).
Big Data computing and clouds: Trends and future directions. Journal of Parallel and
Distributed Computing, 79, 3-15.

[8]Automobile Industry in India.2015. Retrieved November, 2015, From


http://www.ibef.org/http://www.ibef.org/industry/india-automobiles.aspx

[8] Bag, S., & Anand, N. (2015). Modelling barriers of sustainable supply chain
network design using interpretive structural modelling: an insight from food processing
sector in India.

[9]international Journal of Automation and Logistics, 1(3), 234-255.


Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 25
BIG DATA AND PREDICTIVE ANALYTICS

[10]Batra, S. (2014). Big Data Analytics and its Reflections on DIKW Hierarchy.Review
of Management, 4(1/2), 5.

[11]Berg, W. F., Carlin, J. D., Kalmbach, M. T., & Schroeder, M. D. (2015). U.S. Patent
No. 8,989,067.
Washington, DC: U.S. Patent and Trademark Office.

[11]Bharathi, S. V., & Mandal, T. (2015). Prioritising and ranking critical factors for
sustainable cloud

[12]ERP adoption in SMEs. International Journal of Automation and Logistics, 1(3), 294-
316.

[13] Borra, E., & Rieder, B. (2014). Programmed method: developing a toolset for
capturing and analyzing tweets. Aslib Journal of Information Management, 66(3), 262-
278.

[14]Bose, R. (2009). Advanced analytics: opportunities and challenges.Industrial


Management & DataSystems, 109(2), 155-172.

[15]Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a
cultural,technological, and scholarly phenomenon. Information, communication &
society, 15(5), 662-679.

[16]Bröhl-Kerner, H. (2008). Intelligent replacement-making optimal use of household


appliances.
Refereed Sessions I-II Monday 10 March, 277.

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 26


BIG DATA AND PREDICTIVE ANALYTICS

Appendix A – PowerPoint presentation

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 27


BIG DATA AND PREDICTIVE ANALYTICS

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 28


BIG DATA AND PREDICTIVE ANALYTICS

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 29


BIG DATA AND PREDICTIVE ANALYTICS

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 30


BIG DATA AND PREDICTIVE ANALYTICS

Mayank Gupta – 1CR14IS055 Dept. of ISE – Fab – May 2019 Page 31

You might also like