Professional Documents
Culture Documents
By:
Surbhi Vyas(7)
Varsha(8)
Big Data
Big data is a term that describes the large volume
of data both structured and unstructured that
inundates(floods) a business on a day-to-day basis.
But its not the amount of data thats important.
Its what organizations do with the data that
matters. It can be analyzed for insights that lead
to better decisions and strategic business moves.
An example of big data might bepetabytes(1,024
terabytes) orexabytes(1,024 petabytes) of data
consisting of billions to trillions of records of
millions of peopleall from different sources (e.g.
Web, sales, customer contact center, social media,
mobile data and so on).
Big Data History & Current
Considerations
While the term big data is relatively new, the
act of gathering and storing large amounts of
information for eventual analysis is ages old.
Characteristics:
Volume:Organizations collect data from a variety
of sources, including business transactions, social
media and information from sensor or machine-to-
machine data. In the past, storing it wouldve
been a problem but new technologies (such as
Hadoop) have eased the burden.
Velocity:Data streams in at an unprecedented
speed and must be dealt with in a timely manner.
Variety: Data comes in all types of formats from
structured, numeric data in traditional databases
to unstructured text documents, email, video,
audio, stock ticker data and financial transactions.
Big data has increased the demand of
information management specialists in
thatSoftware AG,Oracle
Corporation,IBM,Microsoft,SAP,EMC,HP
andDellhave spent more than $15billion on
software firms specializing in data management
and analytics.
In 2010, this industry was worth more than
$100billion and was growing at almost
Who uses big data?
What is it?
Programming model used by Google
A combination of the Map and Reduce models with
an associated implementation
Used for processing and generating large data sets
MapReduce Overview
Reduce applies a
user defined
Map returns Reduces accepts
function to
information information
reduce the
amount of data
MapReduce workflow
Input Data Output Data
Worker Output
write
local Worker File 0
Split 0 read write
Split 1 Worker
Split 2 Output
Worker File 1
Worker remote
read,
sort
Map Reduce
extract something you aggregate,
care about from each summarize,
record filter, or
transform
Example: Word Count
Applications
1996
1996
1996
1997
Google search engines
1998
2017
Framework of Hadoop
Flexible Fast
Resilient
to Failure
Disadvantages of Hadoop
Security Concerns
Vulnerable by Nature
General Limitation
Companies Using Hadoop
Used as a source for reporting and machine learning.
Client Projects