You are on page 1of 17

A Seminar on

Introduction to BigData
Ajay Kumar
PhD(CSE-part time) Scholar
DIT University Dehradun
20-July-2015
Introduction to Big Data

Lots of data is being collected


Web data, e-commerce
Purchase at departmental/groceries store
Bank/credit card transaction
Social network
Medical record
Population registration
How much Data (is Big Data)?

Google process 20 PB a day Size of data in


Bit
Facebook has 2.5 PB of user Byte
data (~40 Billions photo) + 15 Kilobyte(KB)
TB a day Megabyte (MB)
eBay has 6.5 PB of user data + Gigabyte(GB)

50 TB a day Terabyte(TB)
Petabyte(PB)
Walmart handles more than 1 Exabyte(EB)
million customer transaction Zetabyte(ZB)
every hour Yottabyte(YB)
Type of Data

Relational Data (Table/Transaction/Legacy Data)


Text Data (Web based)
Graphical Data (social network, semantic web, RDF)
Structured, Semi structured and Structured data (XML)
Streaming data
What to do with these data?
Aggregation & Statistics
Data warehouse & OLAP
Indexing, Searching and Querying
Keyword based search
Pattern matching (XML/RDF)
Knowledge Discovery
Data mining
Statistical Modeling
What is Big Data?
Big data is defined as large amount of data which requires
new technologies and architectures to make possible to
extract value from it by capturing and analysis process.
Big Data is the data characterized by 4 key attributes
Volume (Data Quantity)
Variety (Data Types)
Velocity (Data Speed)
Value (Targeting focus)
Volume ( Data Quantity)
Facebook generates 500 TB data everyday
Boeing 737 generates 240 TB of flight data across the USA
Smartphone- the data they create and consume
Sensors generate the data in billions constantly for the
environment, location, video, data capture etc.
Velocity (Data speed)
Click stream and Ads impression captures users behavior at
millions of events per second
High frequency stocks trading algorithm reflect market changes
within microseconds
Machine to machine processes exchange data between billions
of devices
Real-time system keeps the massive log data of every record.
Online gaming system support millions of concurrent users.
Velocity in Big data is a concept which deals with the speed of
the data coming from various sources. This characteristic is not
being limited to the speed of incoming data but also speed at
which the data flows and aggregated.
Variety (Data type)
Not only numbers, string
But also geospatial data, 3D data, audio, video, unstructured text, log
file, social media
Data variety is a measure of the richness of the data representation
text, images video, audio, etc. Data being produced is not of single
category as it not only includes the traditional data but also the semi
structured data from various resources like web Pages, Web Log Files,
social media sites, e-mail, documents.

Value
Ads targeting
Value of data in term of like, dislike, twits, blogs by our
opinion
Issues & Challenges in Big Data
Difficulties in managing large data :
data capture, storage, search, sharing, analytics and visualization etc.
encoded video, images, audio, or text/numeric information;
scalability, unstructured data, accessibility, real time analytics, fault tolerance and many
more.

Challenges in Big Data:


Privacy and security
Data Access and Sharing of Information
Analytical Challenges (which are the real data to deal with?)
Human Resources and Manpower
Technical challenges ( Fault tolerance, scalability, Quality of data, Heterogeneous Data)
Explosion of data
Big Data Resource (who creates data)

Generating data Resources of data


Mobile devices
(Large & Growing File)

Users/Application Microphone
Big Data File

Reader/scanner
sensors
Science facility
Program/software
Organization
Social media
camera
Big Data Analytics

Examining large amount of data


Extraction of Appropriate information
Identification of hidden pattern, unknown correlation
Better business decision, strategic and operational
Effective marketing, customer satisfaction, increased revenue
Type of tool used in Big Data

Processing
hosted- Distributed server (distributed computing-
HDFS(Hadoop Distributed File System)
Data is stored distributed storage (e.g.Hadoop)
Programmingmodel distributed processing
(e.g. MapReduce)
Datais stored and indexed high performance schema-
free database (e.g. MongoDB)
Advantage of Big Data
Understanding and Targeting Customers
Understanding and Optimizing Business Process
Improving Science and Research
Improving Healthcare and Public Health
Optimizing Machine and Device Performance
Financial Trading High Frequency Trading
Improving Security and Law Enforcement
Scalability and flexibility to store data to be processed - Hadoop
Application of Big Data (Research Area)

Smarter healthcare
Traffic control
Manufacturing unit
Multi-channel sales
Telecom
Trading analytics
Search quality
Thank you !

You might also like