Professional Documents
Culture Documents
Customer 360 is a Customer Relationship Management (CRM) best practice which aims
at improving the relationship with existing customers, finding new prospective
customers, and winning back former customers. One of the core tenants of CRM is to
have a holistic “360 degree” view of all customer interactions and information that is
easily accessible by the business or organization. Accented solutions operate on a
seamless platform which contains Customer 360 degree views all applications and
business process that interact with the customer, improving quality of service to
maximize customer satisfaction and increase revenue.
COMPONENTS USED
1. HDFS
2. HBASE
3. PIG
4. HIVE
WHAT IS HDFS?
The Hadoop Distributed File System (HDFS) is the primary data storage system used by
Hadoop applications. It employs a Name Node and Data Node architecture to
implement a distributed file system that provides high-performance access to data
across highly scalable Hadoop Clusters.
HDFS is a key part of the many Hadoop ecosystem technologies, as it provides a reliable
means for managing pools of big data and supporting related big data analytics
applications.
Features of HDFS
It is suitable for the distributed storage and processing.
Hadoop provides a command interface to interact with HDFS.
The built-in servers of name node and data node help users to easily check the
status of the cluster.
Streaming access to file system data.
HDFS provides file permission and authentication.
HBase
What is HBase?
HBase is a distributed column-oriented database built on top of the Hadoop file system.
It is an open – source project and is horizontally scalable.
HBase is a data model that is like Google’s big table designed to provide quick random
access to huge amounts of structured data. It leverages the fault tolerance provided by
the Hadoop File System (HDFS).
It is a part of Hadoop ecosystem that provides random real-time read/write access to
data in the Hadoop File System.
One can store the data in HDFS either directly or through HBase. Data consumer
reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop
File System and provides read and write access.
Features of HBase
HBase is linearly scalable.
It has automatic failure support.
It provides consistent read and writes.
It integrates with Hadoop, both as a source and a destination.
It has easy java API for client.
It provides data replication across clusters.
Features of Pig
Rich set of operators- It provides many operators to perform operations like join,
sort, filer, etc.
Ease of programming- Pig Latin is like SQL and it is easy to write a Pig Script if you
are good at SQL.
Optimization opportunities- The tasks in Apache Pig optimize their execution
automatically, so the programmers need to focus only on semantics of the
language.
Extensibility- Using the existing operators, users can develop their own functions
to read, process and write data.
WORKING WITH PIG
Loading and Storing ‘demographics.csv’
Scanning ‘demographics.csv’
Loading and Storing ‘creditcard.csv’
Scanning ‘creditcard.csv’
Loading and Storing ‘depositaccount.csv’
Scanning ‘depositaccount.csv’
Loading and Storing ‘loanaccount.csv’
Scanning ‘loanaccount.csv’
Loading and Storing ‘savingsaccount.csv’
Scanning ‘savingsaccount.csv’
Loading and Storing ‘creditcardtrx.csv’
SUM
DISTINCT TRANSACTION TYPES
HIVE
What is HIVE?
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It
resides on top of Hadoop to summarize Big Data and makes querying and analysing
easy.
Initially Hive was developed by Facebook, later the Apache Software Foundation took it
up and developed it further as an open source under the name Apache Hive. It is used
by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Features of Hive
It stores schema in a database and processed data into HDFS.
It is designed for OLAP.
It provides SQL type language for querying called HiveQL or HQL.
It is familiar, fast, scalable and extensible.
WORKING WITH HIVE
DATA ANALYSIS
1. Select count(*) from reception where deposittype is not null;
Thank You