Advanced Training - Hive Installation and Usage

nagarjuna@outlook.
com
HIVE INSTALLATION PROCEDURE

Start by downloading the most recent stable release of Hive from one of the Apache download
mirrors. Download the latest version here http://www.apache.org/dyn/closer.cgi/hive/. We would
be using hive-0.7.1 version of hive distribution.
1. Installation:
1.1 Requirements:
a. Sun JDK 1.6.x must be installed.
b. Hadoop-0.20.x distribution must be installed.
1.2 Installing hive:

First, you need to unpack the tarball by the following command.
$ tar -xzvf hive-0.7.1-bin.tar.gz
The above command would create a directory named hive-0.7.1-bin. Set the environment
variable HIVE_HOME to point to the installation directory.
$ cd hive-0.7.1-bin
$ export HIVE_HOME=$PWD
To use hive command line interface (cli) from the shell:
$ bin/hive
You are done with hive installation once you see the hive command shell . It is as shown
below:
hive>
2. Hive usage with examples

2.1 DDL Operations
Copyright 2012 nagarjuna@outlook.com
nagarjuna@outlook.com
Creating tables:
In Hive, tables are created by the following command
hive>create table <tablename>(list_of_attributes_with_their_types);
Consider the example as shown:
Suppose we have details about Key performance indicator(KPI) on service fulfillment. Then we
create a hive table as shown below:
hive> create table kpi(id int,fulfillment_duration int,time_unit int,order_fulfillment_startdate
string,order_fulfillment_enddate string,order_id bigint,order_type string,month string,year int)
row format delimited fields terminated by ',' stored as textfile;
Following output can be seen on the command shell
Displaying list of tables:

Tables can be listed by the following command:
hive> show tables;
Following output can be seen
Describing a table:
To see the description of any table in hive, we can use the following command:
hive>describe kpi;
We should be able to see the following output
Dropping table:
Table can be dropped by usingthe following command.
hive> drop table <tablename>;
2.2 DML Operations

Loading data from flat files:
Suppose we have a file with the details regarding KPI, say
KPI_SERVICEFULFILLMENTDURATION.csv.
Use the following command to loading data into hive table from flat file
hive> load data local inpath /home/hadoop/KPI_SERVICEFULFILLMENTDURATION.csv
overwrite into table kpi;
Following output can be seen
Loading data from hdfs file:

If we want to load data into a hive table from hdfs, then we must give the following command to
load data. Say we have KPI_SERVICEFULFILLMENTDURATION.csv in a directory named
in in hdfs. Then use the following command.
hive> load data inpath in/KPI_SERVICEFULFILLMENTDURATION.csv overwrite into table
kpi;
The following output can be seen
2.3 SQL Operations

SELECT
Selecting tuples in a hive table is same as SQL. It is as shown.
hive>select * from kpi where order_fulfillment_startdate=10-MAY-11;
The following output can be seen
The results are not stored anywhere, but are displayed on the console.
AGGREGATIONS
SUM:
Summations based on some condition can be done as shown.
hive>select sum(fulfillment_duration) from kpi where year=2011;
The output can be seen as follows.
COUNT:
Consider the following examples to count the number of rows in the table.
hive>select count(*) from kpi;
MIN, MAX & AVG:

The examples for each are as follows.

1. MIN
Finding minimum of the fulfillment duration.
2. MAX
Finding maximum of the fulfillment duration.
3. AVG
Finding the average of fulfillment duration.
GROUP BY
The GROUP BY statement is used in conjunction with the aggregate functions to group the
result-set by one or more columns. Example is as shown.
hive>select month,count(*) from kpi where month='APR' and year=2011 group by month;
The output can be seen as follows.
JOINS
Create two tables persons & orders in hive as shown:
hive>create table persons(pid int,lname string,fname string,address string,city string) row format
delimited fields terminated by ',' stored as textfile;
hive>create table orders(oid int,orderno bigint,pid int) row format delimited fields terminated by
',' stored as textfile;
The contents in both tables are .
The join operation based on pid on both the tables is as shown.


Advanced Training - Hive Installation and Usage

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Training - Hive Installation and Usage

Uploaded by

Copyright:

Available Formats

nagarjuna@outlook.

HIVE INSTALLATION PROCEDURE

1.2 Installing hive:

2. Hive usage with examples

Copyright 2012 nagarjuna@outlook.com

Displaying list of tables:

Copyright 2012 nagarjuna@outlook.com

2.2 DML Operations

Loading data from hdfs file:

2.3 SQL Operations

Copyright 2012 nagarjuna@outlook.com

Copyright 2012 nagarjuna@outlook.com

MIN, MAX & AVG:

The examples for each are as follows.

Copyright 2012 nagarjuna@outlook.com

The join operation based on pid on both the tables is as shown.

Copyright 2012 nagarjuna@outlook.com

You might also like