You are on page 1of 10

nagarjuna@outlook.

com

HIVE INSTALLATION PROCEDURE


Start by downloading the most recent stable release of Hive from one of the Apache download
mirrors. Download the latest version here http://www.apache.org/dyn/closer.cgi/hive/. We would
be using hive-0.7.1 version of hive distribution.

1. Installation:
1.1 Requirements:
a. Sun JDK 1.6.x must be installed.
b. Hadoop-0.20.x distribution must be installed.

1.2 Installing hive:


First, you need to unpack the tarball by the following command.
$ tar -xzvf hive-0.7.1-bin.tar.gz
The above command would create a directory named hive-0.7.1-bin. Set the environment
variable HIVE_HOME to point to the installation directory.
$ cd hive-0.7.1-bin
$ export HIVE_HOME=$PWD
To use hive command line interface (cli) from the shell:
$ bin/hive
You are done with hive installation once you see the hive command shell . It is as shown
below:
hive>

2. Hive usage with examples


2.1 DDL Operations

Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

Creating tables:
In Hive, tables are created by the following command
hive>create table <tablename>(list_of_attributes_with_their_types);
Consider the example as shown:
Suppose we have details about Key performance indicator(KPI) on service fulfillment. Then we
create a hive table as shown below:
hive> create table kpi(id int,fulfillment_duration int,time_unit int,order_fulfillment_startdate
string,order_fulfillment_enddate string,order_id bigint,order_type string,month string,year int)
row format delimited fields terminated by ',' stored as textfile;
Following output can be seen on the command shell

Displaying list of tables:


Tables can be listed by the following command:
hive> show tables;
Following output can be seen

Describing a table:
To see the description of any table in hive, we can use the following command:
hive>describe kpi;
We should be able to see the following output

Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

Dropping table:
Table can be dropped by usingthe following command.
hive> drop table <tablename>;

2.2 DML Operations


Loading data from flat files:
Suppose we have a file with the details regarding KPI, say
KPI_SERVICEFULFILLMENTDURATION.csv.
Use the following command to loading data into hive table from flat file
hive> load data local inpath /home/hadoop/KPI_SERVICEFULFILLMENTDURATION.csv
overwrite into table kpi;
Following output can be seen

Loading data from hdfs file:


Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

If we want to load data into a hive table from hdfs, then we must give the following command to
load data. Say we have KPI_SERVICEFULFILLMENTDURATION.csv in a directory named
in in hdfs. Then use the following command.
hive> load data inpath in/KPI_SERVICEFULFILLMENTDURATION.csv overwrite into table
kpi;
The following output can be seen

2.3 SQL Operations


SELECT
Selecting tuples in a hive table is same as SQL. It is as shown.
hive>select * from kpi where order_fulfillment_startdate=10-MAY-11;
The following output can be seen

Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

The results are not stored anywhere, but are displayed on the console.

AGGREGATIONS
SUM:
Summations based on some condition can be done as shown.
hive>select sum(fulfillment_duration) from kpi where year=2011;
The output can be seen as follows.

Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

COUNT:
Consider the following examples to count the number of rows in the table.
hive>select count(*) from kpi;

MIN, MAX & AVG:


Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

The examples for each are as follows.


1. MIN
Finding minimum of the fulfillment duration.

2. MAX
Finding maximum of the fulfillment duration.

3. AVG
Finding the average of fulfillment duration.
Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

GROUP BY
The GROUP BY statement is used in conjunction with the aggregate functions to group the
result-set by one or more columns. Example is as shown.
hive>select month,count(*) from kpi where month='APR' and year=2011 group by month;
The output can be seen as follows.

Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

JOINS
Create two tables persons & orders in hive as shown:
hive>create table persons(pid int,lname string,fname string,address string,city string) row format
delimited fields terminated by ',' stored as textfile;
hive>create table orders(oid int,orderno bigint,pid int) row format delimited fields terminated by
',' stored as textfile;
The contents in both tables are .

The join operation based on pid on both the tables is as shown.


Copyright 2012 nagarjuna@outlook.com

nagarjuna@outlook.com

Copyright 2012 nagarjuna@outlook.com

You might also like