Professional Documents
Culture Documents
com
1. Installation:
1.1 Requirements:
a. Sun JDK 1.6.x must be installed.
b. Hadoop-0.20.x distribution must be installed.
nagarjuna@outlook.com
Creating tables:
In Hive, tables are created by the following command
hive>create table <tablename>(list_of_attributes_with_their_types);
Consider the example as shown:
Suppose we have details about Key performance indicator(KPI) on service fulfillment. Then we
create a hive table as shown below:
hive> create table kpi(id int,fulfillment_duration int,time_unit int,order_fulfillment_startdate
string,order_fulfillment_enddate string,order_id bigint,order_type string,month string,year int)
row format delimited fields terminated by ',' stored as textfile;
Following output can be seen on the command shell
Describing a table:
To see the description of any table in hive, we can use the following command:
hive>describe kpi;
We should be able to see the following output
nagarjuna@outlook.com
Dropping table:
Table can be dropped by usingthe following command.
hive> drop table <tablename>;
nagarjuna@outlook.com
If we want to load data into a hive table from hdfs, then we must give the following command to
load data. Say we have KPI_SERVICEFULFILLMENTDURATION.csv in a directory named
in in hdfs. Then use the following command.
hive> load data inpath in/KPI_SERVICEFULFILLMENTDURATION.csv overwrite into table
kpi;
The following output can be seen
nagarjuna@outlook.com
The results are not stored anywhere, but are displayed on the console.
AGGREGATIONS
SUM:
Summations based on some condition can be done as shown.
hive>select sum(fulfillment_duration) from kpi where year=2011;
The output can be seen as follows.
nagarjuna@outlook.com
COUNT:
Consider the following examples to count the number of rows in the table.
hive>select count(*) from kpi;
nagarjuna@outlook.com
2. MAX
Finding maximum of the fulfillment duration.
3. AVG
Finding the average of fulfillment duration.
Copyright 2012 nagarjuna@outlook.com
nagarjuna@outlook.com
GROUP BY
The GROUP BY statement is used in conjunction with the aggregate functions to group the
result-set by one or more columns. Example is as shown.
hive>select month,count(*) from kpi where month='APR' and year=2011 group by month;
The output can be seen as follows.
nagarjuna@outlook.com
JOINS
Create two tables persons & orders in hive as shown:
hive>create table persons(pid int,lname string,fname string,address string,city string) row format
delimited fields terminated by ',' stored as textfile;
hive>create table orders(oid int,orderno bigint,pid int) row format delimited fields terminated by
',' stored as textfile;
The contents in both tables are .
nagarjuna@outlook.com