You are on page 1of 4

Experiments on Networking of Hadoop

Abdul Navaz
University of Houston
fabdulnavaz@uh.edu

Gandhimathi Velusamy
University of Houston
gvelusamy@uh.edu

AbstractHadoop is a popular application to analyze and


process big data problems in a networked distributed system of
computers. Investigations of performance for application-aware
networking have been of interest with the software-defined
networking paradigm through on-demand and dynamic policy
enforcements. Network usage characterization of Hadoop jobs can
further help understand what policy enforcements may be needed
during application use cases. At scale experimentation of Hadoop
jobs will help facilitate such characterizations. We report how
Hadoop networking usage can be characterized in an
experimentation environment using the GENI (Global
Environment for Network Innovation). Furthermore, we report a
distributed switch framework that may help alleviate the fault
tolerance schemes in Hadoop application in the forwarding plane.
Delay in recovery from failures has been reduced by almost 99%
through such a distributed switch architecture deployed on the
GENI experimentation environment.
Index TermsHadoop, Big Data, GENI, OpenFow, LINC,
Software Defined Networks (SDN)

I. INTRODUCTION
Hadoop framework plays very important role in processing
todays massive amount of data. By using available inexpensive
commodity servers Hadoop gives the power to process
Petabytes of data very efficiently. By using divide and conquer
principle and using HDFS, Hadoop makes it possible to process
massive data. The task of processing massive amount of data is
divided into subtasks to be performed on subset of data stored
in HDFS and the then combining the results together by means
of the two processes namely mapper and reducer. The Hadoop
cluster is formed by interconnecting several commodity servers
by means of reliable network.
The main contribution of our work is twofold. First, this work
shows the study and analysis of various network messages
exchanged between the nodes in Hadoop cluster that has been
built on GENI (Global environment for Network Innovation)
test bed. GENI is a virtual laboratory which provides computing
and networking resources for experimenters to run their
experiment at scale sponsored by NSF to encourage innovations
in networking, security, services and applications [1].
Second, a fault-tolerant switching system is used with the
Hadoop framework in an attempt to augment the fault tolerance
given by the Hadoop frame work to its applications by adding
resiliency to the underlying network used to connect the nodes
on the cluster. Hadoop provides fault-tolerance to its
applications by replicating data on multiple nodes and
exchanging heart beat messages between Name Node (NN) and
Data Nodes (DN) to ensure the aliveness of DNs. When a
switch which connects a DN with the NN fails, NN treats it as
978-1-4799-6204-4/14/$31.00 2014 IEEE

Deniz Gurkan
University of Houston
dgurkan@uh.edu

DN failure as it cannot receive the heart beat messages from it.


And waits for certain threshold time interval and uses the
available local data and recalculates the mapping data if the DN
has one. So job can be done with the cost of time. We are
proposing a method for making the switches resilient to failure
to improve the performance.
The paper is organized as follows. In Section II we described
the brief overview of Hadoop Architecture, In Section III, we
described the Experimental set up and the network
characterization of Hadoop, A fault-tolerant switching system
with Hadoop is explained in Section IV. Finally, Section V
concludes this paper.
II. OVERVIEW OF HADOOP ARCHITECTURE
HDFS is a distributed file system that stores and manages the
data in a Hadoop cluster. Hadoop uses a computing framework
called MapReduce for processing huge data sets. To provide
reliability to data, Hadoop uses data replication where the data
is replicated among several data nodes.

Fig. 1. HDFS Architecture

A Hadoop cluster contains a single NN and several DNs. The


NN is a Master node, which keeps track of all metadata. It holds
the list of all blocks in an HDFS file and list of DNs where the
block resides. DNs are just the commodity hardware and they
work independently from each other and store the data block
[4].
HDFS Architecture is shown in figure 1. Map-Reduce
framework consists of two phases Map and Reduce, each phase
generate key value pairs. The nodes in the Hadoop cluster nodes
are categorized as Job Tracker and Task Tracker. Job tracker
controls the all the jobs in the cluster. Task Tracker accepts
tasks such as map, reduce and shuffle from the Job Tracker the
sends back progress of the task to the Job Tracker.

III. STUDY OF NETWORK CHARACTERISTIC OF


HADOOP
To study the network characteristic of Hadoop we have created
a topology on GENI test bed with 4 nodes (raw PC), all belong
to same rack from Emulab aggregate and each running Hadoop
version 1.1.2. The NNs configuration is 2GB of RAM and
2.2GHz CPU whereas 3 DNs configuration is 256MB of RAM
and 600MHz of CPU. All the nodes run Ubuntu 12.04LTS. The
number of block replicas is set to 3. Each data block size is set
as default value of 128MB. Hadoop cluster on GENI testbed is
set up as shown in figure 2.

Fig. 2. Hadoop cluster setup on GENI Test bed

We have started our experiment by downloading the text file


which is of 1486 bytes and stored in HDFS. Since the
replication factor is set to 3, the data is replicated to all the 3
DNs in the cluster.
In Hadoop, the storage and processing layer are completely
independent. For instance the reading/writing from/to DNs will
not be affected when Task Tracker process is absent provided
DN daemon is running. In the same way when DN daemon
fails, Task Tracker daemon is available for service. There are
two types of heartbeats: Data Node heartbeat and Task Tracker
heartbeat exchanged between DNs and NN. Data Node
heartbeat is sent from DN daemons to NN daemon and Task
Tracker heartbeat is sent from Task Tracker (Node Manager in
Hadoop2) daemons to Job Tracker (Resource Manager in
Hadoop2). Both types of heartbeats are used as channels for
messages. The Data Node heartbeat is sent by the DN to the
NN to indicate the aliveness of DN, available memory, and
status of the running task etc. The NN issues commands to DN
to replicate data blocks, delete corrupted blocks etc., as part of
the Heartbeat Response. We have captured the network
messages using Wireshark and found that DN daemon sends
heartbeat to NN for every 3 seconds. The TCP flow graph is
shown in fig 3. Each heartbeat is acknowledged by NN and it
uses TCP port 54310 as configured in the core-site.xml file. The
flow graph of heartbeat messages is shown in figure3.
The Task Tracker heartbeats tell the Job Tracker that a Task
Tracker is alive. As a part of the heartbeat, a Task Tracker will
indicate its readiness to run a new task, or failed tasks, or the
status of running task. When the Job Tracker is informed about
the readiness of Task Tracker, the Job Tracker consults the
Scheduler to assign tasks to the Task Tracker and sends the list
of tasks as part of the Heartbeat Response to the Task Tracker.
Task Tracker daemon sends heartbeat to job tracker which is
running on name node every 0.3 seconds. It uses TCP port
54311 as configured in the mapred-site.xml file. Also for every
10 heartbeats, the DN sends its complete block information to
the NN. The NN receives the heartbeats from all three DNs.

Fig. 3. Heartbeat messages between Data Node and Name Node is captured
using Wireshark

The NN holds all file system metadata and it knows where


the file is located and what block makes up the file. Before
copying the data into first available Data Node (DN1 in our
case), the NN establishes three way handshake with DN1. It
uses TCP port 50010. Once the TCP handshake is established
the NN shares DN2s and DNe3s rack information including
IP addresses of those Data Nodes to DN1, and tells the DN1 to
replicate the Data into DN2 and DN3 using TCP 50010. Upon
receiving this information, DN1 sends the rack information of
the remaining nodes, which are DN3 to DN2, using TCP port
50010. In this way a pipeline is created among Client DN1-DN2 -- DN3. Now data transfer begins from client, which is
running on NN, to DN1. As soon as the DN1 receives the block
of data, it pushes the data into DN2 and DN2 pushes the data
into DN3. When the three nodes successfully received the data
block, the DN3 first sends a block received message to NN
followed by DN2 and DN1. It uses TCP port 54310. Finally,
connection is closed by DN3 to DN2 followed by DN2 to DN1
and DN1 with client (TCP port 50010). The complete HDFS
write process is represented in the figure 4.
Here we ran a Word count program on the loaded file myfile.
Job Tracker delivers the java code to Task tracker running on
the DNs to execute the Map computation. Task Tracker
monitors the progress by sending Task Tracker heartbeats. The
result of the map task is called intermediate data which will be
the input to Reducer as key-value pairs in the shuffle phase.
Hadoop framework uses following protocols for the
communication between the nodes in cluster
Name Node Protocol: Used by secondary Name Node
to communicate with Name Node.
Data Node Protocol: Used by DNs to communicate
with NN.
Inter Data Node Protocol: Used to communicate
between Data Nodes
Client protocol: Used by the client to communicate
with NN.
Data Transfer protocol: Data Transfer between Client
and Data Nodes.
We found that the time taken to complete the map task by the
Mapper is four seconds whereas, the shuffle phase took seven
seconds and reducer took 2secs by using history command.
We also repeated our experiment with a file size of 500MB. The
result now is three minutes for the Mapper and one minute for
the Reducer. Then we congested a link with other traffic such
as iperf, as a result, the time taken by the reducer is increased
to 13 minutes.

inserted by the controller. The flow entries consists of header,


action and counters where header specifies the packet features
that will be used to match against incoming packets; action will
be the task to be performed on the packets that satisfies the
matching criteria; counters are used to maintain the statistics of
the flow[7]. The OpenFlow Switches are made resilient to
failure by instantiating the OpenFlow switch LINC on
redundant set of multiple nodes and by using fail-over and
takeover mechanisms provided by Erlang [9]. Failover is a
mechanism of restarting an application on a different node than
where it was failed to run and Takeover is a mechanism of
stopping the running application on a lower priority node when
the application is restarted from a higher priority node. The
distributed LINC architecture is depicted in figure 5.

Fig. 5. LINC-Switches as Distributed Switch to connect nodes of a Hadoop


cluster
Fig 4. Flow diagram of HDFS Write Process

A huge volume of shuffling traffic consumes a large amount of


bandwidth. If Hadoop traffic, especially shuffling traffic, is
given high priority using QoS capability provided by OpenFlow
and SDN then job completion time will be reduced [11]. Also
using the controller, a better centralized control over the
network could be exercised.
BASS [2] is used to combine Hadoop with SDN to reduce the
job completion time by assigning the task on local node or
remote node based on availability. The volume of shuffling
traffic between Map and Reduce task is decided by the size of
intermediate data generated by the mapper, which is not
available until Map task is finished. By monitoring the progress
of traffic volume observed on different task JT can predicts the
shuffling traffic demand of map task before they finished [3].
IV. FAULT-TOLERANCE LINC-SWITCH
ARCHITECTURE
A fault-tolerant switching system which was proposed in [9]
using a complete OpenFlow and software switch called LINCSwitch using the distribution properties of Erlang is tested with
the Hadoop framework. The distributed switching architecture
consists of two/three redundant LINC-Switches and uses failover and takeover mechanism provided by Erlang Open
Telecom Platform (OTP). LINC-Switch [5] is an OpenFlow
switch that supports OpenFlow specification 1.2 and 1.3 and
was written in Erlang. The main function of a regular switch in
a network of computers is to forward incoming packets into
output ports based on the information from header fields and
the input port on which it has arrived at the switch. In an
OpenFlow switch, the control plane is separated from the
switch and handled by a separate application called controller
[6]. The switch will now taking care of forwarding the packets
based on the flow entries from the flow table which will be

The switch will be started from three nodes say l1, l2 and l3 but
will be running from only one node say l1 which is the primary
node designated by the kernel parameter. If the running switch
from l1 fails or the node failed or shout down then switch
functionality will be resuming from l2 (failover) which is
having next precedence and then if l2 fails, the switch will be
running from l3.While l3 is running and if l1 and l2 are restarted
then the switch at l3 will exit and stop running and starts
running from l1 as it is having highest precedence (takeover).
Experimental setup of Hadoop with Fault-tolerant switching
system:
We have tested the distributed switching architecture with a
simple two node Hadoop cluster with one Master node and one
Slave node. The switch used to connect the Master and Slave is
replaced by the fault-tolerant switching system comprised of
three redundant LINC-Switches.
Figure 6 shows the topology built on GENI for our
experiment. The Master and Slave nodes are PC s with Ubuntu
12.04 OS from Clemson Insta Geni rack. The distributed
switching system is implemented by installing OpenFlow
switch LINC as a distributed application on the nodes l1, l2 and
l3 which are Virtual Machines with Ubuntu 12.04 OS. The
OpenFlow controller is running from the VM named contro.
The Master and Slave are connected to the distributed switching
architecture by LINC-Switches running from the VMs named
ovs1 and ovs2. They are connected with the controllers run from
the same local machines which modifies the flow entries
whenever Master and Slave are connected by a new switch
when failover or takeover happens. Thus the packet loss
happening during failover or takeover due to the mac-aging of
the hardware switch used normally will be avoided by the use
of LINC-Switch and an OpenFlow controller to connect Master
and Slave nodes to the Distributed switching system. Here
Hadoop-1.0.3 is used with the data block is 64 MB. The data

was split into 2 blocks and stored on both master and slave
nodes. The replication factor was 2 and both nodes contain
copies of both blocks.
The word count application is run on an Input of 75 MB text
file and the time taken to complete the job is recorded. It took 2
minutes to complete the job under normal condition. When we
made switch to fail 1 minute after the word count job has been
started at which approximately 50% of mapping was done, it
took around 900 seconds to finish the job as it has to recalculate
the map data with available replicated local data. The figure 8
depicts the graph of job completion times on Hadoop for 75 MB
of text data 1.under normal operating condition, 2.with switch
failure in the middle of the job and 3.with the impact of faulttolerant switching architecture in the event of switch failure in
the middle of the job. When the switch failed, the Master node
could not receive any heart beat signal from the slave node
(Task Tracker) and waits for 10 minutes before declaring it as
dead (in our case inaccessible) and the map outputs stored in
the non-accessible Task Trackers also be not available to the
reducers. Hadoop re-computes the map outputs when Job
Tracker receives enough notifications from the reducers about
the unavailability of map outputs from the disconnected Task
Tracker [10]. When we used fault-tolerant distributed switching
architecture between master and slave and made the switch to
fail, it took the same time as without switch failure because failover has happened and the switch run from the secondary node
as the Erlang OTP detects the failure of main switch. So with
the fault-tolerant switching system is used to connect nodes of
the Hadoop cluster, the resiliency provided by the Hadoop to its
applications will be more efficient.

Fig. 6. Master and Slave nodes of a Hadoop cluster are connected by


distributed switching system

V. CONCLUSION
We continue our research on Hadoop and especially on network
perspective to find how to use OpenFlow protocol and SDN to
improve the performance of Hadoop in completing the
submitted job. We intend to expand this work into better
utilization of the network as well as to enhance the faulttolerance given by the Hadoop to its applications by improving
the resiliency given by the network components used in larger
Hadoop clusters.

Fig. 7. Job completion times on Hadoop Word count at regular OpenFlow


Switch failure Vs Fault-tolerant OpenFlow Switch failure in the middle of the
job

VI. ACKNOWLEDGEMENTS
We thank the geni-user mailing list participants and all GENI
infrastructure support for their help in realizing our
experiments.
REFERENCES
[1] Berman, Mark, Jeffrey S. Chase, Lawrence Landweber, Akihiro
Nakao, Max Ott, Dipankar Raychaudhuri, Robert Ricci, and Ivan
Seskar. "GENI: a federated testbed for innovative network
experiments." Computer Networks 61 (2014): 5-23.
[2] Qin, Peng, Bin Dai, Benxiong Huang, and Guan Xu. "BandwidthAware Scheduling with SDN in Hadoop: A New Trend for Big
Data." arXiv preprint arXiv:1403.2800 (2014).
[3] Wang, Guohui, T. S. Ng, and Anees Shaikh. "Programming your
network at run-time for big data applications." In Proceedings of the
first workshop on Hot topics in software defined networks, pp. 103108. ACM, 2012.
[4] Tom White, Hadoop: The Definitive Guide, 3rd edn., OReilly,
Sebastopol, CA, 2011.
[5] http://www.opennetsummit.org/pdf/2013/research_track/poster_pa
pers/ons2013-final36.pdf
[6] McKeown, Nick, Tom Anderson, Hari Balakrishnan, Guru
Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and
Jonathan Turner. "OpenFlow: enabling innovation in campus
networks." ACM SIGCOMM Computer Communication Review
38, no. 2 (2008): 69-74.
[7] 7Pfaff, Ben, B. LANTZ, and B. HELLER. "OpenFlow switch
specification, version 1.3. 0." Open Networking Foundation (2012).
[8] Gandhimathi Velusamy, Deniz Gurkan, Sandhya Narayan and
Stuart Bailey, Fault-Tolerant OpenFlow based Software Switch
Architecture with LINC-Switches for a reliable Network Data
Exchange Research and Educational Experiment Workshop
(GREE), 2014 Third GENI. IEEE, 2014
[9] http://www.erlang.org/doc/design_principles/distributed_applicatio
ns.html
[10] Ng, Florin Dinu TS Eugene. "Analysis of Hadoops Performance
under Failures."Rice University.
[11] Narayan, Sandhya, Stuart Bailey, and Anand Daga. "Hadoop
acceleration in an OpenFlow-Based cluster." In High Performance
Computing, Networking, Storage and Analysis (SCC), 2012 SC
Companion:, pp. 535-538. IEEE, 2012.

You might also like