Professional Documents
Culture Documents
http://dak1n1.com/blog/9-hadoop-el6-install
HOME
BLOG
ABOUT
Search...
Share
Tweet
This guide contains everything you need to get a basic Hadoop cluster up and
running. It is intended as a condensed and easy-to-understand supplement to
the official documentation, so lengthy descriptions are omitted. For the full
documentation, see Cloudera's Install Guide.
More Reading
3 Million and Beyond Creating Scalable Web
Clusters with LVS
ATI Radeon HD fglrx driver
install on Fedora 17
Automating GIS Metadata
Whether you want to start with a basic two-node cluster, or add hundreds or even thousands of nodes, the concepts
Conversion
here apply. Adding nodes can be done at any time without interrupting the cluster's workflow, so as long as you have
Building a Load-Balancing
About hardware:
Hadoop was designed to be used on commodity hardware, so you won't need anything special for this project. You'll
Cloudera Hadoop
RHEL/CentOS 6 Install
Guide
want 2 or more reasonably fast, modern servers. (I'm using 9 SuperMicro boxes I happened to have lying around).
Mine have dual Xeon processors, 48G RAM, 4 x 7200 RPM SATA disks in JBOD mode. JBOD is recomended above
RAID for Hadoop, since Hadoop has its own built-in redundancy which performs better with plain disks.
0
Unlike High-Availablilty clustering, an HPC cluster like Hadoop does not require any special fencing hardware. It
handles hardware failure by simply not giving jobs to broken/misbehaving nodes. If a node fails a certain number of
jobs, it's out.
In a Hadoop cluster, the only type of hardware failure that would cause any noticeable disruption is the possible
Affiliates
failure of the NameNode. This is why it's always good to have a Secondary NameNode on standby.
1 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
Here are a few core concepts that will help you understand what you're about to build. For a very small cluster (let's
say, 9 nodes or less), your cluster will consist of these types of nodes:
1. Head node. Runs the NameNode service and JobTracker service.
2. Worker nodes. All other nodes in the cluster will run DataNode and TaskTracker services.
A larger cluster is almost identical to this, but generally they use a separate machine for the JobTracker service. It's
also common to add a Secondary NameNode for redundancy. So in that scenerio you'd have:
1. NameNode machine.
2. Secondary NameNode machine.
3. Jobtracker machine.
4. Worker machines, each running DataNode + TaskTracker services.
A brief definition of these components:
NameNode: Stores all metadata for the HDFS filesystem.
DataNodes: Worker nodes that store and retrieve data when told to (by clients or the NameNode).
TaskTrackers: Runs tasks and sends progress reports to the JobTracker.
JobTracker: Coordinates all jobs and schedules them to run on TaskTracker nodes.
HDFS: Hadoop Distributed File System. An HDFS cluster consists of a NameNode + DataNodes. All Hadoop IO
happens through this. Built for storing very large files across many machines.
Hadoop Installation
Now that you have a little background on this software, we can begin installing. The first thing you'll need is Java
JDK 1.6 u8 or higher. You might also want to use a tool like clusterssh to ssh into all your nodes at once to perform
this installation.
Disable SELinux
setenforce 0
vim /etc/sysconfig/selinux
SELINUX=disabled
2 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm
yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.MyCluster
alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.MyCluster 50
alternatives --set hadoop-0.20-conf /etc/hadoop-0.20/conf.MyCluster
mkdir -p /mnt/hdfs/{1..4}
Add the new disks to /etc/fstab, ensuring that they're mounted with noatime. (This prevents reads from turning into
unnecessary writes, which is generally good for performance.)
vim /etc/fstab
3 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
mount /mnt/hdfs/1
mount /mnt/hdfs/2
mount /mnt/hdfs/3
mount /mnt/hdfs/4
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://HEADNODE:54310</value>
</property>
4 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
</configuration>
/etc/hadoop-0.20/conf.MyCluster/hdfs-site.xml
This is where we tell Hadoop to use the directories we created earlier. It specifies local storage on each node, used
by the DataNodes and NameNode services to store HDFS data.
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/mnt/hdfs/1/namenode,/mnt/hdfs/2/namenode,/mnt/hdfs/3/namenode,/mnt/hdfs/4/namenode</v
</property>
<property>
<name>dfs.data.dir</name>
<value>/mnt/hdfs/1/datanode,/mnt/hdfs/2/datanode,/mnt/hdfs/3/datanode,/mnt/hdfs/4/datanode</v
</property>
</configuration>
/etc/hadoop-0.20/conf.MyCluster/mapred-site.xml
Specify the JobTracker here, along with all the local directories for writing map/reduce (job-related) data. This is
used by the TaskTracker service on all the worker nodes. Change HEADNODE to the name of your machine that
runs the JobTracker service. (In a small cluster, this machine is the same one that runs the NameNode service.)
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://HEADNODE:54311</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/mnt/hdfs/1/mapred,/mnt/hdfs/2/mapred,/mnt/hdfs/3/mapred,/mnt/hdfs/4/mapred</value>
</property>
5 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
We can see from the output above that the NameNode has been successfully formatted. But nothing is running yet,
so let's start up our NameNode service on the head node. To start up services, first we'll need to fix permissions on
each node.
6 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
Give it a couple seconds to start up the HDFS filesystem. The nodes will connect, and the local storage of each
node will be added to the collective HDFS filesystem. Now we can create core directories.
You now have a fully-functional Hadoop cluster up and running! Check the cluster status on your local Hadoop
status pages:
http://localhost:50070
http://localhost:50030
7 of 11
12/05/2014 02:17 PM
Comments (13)
http://dak1n1.com/blog/9-hadoop-el6-install
Login
+1
+2
Fixed! Thanks.
Reply
Hi,
For apache hadoop-2.0.0-alpha installation on two linux machines, what should be values of fs.defaultFS and
dfs.name.dir and dfs.data.dir properties on both name nodes????
one machine hostname is rsi-nod-nsn1 and another one is rsi-nod-nsn2...
i want to make both as federated namenodes.. and both should be used as datanodes too..
i want to configure both federation anf YARN.
what should be configuration changes for the same? i am not finding masters, mapred-site.xml, and hadoopenv.sh files in hadoopHome/etc/hadoop folder... how do i make changes for these files?
regards,
rashmi
Reply
+1
hadoop-2.0.0-alpha ... that doesn't sound like Cloudera Hadoop. Apache Hadoop works differently and isn't
covered in this guide.
The core configuration options are listed above, so that covers 'fs.default.name' and 'dfs.data.dir'. (Though that
8 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
was for version 0.22... it might be different in your version). I honestly don't use Hadoop anymore, so I don't
know.
But, the manual will show you all available configuration options, so that could be handy:
http://hadoop.apache.org/hdfs/docs/current/hdfs-d...
As far as the location of the configuration files, you can do:
rpm -qa |grep hadoop # find the package name
rpm -ql hadoop-package-name --configfiles
Here's an example of that, using 'httpd' as the package name:
[dakini@nibbana ~]$ sudo rpm -ql httpd --configfiles
/etc/httpd/conf.d/welcome.conf
/etc/httpd/conf/httpd.conf
/etc/httpd/conf/magic
Reply
Since I'm a raging Hadummy, is there a more detailed guide on how to partition each of the nodes? Following the
steps above leads to pain, suffering and errors. (Specifically: "special device /dev/sdd1 does not exist")
1 reply active 95 weeks ago
Reply
oh no, it's very dangerous to copy/paste commands like that from the internet, unless you fully understand
what they do. '/dev/sdb' refers to a disk that you don't have, which means you're trying to run a command that
works on someone else's hardware, but not on yours.
When partitioning and mounting, you have to look at your particular hardware, and adjust the commands
accordingly. Otherwise you might find yourself destroying your data!
'sudo fdisk -l' will show you all the disks you have. I suggest googling around for a partitioning guide, since it's
important to learn what this stuff does before attempting to run it.
Reply
9 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
How I am getting some file permissions issues. I installed as root but my /mnt/hdfs etc (I created with same
names as you) are
drwxr-xr-x 5 hdfs hadoop 4096 Apr 20 19:15 1
Would you recommend adding root to hadoop group and changing -R 775?
Reply
+1
No, I wouldn't add root to any groups, because the hadoop services don't run as root. They run as 'hdfs' and
'mapred'.
See the permissions section above. It worked for me every time during my hadoop installs 1+ years ago.
# hdfs user must own the /mnt/hdfs directory
chown -R hdfs:hadoop /mnt/hdfs/
# mapred user must own the mapred directories
chown -R mapred:hadoop /mnt/hdfs/{1,2,3,4}/mapred
10 of 11
12/05/2014 02:17 PM
http://dak1n1.com/blog/9-hadoop-el6-install
Though maybe you're talking about the HDFS filesystem itself having permissions issues. If that's the case,
read the section labeled "First-time HDFS use: create core directories".
Otherwise maybe check your logs for a more detailed error message and see the install manual for your
Hadoop version. It's possible that things may have changed, since this guide is a year old.
Reply
Website (optional)
Subscribe to
None
Submit Comment
Category: Blog
me@dak1n1.com
RSS Feed
11 of 11
12/05/2014 02:17 PM