You are on page 1of 1

B IG D ATA A DMINISTRATOR T RAINING FOR A PACHE H ADOOP

COURSE OVERVIEW
The Apache Hadoop software library is a framework that allows for the distributed processing of large
data sets across clusters of computers using simple programming models. Clouderas open source big
data platform is the most widely adopted in the world that offers the industry's highest quality technical
support for Apache Hadoop to easily install, configure and manage Hadoop cluster.
This 4-day course provides students with a comprehensive understanding of all the steps necessary to
operate and maintain a Hadoop cluster. After completing this course, students will be able to install
Hadoop, MapReduce, Hive, Impala, and Pig, perform initial HDFS configuration, configure HDFS high
availability, securing a Hadoop cluster with Kerberos, and maintain & monitor Hadoop cluster.

Duration: 4 days

WHO SHOULD ATTEND

PREREQUISITES

System administrators who will be setting up or


maintaining a Hadoop cluster

Some basic knowledge of Linux operating


systems is strongly recommended.

COURSE CONTENT
THE CASE FOR APACHE HADOOP

Why Hadoop? Fundamental Concepts Core Hadoop


Components
HDFS

HDFS Features Writing and Reading Files


NameNode Memory Considerations Overview of
HDFS Security Using the Namenode Web UI Using
the Hadoop File Shell
GETTING DATA INTO HDFS

Ingesting Data from External Sources with Flume


Ingesting Data from Relational Databases with Sqoop
REST Interfaces Best Practices for Importing Data
MAPREDUCE

What Is MapReduce? Feature of MapReduce Basic


Concepts Architectural Overview MapReduce
Version 2 Failure Recovery Using the Job Tracker
Web UI
PLANNING YOUR HADOOP CLUSTER

General Planning Considerations Choosing the Right


Hardware Network Considerations Configuring
Nodes Planning for Cluster Management
HADOOP INSTALLATION AND INITIAL CONFIGURATION

Deployment Types Installing Hadoop Specifying the


Hadoop Configuration Performing Initial HDFS
Configuration Performing Initial MapReduce
Configuration Hadoop Logging
INSTALLING AND CONFIGURING HIVE, IMPALA, & PIG

CLOUDERA MANAGER

The Motivation for Cloudera Manager Cloudera


Manager Features Standard and Enterprise Versions
Cloudera Manager Topology Installing Cloudera
Manager Installing Hadoop Using Cloudera Manager
Performing Basic Administration Tasks Using
Cloudera Manager
ADVANCED CLUSTER CONFIGURATION

Advanced Configuration Parameters Configuring


Hadoop Ports Explicitly Including and Excluding
Hosts Configuring HDFS for Rack Awareness
Configuring HDFS High Availability
HADOOP SECURITY

Why Hadoop Security Is Important Hadoops


Security System Concepts What Kerberos Is and How
it Works Securing a Hadoop Cluster with Kerberos
MANAGING AND SCHEDULING JOBS

Managing Running Jobs Scheduling Hadoop Jobs


Configuring the FairScheduler
CLUSTER MAINTENANCE

Checking HDFS Status Copying Data between


Clusters Adding and Removing Cluster Nodes
Rebalancing the Cluster Cluster Upgrading
CLUSTER MONITORING AND TROUBLESHOOTING

General System Monitoring Monitoring Hadoop


Clusters Troubleshooting Hadoop Clusters Common
Misconfigurations Common Misconfigurations

Hive Impala Pig


HADOOP CLIENTS

What are Hadoop Clients? Installing and Configuring


Hadoop Clients Installing and Configuring Hue Hue
Authentication and Authorization

SUGGESTED NEXT COURSE

Cloudera Manager Training for Apache Hadoop


Linux Administration and Security
Big Data Developer for Apache Hadoop

You might also like