You are on page 1of 3

Importing a MySQL table to HDFS

This scenario illustrates how to use tSqoopImport to import a MySQL table to a given HDFS
system.

The sample data to be used in this scenario reads as follows:

id,wage,mod_date

0,2000,2008-06-26 04:25:59

1,2300,2011-06-12 05:29:45

2,2500,2007-01-15 11:59:13

3,3000,2010-05-02 15:34:05

The data is stored in a MySQL table called sqoopmerge.

Before starting to replicate this scenario, ensure that you have appropriate rights and
permissions to access the Hadoop distribution to be used. Then proceed as follows:

Dropping the component


1. In the Integration perspective of the Studio, create an empty Job from the Job Designs
node in the Repository tree view.
2. For further information about how to create a Job, see the Talend Studio User Guide.
3. Drop tSqoopImport onto the workspace.

Importing the MySQL table

Configuring tSqoopImport

1. Double-click tSqoopImport to open its Component view.


2.
3. In the Mode area, select Use Java API.
4. In the Version area, select the Hadoop distribution to be used and its version. If you
cannot find from the list the distribution corresponding to yours, select Custom so as to
connect to a Hadoop distribution not officially supported in the Studio.
5. For a step-by-step example about how to use this Custom option, see Connecting to a
custom Hadoop distribution.
6. In the NameNode URI field, enter the location of the master node, the NameNode, of the
distribution to be used. For example, hdfs://talend-cdh4-namenode:8020.
7. In the JobTracker Host field, enter the location of the JobTracker of your distribution.
For example, talend-cdh4-namenode:8021.
8. Note that the notion Job in this term JobTracker designates the MR or the MapReduce
jobs described in Apache's documentation on http://hadoop.apache.org/.
9. If the distribution to be used requires Kerberos authentication, select the Use Kerberos
authentication check box and complete the authentication details. Otherwise, leave this
check box clear.
10. If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate.
A keytab file contains pairs of Kerberos principals and encrypted keys. You need to enter
the principal to be used in the Principal field and the access path to the keytab file itself
in the Keytab field.
11. Note that the user that executes a keytab-enabled Job is not necessarily the one a
principal designates but must have the right to read the keytab file being used. For
example, the user name you are using to execute a Job is user1 and the principal to be
used is guest; in this situation, ensure that user1 has the right to read the keytab file to be
used.
12. In the Connection field, enter the URI of the MySQL database where the source table is
stored. For example, jdbc:mysql://10.42.10.13/mysql.
13. In Username and Password, enter the authentication information.
14. Under the Driver JAR table, click the [+] button to add one row, then in this row, click the
[...] button to display the drop-down list and select the jar file to be used from that list. In
this scenario, it is mysql-connector-java-5.1.30-bin.jar.
15. If the [...] button does not appear, click anywhere in this row to make it displayed.
16. In the Table Name field, enter the name of the source table. In this scenario, it is
sqoopmerge.
17. From the File format list, select the format that corresponds to the data to be used,
textfile in this scenario.
18. Select the Specify target dir check box and enter the directory where you need to import
the data to. For example, /user/ychen/target_old.

Executing the Job

Then you can press F6 to run this Job.

Once done, you can verify the results in the target directory you have specified, in the web
console of the Hadoop distribution used.

You might also like