Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
Importing data into Hbase by Sqoop :
The following steps will describe how to import the data from RDBMS to Hadoop Real Time Database Hbase.
Create the Table in Hbase :
create ‘customer_hbase’ , ‘contactid’
# Here ‘customer_hbase’ is a table name in Hbase and ‘contactid’ is a column family.
MySql Table :
In Mysql which table you want to import to Hbase.
# customer table in mysql.
Sqoop Command to Import :
First you should to create a table in Hbase with respected column family. Unlike importing into Hive, Sqoop does not use a default table name when importing into HBase. Rather, you have to specify a valid table name with the –hbase-table parameter.
To insert data into HBase there are three mandatory parameters: the table name, a
column family name within the table, and the id of the row into which you are inserting data. Sqoop uses one table and one column family per import job, so you have to specify them using the –hbase-table and –column-family parameters on the command line.
Sqoop Command :
sqoop import \
–connect jdbc:mysql://localhost/retail_db \
–username root \
–password password \
–table CUSTOMER \
–hbase-table customer_hbase \
–column-family contactid \
–hbase-row-key contactid \
To identify each individual row in HBase, Sqoop defaults to the column name specified in the –split-by parameter or the column that was automatically identified to serve
this purpose (usually the primary key of the table). You can override this behavior using
the –hbase-row-key parameter.
Final View In Hbase :
# scan ‘customer_hbase’