Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
Import MySQL table into hive and encode the values, which value is NULL in the table, also make sure that table file is partitioned in 3 files.
If u already have data in mysql directly goto STEP 4 or continue the following steps.
First create a table in MySQL and insert values into that table, and then import that table into HIVE using SQOOP and if the table contain any NULL values then, encode that NULL values and also split(Partitioned) the table into 3 files (like part-m-00000, part-m-00001, part-m-00002).
Step-1 : Create database in MySQL
create database sqoopdb;
Step-2 : Create table in sqoopdb;
Create table product_order_details (user_id int, order_id int, product_id varchar(20), order_amount bigint, order_delivery_date varchar(20), user_email_id varchar(20));
Step-3 : Insert values into product_order_details
Step-4 : Import table from MySQL to HIVE with the following conditions,
(i). Encode the values, which value is NULL in the table ,
(ii). Make sure that table file is partitioned in 3 files.
Step-5 : List the table product_order_details
hdfs dfs -ls /user/hive/warehouse/product_order_details
Step-6 : View the content of the partitioned files
hdfs dfs -cat /user/hive/warehouse/product_order_details/part-m-00000
hdfs dfs -cat /user/hive/warehouse/product_order_details/part-m-00001
hdfs dfs -cat /user/hive/warehouse/product_order_details/part-m-00002
Description about command :
sqoop import -> keyword for importing
connect -> keyword
<path> -> path of the mysql
username -> keyword
root -> name of the user, which is stored in the mysql
password -> keyword
password -> password for the user
table -> keyword
product_order_details -> importing table
null-non-string -> keyword for Encoding non-string NULL values
null-string -> keyword for Encoding string NULL values
NA -> encoding value
split-by -> keyword for split the table
user_id -> name of the column (the table splited based on that column)
m 3 -> number of partitioned
target-dir -> keyword for target directory
/user/hive/warehouse/product_order_details -> path for where our table should be stored.