Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
What is Boundary Value Query in Sqoop
Sqoop run it mapper task by execute the SQL like SELECT * FROM table WHERE id >= low AND id < high. Sqoop uses query select minimum value for splitting, maximum value for splitting to find out boundaries for creating splits. This Sqoop operation is known as Boundary Value Query.
Import data from mysql to hdfs using Boundary query.
Create table in mysql and import to hdfs using boundary query through Sqoop.
Create database databasename;
create database geo;
Create table in mysql.
create table purchase(purchaseorder_id int,productname varchar(50),quantity bigint,prize bigint);
Insert values to the purchase table.
insert into purchase values(1,'Diesel',500,5000),(2,'Petrol',250,4000),(3,'oil',1000,7000);
Import the boundary query in hive using mysql table through SQOOP
sqoop import --connect jdbc:mysql://localhost/geo --username root --password root --table purchase --target-dir /user/geouser/newpurchase_hive --boundary-query “select 1,2 from purchase” --columns productname,quantity,prize -m 1 --split-by purchaseorder_id;
Here split-by is used to refer the mapreduce partition
View the result.
hadoop fs -cat /user/geouser/newpurchase_hive/part-m-00000 hadoop fs -cat /user/geouser/newpurchase_hive/part-m-00001