Sqoop Commands
SQOOP
SQOOP=> (SQ)L + HAD(OOP)
Relational database systems often store valuable data in a company. If made available, that data can be managed and processed by Apache Hadoop, which is fast becoming the standard for big data processing. Several relational database vendors championed de veloping integration with Hadoop within one or more of their products.
Transferring data to and from relational databases is challenging and laborious. Because data transfer requires careful handling, Apache Sqoop, short for “SQL to Hadoop,” was created to perform bidirectional data transfer between Hadoop and almost any external structured datastore. Taking advantage of MapReduce, Hadoop’s execution engine, Sqoop performs the transfers in a parallel manner.
Import = Transferring data from your relational database or warehouse system to the Hadoop ecosystem.
Export = Transferring data from your Hadoop ecosystem to the relational database or warehouse system.
Importing Data:
Here we will use a MySQL database with the account root and password password. We will be connecting to a database named sqoop_ex;
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table mysql_table \
–m 1
Description:
The first parameter after the sqoop executable is import , which specifies the appropriate tool. The import tool is used when you want to transfer data from the relational database into Hadoop.
Next, –connect , contains the JDBC URL to your database. The syntax of the URL is specific for each database, so you need to consult your DB manual for the proper format. The URL is followed by two parameters, –username and –password , which are the credentials that Sqoop should use while connecting to the database.
Then –table , contains the name of the table in mysql to transfer and –m specifies the Number of Mapper Tasks will kick to start work.
Importing Data into HIVE
This practice importing a data from Mysql to Hive.
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table mysql_table \
–hive-import \
–m 1
Description:
Parameter –hive-import is used for importing data into Hive.
Importing All Tables into HIVE
This excersice will import all your tables in MySql Selected Database into Hive.
Sqoop Command>>
sqoop import-all-tables \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–hive-import \
–m 1
Description:
In above command import-all-tables parameter will import all the tables in mysql sqoop_ex Database to Hive.
Above sqoop_imported table is rename version of hive_import table in MySql. More on Naming Hive Table and Hive Database will be discussed.
Importing Only a Subset of Data using Where Clause
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table batch_info \
–where “batch_code=5”
–hive-import \
–m 1
–target-dir ‘/user/hive/warehouse/imported’
Description:
–where parameter is used to specify a SQL condition that the imported data should meet.
–target-dir parameter is used to specify the directory on HDFS where Sqoop should import your data.
Importing Data with append.
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table batch_info \
–where “batch_code=6”
–append
–hive-import \
–m 1
–target-dir ‘/user/hive/warehouse/imported’
Description
–append parameter will appending your new data into your existing hive table.
Importing Free Form Query
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–query “select * from batch_info where course code= ‘H001’ and $CONDITIONS \
–hive-import \
–hive-table hive_import_table
–m 1
–target-dir ‘/user/hive/warehouse/imported’
Description:
Sqoop will allow you to specify any query for importing data. Instead of the parameter –table , use the parameter –query with the entire query for obtaining the data you would like to transfer.
Here another parameter named –hive-table is going to change the name of your table in Hive Database.
Importing Data into Hive Database.
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–query “select * from batch_info where course code= ‘B001’ and $CONDITIONS \
–hive-database sqoop_import \
–hive-import \
–hive-table hive_import_table
–m 1
–target-dir ‘/user/hive/warehouse/hive_import_db’
Description
Here –hive-database parameter its used to specify the hive database to import your data
Creating Sqoop job
Sqoop Command>>
sqoop job \
–create test1
— \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table course_view \
–hive-import \
–incremental append
–check-column course_code
–last-value 5
–m 1
–target-dir ‘/user/hive/warehouse/hive_import_db’
Execution of Sqoop Job
sqoop job –exec test1;
Description
Sqoop Job is You can take advantage of the built-in Sqoop metastore that allows you to save all parameters for later reuse.
Parameter –incremental import in append mode will allow you to transfer only the newly created rows.
–check-column indicates a column name that should be checked for newly appended data.
–last-value contains the last value that successfully imported into Hadoop
Sqoop Other Import Options
List all mysql tables of particular database
Sqoop Command>>
sqoop-list-tables \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
List all mysql databases
Sqoop Command>>
sqoop-databases \
–connect jdbc:mysql://localhost/ \
–username root \
–password password \
view the mapreduce code generated by sqoop
Sqoop Command>>
sqoop-codegen \
–connect jdbc:mysql://localhost/ \
–username root \
—password password \
–table course_master
Importing only particular column from mysql
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table course_info
–columns course_code
Exclude
To exclude table while using Import-all-tables parameter
Sqoop Command>>
sqoop import-all-tables \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–exclude-tables cities,countries
Setting warehouse Directory
To specify the parent directory for all your Sqoop jobs, instead use the –warehouse- dir parameter
Sqoop Command>>
sqoop import-all-tables \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–exclude-tables cities,countries
–warehouse-dir /user/hive/warehouse/imported
Getting Password From Standard Input
You have two options besides specifying the password on the command line with the –password parameter. The first option is to use the parameter -P that will instruct
Sqoop to read the password from standard input.
Sqoop Command>>
sqoop import-all-tables \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–P \
–exclude-tables cities,countries
–warehouse-dir /user/hive/warehouse/imported
Alternatively, you can save your password in a file and specify the path to this file with the parameter –password-file .
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password-file my-sqoop-password \
–warehouse-dir /user/hive/warehouse/imported
Using a File Format
Sequence File:
The SequenceFile is a special Hadoop file format that is used for storing objects and implements the Writable interface. This format was customized for MapReduce, and thus it expects that each record will consist of two parts: key and value .
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table mysql_table \
–hive-import \
–m 1
–as-sequencefile
Avro File
Apache Avro is a generic data serialization system. Specifying the –asavrodatafile parameter instructs Sqoop to use its compact and fast binary encoding format. Avro is a very generic system that can store any arbitrary data structures.
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table mysql_table \
–hive-import \
–m 1
–as-avrodatafile
Compressing Imported Data
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table mysql_table \
–compress
Speeding Up Transfers
Sqoop Command>>
sqoop import \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table mysql_table \
–direct
Sqoop Export
When you are exporting data into RDBMS you must create the metadata for the respective export table.
Sqoop Command>>
sqoop export \
–connect jdbc:mysql://mysql.example.com/sqoop_ex \
–username root \
–password password \
–table hive_import_new \
–export-dir /user/hive/warehouse/imported
Inserting in Batches
Sqoop Command>>
sqoop export \
–connect jdbc:mysql://localhost/sqoop_ex \
–username root \
–password password \
–table hive_import_new \
–export-dir /user/hive/warehouse/hive_import_new\
–batch