AVRO FILE FORMAT
AVRO FILE:
Avro stores both the data definition and the data together in one message or file making it easy for programs to dynamically understand the information stored in an Avro file or message.
Avro stores the data definition in JSON format making it easy to read and interpret, the data itself is stored in binary format making it compact and efficient. Avro files include markers that can be used to splitting large data sets into subsets suitable for MapReduce processing.
SQOOP:
Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
HIVE:
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. It’s an open source project under the Apache Software Foundation, it’s a data warehouse software ecosystem in Hadoop. Which manage vast amount of structured data sets, by using HQL language; it’s similar to SQL.
Step-1
Create database in MySQL
create database sqoopdb;
Step-2
Create database in MySQL
use sqoopdb;
Step-3
Create table in sqoopdb;
Create table product_order_details (user_id int, order_id int, product_id varchar(20), order_amount bigint, order_delivery_date varchar(20), user_email_id varchar(20));
Step-4
Insert values into product_order_details
Step-5
Import table from MySQL to HIVE in AVRO File Format
sqoop import \
–connect jdbc:mysql://localhost/sqoopdb \
–username root \
–password password \
–table product_order_details \
–as-avrodatafile \
–warehouse-dir /user/hive/warehouse/purchase_db\
-m 1;
Step-6
View the content of the AVRO files
hdfs dfs -cat /user/hive/warehouse/purchase_db/product_order_details/part-m-00000.avro
Convert AVRO file into AVSC:
AVSC:
It contain Schema of the AVRO file. We can create a new table using the avsc schema file.
Step-7
Get the avro file from HDFS to Local
hadoop fs -get /user/hive/warehouse/purchase_db/product_order_details/part-
m-00000.avro /tmp/purchase.avro
Step-8
Convert Avro into Avsc
java -jar /usr/lib/avro/avro-tools-1.7.4.jar getschema /tmp/product_order.avro > /tmp/product_order.avsc
Create a new table using AVSC schema file in Hive:
Step-9
List the avsc files in local
sudo ls -ltr -ln /tmp/*.avsc
Step-10:
Put the avsc file from local to Hive
hdfs dfs -put /tmp/product_order.avsc /user/hive/warehouse/
Step_11
View the content of the AVSC file
hdfs dfs -cat /user/hive/warehouse/product_order.avsc
Step-12:
Create a new hive table using avsc schema file
create external table purchase_details_table stored as avro tblproperties(‘avro.schema.url’=’/user/hive/warehouse/product_order.avsc’);
Step-14
Describe and Select the newly created table
(i) Describe
desc purchase_details_table;
(ii). Select
select * from purchase_details_table;
Step-15
Create a new hive table using avsc schema file and insert value for the table using another file
create external table purchase_details stored as avro location’/user/hive/warehouse/purchase_db/product_order_details’ tblproperties(‘avro.schema.url’=’/user/hive/warehouse/product_order.avsc’);
Step-16:
Describe and Select the newly created table
(i) Describe
desc purchase_details;
(ii). Select
select * from purchase_details;
Description about command:
sqoop import -> keyword for importing
connect -> keyword
<path> -> path of the mysql
username -> keyword
root -> name of the user, which is stored in the mysql
password -> keyword
password -> password for the user
table -> keyword
product_order_details -> importing table
as-avrodatafile -> keyword for avro format file
warehouse-dir -> keyword for target directory
/user/hive/warehouse/purchase_db/product_order_details -> path for where our table should be stored.
m 1 -> number of partitioned