Hive,Hbase Integration
Hive,Hbase Integration
Hive:
Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.
Hbase:
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large datasets.
HBase scales linearly to handle huge data sets with billions of rows and millions of columns, and it easily combines data sources that use a wide variety of different structures and schemas.
HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
Integration steps:
Step 1: create a folder “auxlib” in the hive root directory
Step 2:
i) Copy the following jar files in the lib directory from hive root directory to auxilib directory.
1.apache-hive-1.2.1-bin/lib/guava-14.0.1.jar
2.apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar
ii) Copy the following jar files in the lib directory from hbase root directory to auxilib directory.
3.hbase-1.0.1/lib/hbase-common-1.0.1.jar
4.hbase-1.0.1/lib/zookeeper-3.4.6.jar
Step 3: create table in hbase
Syntax
create table tablename
Example
create ‘hbase_table’,’name’,’details’
Step 4: insert values in hbase
Syntax
put ‘tablename’,’column_value1′,’column_value2′
Example
put ‘hbase_table’,’1′,’name:first_name’,’Sachin’
put ‘hbase_table’,’1′,’name:last_name’,’Tendulkar’
put ‘hbase_table’,’1′,’details:age’,’42’
put ‘hbase_table’,’1′,’details:city’,’Mumbai’
put ‘hbase_table’,’1′,’details:team’,’India’
put ‘hbase_table’,’2′,’name:first_name’,’Ricky’
put ‘hbase_table’,’2′,’name:last_name’,’Ponting’
put ‘hbase_table’,’2′,’details:age’,’40’
put ‘hbase_table’,’2′,’details:city’,’Sydney’
put ‘hbase_table’,’2′,’details:team’,’Australia’
Step 5: describe hbase table
Syntax
describe ‘tablename’
Example
describe ‘hbase_table’
Step 6: view table Values
Syntax
scan ‘tablenme’
Example
scan ‘habse_table’
SerDe Overview
SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting the results of serialization as individual fields for processing.
A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.
Step 7: create table in hive (including SERDE PROPERTIES)
CREATE EXTERNAL TABLE hive_table (Row_key string, First_name string,last_name string,age int,City string,Team string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES(“hbase.columns.mapping”=”:key,name:first_name ,name:last_name,details:age,details:city,details:team”) TBLPROPERTIES (“hbase.table.name” = “hbase_table”);
Step 8: describe table;
Syntax
desc tablename
Example
desc hive_table;
Step 9: desc hive_table;
Syntax
select * from tablename;
Example
select * from hive_table;
Step 10: update value in hbase
put ‘hbase_table’,’2′,’details:city’,’Canberra’
Step 11: view table values in hive after update value
Syntax
select * from tablename;
Example
select * from hive_table;
After the updation completed we can view the above output.