Hive,Hbase Integration

Hive:

Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.

Hbase:

Apache HBase is an open source NoSQL database that provides real-time read/write access to those large datasets.

HBase scales linearly to handle huge data sets with billions of rows and millions of columns, and it easily combines data sources that use a wide variety of different structures and schemas.

HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.

Integration steps:

Step 1: create a folder “auxlib” in the hive root directory

Step 2:

i) Copy the following jar files in the lib directory from hive root directory to auxilib directory.

1.apache-hive-1.2.1-bin/lib/guava-14.0.1.jar

2.apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar

ii) Copy the following jar files in the lib directory from hbase root directory to auxilib directory.

3.hbase-1.0.1/lib/hbase-common-1.0.1.jar

4.hbase-1.0.1/lib/zookeeper-3.4.6.jar

Step 3: create table in hbase

Syntax

create table tablename

Example

create ‘hbase_table’,’name’,’details’

hive and hbase integration

Step 4: insert values in hbase

Syntax

put ‘tablename’,’column_value1′,’column_value2′

Example

put ‘hbase_table’,’1′,’name:first_name’,’Sachin’

put ‘hbase_table’,’1′,’name:last_name’,’Tendulkar’

put ‘hbase_table’,’1′,’details:age’,’42’

put ‘hbase_table’,’1′,’details:city’,’Mumbai’

put ‘hbase_table’,’1′,’details:team’,’India’

put ‘hbase_table’,’2′,’name:first_name’,’Ricky’

put ‘hbase_table’,’2′,’name:last_name’,’Ponting’

put ‘hbase_table’,’2′,’details:age’,’40’

put ‘hbase_table’,’2′,’details:city’,’Sydney’

put ‘hbase_table’,’2′,’details:team’,’Australia’

hive and hbase integration

Step 5: describe hbase table

Syntax

describe ‘tablename’

Example

describe ‘hbase_table’

hive and hbase integration

Step 6: view table Values

Syntax

scan ‘tablenme’

Example

scan ‘habse_table’

hive and hbase integration

SerDe Overview

SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting the results of serialization as individual fields for processing.

A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

Step 7: create table in hive (including SERDE PROPERTIES)

CREATE EXTERNAL TABLE hive_table (Row_key string, First_name string,last_name string,age int,City string,Team string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES(“hbase.columns.mapping”=”:key,name:first_name ,name:last_name,details:age,details:city,details:team”) TBLPROPERTIES (“hbase.table.name” = “hbase_table”);

hive and hbase integration

Step 8: describe table;

Syntax

desc tablename

Example

desc hive_table;

hive and hbase integration

Step 9: desc hive_table;

Syntax

select * from tablename;

Example

select * from hive_table;

hive and hbase integration

Step 10: update value in hbase

put ‘hbase_table’,’2′,’details:city’,’Canberra’

hive and hbase integration

Step 11: view table values in hive after update value

Syntax

select * from tablename;

Example

select * from hive_table;

hive and hbase integration

After the updation completed we can view the above output.

hive and hbase integration