Hive,Hbase Integration

Hive,Hbase Integration

Hive:

Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.

Hbase:

Apache HBase is an open source NoSQL database that provides real-time read/write access to those large datasets.

HBase scales linearly to handle huge data sets with billions of rows and millions of columns, and it easily combines data sources that use a wide variety of different structures and schemas.

HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.

Integration steps:

Step 1: create a folder “auxlib” in the hive root directory

Step 2:

i) Copy the following jar files in the lib directory from hive root directory to auxilib directory.

1.apache-hive-1.2.1-bin/lib/guava-14.0.1.jar

2.apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar

ii) Copy the following jar files in the lib directory from hbase root directory to auxilib directory.

3.hbase-1.0.1/lib/hbase-common-1.0.1.jar

4.hbase-1.0.1/lib/zookeeper-3.4.6.jar

Step 3: create table in hbase

Syntax

create table tablename

Example

create ‘hbase_table’,’name’,’details’

Step 4: insert values in hbase

Syntax

put ‘tablename’,’column_value1′,’column_value2′

Example

put ‘hbase_table’,’1′,’name:first_name’,’Sachin’

put ‘hbase_table’,’1′,’name:last_name’,’Tendulkar’

put ‘hbase_table’,’1′,’details:age’,’42’

put ‘hbase_table’,’1′,’details:city’,’Mumbai’

put ‘hbase_table’,’1′,’details:team’,’India’

put ‘hbase_table’,’2′,’name:first_name’,’Ricky’

put ‘hbase_table’,’2′,’name:last_name’,’Ponting’

put ‘hbase_table’,’2′,’details:age’,’40’

put ‘hbase_table’,’2′,’details:city’,’Sydney’

put ‘hbase_table’,’2′,’details:team’,’Australia’

Step 5: describe hbase table

Syntax

describe ‘tablename’

Example

describe ‘hbase_table’

Step 6: view table Values

Syntax

scan ‘tablenme’

Example

scan ‘habse_table’

SerDe Overview

SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for IO. The interface handles both serialization and deserialization and also interpreting the results of serialization as individual fields for processing.

A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

Step 7: create table in hive (including SERDE PROPERTIES)

CREATE EXTERNAL TABLE hive_table (Row_key string, First_name string,last_name string,age int,City string,Team string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES(“hbase.columns.mapping”=”:key,name:first_name ,name:last_name,details:age,details:city,details:team”) TBLPROPERTIES (“hbase.table.name” = “hbase_table”);

Step 8: describe table;

Syntax

desc tablename

Example

desc hive_table;

Step 9: desc hive_table;

Syntax

select * from tablename;

Example

select * from hive_table;

Step 10: update value in hbase

put ‘hbase_table’,’2′,’details:city’,’Canberra’

Step 11: view table values in hive after update value

Syntax

select * from tablename;

Example

select * from hive_table;

After the updation completed we can view the above output.

Latest News

Recent Posts

Categories

Latest News

Hive,Hbase Integration

Related News

RDDs vs DataFrames in Apache Spark

RDD Joins in Core Spark

Spark Sql Aggregate Function in RDD:

HIVE PARTITION BUCKETING

Dynamic Partitioning In Hive

Hive Joins Examples

Hadoop Installation

Recent Posts

Categories