Hive json file with example:

Hive:

         Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.  It’s an open source project under the Apache Software Foundation, it’s a data warehouse software ecosystem in Hadoop. Which manage vast amount of structured data sets, by using HQL language; it’s similar to SQL.

hive

JSON:

      JSON(JavaScript Object Notation) is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application,  as an alternative to XML.  Square space uses JSON to store and organize site content created with the CMS.


Simple Json ( without using JsonSerDe
):

Step-1: Create simple json table with simple value

CREATE TABLE json_table (json string);


Step-2:
Load value into the json_table

LOAD DATA LOCAL INPATH ‘/home/geouser/simple.json’ INTO TABLE json_table;


Step-3:
View the value in the table json_table

SELECT * from json_table;

hive> CREATE TABLE json_table (json string);
OK
Time taken: 0.353 seconds
hive> LOAD DATA LOCAL INPATH ‘/home/geouser/Simple.json’ INTO TABLE  json_table;
Loading data to table default.json_table
Table default.json_table stats: [numFiles=1, totalSize=456]
OK
Time taken: 0.64 seconds
hive> SELECT *  from json_table;
OK
{“Name”:”Alaister briito”,”Id”:”1001265″,”Address”:{“City”:”Amsterdam”,”Country”:”Netherlands”,}}
{“Name”:”Anifa mohammed “,”Id”:”1001285″,”Address”:{“City”:”Beijing”,”Country”:”China”,}}
{“Name”:”Piyush manish  “,”Id”:”1001263″,”Address”:{“City”:”Cairo”,”Country”:”Egypt”,}}
{“Name”:”Vijay karthik  “,”Id”:”1001295″,”Address”:{“City”:”Dublin”,”Country”:”Ireland”,}}
{“Name”:”Narasingh rao  “,”Id”:”1001267″,”Address”:{“City”:”Havana”,”Country”:”Cuba”,}}
Time taken: 0.191 seconds, Fetched: 6 row(s)

big data training

Complex JSON FILE ( Using JsonSerDe ):

Step-1: First add json-serde-1.3.7- SNAPSHOT-jar-with-dependencies jar into hive

ADD JAR /home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;

hive> ADD JAR /home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar; Added [/home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar] to class path Added resources: [/home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar]


Step-2:
Create a json hive table

create table complex_json (E_id string, User_details struct<ID:int, Username:string,  Name:string, Office_Address:struct<Street:string, Area:string, Country:string>,  Orders:array<struct<ItemId:int, OrderDate:string>>> ) row format serde  ‘org.openx.data.jsonserde.JsonSerDe’ stored as textfile;

hive> create table complex_json (
> E_id string,
> User_details struct<ID:int,
> Username:string,
> Name:string,
> Office_Address:struct<Street:string,
> Area:string,
> Country:string>,
> Orders:array<struct<ItemId:int,
> OrderDate:string>>>
> )
> row format serde ‘org.openx.data.jsonserde.JsonSerDe’ stored as textfile;
OK
Time taken: 0.466 seconds

Step-3: Complex.json (file)

{“E_id”:GISS101,”User_details”:{“Id”:101,”Username”:”briito123″,”Name”:”Alaister briito”,”Office_Address”:{“Street”:”309 KENT STREET”,”Area”:”BENTLEY WA”,”Country”:”AUSTRALIA”},”Orders”:[{“ItemId”:6789,”OrderDate”:”11/07/2016″},{“ItemId”:4352,”OrderDate”:”16/07/2016″}]}} {“E_id”:GISS102,”User_details”:{“Id”:102,”Username”:”Anifa435″,”Name”:”Anifa mohammed”,”Office_Address”:{“Street”:”117 RUSSELL DRIVE”,”Area”:”LONDON  WIP”,”Country”:”ENGLAND”},”Orders”:[{“ItemId”:6791,”OrderDate”:”12/07/2016″},{“ItemId”:4354,”OrderDate”:”15/07/2016″}]}} {“E_id”:GISS103,”User_details”:{“Id”:103,”Username”:”Piyu2300″,”Name”:”Piyush manish”,”Office_Address”:{“Street”:”56 Church Street”,”Area”:”Hamilton HM”,”Country”:”BERMUDA”},”Orders”:[{“ItemId”:6795,”OrderDate”:”11/07/2016″},{“ItemId”:4356,”OrderDate”:”18/07/2016″}]}} {“E_id”:GISS104,”User_details”:{“Id”:104,”Username”:”Vithik”,”Name”:”Vijay karthik”,”Office_Address”:{“Street”:”21-B 125252 Moskva”,”Area”:”MOSCOW”,”Country”:”RUSSIA”},”Orders”:[{“ItemId”:6815,”OrderDate”:”11/07/2016″},{“ItemId”:4358,”OrderDate”:”18/07/2016″}]}} {“E_id”:GISS105,”User_details”:{“Id”:105,”Username”:”Narasio”,”Name”:”Narasingh rao”,”Office_Address”:{“Street”:”13 West Chang An Street”,”Area”:”100804 BEIJING”,”Country”:”CHINA”},”Orders”:[{“ItemId”:6825,”OrderDate”:”11/07/2016″},{“ItemId”:4360,”OrderDate”:”18/07/2016″}]}}


Step-4:
Load the json file into hive json table

LOAD DATA LOCAL INPATH ‘/home/geouser/Complex_json.json’ OVERWRITE INTO TABLE complex_json;

Step-5: View the table

select * from complex_json;

hive>select * from complex_json;
OK
{“E_id”:GISS101,”User_details”:{“Id”:101,”Username”:”briito123″,”Name”:”Alaister briito”,”Office_Address”:{“Street”:”309 KENT STREET”,”Area”:”BENTLEY WA”,”Country”:”AUSTRALIA”},”Orders”:[{“ItemId”:6789,”OrderDate”:”11/07/2016″},{“ItemId”:4352,”OrderDate”:”16/07/2016″}]}}
{“E_id”:GISS102,”User_details”:{“Id”:102,”Username”:”Anifa435″,”Name”:”Anifa mohammed”,”Office_Address”:{“Street”:”117 RUSSELL DRIVE”,”Area”:”LONDON  WIP”,”Country”:”ENGLAND”},”Orders”:[{“ItemId”:6791,”OrderDate”:”12/07/2016″},{“ItemId”:4354,”OrderDate”:”15/07/2016″}]}}
{“E_id”:GISS103,”User_details”:{“Id”:103,”Username”:”Piyu2300″,”Name”:”Piyush manish”,”Office_Address”:{“Street”:”56 Church Street”,”Area”:”Hamilton HM”,”Country”:”BERMUDA”},”Orders”:[{“ItemId”:6795,”OrderDate”:”11/07/2016″},{“ItemId”:4356,”OrderDate”:”18/07/2016″}]}}
{“E_id”:GISS104,”User_details”:{“Id”:104,”Username”:”Vithik”,”Name”:”Vijay karthik”,”Office_Address”:{“Street”:”21-B
125252 Moskva”,”Area”:”MOSCOW”,”Country”:”RUSSIA”},”Orders”:[{“ItemId”:6815,”OrderDate”:”11/07/2016″},{“ItemId”:4358,”OrderDate”:”18/07/2016″}]}}
{“E_id”:GISS105,”User_details”:{“Id”:105,”Username”:”Narasio”,”Name”:”Narasingh rao”,”Office_Address”:{“Street”:”13 West Chang An Street”,”Area”:”100804 BEIJING”,”Country”:”CHINA”},”Orders”:[{“ItemId”:6825,”OrderDate”:”11/07/2016″},{“ItemId”:4360,”OrderDate”:”18/07/2016″}]}}
Time taken: 0.315 seconds, Fetched: 3 row(s)

Step-6: View particuler Fields

SELECT E_id, User_details.Id, User_details.Office_Address.Country as country, User_details.Orders[0].ItemId as order0id, User_details.Orders[1].ItemId as order1id FROM complex_json;

hive>SELECT E_id,User_details.Id, User_details.Office_Address.Country as country,
>    User_details.Orders[0].ItemId as order0id,
>    User_details.Orders[1].ItemId as order1id
>    FROM complex_json;
OK
GISS101 101 AUSTRALIA 6789 4352
GISS102 102 ENGLAND   6791 4354
GISS103 103 BERMUDA   6795 4356
GISS104 104 RUSSIA    6815 4358
GISS105 105 CHINA     6825 4360
Time taken: 0.315 seconds, Fetched: 3 row(s)