HIVE JSON
Hive json file with example:
Hive:
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. It’s an open source project under the Apache Software Foundation, it’s a data warehouse software ecosystem in Hadoop. Which manage vast amount of structured data sets, by using HQL language; it’s similar to SQL.
JSON:
JSON(JavaScript Object Notation) is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML. Square space uses JSON to store and organize site content created with the CMS.
Simple Json ( without using JsonSerDe ):
Step-1: Create simple json table with simple value
CREATE TABLE json_table (json string);
Step-2: Load value into the json_table
LOAD DATA LOCAL INPATH ‘/home/geouser/simple.json’ INTO TABLE json_table;
Step-3: View the value in the table json_table
SELECT * from json_table;
hive> CREATE TABLE json_table (json string);
OK
Time taken: 0.353 seconds
hive> LOAD DATA LOCAL INPATH ‘/home/geouser/Simple.json’ INTO TABLE json_table;
Loading data to table default.json_table
Table default.json_table stats: [numFiles=1, totalSize=456]
OK
Time taken: 0.64 seconds
hive> SELECT * from json_table;
OK
{“Name”:”Alaister briito”,”Id”:”1001265″,”Address”:{“City”:”Amsterdam”,”Country”:”Netherlands”,}}
{“Name”:”Anifa mohammed “,”Id”:”1001285″,”Address”:{“City”:”Beijing”,”Country”:”China”,}}
{“Name”:”Piyush manish “,”Id”:”1001263″,”Address”:{“City”:”Cairo”,”Country”:”Egypt”,}}
{“Name”:”Vijay karthik “,”Id”:”1001295″,”Address”:{“City”:”Dublin”,”Country”:”Ireland”,}}
{“Name”:”Narasingh rao “,”Id”:”1001267″,”Address”:{“City”:”Havana”,”Country”:”Cuba”,}}
Time taken: 0.191 seconds, Fetched: 6 row(s)
Complex JSON FILE ( Using JsonSerDe ):
Step-1: First add json-serde-1.3.7- SNAPSHOT-jar-with-dependencies jar into hive
ADD JAR /home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
hive> ADD JAR /home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar; Added [/home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar] to class path Added resources: [/home/geouser/Hive-JSON-Serde/json-serde/target/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar]
Step-2: Create a json hive table
create table complex_json (E_id string, User_details struct<ID:int, Username:string, Name:string, Office_Address:struct<Street:string, Area:string, Country:string>, Orders:array<struct<ItemId:int, OrderDate:string>>> ) row format serde ‘org.openx.data.jsonserde.JsonSerDe’ stored as textfile;
hive> create table complex_json (
> E_id string,
> User_details struct<ID:int,
> Username:string,
> Name:string,
> Office_Address:struct<Street:string,
> Area:string,
> Country:string>,
> Orders:array<struct<ItemId:int,
> OrderDate:string>>>
> )
> row format serde ‘org.openx.data.jsonserde.JsonSerDe’ stored as textfile;
OK
Time taken: 0.466 seconds
Step-3: Complex.json (file)
{“E_id”:GISS101,”User_details”:{“Id”:101,”Username”:”briito123″,”Name”:”Alaister briito”,”Office_Address”:{“Street”:”309 KENT STREET”,”Area”:”BENTLEY WA”,”Country”:”AUSTRALIA”},”Orders”:[{“ItemId”:6789,”OrderDate”:”11/07/2016″},{“ItemId”:4352,”OrderDate”:”16/07/2016″}]}} {“E_id”:GISS102,”User_details”:{“Id”:102,”Username”:”Anifa435″,”Name”:”Anifa mohammed”,”Office_Address”:{“Street”:”117 RUSSELL DRIVE”,”Area”:”LONDON WIP”,”Country”:”ENGLAND”},”Orders”:[{“ItemId”:6791,”OrderDate”:”12/07/2016″},{“ItemId”:4354,”OrderDate”:”15/07/2016″}]}} {“E_id”:GISS103,”User_details”:{“Id”:103,”Username”:”Piyu2300″,”Name”:”Piyush manish”,”Office_Address”:{“Street”:”56 Church Street”,”Area”:”Hamilton HM”,”Country”:”BERMUDA”},”Orders”:[{“ItemId”:6795,”OrderDate”:”11/07/2016″},{“ItemId”:4356,”OrderDate”:”18/07/2016″}]}} {“E_id”:GISS104,”User_details”:{“Id”:104,”Username”:”Vithik”,”Name”:”Vijay karthik”,”Office_Address”:{“Street”:”21-B 125252 Moskva”,”Area”:”MOSCOW”,”Country”:”RUSSIA”},”Orders”:[{“ItemId”:6815,”OrderDate”:”11/07/2016″},{“ItemId”:4358,”OrderDate”:”18/07/2016″}]}} {“E_id”:GISS105,”User_details”:{“Id”:105,”Username”:”Narasio”,”Name”:”Narasingh rao”,”Office_Address”:{“Street”:”13 West Chang An Street”,”Area”:”100804 BEIJING”,”Country”:”CHINA”},”Orders”:[{“ItemId”:6825,”OrderDate”:”11/07/2016″},{“ItemId”:4360,”OrderDate”:”18/07/2016″}]}}
Step-4: Load the json file into hive json table
LOAD DATA LOCAL INPATH ‘/home/geouser/Complex_json.json’ OVERWRITE INTO TABLE complex_json;
Step-5: View the table
select * from complex_json;
hive>select * from complex_json;
OK
{“E_id”:GISS101,”User_details”:{“Id”:101,”Username”:”briito123″,”Name”:”Alaister briito”,”Office_Address”:{“Street”:”309 KENT STREET”,”Area”:”BENTLEY WA”,”Country”:”AUSTRALIA”},”Orders”:[{“ItemId”:6789,”OrderDate”:”11/07/2016″},{“ItemId”:4352,”OrderDate”:”16/07/2016″}]}}
{“E_id”:GISS102,”User_details”:{“Id”:102,”Username”:”Anifa435″,”Name”:”Anifa mohammed”,”Office_Address”:{“Street”:”117 RUSSELL DRIVE”,”Area”:”LONDON WIP”,”Country”:”ENGLAND”},”Orders”:[{“ItemId”:6791,”OrderDate”:”12/07/2016″},{“ItemId”:4354,”OrderDate”:”15/07/2016″}]}}
{“E_id”:GISS103,”User_details”:{“Id”:103,”Username”:”Piyu2300″,”Name”:”Piyush manish”,”Office_Address”:{“Street”:”56 Church Street”,”Area”:”Hamilton HM”,”Country”:”BERMUDA”},”Orders”:[{“ItemId”:6795,”OrderDate”:”11/07/2016″},{“ItemId”:4356,”OrderDate”:”18/07/2016″}]}}
{“E_id”:GISS104,”User_details”:{“Id”:104,”Username”:”Vithik”,”Name”:”Vijay karthik”,”Office_Address”:{“Street”:”21-B
125252 Moskva”,”Area”:”MOSCOW”,”Country”:”RUSSIA”},”Orders”:[{“ItemId”:6815,”OrderDate”:”11/07/2016″},{“ItemId”:4358,”OrderDate”:”18/07/2016″}]}}
{“E_id”:GISS105,”User_details”:{“Id”:105,”Username”:”Narasio”,”Name”:”Narasingh rao”,”Office_Address”:{“Street”:”13 West Chang An Street”,”Area”:”100804 BEIJING”,”Country”:”CHINA”},”Orders”:[{“ItemId”:6825,”OrderDate”:”11/07/2016″},{“ItemId”:4360,”OrderDate”:”18/07/2016″}]}}
Time taken: 0.315 seconds, Fetched: 3 row(s)
Step-6: View particuler Fields
SELECT E_id, User_details.Id, User_details.Office_Address.Country as country, User_details.Orders[0].ItemId as order0id, User_details.Orders[1].ItemId as order1id FROM complex_json;
hive>SELECT E_id,User_details.Id, User_details.Office_Address.Country as country,
> User_details.Orders[0].ItemId as order0id,
> User_details.Orders[1].ItemId as order1id
> FROM complex_json;
OK
GISS101 101 AUSTRALIA 6789 4352
GISS102 102 ENGLAND 6791 4354
GISS103 103 BERMUDA 6795 4356
GISS104 104 RUSSIA 6815 4358
GISS105 105 CHINA 6825 4360
Time taken: 0.315 seconds, Fetched: 3 row(s)