AVRO JSON CONVERSIONS
AVRO JSON CONVERSIONS:
What is AVRO File Format:
Avro stores both the data definition and the data together in one message or file making it easy for programs to dynamically understand the information stored in an Avro file or message.
Avro stores the data definition in JSON format making it easy to read and interpret, the data itself is stored in binary format making it compact and efficient. Avro files include markers that can be used to splitting large data sets into subsets suitable for MapReduce processing.
What is JSON File Format:
JSON(JavaScript Object Notation) is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML. Square space uses JSON to store and organize site content created with the CMS.
AVSC:
AVSC is a Schema File.
AVRO to JSON Conversion:
Step-1: First Download the avro-tools-1.7.4.jar
Step-2: Example Avro File
Objavro.schema\8E{“type”:”record”,”name”:”Avro_File”,”doc”:”Sqoop import of Avro_File”,”fields”:[{“name”:”User__Name”,”type”:[“string”,”null”],”columnName”:”User__Name”,”sqlType”:”12″},{“name”:”Product_Id”,”type”:[“int”,”null”],”columnName”:”Product_Id”,”sqlType”:”4″},{“name”:”Order_Id”,”type”:[“string”,”null”],”columnName”:”Order_Id”,”sqlType”:”12″},{“name”:”Delivery_Date”,”type”:[“string”,”null”],”columnName”:”Delivery_Date”,”sqlType”:”12″}],”tableName”:”Avro_File”}\00\D0!\B2\00e\B1\AC\CBI6\80aɳ\D6\00alaister briito\00ʺ\DA/\00mo862041\0026-09-2016\00anifa mohammed\00̺\DA/\00la862041\0014-09-2016\00piyush manish\00κ\DA/\00mo862032\0016-09-2016\00vijay karthik\00к\DA/\00wa862098\0029-09-2016\D0!\B2\00e\B1\AC\CBI6\80aɳ
Step-3: Convert the AVRO file into JSON
java -jar <jar file with location> <keyword> <avro file with location> > <json file name with json file location where the json file will be stored>
java -jar /usr/lib/avro/avro-tools-1.7.4.jar tojson /tmp/Avro_Format_File.avro > /tmp/Json_Format_File.json
Step-4: View the Json Format File which is converted from Avro file
hadoop fs -cat file:/tmp/Json_Format_File.json
geouser@geouser:~$ hadoop fs -cat file:/tmp/Json_Format_File.json
{“User__Name”:{“string”:”alaister briito”},”Product_Id”:{“int”:50024101},”Order_Id”:{“string”:”mo862041″},”Delivery_Date”:{“string”:”26-09-2016″}}
{“User__Name”:{“string”:”anifa mohammed”},”Product_Id”:{“int”:50024102},”Order_Id”:{“string”:”la862041″},”Delivery_Date”:{“string”:”14-09-2016″}}
{“User__Name”:{“string”:”piyush manish”},”Product_Id”:{“int”:50024103},”Order_Id”:{“string”:”mo862032″},”Delivery_Date”:{“string”:”16-09-2016″}}
{“User__Name”:{“string”:”vijay karthik”},”Product_Id”:{“int”:50024104},”Order_Id”:{“string”:”wa862098″},”Delivery_Date”:{“string”:”29-09-2016″}}
AVRO to AVSC Conversion:
It is used for get the schema from the AVRO File
Step-1: Convert the Avro File into Avsc
java -jar /usr/lib/avro/avro-tools-1.7.4.jar getschema /tmp/Avro_Format_File.avro > /tmp/Avsc_Format_File.avsc
Step-2: View the Avsc file which is converted from Avro file
hadoop fs -cat file:/tmp/Avsc_Format_File.avsc
geouser@geouser:~$ hadoop fs -cat file:/tmp/Avsc_Format_File.avsc
{
“type” : “record”,
“name” : “Avro_File”,
“doc” : “Sqoop import of Avro_File”,
“fields” : [ {
“name” : “User__Name”,
“type” : [ “string”, “null” ],
“columnName” : “User__Name”,
“sqlType” : “12”
}, {
“name” : “Product_Id”,
“type” : [ “int”, “null” ],
“columnName” : “Product_Id”,
“sqlType” : “4”
}, {
“name” : “Order_Id”,
“type” : [ “string”, “null” ],
“columnName” : “Order_Id”,
“sqlType” : “12”
}, {
“name” : “Delivery_Date”,
“type” : [ “string”, “null” ],
“columnName” : “Delivery_Date”,
“sqlType” : “12”
} ],
“tableName” : “Avro_File”
}
JSON to AVRO Conversion:
It is work, when we have the Schema file (AVSC) and the JSON file
Step-1: Convert the Json file into Avro file
java -jar <jar file with location> <keyword> <keyword for schema> <schema file with location> <json file with location> > /tmp/Avro_File_Converted_File.avro
java -jar /usr/lib/avro/avro-tools-1.7.4.jar fromjson --schema-file /tmp/Avsc_Format_File.avsc /tmp/Json_Format_File.json > <avro file name with avro file location where the avro file will be stored>
Step-2: View the Avro file which is converted from Json file
hadoop fs -cat file:/tmp/Avro_Format_Converted_File.avro
geouser@geouser:~$ hadoop fs -cat file:/tmp/Avro_Format_Converted_File.avro
Objavro.schema{“type”:”record”,”name”:”Avro_File”,”doc”:”Sqoop import of Avro_File”,”fields”:[{“name”:”User__Name”,”type”:[“string”,”null”],”columnName”:”User__Name”,”sqlType”:”12″},{“name”:”Product_Id”,”type”:[“int”,”null”],”columnName”:”Product_Id”,”sqlType”:”4″},{“name”:”Order_Id”,”type”:[“string”,”null”],”columnName”:”Order_Id”,”sqlType”:”12″},{“name”:”Delivery_Date”,”type”:[“string”,”null”],”columnName”:”Delivery_Date”,”sqlType”:”12″}],”tableName”:”Avro_File”}avro.codenull1�얽w(�t*pV��alaister briitoʺ�/mo86204126-09-2016anifa mohammed̺�/la86204114-09-2016piyush manishκ�/mo86203216-09-2016vijay karthikк�/wa86209