What is AVRO File Format:

        Avro stores both the data definition and the data together in one message or file making it easy for programs to dynamically understand the information stored in an Avro file or message.

       Avro stores the data definition in JSON format making it easy to read and interpret, the data itself is stored in binary format making it compact and efficient. Avro files include markers that can be used to splitting large data sets into subsets suitable for MapReduce processing.

What is JSON File Format:

      JSON(JavaScript Object Notation) is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application,  as an alternative to XML.  Square space uses JSON to store and organize site content created with the CMS.


AVSC is a Schema File.


AVRO to JSON Conversion:

Step-1: First Download the avro-tools-1.7.4.jar

Step-2: Example Avro File

Objavro.schema\8E{“type”:”record”,”name”:”Avro_File”,”doc”:”Sqoop import of Avro_File”,”fields”:[{“name”:”User__Name”,”type”:[“string”,”null”],”columnName”:”User__Name”,”sqlType”:”12″},{“name”:”Product_Id”,”type”:[“int”,”null”],”columnName”:”Product_Id”,”sqlType”:”4″},{“name”:”Order_Id”,”type”:[“string”,”null”],”columnName”:”Order_Id”,”sqlType”:”12″},{“name”:”Delivery_Date”,”type”:[“string”,”null”],”columnName”:”Delivery_Date”,”sqlType”:”12″}],”tableName”:”Avro_File”}\00\D0!\B2\00e\B1\AC\CBI6\80aɳ\D6\00alaister briito\00ʺ\DA/\00mo862041\0026-09-2016\00anifa mohammed\00̺\DA/\00la862041\0014-09-2016\00piyush manish\00κ\DA/\00mo862032\0016-09-2016\00vijay karthik\00к\DA/\00wa862098\0029-09-2016\D0!\B2\00e\B1\AC\CBI6\80aɳ

Step-3: Convert the AVRO file into JSON

java -jar <jar file with location> <keyword> <avro file with location>  >  <json file name with json file location where the json file will be stored>

java  -jar  /usr/lib/avro/avro-tools-1.7.4.jar  tojson  /tmp/Avro_Format_File.avro  >  /tmp/Json_Format_File.json

Step-4: View the Json Format File which is converted from Avro file

hadoop fs -cat file:/tmp/Json_Format_File.json

geouser@geouser:~$ hadoop fs -cat file:/tmp/Json_Format_File.json
{“User__Name”:{“string”:”alaister briito”},”Product_Id”:{“int”:50024101},”Order_Id”:{“string”:”mo862041″},”Delivery_Date”:{“string”:”26-09-2016″}}
{“User__Name”:{“string”:”anifa mohammed”},”Product_Id”:{“int”:50024102},”Order_Id”:{“string”:”la862041″},”Delivery_Date”:{“string”:”14-09-2016″}}
{“User__Name”:{“string”:”piyush manish”},”Product_Id”:{“int”:50024103},”Order_Id”:{“string”:”mo862032″},”Delivery_Date”:{“string”:”16-09-2016″}}
{“User__Name”:{“string”:”vijay karthik”},”Product_Id”:{“int”:50024104},”Order_Id”:{“string”:”wa862098″},”Delivery_Date”:{“string”:”29-09-2016″}}

AVRO to AVSC Conversion:

It is used for get the schema from the AVRO File

Step-1: Convert the Avro File into Avsc

java -jar /usr/lib/avro/avro-tools-1.7.4.jar getschema /tmp/Avro_Format_File.avro > /tmp/Avsc_Format_File.avsc

Step-2: View the Avsc file which is converted from Avro file

hadoop fs -cat file:/tmp/Avsc_Format_File.avsc

geouser@geouser:~$ hadoop fs -cat file:/tmp/Avsc_Format_File.avsc
  “type” : “record”,
  “name” : “Avro_File”,
  “doc” : “Sqoop import of Avro_File”,
  “fields” : [ {
    “name” : “User__Name”,
    “type” : [ “string”, “null” ],
    “columnName” : “User__Name”,
    “sqlType” : “12”
  }, {
    “name” : “Product_Id”,
    “type” : [ “int”, “null” ],
    “columnName” : “Product_Id”,
    “sqlType” : “4”
  }, {
    “name” : “Order_Id”,
    “type” : [ “string”, “null” ],
    “columnName” : “Order_Id”,
    “sqlType” : “12”
  }, {
    “name” : “Delivery_Date”,
    “type” : [ “string”, “null” ],
    “columnName” : “Delivery_Date”,
    “sqlType” : “12”
  } ],
  “tableName” : “Avro_File”

JSON to AVRO Conversion:

It is work, when we have the Schema file (AVSC) and the JSON file

Step-1: Convert the Json file into Avro file

java -jar <jar file with location> <keyword> <keyword for schema> <schema file with location> <json file with location> > /tmp/Avro_File_Converted_File.avro

java -jar /usr/lib/avro/avro-tools-1.7.4.jar fromjson --schema-file /tmp/Avsc_Format_File.avsc /tmp/Json_Format_File.json > <avro file name with avro file location where the avro file will be stored>

Step-2: View the Avro file which is converted from Json file

hadoop fs -cat file:/tmp/Avro_Format_Converted_File.avro

geouser@geouser:~$ hadoop fs -cat file:/tmp/Avro_Format_Converted_File.avro
Objavro.schema{“type”:”record”,”name”:”Avro_File”,”doc”:”Sqoop import of Avro_File”,”fields”:[{“name”:”User__Name”,”type”:[“string”,”null”],”columnName”:”User__Name”,”sqlType”:”12″},{“name”:”Product_Id”,”type”:[“int”,”null”],”columnName”:”Product_Id”,”sqlType”:”4″},{“name”:”Order_Id”,”type”:[“string”,”null”],”columnName”:”Order_Id”,”sqlType”:”12″},{“name”:”Delivery_Date”,”type”:[“string”,”null”],”columnName”:”Delivery_Date”,”sqlType”:”12″}],”tableName”:”Avro_File”}avro.codenull1�얽w(�t*pV��alaister briitoʺ�/mo86204126-09-2016anifa mohammed̺�/la86204114-09-2016piyush manishκ�/mo86203216-09-2016vijay karthikк�/wa86209

Leave a Reply

Your email address will not be published. Required fields are marked *