Hive File Formats:

A file format is the way in which information is stored or encoded in a computer file. In Hive it refers to how records are stored inside the file. As we are dealing with structured data, each record has to be its own structure. How records are encoded in a file defines a file format. These file formats mainly varies between data encoding, compression rate, usage of space and disk I/O.

Most commonly used file formats are text file,sequence file,RC(RECORD-COLUMNAR) file and ORC(OPTIMIZED ROW-COLUMNAR) file

TextFile Format:

TEXTFILE format is a famous input/output format used in Hadoop. In Hive if we define a table as TEXTFILE it can load data of form CSV (Comma Separated Values), delimited by Tabs, Spaces and JSON data. This means fields in each record should be separated by comma or space or tab or it may be JSON(Java Script Object Notation) data.
By default if we use TEXTFILE format then each line is considered as a record.

Create a text file by specifying STORED AS TEXTFILE in the end of a CREATE TABLE statement.

(e.g) create table text_file(id int,name string,age int,department string,location string) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as textfile;

hive tutorial,hive example,hive commands

# load a textfile into a text_file table

load data local inpath ‘/home/hduser/txt_file’ into table text_file;

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

# To view the loaded file in the table goto browser and open the table directory

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

SequenceFile Format:

Sequence files are flat files consisting of binary key-value pairs. When Hive converts queries to MapReduce jobs, it decides on the appropriate key-value pairs to be used for a given record. Sequence files are in binary format which are able to split and the main use of these files is to club two or more smaller files and make them as a one sequence file.

Create a sequence file by specifying STORED AS SEQUENCEFILE in the end of a CREATE TABLE statement.

(e.g) create table sequence_file(id int,name string,age int,department string,location string) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as sequencefile;

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

# load a textfile into a sequence_file table

load data local inpath ‘/home/hduser/txt_file’ into table sequence_file;

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

# To view the loaded file in the table goto browser and open the table directory

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

Rc file:

RCFILE stands of Record Columnar File which is another type of binary file format which offers high compression rate on the top of the rows. RCFILE is used when we want to perform operations on multiple rows at a time. RCFILEs are flat files consisting of binary key/value pairs, which shares much similarity with SEQUENCEFILE. RCFILE stores columns of a table in form of record in a columnar manner.

Create a Rc file by specifying STORED AS RCFILE in the end of a CREATE TABLE statement.

(e.g) create table rc_file(id int,name string,age int,department string,location string) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as rcfile;

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

# insert a rc_file table values from text_file table

insert overwrite table rc_file select * from text_file;

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

# To view the loaded file in the table goto browser and open the table directory

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

Orc file:

ORC stands for Optimized Row Columnar which means it can store data in an optimized way than the other file formats. ORC reduces the size of the original data up to 75%. As a result the speed of data processing also increases. ORC shows better performance than Text, Sequence and RC file formats.

Create a Orc file by specifying STORED AS RCFILE in the end of a CREATE TABLE statement.

(e.g) create table orc_file(id int,name string,age int,department string,location string) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as orcfile;

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

# insert a orc_file table values from text_file table

insert overwrite table orc_file select * from text_file;

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

# To view the loaded file in the table goto browser and open the table directory

hive orc file,hive rc file,sequence file,text file, types of files format in hive,hive orc file,hive rc file,sequence file,hive file format examples

2 Thoughts on “Hive File Format Examples”

Leave a Reply

Your email address will not be published. Required fields are marked *