Hive :

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

Hive udf (user defined functions) :

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment.

Apache Hive defines, in addition to the regular user defined functions (UDF).Hive enables developers to create their own custom functions with Java.

Step 1 :

Start eclipse and create java project.

Step 2 :

Write java program.


package udf;
import org.apache.hadoop.hive.ql.exec.UDF;

public class lower extends UDF {

public Text evaluate(Text str) {
if (str == null) return null;
return new Text(str.toString().toLowerCase());

Step 3 :

Add external jar (hadoop,hive jars) files to that project.

Right click on project –> build path–> configure build path –>
libraries –> add external jars –> select hadoop and hive lib folder jars files –> click ok.

Step 4 :

Create project as jar file.

Right click the program –> export –> create as jar –> click finish.

Step 5 :

Open hive terminal and add the jar

Syntax: add jar filename.jar;

(e.g) add jar lower.jar;

Step 6 :

Create a temporary function in hive terminal.

Syntax: Create temporary function classname as ‘packagename.classname’;

(e.g) create temporary function lower as ‘udf.lower’;

Step 7 :

Create a hive table and load data into table .

create table sample(no int,name string,city string) row format delimited fields terminated by ‘,’ lines terminated by ‘\n’ stored as textfile;

load data local inpath ‘/home/geouser/sample2.txt’ into table sample;

select *from sample;

Step 8 :

Now Call the temporary function lower in the select statement.

Syntax : select classname(column name) from tablename;

(e.g) select lower(name) from sample;

# lower – java program class name.

# name – column name of table sample.

# sample – table name.