Hive User Defined Functions (UDF) Java Example

Hive User Defined Functions (UDF) Java Example
Generally Hive having some Built-in functions,we can use that Built-in functions for our Hive program with out adding any extra code but some times user requirement is not available in that built-in functions at that time user can write some own custom user defined functions called UDF (user defined function).
There are three types of UDFs
1) Regular UDFs
2) User Defined Aggregate Functions - UDAFs (See,here)
3) User Defined Table Generating Functions - UDTFs (See,here)
Here is the simple steps of How To Write Hive UDF Example In Java.
Step 1 - Add these jar files to your java project.

hive-exe*.jar

$HIVE_HOME/lib/*.jar
$HADOOP_HOME/share/hadoop/mapreduce/*.jar
$HADOOP_HOME/share/hadoop/common/*.jar

AutoIncrementUDF.java

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
@UDFType(stateful = true)
public class AutoIncrementUDF extends UDF {
 int ctr;
 public int evaluate() {
  ctr++;
  return ctr;
 }
}

Step 2 - Compile and create a jar file of your java project. Creating a jar file is left to you.
Step 3 - You can add jar file in ways
1) Using Hive Shell
Step 4 - Change the directory to /usr/local/hive/bin

$ cd $HIVE_HOME/bin

Step 5 - Enter into hive shell

$ hive

hive> ADD JAR /home/hduser/Desktop/HIVE/AutoIncrementUDF.jar;

OR
2) hive-site.xml

hive-site.xml

<property>
    <name>hive.aux.jars.path</name>
    <value>file:///home/hduser/Desktop/HIVE/AutoIncrementUDF.jar</value>
</property>

OR
3) hive-env.sh

hive-env.sh

export HIVE_AUX_JARS_PATH="/home/hduser/Desktop/HIVE/AutoIncrementUDF.jar"

Step 6 - Create a function

hive> CREATE TEMPORARY FUNCTION incr AS 'AutoIncrementUDF';

OR
Step 6 - Create a function

hive> CREATE PERMANENT FUNCTION incr AS 'AutoIncrementUDF';

Step 7 - Create a data.csv file

data.csv

Step 8 - Add these following lines to data.csv file Save and close.

row1,c1,c2
row2,c1,c2
row3,c1,c2
row4,c1,c2
row5,c1,c2
row6,c1,c2
row7,c1,c2
row8,c1,c2
row9,c1,c2
row10,c1,c2

Step 9 - Create a table t1, load data.csv data into the table and verify.

hive> CREATE TABLE t1 (id STRING, c1 STRING, c2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/data.csv' OVERWRITE INTO TABLE t1;
hive> SELECT * FROM t1;

Step 10 - Create a table increment_table1, execute UDF and verfiy.

hive> CREATE TABLE increment_table1 (id INT, c1 STRING, c2 STRING, c3 STRING);
hive> INSERT OVERWRITE TABLE increment_table1 SELECT incr() AS inc, id, c1, c2 FROM t1;
hive> SELECT * FROM increment_table1;

Please share this blog post and follow me for latest updates on

Big Data Analysis

Search This Blog

Hive User Defined Functions (UDF) Java Example

Comments

Post a Comment

Popular posts from this blog

Apache Spark WordCount scala example

Hive hiveserver2 and Web UI usage

Apache Spark Shell Usage