Hive User Defined Functions (UDF) Java Example
Generally Hive having some Built-in functions,we can use that Built-in functions for our Hive program with out adding any extra code but some times user requirement is not available in that built-in functions at that time user can write some own custom user defined functions called UDF (user defined function).
There are three types of UDFs
1) Regular UDFs
2) User Defined Aggregate Functions - UDAFs (See,here)
3) User Defined Table Generating Functions - UDTFs (See,here)
Here is the simple steps of How To Write Hive UDF Example In Java.
Step 1 - Add these jar files to your java project.
AutoIncrementUDF.java
Step 2 - Compile and create a jar file of your java project. Creating a jar file is left to you.
Step 3 - You can add jar file in ways
1) Using Hive Shell
Step 4 - Change the directory to /usr/local/hive/bin
Step 5 - Enter into hive shell
OR
2) hive-site.xml
OR
3) hive-env.sh
Step 6 - Create a function
OR
Step 6 - Create a function
Step 7 - Create a data.csv file
Step 8 - Add these following lines to data.csv file Save and close.
Step 9 - Create a table t1, load data.csv data into the table and verify.
Step 10 - Create a table increment_table1, execute UDF and verfiy.
Please share this blog post and follow me for latest updates on
Generally Hive having some Built-in functions,we can use that Built-in functions for our Hive program with out adding any extra code but some times user requirement is not available in that built-in functions at that time user can write some own custom user defined functions called UDF (user defined function).
There are three types of UDFs
1) Regular UDFs
2) User Defined Aggregate Functions - UDAFs (See,here)
3) User Defined Table Generating Functions - UDTFs (See,here)
Here is the simple steps of How To Write Hive UDF Example In Java.
Step 1 - Add these jar files to your java project.
hive-exe*.jar
$HIVE_HOME/lib/*.jar $HADOOP_HOME/share/hadoop/mapreduce/*.jar $HADOOP_HOME/share/hadoop/common/*.jar
import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; @UDFType(stateful = true) public class AutoIncrementUDF extends UDF { int ctr; public int evaluate() { ctr++; return ctr; } }
Step 3 - You can add jar file in ways
1) Using Hive Shell
Step 4 - Change the directory to /usr/local/hive/bin
$ cd $HIVE_HOME/bin
$ hive
hive> ADD JAR /home/hduser/Desktop/HIVE/AutoIncrementUDF.jar;
2) hive-site.xml
hive-site.xml
<property> <name>hive.aux.jars.path</name> <value>file:///home/hduser/Desktop/HIVE/AutoIncrementUDF.jar</value> </property>
3) hive-env.sh
hive-env.sh
export HIVE_AUX_JARS_PATH="/home/hduser/Desktop/HIVE/AutoIncrementUDF.jar"
hive> CREATE TEMPORARY FUNCTION incr AS 'AutoIncrementUDF';
Step 6 - Create a function
hive> CREATE PERMANENT FUNCTION incr AS 'AutoIncrementUDF';
data.csv
row1,c1,c2 row2,c1,c2 row3,c1,c2 row4,c1,c2 row5,c1,c2 row6,c1,c2 row7,c1,c2 row8,c1,c2 row9,c1,c2 row10,c1,c2
hive> CREATE TABLE t1 (id STRING, c1 STRING, c2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; hive> LOAD DATA LOCAL INPATH '/home/hduser/Desktop/HIVE/data.csv' OVERWRITE INTO TABLE t1; hive> SELECT * FROM t1;
hive> CREATE TABLE increment_table1 (id INT, c1 STRING, c2 STRING, c3 STRING); hive> INSERT OVERWRITE TABLE increment_table1 SELECT incr() AS inc, id, c1, c2 FROM t1; hive> SELECT * FROM increment_table1;
Comments
Post a Comment