Hadoop Hive UDF

Hive UDF (User-Defined Functions)

  • Sometimes the query you want to write can’t be expressed easily using the built–in functions that HIVE provides.
  • By writing UDF (User Defined function) hive makes it easy to plug in your own processing code and invoke it from a Hive query.
  • UDF’s have to be writhen in Java, the Language that Hive itself is written in.

There are three types of  UDF’s in Hive

1. UDF’s (regular)
2. UDF’s (user defined Aggregate Functions)
3. UDF’s (user defined table – generating Functions)

  • They differ in the number of rows in which they accept input and produces output.

1) UDF Operates on a single row and produces a single row as its output has most of the functions, such as mathematical functions.

2) UDAF’S:-

  • UDAF works on multiple input rows and creates a single output row and aggregate functions which include functions such as count and MAX.
  • A UDTF:-Operates on a single row and produces multiple rows- a table- as output.
  • Table–generating function are less well known than the other two types.

Ex:- Consider a table with a single column x which contains arrays of strings.

hive>CREATE TABLE arrays(*ARRAY DELIMITED FIELDS TERMANATED By’?01’Collection
ITEMS By’?02’;
  • After running a LOAD DATA Command, the following query confirms that the data was loaded correctly:
hive>SELECT * FROM arrays;

[“a”, ”b”]

[“c”, ”d” ,“e”]

  • Next, we can use the explode UDTF to transform this table
  • This function emits a row for each entry in the array.
  • So, in this case the type of the output column y is STRING.
  • The result is that the table is flattened into five rows:
Hive>SELECT explode(x)As y from arrays;
  • SELECT Statements using UDTFs have some restrictions such as not being able to retrieve additional column expressions.

MindMajix YouTube Channel

Are you looking for Hadoop Hive Training? MindMajix is the right palce to get trainined. Lets Hurry!

Writing a Hive UDF:-

  • We can write a simple UDF by using characters from the ends of strings.
  • Hive already has a built- in function called, so we can call the strip
  • The code for the strip Java class is shown as below for stripping characters from the ends of strings
Package com. hadoop book .hive;
Import . org . apache. Common. Long. String URLS;
Import . org . apache. hadoop. Hive. ql. exec UDF;
Import . org . apache. hadoop. Io .text;
Public class strip extends UDF
{
Private Text result = new text();
Public. Text. evaluate(Text str)
{
If(str==null)
{
Return null;
}
Result. set(string utils. Strip(str. To string()));
Return result:
}
Public. Text. evaluate(Text str, string strip chers)
{
If(str==null)
{
Return null;
}
result. set(string utils. Strip(str. To string(),strip chars));
Return result;
}
}
  • A UDF must satisfy the following two properties:

1. A UDF must be a sub class of org. apache. Hadoop. Hive ql. exec. UDF
2. A UDF must implement at least one evaluate() method.

  • The strip class has two evaluate() methods. Which are not defined by an interface
  • The first strips leading and trailing white space from the input while the second strip has set of supplied characters from the ends of the string.

To use MB UDF in Hive, Run as JAVA Application and register the file with Hive:

hive>ADD JAR/path/to/Hive-examples.jar;
  • We also need to create an alias for the java class name:
Hive)CREATE TEMPORARY FUNCTION strip As ‘com-hadoop book. Hive. strip.;
  • To call ADD JAR, you can specify at launch time a path where Hive looks for auxiliary JAR files to put on its class path.
  • This technique is used for automatically adding your own library of UDFs for every time you hive.
  • There are two ways of specifying the path either by passing the –aux path option to the hive command as below:
%hive—aux path/path/to/Hive-examples jar

or by setting the HIVE-AUX-JARS-PATH environment variable before involving Hive.

  • The UDF is now ready to be used, just like a built-in function:
hive>SELECT EMPID, Strip(EMPNAME),ESAL FROM Employee;

(Or)

hive>SELECT strip(‘banana’, ’ab’)FROM dummy;

Output is: non

List of Big Data Courses:

 Hadoop Adminstartion MapReduce
 Big Data On AWS Informatica Big Data Integration
 Bigdata Greenplum DBA Informatica Big Data Edition
 Hadoop Hive Impala
 Hadoop Testing Apache Mahout
Course Schedule
NameDates
Hadoop TrainingNov 09 to Nov 24View Details
Hadoop TrainingNov 12 to Nov 27View Details
Hadoop TrainingNov 16 to Dec 01View Details
Hadoop TrainingNov 19 to Dec 04View Details
Last updated: 04 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less