Writing a User Defined Functions (UDF) for Pig
Pig provides extensive support for user defined functions as a way to specify custom processing.
Pig UDF’s can currently be implemented in three languages like Java, Python, Java script, Ruby and Groovy.
Java Functions are more efficient because they are implemented in the same language as pig.
Limited support is provided for Python, Ruby, Java script and Groovy functions.
Pig also provides support for piggy Bank, a repository for JAVA UDF’s and through piggy Bank, you can access JAVA UDF’s written by other users and can also contribute Java UDFs that you have written.
Example for User Defined Function i.e Eval Function in Pig
To create the simple Java project.
Log on to Eclipse
Provide the name for the project
Once project have been created successfully, Right-click on the Java project name and select new
Then select package and provide the package name
Once the package is created successfully, Right –click on the package name
Select new and then select class.
Provide the name for the class.
As a part of the apache pig customization, the created class should extend eval func<String>, which is predefined function.
If you are extending the eval function, we have to override the method called EXEC which will have tuples.
Write the code as below.
Ex:- Package myudfs;
Import java. io. IOException;
Import org. apache.pig EvalFunc;
Import org. apache.pig data.Tuple;
Public class UPPER extends Eval Func<String>
Public string exec(Tuple input)throws IO Exception
If(input ==null// input.size()==o)
Throw new IO Exception(“caught exception processing input row”,e)
For executing the pig-customization program, we have to import the following packages.
Import org. apache. pig. Eval Fune;
Import org. apache. pig. Data. Tuple;
Import org. apache. pig. Impl. util. wrappe IO Exception;
To these imported packages, we have to add the supporting external jars for the pig in the below fashion.
Right click on the Package Name
Click on Build path
Select configure Build path
Window will be open, in that click on libraries tab
Click on Add External Jars
We have to select the respective supporting jars from your local drive [Multiple selection is possible]
Whatever jar files have been selected, they will come under referenced libraries folder of the project explorer window
Check for the Errors
To compile the program, we make a jar file for the program as in the below fashion
go to project
Right click and select Export
Select an Export Destination As JAR file
Provide the JAR file name in the JAR file, with extenstion .jar
Running Pig in embedded mode using the JAR file as shown in the below.
--My script- pig
REGISTER my udfs.jar;
A=LOAD ‘Student-data.txt’ AS(name: char array, age: int, gpa: float);
B=FOREACH A GENERATE my udfs.UPPER(name);