Blog

Apache Pig User Defined Functions - Hadoop

  • (4.0)
  • | 1113 Ratings |
  • Last Updated March 22, 2017

 Pig provides extensive support for USER DEFINED FUNCTIONS as a way to specify custom processing.

PIG UDF’s can currently be implemented in three languages like Java, Python, Java script, Ruby and Groovy.

Java Functions are more efficient because they are implemented in the same language as pig.

Limited support is provided for Python, Ruby, Java script and Groovy functions.

Pig also provides support for piggy Bank, a repository for JAVA UDF’s and through piggy Bank, you can access JAVA UDF’s written by other users and can also contribute Java UDFs that you have written.

Interested in mastering MapReduce? Enroll now for FREE demo on MapReduce training

Example for User Defined Function i.e Eval Function in Pig

To create the simple Java project.

Log on to Eclipse

   To    menu

       New

        Java Project

           Provide the name for the project

             Once project have been created successfully, Right-click on the Java project name and select new

                    Then select package and provide the package name

 Once the package is created successfully, Right –click on the package name

  Select new and then select class.

 Provide the name for the class.

As a part of the apache pig customization, the created class should extend eval func, which is predefined function.

         If you are extending the eval function, we have to override the method called EXEC which will have tuples.

              Write the code as below.

Ex:- Package myudfs;
Import java. io. IOException;
Import org. apache.pig EvalFunc;
Import org. apache.pig data.Tuple;
Public class UPPER extends Eval Func
{
Public string exec(Tuple input)throws IO Exception
{
If(input ==null// input.size()==o)
Return null;
Try
{
String str=(String)input.get(0);
Return str.toUpperCase();
}
Catch(Exception e)
{
Throw new IO Exception(“caught exception processing input row”,e)
}
}
}

Execution:

For executing the pig-customization program, we have to import the following packages.

Import org. apache. pig. Eval Fune;
Import org. apache. pig. Data. Tuple;
Import org. apache. pig. Impl. util. wrappe IO Exception;
 

To these imported packages, we have to add the supporting external jars for the pig in the below fashion.

Frequently Asked MapReduce Interview Questions & Answers

Right click on the Package Name

   Click on Build path

 Select configure Build path

  Window will be open, in that click on libraries tab

 Click on Add External Jars

 We have to select the respective supporting jars from your local drive [Multiple selection is possible]

  Whatever jar files have been selected, they will come under referenced libraries folder of the project explorer window

      Check for the Errors

To compile the program, we make a jar file for the program as in the below fashion

       go to project

    Right  click and select Export

         Select an Export Destination As JAR file

         Next

       Provide the JAR file name in the JAR file, with extenstion .jar

          Finish

Running Pig in embedded mode using the JAR file as shown in the below.

--My script- pig
REGISTER my udfs.jar;
A=LOAD ‘Student-data.txt’ AS(name: char array, age: int, gpa: float);
B=FOREACH A GENERATE my udfs.UPPER(name);
DUMP B;
Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Other Big Data Courses:

 Hadoop Adminstartion  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout

 


Subscribe For Free Demo

Free Demo for Corporate & Online Trainings.

Free Demo Popup -->