Introduction to Apache Pig – Hadoop

Introduction to Pig

Capture 15 Apache Pig is one of the major components of hadoop which is an abstract layer (high level) on the top of MapReduce.

Capture 15 Apache pig is meant for processing huge amount of data that gets stored on top of HDFS.

Capture 15 The processing will be carried out in apache pig by making use of different transform actions like load, Generate, filter etc.

Capture 15 So, we can call apache pig as transformation language (or) Data flow language.

Capture 15 So the data has to go through this transformation to archive the dizer functionality.

Note: Apache Pig is a abstract layer or high level language on top of HDFS as every statement of the pig is internally getting converted into MR.

Map Reduce Vs Apache Pig

  1. In MapReduce, for processing data we have to write the driver code, Mapper code and Reduces code (if required) irrespective of business logic that we are applying Capture 15 Where as in Apache pig, we can archive some functionality by making use of scripting language with less number of lines of coding.
  2. MapReduce is expecting Java programming language skills where as in apache pig even a non java programming member  can write the code using simple scripting.
  3. 200 lines of MR code is equal to 10 lines of a pig code.
  4. In Map reduce, we have to follow scripting process something like compilation of MR code, Executing code, packaging code and deploy in cluster where as in apache pig, it is very easy to run the code without involving many steps

Installing and Running Pig

Capture 15 Pig runs as a client – side application

Capture 15 If you want to run pig on a hadoop cluster, there is nothing extra to install on the cluster i.e. pig launches jobs and interacts with HDFS or other Hadoop file systems from your work station.

Capture 15 Installation is straight forward and Java 6 is a prerequisite.

Capture 15 Download a stable release from http://pig.apache.org/release.html and un place the tar ball in a suitable place on your work station i.e % tar xzf pig – x.y.z tar.  Gz.

Capture 15 It’s convenient to add pig’s bin directory to your command line path.

Capture 15 For Example:  % export PIG-INSTALL=/home/tom/pig-x.y.z export PATH = $ PIG-INSTALL/bin

Capture 15 You also need to set the JAVA-HOME environment variable to point to a suitable Java Installation.

Capture 15 Provide the command pig-help to get usage instructions.




0 Responses on Introduction to Apache Pig – Hadoop"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.