Hadoop’s Pig Data Types and Syntax

Rating: 4
Views: 5520
by Ruchitha Geebu
Last modified: January 30th 2021

Pig Data Types

  • Every piece of data in PIG has one of these four types:
Data Atom: is a simple atomic DATA VALUE and it is stored as string but can be used either a string or a number.
Examples:‘apache.org’ and ‘1-0’
Tuple: is a data record consisting of a sequence of “fields” and each field is a piece of data of any type (data atom, tuple or data bag)
  • We denote tuples with <> bracketing
  •  Example of a hepde is
Data Bag: Is a set of tuples (duplicate tuples are allowed)
  • Think of it as a “table”, except that pig does not require that the tuple field types match, or even that the tuples has the same no. of fields. Bag could be {}
Inclined to build a profession as Hadoop Developer? Then here is the blog post on HADOOP TRAINING ONLINE.
Data Map: is a map from keys that are string literals to values that can be of any data type.
  • Think of it as a Hash map where X can be any of the 4 pig data types.
  • A data map supports the expected get and put interface.
Data Types in Pig:
 Other language Ping
Int Int
string char array
float float
long long
double double
boolean boolean

Different Transformations in Pig:

REGISTER- Register jar file with the pig runtime

DEFINE- Create an alias for a macro, UDF, Streaming script (or) command specification.

IMPORT- Import macros defined in separate file into a script.

Typical Transformations:

 Load: load data from the file system.

FILETER: Remove unwanted rows from a location

FOREACH: Particular column is displayed

GENERATE: Add or Remove fields from a Relation

GROUP: To group data in a single relation.

COGROUP: To group or join data in two or more relation

UNION: To merge the contents of two or more relations

SPLIT: To partition the contents of a relation into multiple relations

JOIN (Inner or Outer): To join the data in two or more relations

ORDER: Sort the relations by one or more fields

LIMIT: Limits the size of a relation to a maximum no. of tuples

Debugging Pig Latin:  

  • Pig Latin provides operators that help you debug the pig latin statements.
DUMP: To display the results to your terminal screen
DESCRISE: To review the schema of a relation.
EXPLAIN: To view the logical, physical or map reduce execution plans to compute a relation.
ILLUSTRATE: To view the step-by-step execution of a series of statements.

Frequently asked Hadoop Interview Questions

List of Other Big Data Courses:

 Hadoop Adminstartion  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout