Home  >  Blog  >   Hadoop  > 

Hadoop’s Pig Data Types and Syntax

Rating: 4
  
 
5800
  1. Share:

Pig Data Types

  • Every piece of data in PIG has one of these four types:
Data Atom: is a simple atomic DATA VALUE and it is stored as string but can be used either a string or a number.
Examples:‘apache.org’ and ‘1-0’
 
Tuple: is a data record consisting of a sequence of “fields” and each field is a piece of data of any type (data atom, tuple or data bag)
 
  • We denote tuples with <> bracketing
  •  Example of a hepde is
 
Data Bag: Is a set of tuples (duplicate tuples are allowed)
 
  • Think of it as a “table”, except that pig does not require that the tuple field types match, or even that the tuples has the same no. of fields. Bag could be {}
 
Inclined to build a profession as Hadoop Developer? Then here is the blog post on HADOOP TRAINING ONLINE.
 
Data Map: is a map from keys that are string literals to values that can be of any data type.
 
  • Think of it as a Hash map where X can be any of the 4 pig data types.
  • A data map supports the expected get and put interface.

 MindMajix YouTube Channel

Data Types in Pig:
 
 Other language Ping
Int Int
string char array
float float
long long
double double
boolean boolean

Different Transformations in Pig:

REGISTER- Register jar file with the pig runtime

DEFINE- Create an alias for a macro, UDF, Streaming script (or) command specification.

IMPORT- Import macros defined in separate file into a script.

Typical Transformations:

 Load: load data from the file system.

FILETER: Remove unwanted rows from a location

FOREACH: Particular column is displayed

GENERATE: Add or Remove fields from a Relation

GROUP: To group data in a single relation.

COGROUP: To group or join data in two or more relation

UNION: To merge the contents of two or more relations

SPLIT: To partition the contents of a relation into multiple relations

JOIN (Inner or Outer): To join the data in two or more relations

ORDER: Sort the relations by one or more fields

LIMIT: Limits the size of a relation to a maximum no. of tuples

Debugging Pig Latin:  

  • Pig Latin provides operators that help you debug the pig latin statements.
 
DUMP: To display the results to your terminal screen
 
DESCRISE: To review the schema of a relation.
 
EXPLAIN: To view the logical, physical or map reduce execution plans to compute a relation.
 
ILLUSTRATE: To view the step-by-step execution of a series of statements.

Frequently asked Hadoop Interview Questions

List of Other Big Data Courses:

 Hadoop Adminstartion  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout

 

 

Join our newsletter
inbox

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
NameDates
Hadoop TrainingMar 25 to Apr 09
Hadoop TrainingMar 28 to Apr 12
Hadoop TrainingApr 01 to Apr 16
Hadoop TrainingApr 04 to Apr 19
Last updated: 23 March 2023
About Author
Remy Sharp
Ruchitha Geebu

I am Ruchitha, working as a content writer for MindMajix technologies. My writings focus on the latest technical software, tutorials, and innovations. I am also into research about AI and Neuromarketing. I am a media post-graduate from BCU – Birmingham, UK. Before, my writings focused on business articles on digital marketing and social media. You can connect with me on LinkedIn.

Recommended Courses

1 /15