Pig Data Types
- Every piece of data in PIG has one of these four types:
- We denote tuples with <> bracketing
- Example of a hepde is
- Think of it as a “table”, except that pig does not require that the tuple field types match, or even that the tuples has the same no. of fields. Bag could be {}
- Think of it as a Hash map where X can be any of the 4 pig data types.
- A data map supports the expected get and put interface.
Other language | Ping |
Int | Int |
string | char array |
float | float |
long | long |
double | double |
boolean | boolean |
Different Transformations in Pig:
REGISTER- Register jar file with the pig runtime
DEFINE- Create an alias for a macro, UDF, Streaming script (or) command specification.
IMPORT- Import macros defined in separate file into a script.
Typical Transformations:
Load: load data from the file system.
FILETER: Remove unwanted rows from a location
FOREACH: Particular column is displayed
Subscribe to our youtube channel to get new updates..!
GENERATE: Add or Remove fields from a Relation
GROUP: To group data in a single relation.
COGROUP: To group or join data in two or more relation
UNION: To merge the contents of two or more relations
SPLIT: To partition the contents of a relation into multiple relations
JOIN (Inner or Outer): To join the data in two or more relations
ORDER: Sort the relations by one or more fields
LIMIT: Limits the size of a relation to a maximum no. of tuples
Debugging Pig Latin:
- Pig Latin provides operators that help you debug the pig latin statements.
Frequently asked Hadoop Interview Questions
List of Other Big Data Courses:
Hadoop Adminstartion | MapReduce |
Big Data On AWS | Informatica Big Data Integration |
Bigdata Greenplum DBA | Informatica Big Data Edition |
Hadoop Hive | Impala |
Hadoop Testing | Apache Mahout |