Using Counters in Hadoop MapReduce API with Example
Counters in Hadoop MapReduce
Advantages of counters are :
1. Job statistics
2. Quality control
3. Problem Diagnosis.
While Executing a task, if any one of them fails because of bad records with binary data then the whole job also fails.
At that point, we need to avoid the bad records, i.e, skip out the bad records using counters.
For counters, we have to configure some parameters within the mapper class to check for statistics.
At that time of running, parameters are used to find out which are the bad records.
Here we have built- in counters parameters.
- Max no of records in mapper I/P
- Max no of records in mapper O/P
We can have user defined counters and we will use as get counters() which has to be mentioned in the driver code.
Bad records will be stored in temp/log files.
Simple Example for map reduce:
Different phases of Map Reduce Programming are
1. Map phase
2. Sort and shuffle phase
This is a hadoop class
Hadoop is best in the market
The market is not always good
Good placements are there in hadoop
earning hadoop is a good thing.
<100101011,This is a hadoop class>
<2011011, hadoop is good in market>
<1110111,learing hadoop is good thing>
It will take the delimiter as space in each line and take value for each and every word
Mapper output is the input for sort and shuffle output
Internally, it will sort and shuffle and give the desired output.
Mapper output Sort and shuffle output
(1- No of occurrences)
< hadoop,1> < hadoop,1,1,1,1>
< hadoop,1> < good,1,1,1,1>
Reducer Output is
Obj. hasmore elements()
Value. get values()
Context. Write(word, sum)
Map Reduce Programming
Any Map Reduce program can be split into 3 modules
1. Driver code
2. Mapper code
Note:-The Reducer program is not a mandatory one.
The reason might be, if you wish to conclude business logic in the mapper class itself, we can do that very well and in that case we cannot expect the Reducer code.
The selection of the mapper and reducer will depend upon the case that we have selected.
1. Driver code:
The Driver code generally falls under the main() method of the map reduce program.
As we are dealing with the below information in our driver code.
1) All the job level config details (job name, jar file- etc)
2) Mapper combiner reducer class level details.
Ex:-Job. set mapper class(tokenize mapper. class);
Job. Set combiner class(Int sum reduce. class);
Job. Set Reducer class(Int sum reduce. class);
- The final output(key, value)details
Ex:-Job. set output key class(text. class);
Data type will be changed.
Job. Set output value class(Int writable. class);
- Input and output related information of HDFS (the input path from where the map reduce program will expect the input file of HDFS and the output path of HDFS where exactly map reduce will produce the results.
Note:-Here at the input HDFS path, the developer has to create well in advance before executing the map reduce program and upload the input file in the same path (HDFS CLI)
The program itself will create the HDFS output path after the successful execution of the program.
2. Mapper code:
Any Mapper class should extend the base class of mapper in order to provide the mapper business logic.
It will take 4 arguments and first two corresponds to mapper input (key, value)
Ex:-Public static class tokenize mapper extends
Mapper Mapper Mapper Mapper
Input key I/P value O/P key O/P value
Last two argument corresponds to mapper output(key, value)
In any mapper class, we have to override a method called a Map.
Syntax:- Public void map (object, key, text value, context, content) throws IO Exception, Interrupted Exception
How to add the External (dependent jars)
Right click on Project name
Select Build path
configure build path Screenshot_1806 go to libraries tab
click on add external Jab
select the appropriate jar files
click on ok.
How to Export the jar
Right click on project name
click on Export
go to Java
select JAR file
Click on Next
browse the path where you wish to place the jar file
save Screenshot_1806 finish
>hadoop fs- mkdir/user/Batch4/map reduce/input/
To create Directory
> hadoop fs-put. Input.txt /user/Batch 4/map Reduce/ input
Copying the input file to HDFS
> hadoop jar word count New.jar word count new/user/batch4/ map reduce /input/input.txt/user/Batch4/map reduce
hadoop dfs – cat/user/Batch4/map Reduce/out put/part- r-00000
To check the output