Blog

Using Counters in Hadoop MapReduce API with Example

Counters in Hadoop MapReduce

 
 MAP REDUCE COUNTER provides a way to measure the progress or the number of operations that occur within MAP REDUCE programs.
 
  Advantages of counters are :
 
1. Job statistics
2. Quality control
3. Problem Diagnosis.
While Executing a task, if any one of them fails because of bad records with binary data then the whole job also fails.
At that point, we need to avoid the bad records, i.e, skip out the bad records using counters.
 
For counters, we have to configure some parameters within the mapper class to check for statistics.
 
At that time of running, parameters are used to find out which are the bad records.
 
Here we have built- in counters parameters.
 
  • Max no of records in mapper I/P
  • Max no of records in mapper O/P
 
We can have user defined counters and we will use as get counters() which has to be mentioned in the driver code.
 
Bad records will be stored in temp/log files.
 

Simple Example for map reduce:

 
Different phases of Map Reduce Programming are
 
1. Map phase
2. Sort and shuffle phase
 3.Reducer phase
 
Ex:-
This is a hadoop class
 
Hadoop is best in the market
 
The market is not always good
 
Good placements are there in hadoop
 
earning hadoop is a good thing.
 
Flow:
 
Part-m-00000 Map file
 
Part-r-00000 Reduce file
 
<100101011,This is a hadoop class>
 
                       
 
System generated
 
<2011011, hadoop is good in market>
 
<1110111,learing hadoop is good thing>
 

Mapper Output:

It will take the delimiter as space in each line and take value for each and every word
 

 

  Mapper output is the input for sort and shuffle output

 
Internally, it will sort and shuffle and give the desired output.
 
Mapper output                                  Sort and shuffle output
 
                                    (1- No of occurrences)
                                                 
< hadoop,1>                                     < hadoop,1,1,1,1>
                                               
< hadoop,1>                                         < good,1,1,1,1>
                                                           
Ι                                                                        Ι
 
Ι                                                                         Ι
 
Ι                                                                         Ι                
 
                                                   
  Reducer Output is                                  

 

Sum=0 
<This,1>
Obj. hasmore elements()  
<is,4>
{ 
< hadoop,4>
Value. get values()  
<class,1>
Sum=sum+ value; 
 <good,4>
}
….
Context. Write(word, sum)
….
 <thing,1>
 

 

Map Reduce Programming

Program Flow:-
 
Any Map Reduce program can be split into 3 modules
 
1. Driver code
2. Mapper code
3.Reducer code
Note:-The Reducer program is not a mandatory one.
 
         The reason might be, if you wish to conclude business logic in the mapper class itself, we can do that very well and in that case we cannot expect the Reducer code.
 
      The selection of the mapper and reducer will depend upon the case that we have selected.
 
1. Driver code:
 
  The Driver code generally falls under the main() method of the map reduce program.
 
As we are dealing with the below information in our driver code.
 
1) All the job level config details (job name, jar file- etc)
 
2) Mapper combiner reducer class level details.
 
Ex:-Job. set mapper class(tokenize mapper. class);
 
        Job. Set combiner class(Int sum reduce. class);
 
        Job. Set Reducer class(Int sum reduce. class);
 
  • The final output(key, value)details
Ex:-Job. set output key class(text. class);
 
                                                Data type will be changed.
 
        Job. Set output value class(Int writable. class);
 
                                                  Data type
 
  • Input and output related information of HDFS (the input path from where the map reduce program will expect the input file of HDFS and the output path of HDFS where exactly map reduce will produce the results.
Note:-Here at the input HDFS path, the developer has to create well in advance before executing the map reduce program and upload the input file in the same path (HDFS CLI)
 
The program itself will create the HDFS output path after the successful execution of the program.
 
2. Mapper code:
 
Any Mapper class should extend the base class of mapper in order to provide the mapper business logic.
 
It will take 4 arguments and first two corresponds to mapper input (key, value)
 
Ex:-Public static class tokenize mapper extends
 

Mapper

                                               

            Mapper  Mapper   Mapper    Mapper

         Input key I/P value O/P key  O/P value

 
 Last two argument corresponds to mapper output(key, value)
 
 In any mapper class, we have to override a method called a Map.
 
Syntax:- Public void map (object, key, text value, context, content) throws IO Exception, Interrupted Exception
 
How to add the External (dependent jars)
 
Right click on Project name Select Build path configure build path Screenshot_1806 go to libraries tab click on add external Jab select the appropriate jar files click on ok.
 
How to Export the jar
 
Right click on project name click on Export go to Java select JAR file Click on Next browse the path where you wish to place the jar file save Screenshot_1806 finish
 

Execution commands:-

 
>hadoop fs- mkdir/user/Batch4/map reduce/input/
 
                               To create Directory
 
> hadoop fs-put. Input.txt /user/Batch 4/map Reduce/ input
 
                               Copying the input file to HDFS
 
> hadoop jar word count New.jar word count new/user/batch4/ map reduce /input/input.txt/user/Batch4/map reduce
 
hadoop dfs – cat/user/Batch4/map Reduce/out put/part- r-00000
                                  To check the output

RELATED COURSES

Get Updates on Tech posts, Interview & Certification questions and training schedules