Home  >  Blog  >   Hadoop  > 

Using Counters in Hadoop MapReduce API with Example

Rating: 5
  
 
3469
  1. Share:

Counters in Hadoop MapReduce

 MAP REDUCE COUNTER provides a way to measure the progress or the number of operations that occur within MAP REDUCE programs.
 
  Advantages of counters are :
 
1. Job statistics
2. Quality control
3. Problem Diagnosis.
While Executing a task, if any one of them fails because of bad records with binary data then the whole job also fails.
At that point, we need to avoid the bad records, i.e, skip out the bad records using counters.
 
For counters, we have to configure some parameters within the mapper class to check for statistics.
 
At that time of running, parameters are used to find out which are the bad records.
 
Here we have built-in counters parameters.
 
  • Max no of records in mapper I/P
  • Max no of records in mapper O/P
We can have user-defined counters and we will use them as getting counters() which have to be mentioned in the driver code.
 
Bad records will be stored in temp/log files.
Interested in mastering MapReduce? Enroll now for FREE demo on MapReduce training

Simple Example for map reduce:

Different phases of Map-Reduce Programming are
 
1. Map phase
2. Sort and shuffle phase
3. Reducer phase
 
Ex:-
This is a Hadoop class
 
Hadoop is the best in the market
 
The market is not always good
 
Good placements are there in Hadoop
 
earning Hadoop is a good thing.
 
Flow:
 
Part-m-00000 Map file
 
Part-r-00000 Reduce file
 
<100101011, This is a Hadoop class>
 
                       
 
System-generated
 
<2011011, Hadoop is good in the market>
 
<1110111, learning Hadoop is good thing>
 

 MindMajix YouTube Channel

Mapper Output:

It will take the delimiter as space in each line and take value for each and every word

  Mapper output is the input for sort and shuffles output

 
Internally, it will sort and shuffle and give the desired output.
 
Mapper output                                  Sort and shuffle output
 
                                    (1- No of occurrences)
                                                 
< hadoop,1>                                     < hadoop,1,1,1,1>
                                               
< hadoop,1>                                         < good,1,1,1,1>
                                                           
Ι                                                                        Ι
 
Ι                                                                         Ι
 
Ι                                                                         Ι                
 
                                                   
  Reducer Output is                                  
Sum=0 
 
Obj. hasmore elements()  
 
{ 
< hadoop,4>
Value. get values()  
 
Sum=sum+ value; 
 
}
….
Context. Write(word, sum)
….
 
 

Map Reduce Programming

Program Flow:-
 
Any Map Reduce program can be split into 3 modules
 
1. Driver code
2. Mapper code
3. Reducer code
Note:-The Reducer program is not a mandatory one.
 
         The reason might be, if you wish to conclude business logic in the mapper class itself, we can do that very well and in that case we cannot expect the Reducer code.
 
      The selection of the mapper and reducer will depend upon the case that we have selected.
 
 
1. Driver code:
 
  The Driver code generally falls under the main() method of the map-reduce program.
 
As we are dealing with the below information in our driver code.
 
1) All the job level config details (job name, jar file- etc)
 
2) Mapper combiner reducer class level details.
 
Ex:-Job. set mapper class(tokenize mapper. class);
 
        Job. Set combiner class(Int sum reduce. class);
 
        Job. Set Reducer class(Int sum reduce. class);
 
  • The final output(key, value)details
Ex:-Job. set output key class(text. class);
 
                                                The data type will be changed.
 
        Job. Set output value class(Int writable. class);
 
                                                  Data type
 
  • Input and output related information of HDFS (the input path from where the map-reduce program will expect the input file of HDFS and the output path of HDFS where exactly map-reduce will produce the results.
Note:-Here at the input HDFS path, the developer has to create well in advance before executing the map-reduce program and uploading the input file in the same path (HDFS CLI)
 
The program itself will create the HDFS output path after the successful execution of the program.
 
2. Mapper code:
 
Any Mapper class should extend the base class of the mapper in order to provide the mapper business logic.
 
It will take 4 arguments and the first two correspond to mapper input (key, value)
 
Ex:-Public static class tokenize mapper extends
 

Mapper

                                               

            Mapper  Mapper   Mapper    Mapper

         Input key I/P value O/P key  O/P value

 
 The last two-argument corresponds to mapper output(key, value)
 
 In any mapper class, we have to override a method called a Map.
 
Syntax:- public void map (object, key, text value, context, content) throws IO Exception, Interrupted Exception
 
How to add the External (dependent jars)
 
Right-click on Project name Select Build path configure build path Screenshot_1806 go to libraries tab click on add external Jab select the appropriate jar files click on ok.
 
How to Export the jar
 
Right click on project name click on Export go to Java select JAR file Click on Next browse the path where you wish to place the jar file save Screenshot_1806 finish

Execution commands:-

 
>hadoop fs- mkdir/user/Batch4/map reduce/input/
 
                               To create Directory
 
> hadoop fs-put. Input.txt /user/Batch 4/map Reduce/ input
 
                               Copying the input file to HDFS
 
> hadoop jar word count New.jar word count new/user/batch4/ map reduce /input/input.txt/user/Batch4/map reduce
 
hadoop dfs – cat/user/Batch4/map Reduce/out put/part- r-00000
                                  To check the output
Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!

List of Big Data Courses:

 Hadoop Adminstartion  MapReduce
 Big Data On AWS  Informatica Big Data Integration
 Bigdata Greenplum DBA  Informatica Big Data Edition
 Hadoop Hive  Impala
 Hadoop Testing  Apache Mahout
Join our newsletter
inbox

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
NameDates
Hadoop TrainingMar 25 to Apr 09
Hadoop TrainingMar 28 to Apr 12
Hadoop TrainingApr 01 to Apr 16
Hadoop TrainingApr 04 to Apr 19
Last updated: 21 March 2023
About Author
Remy Sharp
Ravindra Savaram

Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

Recommended Courses

1 /15