Using Counters in Hadoop MapReduce API with Example

Recommended by 0 users

Counters in Hadoop MapReduce

Screenshot_1806 Map Reduce counter provides a way to measure the progress or the number of operations that occur within map reduce programs.

Screenshot_1806 Advantages of counters are :

  1. Job statistics
  2. Quality control
  3. Problem Diagnosis.

Screenshot_1806 While Executing a task, if any one of them fails because of bad records with binary data then the whole job also fails.
Screenshot_1806 At that point, we need to avoid the bad records, i.e, skip out the bad records using counters.

Screenshot_1806 For counters, we have to configure some parameters within the mapper class to check for statistics.

Screenshot_1806 At that time of running, parameters are used to find out which are the bad records.

Screenshot_1806 Here we have built- in counters parameters.

  • Max no of records in mapper I/P
  • Max no of records in mapper O/P

Screenshot_1806 We can have user defined counters and we will use as get counters() which has to be mentioned in the driver code.

Screenshot_1806 Bad records will be stored in temp/log files.

Simple Example for map reduce:

Different phases of Map Reduce Programming are

  1. Map phase
  2. Sort and shuffle phase
  3. Reducer phase


This is a hadoop class

Hadoop is best in the market

The market is not always good

Good placements are there in hadoop

earning hadoop is a good thing.



   1-1 Realation

Part-m-00000 Screenshot_1806 Map file

Part-r-00000 Screenshot_1806 Reduce file

<100101011,This is a hadoop class>


System generated

<2011011, hadoop is good in market>

<1110111,learing hadoop is good thing>

Mapper Output:

It will take the delimiter as space in each line and take value for each and every word

<This, 1>

<is , 1>

< Hadoop,1>


< hadoop,1>






Screenshot_1806 Mapper output is the input for sort and shuffle output

Screenshot_1806 Internally, it will sort and shuffle and give the desired output.

Mapper output

Sort and shuffle output

<This,1> <This,1> (1- No of occurrences)
<is,1> <is,1,1,1>
< hadoop,1> < hadoop,1,1,1,1>
<class,1> <class,1>
< hadoop,1> < good,1,1,1,1>








<thing,1> <thing,1>

Screenshot_1806 Reducer Output is                                                          Screenshot_1807

Sum=0                                   <This,1>
Obj. hasmore elements()                 <is,4>
{                                       < hadoop,4>
Value. get values()                     <class,1>
Sum=sum+ value;                         <good,4>
}                                          …..
Context. Write(word, sum)                   ….                                                                                                       <thing,1>

Map Reduce Programming

Program Flow:-

Any Map Reduce program can be split into 3 modules

  1. Driver code
  2. Mapper code
  3. Reducer code

Note:-The Reducer program is not a mandatory one.

       The reason might be, if you wish to conclude business logic in the mapper class itself, we can do that very well and in that case we cannot expect the Reducer code.

      The selection of the mapper and reducer will depend upon the case that we have selected.

1. Driver code:

Screenshot_1806 The Driver code generally falls under the main() method of the map reduce program.

Screenshot_1806 As we are dealing with the below information in our driver code.

1) All the job level config details (job name, jar file- etc)

2) Mapper combiner reducer class level details.

Ex:-Job. set mapper class(tokenize mapper. class);

        Job. Set combiner class(Int sum reduce. class);

        Job. Set Reducer class(Int sum reduce. class);

  • The final output(key, value)details

Ex:-Job. set output key class(text. class);

                                             Screenshot_1821   Data type will be changed.

        Job. Set output value class(Int writable. class);

                                               Screenshot_1821   Data type

  • Input and output related information of HDFS (the input path from where the map reduce program will expect the input file of HDFS and the output path of HDFS where exactly map reduce will produce the results.

Note:-Here at the input HDFS path, the developer has to create well in advance before executing the map reduce program and upload the input file in the same path (HDFS CLI)

Screenshot_1806 The program itself will create the HDFS output path after the successful execution of the program.

2. Mapper code:

Screenshot_1806 Any Mapper class should extend the base class of mapper in order to provide the mapper business logic.

Screenshot_1806 It will take 4 arguments and first two corresponds to mapper input (key, value)

Ex:-Public static class tokenize mapper extends

Mapper<object,   text,      text,     int writable>

                    Screenshot_1807        Screenshot_1807          Screenshot_1807              Screenshot_1807

            Mapper  Mapper   Mapper    Mapper

         Input key I/P value O/P key  O/P value

Screenshot_1806 Last two argument corresponds to mapper output(key, value)

Screenshot_1806 In any mapper class, we have to override a method called a Map.

Syntax:- Public void map (object, key, text value, context, content) throws IO Exception, Interrupted Exception

Screenshot_1806 How to add the External (dependent jars)

Right click on Project name Screenshot_1806 Select Build path Screenshot_1806 configure build path Screenshot_1806 go to libraries tab Screenshot_1806 click on add external Jab Screenshot_1806 select the appropriate jar files Screenshot_1806 click on ok.

Screenshot_1806 How to Export the jar

Right click on project name Screenshot_1806 click on Export Screenshot_1806 go to Java Screenshot_1806 select JAR file Screenshot_1806 Click on Next Screenshot_1806 browse the path where you wish to place the jar file Screenshot_1806 save Screenshot_1806 finish

Execution commands:-

>hadoop fs- mkdir/user/Batch4/map reduce/input/

                             Screenshot_1821   To create Directory

> hadoop fs-put. Input.txt /user/Batch 4/map Reduce/ input

                              Screenshot_1821   Copying the input file to HDFS

> hadoop jar word count New.jar word count new/user/batch4/ map reduce /input/input.txt/user/Batch4/map reduce

hadoop dfs – cat/user/Batch4/map Reduce/out put/part- r-00000

                                Screenshot_1821   To check the output

0 Responses on Using Counters in Hadoop MapReduce API with Example"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.