Counters in Hadoop MapReduce
.png)
MAP REDUCE COUNTER provides a way to measure the progress or the number of operations that occur within MAP REDUCE programs.
.png)
Advantages of counters are :
1. Job statistics
2. Quality control
3. Problem Diagnosis.
.png)
While Executing a task, if any one of them fails because of bad records with binary data then the whole job also fails.
.png)
At that point, we need to avoid the bad records, i.e, skip out the bad records using counters.
.png)
For counters, we have to configure some parameters within the mapper class to check for statistics.
.png)
At that time of running, parameters are used to find out which are the bad records.
.png)
Here we have built-in counters parameters.
- Max no of records in mapper I/P
- Max no of records in mapper O/P
.png)
We can have user-defined counters and we will use them as getting counters() which have to be mentioned in the driver code.
.png)
Bad records will be stored in temp/log files.
Simple Example for map reduce:
Different phases of Map-Reduce Programming are
1. Map phase
2. Sort and shuffle phase
3. Reducer phase
Ex:-
This is a Hadoop class
Hadoop is the best in the market
The market is not always good
Good placements are there in Hadoop
earning Hadoop is a good thing.
Flow:
Part-m-00000
.png)
Map file
Part-r-00000
.png)
Reduce file
<100101011, This is a Hadoop class>
System-generated
<2011011, Hadoop is good in the market>
<1110111, learning Hadoop is good thing>

Mapper Output:
It will take the delimiter as space in each line and take value for each and every word
Mapper output is the input for sort and shuffles output
.png)
Internally, it will sort and shuffle and give the desired output.
Mapper output Sort and shuffle output
(1- No of occurrences)
< hadoop,1> < hadoop,1,1,1,1>
< hadoop,1> < good,1,1,1,1>
Ι Ι
Ι Ι
Ι Ι
.png)
Reducer Output is
.png)
Sum=0
|
|
Obj. hasmore elements()
|
|
{
|
< hadoop,4>
|
Value. get values()
|
|
Sum=sum+ value;
|
|
}
|
….
|
Context. Write(word, sum)
|
….
|
|
|
Map Reduce Programming
Program Flow:-
Any Map Reduce program can be split into 3 modules
1. Driver code
2. Mapper code
3. Reducer code
Note:-The Reducer program is not a mandatory one.
The reason might be, if you wish to conclude business logic in the mapper class itself, we can do that very well and in that case we cannot expect the Reducer code.
The selection of the mapper and reducer will depend upon the case that we have selected.
1. Driver code:
.png)
The Driver code generally falls under the main() method of the map-reduce program.
.png)
As we are dealing with the below information in our driver code.
1) All the job level config details (job name, jar file- etc)
2) Mapper combiner reducer class level details.
Ex:-Job. set mapper class(tokenize mapper. class);
Job. Set combiner class(Int sum reduce. class);
Job. Set Reducer class(Int sum reduce. class);
- The final output(key, value)details
Ex:-Job. set output key class(text. class);
.png)
The data type will be changed.
Job. Set output value class(Int writable. class);
.png)
Data type
- Input and output related information of HDFS (the input path from where the map-reduce program will expect the input file of HDFS and the output path of HDFS where exactly map-reduce will produce the results.
Note:-Here at the input HDFS path, the developer has to create well in advance before executing the map-reduce program and uploading the input file in the same path (HDFS CLI)
.png)
The program itself will create the
HDFS output path after the successful execution of the program.
2. Mapper code:
.png)
Any Mapper class should extend the base class of the mapper in order to provide the mapper business logic.
.png)
It will take 4 arguments and the first two correspond to mapper input (key, value)
Ex:-Public static class tokenize mapper extends
Mapper
.png)
Mapper Mapper Mapper Mapper
Input key I/P value O/P key O/P value
.png)
The last two-argument corresponds to mapper output(key, value)
.png)
In any mapper class, we have to override a method called a Map.
Syntax:- public void map (object, key, text value, context, content) throws IO Exception, Interrupted Exception
.png)
How to add the External (dependent jars)
Right-click on Project name
.png)
Select Build path
.png)
configure build path Screenshot_1806 go to libraries tab
.png)
click on add external Jab
.png)
select the appropriate jar files
.png)
click on ok.
.png)
How to Export the jar
Right click on project name
.png)
click on Export
.png)
go to Java
.png)
select JAR file
.png)
Click on Next
.png)
browse the path where you wish to place the jar file
.png)
save Screenshot_1806 finish
Execution commands:-
>hadoop fs- mkdir/user/Batch4/map reduce/input/
.png)
To create Directory
> hadoop fs-put. Input.txt /user/Batch 4/map Reduce/ input
.png)
Copying the input file to HDFS
> hadoop jar word count New.jar word count new/user/batch4/ map reduce /input/input.txt/user/Batch4/map reduce
hadoop dfs – cat/user/Batch4/map Reduce/out put/part- r-00000
.png)
To check the output
Explore MapReduce Sample Resumes! Download & Edit, Get Noticed by Top Employers!Download Now!
List of Big Data Courses: